OpenCL August

Fresh NVIDIA vs. Radeon OpenCL Linux benchmarks. Tests by Michael Larabel for a future article on Phoronix.com.

HTML result view exported from: https://openbenchmarking.org/result/1808234-PTS-OPENCLAU56&grt&sor.

OpenCL AugustProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiAMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads)ASUS ROG ZENITH EXTREME (1402 BIOS)AMD Family 17h32768MBSamsung SSD 970 EVO 500GBAMD Radeon RX Vega 8176MBRealtek ALC1220ASUS VP28UIntel I211 Gigabit Connection + Qualcomm Atheros QCA6174 802.11ac WirelessUbuntu 18.044.15.0-33-generic (x86_64)GNOME Shell 3.28.3X Server 1.19.6amdgpu 18.0.994.6.13536OpenCL 2.1 AMD-APP (2671.3)GCC 7.3.0ext43840x2160NVIDIA GeForce GTX 1070 8192MB (1506/4006MHz)NVIDIA 396.544.6.0OpenCL 1.2 CUDA 9.2.210Zotac NVIDIA GeForce GTX 1070 Ti 8192MB (1607/4006MHz)NVIDIA GeForce GTX 1080 8192MB (1607/5005MHz)NVIDIA GeForce GTX 1080 Ti 11264MB (1480/5508MHz)OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq ondemandGraphics Details- Radeon RX Vega 56, Radeon RX Vega 64: GLAMORPython Details- Radeon RX Vega 56: Python 2.7.15rc1 + Python 3.6.5Security Details- __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp ProtectionOpenCL Details- GeForce GTX 1070: GPU Compute Cores: 1920- GeForce GTX 1070 Ti: GPU Compute Cores: 2432- GeForce GTX 1080: GPU Compute Cores: 2560- GeForce GTX 1080 Ti: GPU Compute Cores: 3584

OpenCL Augustcl-mem: Copycl-mem: Readcl-mem: Writefahbench: luxmark: GPU - Hotelmandelgpu: GPUshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: OpenCL - Texture Read BandwidthRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 Ti313.30346.47333.2092.485083165896206.17830.7713.10362.97369.93399.00388.5792.465907192917451.23882.9617.12427.74186.87205.50192.30131.183820148507150.87528.7110.70456.10186.80205.63191145.384405184334944.10551.9513.80501.46209.33228.40216.70141.383883188560526.97650.7814.40523.65317.37337.73336.30179.345662250678650.87984.6119.72593.14OpenBenchmarking.org

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyRadeon RX Vega 64GeForce GTX 1080 TiRadeon RX Vega 56GeForce GTX 1080GeForce GTX 1070GeForce GTX 1070 Ti80160240320400SE +/- 0.03, N = 3SE +/- 0.15, N = 3SE +/- 0.30, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3369.93317.37313.30209.33186.87186.801. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadRadeon RX Vega 64Radeon RX Vega 56GeForce GTX 1080 TiGeForce GTX 1080GeForce GTX 1070 TiGeForce GTX 107090180270360450SE +/- 0.10, N = 3SE +/- 0.52, N = 3SE +/- 0.30, N = 3SE +/- 0.20, N = 3SE +/- 0.09, N = 3SE +/- 0.10, N = 3399.00346.47337.73228.40205.63205.501. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteRadeon RX Vega 64GeForce GTX 1080 TiRadeon RX Vega 56GeForce GTX 1080GeForce GTX 1070GeForce GTX 1070 Ti80160240320400SE +/- 0.71, N = 3SE +/- 0.10, N = 3SE +/- 0.40, N = 3SE +/- 0.06, N = 3SE +/- 0.00, N = 3388.57336.30333.20216.70192.30191.001. (CC) gcc options: -O2 -flto -lOpenCL

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2GeForce GTX 1080 TiGeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1070Radeon RX Vega 56Radeon RX Vega 644080120160200SE +/- 0.34, N = 3SE +/- 0.17, N = 3SE +/- 0.05, N = 3SE +/- 0.18, N = 3SE +/- 0.14, N = 3SE +/- 0.82, N = 3179.34145.38141.38131.1892.4892.46

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: HotelRadeon RX Vega 64GeForce GTX 1080 TiRadeon RX Vega 56GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 107013002600390052006500SE +/- 32.69, N = 3SE +/- 12.60, N = 3SE +/- 4.18, N = 3SE +/- 1.33, N = 3590756625083440538833820

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUGeForce GTX 1080 TiRadeon RX Vega 64GeForce GTX 1080GeForce GTX 1070 TiRadeon RX Vega 56GeForce GTX 107050M100M150M200M250MSE +/- 727895.77, N = 3SE +/- 234631.71, N = 3SE +/- 430789.92, N = 3SE +/- 572304.04, N = 3SE +/- 144327.89, N = 3SE +/- 483103.56, N = 3250678650.87192917451.23188560526.97184334944.10165896206.17148507150.871. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPGeForce GTX 1080 TiRadeon RX Vega 64Radeon RX Vega 56GeForce GTX 1080GeForce GTX 1070 TiGeForce GTX 10702004006008001000SE +/- 1.72, N = 3SE +/- 14.14, N = 12SE +/- 31.13, N = 12SE +/- 1.65, N = 3SE +/- 1.30, N = 3SE +/- 2.10, N = 3984.61882.96830.77650.78551.95528.711. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 HashGeForce GTX 1080 TiRadeon RX Vega 64GeForce GTX 1080GeForce GTX 1070 TiRadeon RX Vega 56GeForce GTX 1070510152025SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 319.7217.1214.4013.8013.1010.701. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthGeForce GTX 1080 TiGeForce GTX 1080GeForce GTX 1070 TiGeForce GTX 1070Radeon RX Vega 64Radeon RX Vega 56130260390520650SE +/- 0.88, N = 3SE +/- 2.76, N = 3SE +/- 1.71, N = 3SE +/- 0.38, N = 3SE +/- 0.13, N = 3SE +/- 1.35, N = 3593.14523.65501.46456.10427.74362.971. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi


Phoronix Test Suite v10.8.4