OpenCL August

Fresh NVIDIA vs. Radeon OpenCL Linux benchmarks. Tests by Michael Larabel for a future article on Phoronix.com.

HTML result view exported from: https://openbenchmarking.org/result/2107105-FI-1808234PT94&grr.

OpenCL AugustProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionVulkanRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260XAMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads)ASUS ROG ZENITH EXTREME (1402 BIOS)AMD Family 17h32768MBSamsung SSD 970 EVO 500GBAMD Radeon RX Vega 8176MBRealtek ALC1220ASUS VP28UIntel I211 Gigabit Connection + Qualcomm Atheros QCA6174 802.11ac WirelessUbuntu 18.044.15.0-33-generic (x86_64)GNOME Shell 3.28.3X Server 1.19.6amdgpu 18.0.994.6.13536OpenCL 2.1 AMD-APP (2671.3)GCC 7.3.0ext43840x2160NVIDIA GeForce GTX 1070 8192MB (1506/4006MHz)NVIDIA 396.544.6.0OpenCL 1.2 CUDA 9.2.210Zotac NVIDIA GeForce GTX 1070 Ti 8192MB (1607/4006MHz)NVIDIA GeForce GTX 1080 8192MB (1607/5005MHz)NVIDIA GeForce GTX 1080 Ti 11264MB (1480/5508MHz)AMD FX-6300 Six-Core @ 3.50GHz (3 Cores / 6 Threads)MSI 970 GAMING (MS-7693) v4.0 (V22.4 BIOS)AMD RD9x0/RX98016GB1000GB Western Digital WD10EZEX-08W + 275GB Crucial CT275MX3 + 2000GB Seagate ST2000DX002-2DV1 + 64GB Cruzer BladeSapphire AMD Radeon HD 7770/8760 / R7 250X 2GBRealtek ALC1150BenQ GW2270Qualcomm Atheros AR9485openSUSE 202107085.13.0-1-default (x86_64)KDE Plasma 5.22.2X Server 1.20.11modesetting 1.20.114.6 Mesa 21.1.4 (LLVM 12.0.0)OpenCL 2.1 AMD-APP (3224.4)1.2.168GCC 11.1.1 20210625 [revision 62bbb113ae68a7e724255e17143520735bcb9ec9] + Clang 12.0.0Target: + LLVM 12.0.01920x1080OpenBenchmarking.orgCompiler Details- Radeon RX Vega 56: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v- Radeon RX Vega 64: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v- GeForce GTX 1070: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v- GeForce GTX 1070 Ti: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v- GeForce GTX 1080: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v- GeForce GTX 1080 Ti: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v- Radeon R7 250X + Radeon R7 260X: --build=x86_64-suse-linux --disable-libcc1 --disable-libssp --disable-libstdcxx-pch --disable-libvtv --disable-werror --enable-cet=auto --enable-checking=release --enable-gnu-indirect-function --enable-host-shared --enable-languages=c,c++,objc,fortran,obj-c++,ada,go,d,jit --enable-libphobos --enable-libstdcxx-allocator=new --enable-link-mutex --enable-linux-futex --enable-multilib --enable-offload-targets=nvptx-none,amdgcn-amdhsa, --enable-plugin --enable-ssp --enable-version-specific-runtime-libs --host=x86_64-suse-linux --mandir=/usr/share/man --with-arch-32=x86-64 --with-build-config=bootstrap-lto-lean --with-gcc-major-version-only --with-slibdir=/lib64 --with-tune=generic --without-cuda-driver --without-system-libunwindProcessor Details- Radeon RX Vega 56: Scaling Governor: acpi-cpufreq ondemand- Radeon RX Vega 64: Scaling Governor: acpi-cpufreq ondemand- GeForce GTX 1070: Scaling Governor: acpi-cpufreq ondemand- GeForce GTX 1070 Ti: Scaling Governor: acpi-cpufreq ondemand- GeForce GTX 1080: Scaling Governor: acpi-cpufreq ondemand- GeForce GTX 1080 Ti: Scaling Governor: acpi-cpufreq ondemand- Radeon R7 250X + Radeon R7 260X: Scaling Governor: acpi-cpufreq schedutil - CPU Microcode: 0x6000852Graphics Details- Radeon RX Vega 56, Radeon RX Vega 64, Radeon R7 250X + Radeon R7 260X: GLAMORPython Details- Radeon RX Vega 56: Python 2.7.15rc1 + Python 3.6.5Security Details- Radeon RX Vega 56: __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp Protection- Radeon RX Vega 64: __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp Protection- GeForce GTX 1070: __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp Protection- GeForce GTX 1070 Ti: __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp Protection- GeForce GTX 1080: __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp Protection- GeForce GTX 1080 Ti: __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp Protection- Radeon R7 250X + Radeon R7 260X: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affectedOpenCL Details- GeForce GTX 1070: GPU Compute Cores: 1920- GeForce GTX 1070 Ti: GPU Compute Cores: 2432- GeForce GTX 1080: GPU Compute Cores: 2560- GeForce GTX 1080 Ti: GPU Compute Cores: 3584Kernel Details- Radeon R7 250X + Radeon R7 260X: amdgpu.si_support=1 amdgpu.cik_support=1Environment Details- Radeon R7 250X + Radeon R7 260X: DEBUGINFOD_URLS=https://debuginfod.opensuse.org/

OpenCL Augustluxmark: GPU - Hotelfahbench: shoc: OpenCL - Texture Read Bandwidthcl-mem: Copycl-mem: Readcl-mem: Writemandelgpu: GPUshoc: OpenCL - FFT SPshoc: OpenCL - MD5 HashRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X508392.48362.97313.30346.47333.20165896206.17830.7713.10590792.46427.74369.93399.00388.57192917451.23882.9617.123820131.18456.10186.87205.50192.30148507150.87528.7110.704405145.38501.46186.80205.63191184334944.10551.9513.803883141.38523.65209.33228.40216.70188560526.97650.7814.405662179.34593.14317.37337.73336.30250678650.87984.6119.72162819.052568.589345.963.439.229262082.9149.0081.5159OpenBenchmarking.org

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: HotelRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X13002600390052006500SE +/- 4.18, N = 3SE +/- 32.69, N = 3SE +/- 1.33, N = 3SE +/- 12.60, N = 3SE +/- 20.63, N = 125083590738204405388356621628

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2Radeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X4080120160200SE +/- 0.14, N = 3SE +/- 0.82, N = 3SE +/- 0.18, N = 3SE +/- 0.17, N = 3SE +/- 0.05, N = 3SE +/- 0.34, N = 3SE +/- 0.04, N = 392.4892.46131.18145.38141.38179.3419.05

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X130260390520650SE +/- 1.35, N = 3SE +/- 0.13, N = 3SE +/- 0.38, N = 3SE +/- 1.71, N = 3SE +/- 2.76, N = 3SE +/- 0.88, N = 3SE +/- 1.02, N = 3362.97427.74456.10501.46523.65593.1468.59-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X80160240320400SE +/- 0.30, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.15, N = 3SE +/- 0.58, N = 3313.30369.93186.87186.80209.33317.3745.901. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X90180270360450SE +/- 0.52, N = 3SE +/- 0.10, N = 3SE +/- 0.10, N = 3SE +/- 0.09, N = 3SE +/- 0.20, N = 3SE +/- 0.30, N = 3SE +/- 0.15, N = 3346.47399.00205.50205.63228.40337.7363.401. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X80160240320400SE +/- 0.40, N = 3SE +/- 0.71, N = 3SE +/- 0.00, N = 3SE +/- 0.06, N = 3SE +/- 0.10, N = 3SE +/- 0.00, N = 3333.20388.57192.30191.00216.70336.3039.201. (CC) gcc options: -O2 -flto -lOpenCL

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPURadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X50M100M150M200M250MSE +/- 144327.89, N = 3SE +/- 234631.71, N = 3SE +/- 483103.56, N = 3SE +/- 572304.04, N = 3SE +/- 430789.92, N = 3SE +/- 727895.77, N = 3SE +/- 23993.55, N = 3165896206.17192917451.23148507150.87184334944.10188560526.97250678650.8729262082.901. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X2004006008001000SE +/- 31.13, N = 12SE +/- 14.14, N = 12SE +/- 2.10, N = 3SE +/- 1.30, N = 3SE +/- 1.65, N = 3SE +/- 1.72, N = 3SE +/- 0.43, N = 3830.77882.96528.71551.95650.78984.61149.01-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 HashRadeon RX Vega 56Radeon RX Vega 64GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080GeForce GTX 1080 TiRadeon R7 250X + Radeon R7 260X510152025SE +/- 0.0087, N = 3SE +/- 0.0035, N = 3SE +/- 0.0178, N = 3SE +/- 0.0017, N = 3SE +/- 0.0168, N = 3SE +/- 0.0158, N = 3SE +/- 0.0012, N = 313.100017.120010.700013.800014.400019.72001.5159-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt


Phoronix Test Suite v10.8.4