AMDGPU-PRO OpenCL vs. NVIDIA Linux Comparison

OpenCL benchmarks for a future article on Phoronix.

HTML result view exported from: https://openbenchmarking.org/result/1611102-TA-AMDGPUPRO86&obr_sor=y&obr_rro=y.

AMDGPU-PRO OpenCL vs. NVIDIA Linux ComparisonProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionRadeon RX 460Radeon RX 480Radeon R9 FuryGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080Intel Xeon E3-1280 v5 @ 4.00GHz (8 Cores)MSI C236A WORKSTATION (MS-7998) v1.0Intel Sky Lake16384MB256GB INTEL SSDPEKKW256G7AMD Radeon RX 460 2009.7109375MBRealtek ALC1150Acer B286HKIntel ConnectionUbuntu 16.044.8.4-040804-generic (x86_64)Unity 7.4.0X Server 1.18.4modesetting 1.18.44.5.13453OpenCL 2.0 AMD-APP (2117.10)1.0.8GCC 5.4.0 20160609 + LLVM 3.8.0 + CUDA 8.0ext43840x2160AMD Radeon RX 480 8141.7109375MBSapphire AMD Radeon R9 Fury 4053.82421875MBZotac NVIDIA GeForce GTX 1050 2048MB (1316/3504MHz)NVIDIA 375.104.5.0OpenCL 1.2 CUDA 8.0.0eVGA NVIDIA GeForce GTX 1050 Ti 4096MB (1341/3504MHz)NVIDIA GeForce GTX 1060 6GB 6144MB (1506/4006MHz)NVIDIA GeForce GTX 1070 8192MB (1504/4006MHz)NVIDIA GeForce GTX 1080 8192MB (35/5005MHz)OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- Scaling Governor: intel_pstate performanceGraphics Details- Radeon RX 460, Radeon RX 480, Radeon R9 Fury: GLAMOROpenCL Details- GeForce GTX 1050: GPU Compute Cores: 640- GeForce GTX 1050 Ti: GPU Compute Cores: 768- GeForce GTX 1060: GPU Compute Cores: 1280- GeForce GTX 1070: GPU Compute Cores: 1920- GeForce GTX 1080: GPU Compute Cores: 2560System Details- GeForce GTX 1050: GPU Compute Cores: 640.- GeForce GTX 1050 Ti: GPU Compute Cores: 768.- GeForce GTX 1060: GPU Compute Cores: 1280.- GeForce GTX 1070: GPU Compute Cores: 1920.- GeForce GTX 1080: GPU Compute Cores: 2560.

AMDGPU-PRO OpenCL vs. NVIDIA Linux Comparisonshoc: OpenCL - Triadshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: OpenCL - Max SP Flopsshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthjuliagpu: GPUmandelbulbgpu: GPURadeon RX 460Radeon RX 480Radeon R9 FuryGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 10803.1226.832.422150.576.847.1278.9641787915.5026828537.035.5935.285.125434.4713.1113.38162.5460734776.5738283926.004.0624.515.887133.5913.0713.45221.9779965845.4544542867.4711.06116.392.492104.3512.5213.17279.9764392809.4737264194.3011.05128.643.012658.3912.5313.17311.5477022561.2744503222.4011.51214.125.614781.4312.5313.17397.51113379989.4062974313.0011.69302.338.347110.4612.5113.17450.88143023765.0079186944.9311.84346.4711.819426.8812.5413.17526.96164397650.0791368843.10OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: TriadRadeon RX 460Radeon R9 FuryRadeon RX 480GeForce GTX 1050 TiGeForce GTX 1050GeForce GTX 1060GeForce GTX 1070GeForce GTX 10803691215SE +/- 0.06, N = 6SE +/- 0.01, N = 3SE +/- 0.70, N = 6SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 33.124.065.5911.0511.0611.5111.6911.841. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPRadeon R9 FuryRadeon RX 460Radeon RX 480GeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 108080160240320400SE +/- 0.23, N = 3SE +/- 4.17, N = 6SE +/- 1.64, N = 6SE +/- 0.37, N = 3SE +/- 0.99, N = 3SE +/- 2.10, N = 3SE +/- 0.44, N = 3SE +/- 0.65, N = 324.5126.8335.28116.39128.64214.12302.33346.471. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 HashRadeon RX 460GeForce GTX 1050GeForce GTX 1050 TiRadeon RX 480GeForce GTX 1060Radeon R9 FuryGeForce GTX 1070GeForce GTX 10803691215SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.04, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 32.422.493.015.125.615.888.3411.811. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Max SP FlopsGeForce GTX 1050Radeon RX 460GeForce GTX 1050 TiGeForce GTX 1060Radeon RX 480GeForce GTX 1070Radeon R9 FuryGeForce GTX 10802K4K6K8K10KSE +/- 5.38, N = 3SE +/- 2.88, N = 3SE +/- 6.27, N = 3SE +/- 21.87, N = 3SE +/- 39.66, N = 3SE +/- 51.57, N = 3SE +/- 0.46, N = 3SE +/- 57.81, N = 32104.352150.572658.394781.435434.477110.467133.599426.881. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed DownloadRadeon RX 460GeForce GTX 1070GeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1080Radeon R9 FuryRadeon RX 4803691215SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 36.8412.5112.5212.5312.5312.5413.0713.111. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed ReadbackRadeon RX 460GeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080Radeon RX 480Radeon R9 Fury3691215SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.21, N = 47.1213.1713.1713.1713.1713.1713.3813.451. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthRadeon RX 460Radeon RX 480Radeon R9 FuryGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080110220330440550SE +/- 0.33, N = 3SE +/- 0.34, N = 3SE +/- 0.11, N = 3SE +/- 3.00, N = 3SE +/- 4.30, N = 3SE +/- 0.98, N = 3SE +/- 0.17, N = 3SE +/- 0.23, N = 378.96162.54221.97279.97311.54397.51450.88526.961. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

JuliaGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterJuliaGPU 1.2pts1OpenCL Device: GPURadeon RX 460Radeon RX 480GeForce GTX 1050GeForce GTX 1050 TiRadeon R9 FuryGeForce GTX 1060GeForce GTX 1070GeForce GTX 108040M80M120M160M200MSE +/- 629997.60, N = 2SE +/- 478807.27, N = 3SE +/- 64687.33, N = 3SE +/- 586830.45, N = 3SE +/- 1373156.95, N = 2SE +/- 406531.89, N = 3SE +/- 507906.47, N = 3SE +/- 545460.59, N = 341787915.5060734776.5764392809.4777022561.2779965845.45113379989.40143023765.00164397650.071. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm

MandelbulbGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelbulbGPU 1.0pts1OpenCL Device: GPURadeon RX 460GeForce GTX 1050Radeon RX 480GeForce GTX 1050 TiRadeon R9 FuryGeForce GTX 1060GeForce GTX 1070GeForce GTX 108020M40M60M80M100MSE +/- 76036.11, N = 3SE +/- 38810.35, N = 3SE +/- 121550.30, N = 2SE +/- 74696.74, N = 3SE +/- 691465.86, N = 3SE +/- 142262.41, N = 3SE +/- 247192.13, N = 3SE +/- 361988.46, N = 326828537.0337264194.3038283926.0044503222.4044542867.4762974313.0079186944.9391368843.101. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL


Phoronix Test Suite v10.8.4