Radeon ROCm 2.0 OpenCL Compute Versus NVIDIA Linux

ROCm 2.0 Linux GPGPU/compute benchmarks for a future article on Phoronix.com by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1812285-SP-ROCM20NVI57&sro&grw.

Radeon ROCm 2.0 OpenCL Compute Versus NVIDIA LinuxProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 64Intel Core i9-9900K @ 5.00GHz (8 Cores / 16 Threads)ASUS PRIME Z390-A (0602 BIOS)Intel Cannon Lake PCH Shared SRAM16384MB2000GB SABRENT + Samsung SSD 970 EVO 250GBNVIDIA GeForce GTX 980 Ti 6GB (999/3505MHz)Realtek ALC1220Acer B286HKIntel ConnectionUbuntu 18.044.19.5-041905-generic (x86_64)GNOME Shell 3.28.3X Server 1.19.6NVIDIA 415.234.6.0OpenCL 1.2 CUDA 10.0.1321.1.84GCC 7.3.0 + LLVM 6.0.0 + CUDA 10.0ext43840x2160NVIDIA GeForce GTX TITAN X 12GB (1001/3505MHz)NVIDIA GeForce GTX 1060 6GB (1506/4006MHz)NVIDIA GeForce GTX 1070 8GB (1506/4006MHz)NVIDIA GeForce GTX 1080 8GB (1607/5005MHz)NVIDIA GeForce GTX 1080 Ti 11GB (1480/5508MHz)Zotac NVIDIA GeForce RTX 2080 8GB (1515/7000MHz)NVIDIA GeForce RTX 2080 Ti 11GB (1350/7000MHz)NVIDIA TITAN RTX 24GB (1350/7000MHz)MSI AMD Radeon RX 470/480 8GB (1366/2000MHz)4.5 Mesa 19.0.0-devel (git-17218a0406) (LLVM 8.0.0)OpenCL 2.1 AMD-APP (2783.0)1.1.90AMD Radeon RX Vega 8GB (1590/800MHz)4.15.0-43-generic (x86_64)AMD Radeon RX Vega 8GB (1630/945MHz)OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performanceOpenCL Details- GTX 980 Ti: GPU Compute Cores: 2816- GTX TITAN X GM200: GPU Compute Cores: 3072- GTX 1060: GPU Compute Cores: 1280- GTX 1070: GPU Compute Cores: 1920- GTX 1080: GPU Compute Cores: 2560- GTX 1080 Ti: GPU Compute Cores: 3584- RTX 2080: GPU Compute Cores: 2944- RTX 2080 Ti: GPU Compute Cores: 4352- TITAN RTX: GPU Compute Cores: 4608Python Details- Python 2.7.15rc1 + Python 3.6.7Security Details- __user pointer sanitization + Full generic retpoline IBPB IBRS_FW + SSB disabled via prctl and seccomp

Radeon ROCm 2.0 OpenCL Compute Versus NVIDIA Linuxclpeak: Global Memory Bandwidthclpeak: Integer Compute INTdarktable: Boat - OpenCLdarktable: Server Room - OpenCLshoc: OpenCL - FFT SPparboil: OpenCL TPACFrodinia: OpenCL Particle Filterfahbench: mixbench: Single Precisioncl-mem: Copyluxmark: GPU - Luxball HDRluxmark: GPU - Microphoneluxmark: GPU - HotelGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 6426416163.271.527281.3910.2211458602171677811105387926317833.111.367261.369.0312165572181728810929413514712853.771.993021.3912.07103434613912238696319616842.871.104521.088.301406410187172899980387722224502.721.125751.066.541558570209138338732382332933662.271.029720.774.971981160531721685137325581368101991.880.8010830.886.172371102932829650198556589505148401.620.7314430.644.452941617545442314284769191528153981.610.7315480.634.262971732448446377305289884205125213.604.195481.75591518415270317200613.644.219321.41105192033064936224943.581.7610741.361245822232545OpenBenchmarking.org

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Global Memory BandwidthGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX110220330440550SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.70, N = 3SE +/- 0.01, N = 3SE +/- 0.26, N = 3SE +/- 0.23, N = 3SE +/- 0.74, N = 3SE +/- 1.53, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 1.05, N = 3147196222329264263368505205317362528

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is BetterclpeakOpenCL Test: Integer Compute INTGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX3K6K9K12K15KSE +/- 4.82, N = 3SE +/- 18.17, N = 3SE +/- 11.35, N = 3SE +/- 18.09, N = 3SE +/- 21.99, N = 3SE +/- 23.76, N = 3SE +/- 652.84, N = 3SE +/- 697.03, N = 3SE +/- 0.01, N = 3SE +/- 1.26, N = 3SE +/- 2.18, N = 3SE +/- 1168.75, N = 3128516842450336616161783101991484012522006249415398

Darktable

Test: Boat - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Boat - Acceleration: OpenCLGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX48121620SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 33.772.872.722.273.273.111.881.6213.6013.643.581.61

Darktable

Test: Server Room - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Server Room - Acceleration: OpenCLGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX0.94731.89462.84193.78924.7365SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.991.101.121.021.521.360.800.734.194.211.760.73

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX30060090012001500SE +/- 1.44, N = 3SE +/- 0.94, N = 3SE +/- 2.68, N = 3SE +/- 1.23, N = 3SE +/- 0.20, N = 3SE +/- 0.60, N = 3SE +/- 5.95, N = 3SE +/- 14.24, N = 3SE +/- 0.10, N = 3SE +/- 1.83, N = 3SE +/- 2.24, N = 3SE +/- 11.44, N = 330245257597272872610831443548932107415481. (CXX) g++ options: -O2 -lSHOCCommon -std=c++14 -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

Parboil

Test: OpenCL TPACF

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenCL TPACFGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX0.39380.78761.18141.57521.969SE +/- 0.02, N = 6SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.04, N = 31.391.081.060.771.391.360.880.641.751.411.360.631. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenCL Particle FilterGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiTITAN RTX3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 3SE +/- 0.07, N = 3SE +/- 0.16, N = 3SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.07, N = 312.078.306.544.9710.229.036.174.454.261. (CXX) g++ options: -O2 -lOpenCL

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2GTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiTITAN RTX60120180240300SE +/- 0.07, N = 3SE +/- 0.15, N = 3SE +/- 0.15, N = 3SE +/- 0.16, N = 3SE +/- 0.03, N = 3SE +/- 0.08, N = 3SE +/- 0.23, N = 3SE +/- 0.29, N = 3SE +/- 0.64, N = 3103140155198114121237294297

Mixbench

Benchmark: Single Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2016-06-06Benchmark: Single PrecisionGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX4K8K12K16K20KSE +/- 0.99, N = 3SE +/- 48.40, N = 3SE +/- 9.71, N = 3SE +/- 548.77, N = 3SE +/- 1.73, N = 3SE +/- 6.28, N = 3SE +/- 1.47, N = 3SE +/- 21.81, N = 3SE +/- 0.59, N = 3SE +/- 3.03, N = 3SE +/- 5.02, N = 3SE +/- 8.55, N = 34346641085701160558606557110291617559151051912458173241. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX100200300400500SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.39, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.24, N = 31391872093172172183284541842032224841. (CC) gcc options: -O2 -flto -lOpenCL

LuxMark

OpenCL Device: GPU - Scene: Luxball HDR

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: Luxball HDRGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiRX 580RX Vega 56RX Vega 64TITAN RTX10K20K30K40K50KSE +/- 0.88, N = 3SE +/- 0.58, N = 3SE +/- 29.04, N = 3SE +/- 119.86, N = 3SE +/- 15.76, N = 3SE +/- 1.33, N = 3SE +/- 2.73, N = 3SE +/- 82.86, N = 3SE +/- 16.67, N = 3SE +/- 37.86, N = 3SE +/- 545.54, N = 4SE +/- 41.88, N = 3122381728813823215621677817288296414231415270306493254545932

LuxMark

OpenCL Device: GPU - Scene: Microphone

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: MicrophoneGTX 1060GTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiTITAN RTX7K14K21K28K35KSE +/- 0.33, N = 3SE +/- 1.20, N = 3SE +/- 15.59, N = 3SE +/- 12.67, N = 3SE +/- 23.78, N = 3SE +/- 20.28, N = 3SE +/- 30.23, N = 3SE +/- 18.48, N = 3SE +/- 39.86, N = 3696399808732137321110510929198552847630528

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: HotelGTX 1070GTX 1080GTX 1080 TiGTX 980 TiGTX TITAN X GM200RTX 2080RTX 2080 TiTITAN RTX2K4K6K8K10KSE +/- 8.21, N = 3SE +/- 18.33, N = 3SE +/- 6.01, N = 3SE +/- 10.50, N = 3SE +/- 31.83, N = 3SE +/- 1.15, N = 3SE +/- 15.93, N = 3SE +/- 38.04, N = 338773823558138794135658991919884


Phoronix Test Suite v10.8.4