Radeon ROCm 2.0 OpenCL Compute Versus NVIDIA Linux

ROCm 2.0 Linux GPGPU/compute benchmarks for a future article on Phoronix.com by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1812285-SP-ROCM20NVI57.

Radeon ROCm 2.0 OpenCL Compute Versus NVIDIA LinuxProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 64Intel Core i9-9900K @ 5.00GHz (8 Cores / 16 Threads)ASUS PRIME Z390-A (0602 BIOS)Intel Cannon Lake PCH Shared SRAM16384MB2000GB SABRENT + Samsung SSD 970 EVO 250GBNVIDIA GeForce GTX 980 Ti 6GB (999/3505MHz)Realtek ALC1220Acer B286HKIntel ConnectionUbuntu 18.044.19.5-041905-generic (x86_64)GNOME Shell 3.28.3X Server 1.19.6NVIDIA 415.234.6.0OpenCL 1.2 CUDA 10.0.1321.1.84GCC 7.3.0 + LLVM 6.0.0 + CUDA 10.0ext43840x2160NVIDIA GeForce GTX TITAN X 12GB (1001/3505MHz)NVIDIA GeForce GTX 1060 6GB (1506/4006MHz)NVIDIA GeForce GTX 1070 8GB (1506/4006MHz)NVIDIA GeForce GTX 1080 8GB (1607/5005MHz)NVIDIA GeForce GTX 1080 Ti 11GB (1480/5508MHz)Zotac NVIDIA GeForce RTX 2080 8GB (1515/7000MHz)NVIDIA GeForce RTX 2080 Ti 11GB (1350/7000MHz)NVIDIA TITAN RTX 24GB (1350/7000MHz)MSI AMD Radeon RX 470/480 8GB (1366/2000MHz)4.5 Mesa 19.0.0-devel (git-17218a0406) (LLVM 8.0.0)OpenCL 2.1 AMD-APP (2783.0)1.1.90AMD Radeon RX Vega 8GB (1590/800MHz)4.15.0-43-generic (x86_64)AMD Radeon RX Vega 8GB (1630/945MHz)OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performanceOpenCL Details- GTX 980 Ti: GPU Compute Cores: 2816- GTX TITAN X GM200: GPU Compute Cores: 3072- GTX 1060: GPU Compute Cores: 1280- GTX 1070: GPU Compute Cores: 1920- GTX 1080: GPU Compute Cores: 2560- GTX 1080 Ti: GPU Compute Cores: 3584- RTX 2080: GPU Compute Cores: 2944- RTX 2080 Ti: GPU Compute Cores: 4352- TITAN RTX: GPU Compute Cores: 4608Python Details- Python 2.7.15rc1 + Python 3.6.7Security Details- __user pointer sanitization + Full generic retpoline IBPB IBRS_FW + SSB disabled via prctl and seccomp

Radeon ROCm 2.0 OpenCL Compute Versus NVIDIA Linuxdarktable: Boat - OpenCLdarktable: Server Room - OpenCLparboil: OpenCL TPACFrodinia: OpenCL Particle Filtermixbench: Single Precisionclpeak: Global Memory Bandwidthclpeak: Integer Compute INTfahbench: luxmark: GPU - Luxball HDRluxmark: GPU - Microphoneluxmark: GPU - Hotelshoc: OpenCL - FFT SPcl-mem: CopyGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 643.271.521.3910.2258602641616114167781110538797282173.111.361.369.0365572631783121172881092941357262183.771.991.3912.07434614712851031223869633021392.871.101.088.306410196168414017289998038774521872.721.121.066.548570222245015513833873238235752092.271.020.774.97116053293366198216851373255819723171.880.800.886.1711029368101992372965019855658910833281.620.730.644.4516175505148402944231428476919114434541.610.730.634.26173245281539829746377305289884154848413.604.191.75591520512521527054818413.644.211.41105193172006306499322033.581.761.36124583622494325451074222OpenBenchmarking.org

Darktable

Test: Boat - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Boat - Acceleration: OpenCLGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 6448121620SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 33.273.113.772.872.722.271.881.621.6113.6013.643.58

Darktable

Test: Server Room - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Server Room - Acceleration: OpenCLGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 640.94731.89462.84193.78924.7365SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 31.521.361.991.101.121.020.800.730.734.194.211.76

Parboil

Test: OpenCL TPACF

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenCL TPACFGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 640.39380.78761.18141.57521.969SE +/- 0.05, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 6SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.391.361.391.081.060.770.880.640.631.751.411.361. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenCL Particle FilterGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTX3691215SE +/- 0.07, N = 3SE +/- 0.16, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.07, N = 310.229.0312.078.306.544.976.174.454.261. (CXX) g++ options: -O2 -lOpenCL

Mixbench

Benchmark: Single Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2016-06-06Benchmark: Single PrecisionGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 644K8K12K16K20KSE +/- 1.73, N = 3SE +/- 6.28, N = 3SE +/- 0.99, N = 3SE +/- 48.40, N = 3SE +/- 9.71, N = 3SE +/- 548.77, N = 3SE +/- 1.47, N = 3SE +/- 21.81, N = 3SE +/- 8.55, N = 3SE +/- 0.59, N = 3SE +/- 3.03, N = 3SE +/- 5.02, N = 35860655743466410857011605110291617517324591510519124581. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Global Memory BandwidthGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 64110220330440550SE +/- 0.01, N = 3SE +/- 0.26, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.70, N = 3SE +/- 0.23, N = 3SE +/- 0.74, N = 3SE +/- 1.05, N = 3SE +/- 1.53, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3264263147196222329368505528205317362

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is BetterclpeakOpenCL Test: Integer Compute INTGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 643K6K9K12K15KSE +/- 21.99, N = 3SE +/- 23.76, N = 3SE +/- 4.82, N = 3SE +/- 18.17, N = 3SE +/- 11.35, N = 3SE +/- 18.09, N = 3SE +/- 652.84, N = 3SE +/- 697.03, N = 3SE +/- 1168.75, N = 3SE +/- 0.01, N = 3SE +/- 1.26, N = 3SE +/- 2.18, N = 3161617831285168424503366101991484015398125220062494

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2GTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTX60120180240300SE +/- 0.03, N = 3SE +/- 0.08, N = 3SE +/- 0.07, N = 3SE +/- 0.15, N = 3SE +/- 0.15, N = 3SE +/- 0.16, N = 3SE +/- 0.23, N = 3SE +/- 0.29, N = 3SE +/- 0.64, N = 3114121103140155198237294297

LuxMark

OpenCL Device: GPU - Scene: Luxball HDR

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: Luxball HDRGTX 980 TiGTX TITAN X GM200GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXGTX 1060RX 580RX Vega 56RX Vega 6410K20K30K40K50KSE +/- 26.00, N = 3SE +/- 11.50, N = 3SE +/- 0.58, N = 3SE +/- 29.04, N = 3SE +/- 119.86, N = 3SE +/- 2.73, N = 3SE +/- 66.64, N = 3SE +/- 41.88, N = 3SE +/- 0.88, N = 3SE +/- 16.67, N = 3SE +/- 37.86, N = 3SE +/- 545.54, N = 4169101729917288138232156229641426934593212238152703064932545

LuxMark

OpenCL Device: GPU - Scene: Microphone

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: MicrophoneGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTX7K14K21K28K35KSE +/- 23.78, N = 3SE +/- 20.28, N = 3SE +/- 0.33, N = 3SE +/- 1.20, N = 3SE +/- 15.59, N = 3SE +/- 12.67, N = 3SE +/- 30.23, N = 3SE +/- 18.48, N = 3SE +/- 39.86, N = 3111051092969639980873213732198552847630528

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: HotelGTX 980 TiGTX TITAN X GM200GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTX2K4K6K8K10KSE +/- 10.50, N = 3SE +/- 31.83, N = 3SE +/- 8.21, N = 3SE +/- 18.33, N = 3SE +/- 6.01, N = 3SE +/- 1.15, N = 3SE +/- 15.93, N = 3SE +/- 38.04, N = 338794135387738235581658991919884

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 6430060090012001500SE +/- 0.20, N = 3SE +/- 0.60, N = 3SE +/- 1.44, N = 3SE +/- 0.94, N = 3SE +/- 2.68, N = 3SE +/- 1.23, N = 3SE +/- 5.95, N = 3SE +/- 14.24, N = 3SE +/- 11.44, N = 3SE +/- 0.10, N = 3SE +/- 1.83, N = 3SE +/- 2.24, N = 372872630245257597210831443154854893210741. (CXX) g++ options: -O2 -lSHOCCommon -std=c++14 -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyGTX 980 TiGTX TITAN X GM200GTX 1060GTX 1070GTX 1080GTX 1080 TiRTX 2080RTX 2080 TiTITAN RTXRX 580RX Vega 56RX Vega 64100200300400500SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.09, N = 3SE +/- 0.39, N = 3SE +/- 0.24, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 32172181391872093173284544841842032221. (CC) gcc options: -O2 -flto -lOpenCL


Phoronix Test Suite v10.8.4