OpenCL ROCm 2.0 vs. AMDGPU-PRO Linux ROCm Benchmark ROCm 2.0: Processor: AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads), Motherboard: ASUS ROG ZENITH EXTREME (1601 BIOS), Chipset: AMD Family 17h, Memory: 32768MB, Disk: 16GB Voyager 3.0 + Samsung SSD 970 EVO 500GB, Graphics: AMD Radeon RX Vega 8GB (1630/945MHz), Audio: Realtek ALC1220, Monitor: ASUS VP28U, Network: Intel I211 + Qualcomm Atheros QCA6174 802.11ac + Wilocity Wil6200 802.11ad OS: Ubuntu 18.04, Kernel: 4.15.0-43-generic (x86_64), Desktop: GNOME Shell 3.28.3, Display Server: X Server 1.19.6, Display Driver: amdgpu 18.0.1, OpenGL: 4.5 Mesa 18.0.5 (LLVM 6.0.0), Compiler: GCC 7.3.0, File-System: ext4, Screen Resolution: 3840x2160 AMDGPU-PRO 18.50 PAL: Processor: AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads), Motherboard: ASUS ROG ZENITH EXTREME (1601 BIOS), Chipset: AMD Family 17h, Memory: 32768MB, Disk: 16GB Voyager 3.0 + Samsung SSD 970 EVO 500GB, Graphics: AMD Radeon RX Vega 8GB (1630/945MHz), Audio: Realtek ALC1220, Monitor: ASUS VP28U, Network: Intel I211 + Qualcomm Atheros QCA6174 802.11ac + Wilocity Wil6200 802.11ad OS: Ubuntu 18.04, Kernel: 4.15.0-43-generic (x86_64), Desktop: GNOME Shell 3.28.3, Display Server: X Server 1.19.6, Display Driver: amdgpu 18.1.99, OpenGL: 4.6.13542, Compiler: GCC 7.3.0, File-System: ext4, Screen Resolution: 3840x2160 RX 580 8Gb - ROCm 2.7.0: Processor: AMD Ryzen 7 2700 Eight-Core @ 3.20GHz (8 Cores / 16 Threads), Motherboard: ASRock AB350 Pro4 (P5.40 BIOS), Chipset: AMD 17h, Memory: 32768MB, Disk: 240GB SanDisk SSD PLUS + 2000GB Seagate ST2000DM001-9YN1, Graphics: AMD Radeon RX 470/480/570/570X/580/580X/590 8GB (1350/2000MHz), Audio: AMD Ellesmere HDMI Audio, Monitor: S24D332, Network: Realtek RTL8111/8168/8411 OS: Arch rolling, Kernel: 5.2.11-arch1-1-ARCH (x86_64), Desktop: KDE Plasma 5.16.4, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.1.5 (LLVM 8.0.1), OpenCL: OpenCL 2.0 AMD-APP.internal (2949.0), Vulkan: 1.1.90, Compiler: GCC 9.1.0 + Clang 8.0.1, File-System: ext4, Screen Resolution: 1920x1080 SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Triad GB/s > Higher Is Better ROCm 2.0 ................ 6.69 |=================================== AMDGPU-PRO 18.50 PAL .... 6.07 |================================ RX 580 8Gb - ROCm 2.7.0 . 8.96 |=============================================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: FFT SP GFLOPS > Higher Is Better ROCm 2.0 ................ 1075 |=============================================== AMDGPU-PRO 18.50 PAL .... 863 |====================================== RX 580 8Gb - ROCm 2.7.0 . 542 |======================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: MD5 Hash GHash/s > Higher Is Better ROCm 2.0 ................ 16.46 |=========================================== AMDGPU-PRO 18.50 PAL .... 17.63 |============================================== RX 580 8Gb - ROCm 2.7.0 . 7.95 |===================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Bus Speed Download GB/s > Higher Is Better ROCm 2.0 ................ 7.14 |============================= AMDGPU-PRO 18.50 PAL .... 7.14 |============================= RX 580 8Gb - ROCm 2.7.0 . 11.34 |============================================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Bus Speed Readback GB/s > Higher Is Better ROCm 2.0 ................ 7.16 |============================================= AMDGPU-PRO 18.50 PAL .... 7.16 |============================================= RX 580 8Gb - ROCm 2.7.0 . 7.53 |=============================================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Texture Read Bandwidth GB/s > Higher Is Better ROCm 2.0 ............. 441 |=================================================== AMDGPU-PRO 18.50 PAL . 424 |================================================= cl-mem 2017-01-13 Benchmark: Copy GB/s > Higher Is Better ROCm 2.0 ................ 221 |============================= AMDGPU-PRO 18.50 PAL .... 364 |================================================ RX 580 8Gb - ROCm 2.7.0 . 174 |======================= cl-mem 2017-01-13 Benchmark: Read GB/s > Higher Is Better ROCm 2.0 ................ 160 |=================== AMDGPU-PRO 18.50 PAL .... 398 |================================================ RX 580 8Gb - ROCm 2.7.0 . 155 |=================== cl-mem 2017-01-13 Benchmark: Write GB/s > Higher Is Better ROCm 2.0 ................ 384 |================================================ AMDGPU-PRO 18.50 PAL .... 379 |=============================================== RX 580 8Gb - ROCm 2.7.0 . 188 |======================== PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL Examples Per Second > Higher Is Better ROCm 2.0 ............. 252 |================================================= AMDGPU-PRO 18.50 PAL . 263 |=================================================== PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better ROCm 2.0 ............. 479 |=================================================== AMDGPU-PRO 18.50 PAL . 479 |=================================================== PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: OpenCL Examples Per Second > Higher Is Better ROCm 2.0 ............. 225 |================================================= AMDGPU-PRO 18.50 PAL . 233 |=================================================== PlaidML FP16: No - Mode: Inference - Network: Inception V3 - Device: OpenCL Examples Per Second > Higher Is Better ROCm 2.0 ............. 113.88 |======================================== AMDGPU-PRO 18.50 PAL . 136.38 |================================================ LeelaChessZero 0.20.1 Backend: OpenCL Nodes Per Second > Higher Is Better ROCm 2.0 ................ 304 |====================== RX 580 8Gb - ROCm 2.7.0 . 650 |================================================ Rodinia 2.4 Test: OpenCL Heartwall Seconds < Lower Is Better ROCm 2.0 ................ 4.03 |=============================================== AMDGPU-PRO 18.50 PAL .... 3.80 |============================================ RX 580 8Gb - ROCm 2.7.0 . 4.07 |=============================================== Darktable 2.4.2 Test: Boat - Acceleration: OpenCL Seconds < Lower Is Better ROCm 2.0 ............. 4.48 |========================== AMDGPU-PRO 18.50 PAL . 8.50 |================================================== Darktable 2.4.2 Test: Masskrug - Acceleration: OpenCL Seconds < Lower Is Better ROCm 2.0 ............. 5.70 |================================================== AMDGPU-PRO 18.50 PAL . 5.47 |================================================ Darktable 2.4.2 Test: Server Rack - Acceleration: OpenCL Seconds < Lower Is Better ROCm 2.0 ............. 0.23 |================================================== AMDGPU-PRO 18.50 PAL . 0.13 |============================ Darktable 2.4.2 Test: Server Room - Acceleration: OpenCL Seconds < Lower Is Better ROCm 2.0 ............. 1.97 |====================================== AMDGPU-PRO 18.50 PAL . 2.57 |================================================== JuliaGPU 1.2pts1 OpenCL Device: GPU Samples/sec > Higher Is Better ROCm 2.0 ................ 166858816 |============================ AMDGPU-PRO 18.50 PAL .... 240293469 |========================================= RX 580 8Gb - ROCm 2.7.0 . 248633196 |========================================== clpeak OpenCL Test: Kernel Latency us < Lower Is Better ROCm 2.0 ............. 10.56 |============ AMDGPU-PRO 18.50 PAL . 41.75 |================================================= clpeak OpenCL Test: Integer Compute INT GIOPS > Higher Is Better ROCm 2.0 ............. 2497 |================================================== AMDGPU-PRO 18.50 PAL . 2486 |================================================== clpeak OpenCL Test: Single-Precision Float GFLOPS > Higher Is Better ROCm 2.0 ............. 13053 |================================================= AMDGPU-PRO 18.50 PAL . 12528 |=============================================== clpeak OpenCL Test: Double-Precision Double GFLOPS > Higher Is Better ROCm 2.0 ............. 833 |=================================================== AMDGPU-PRO 18.50 PAL . 828 |=================================================== clpeak OpenCL Test: Global Memory Bandwidth GBPS > Higher Is Better ROCm 2.0 ............. 362 |=================================================== AMDGPU-PRO 18.50 PAL . 361 |=================================================== clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer GBPS > Higher Is Better ROCm 2.0 ............. 17.06 |================================================= AMDGPU-PRO 18.50 PAL . 10.92 |=============================== clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer GBPS > Higher Is Better ROCm 2.0 ............. 45.44 |================================================= AMDGPU-PRO 18.50 PAL . 22.49 |======================== Darktable 2.6.2 Test: Boat - Acceleration: OpenCL Seconds < Lower Is Better RX 580 8Gb - ROCm 2.7.0 . 14.25 |============================================== Darktable 2.6.2 Test: Masskrug - Acceleration: OpenCL Seconds < Lower Is Better RX 580 8Gb - ROCm 2.7.0 . 7.43 |=============================================== Darktable 2.6.2 Test: Server Rack - Acceleration: OpenCL Seconds < Lower Is Better RX 580 8Gb - ROCm 2.7.0 . 0.23 |=============================================== Darktable 2.6.2 Test: Server Room - Acceleration: OpenCL Seconds < Lower Is Better RX 580 8Gb - ROCm 2.7.0 . 4.85 |===============================================