OpenCL ROCm 2.0 vs. AMDGPU-PRO Linux Radeon RX Vega 64 ROCm 2.0 OpenCL versus PAL OpenCL driver in AMDGPU-PRO 18.50. Benchmarks by Michael Larabel for a future article on Phoronix.com. ROCm 2.0: Processor: AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads), Motherboard: ASUS ROG ZENITH EXTREME (1601 BIOS), Chipset: AMD Family 17h, Memory: 32768MB, Disk: 16GB Voyager 3.0 + Samsung SSD 970 EVO 500GB, Graphics: AMD Radeon RX Vega 8GB (1630/945MHz), Audio: Realtek ALC1220, Monitor: ASUS VP28U, Network: Intel I211 + Qualcomm Atheros QCA6174 802.11ac + Wilocity Wil6200 802.11ad OS: Ubuntu 18.04, Kernel: 4.15.0-43-generic (x86_64), Desktop: GNOME Shell 3.28.3, Display Server: X Server 1.19.6, Display Driver: amdgpu 18.0.1, OpenGL: 4.5 Mesa 18.0.5 (LLVM 6.0.0), Compiler: GCC 7.3.0, File-System: ext4, Screen Resolution: 3840x2160 AMDGPU-PRO 18.50 PAL: Processor: AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads), Motherboard: ASUS ROG ZENITH EXTREME (1601 BIOS), Chipset: AMD Family 17h, Memory: 32768MB, Disk: 16GB Voyager 3.0 + Samsung SSD 970 EVO 500GB, Graphics: AMD Radeon RX Vega 8GB (1630/945MHz), Audio: Realtek ALC1220, Monitor: ASUS VP28U, Network: Intel I211 + Qualcomm Atheros QCA6174 802.11ac + Wilocity Wil6200 802.11ad OS: Ubuntu 18.04, Kernel: 4.15.0-43-generic (x86_64), Desktop: GNOME Shell 3.28.3, Display Server: X Server 1.19.6, Display Driver: amdgpu 18.1.99, OpenGL: 4.6.13542, Compiler: GCC 7.3.0, File-System: ext4, Screen Resolution: 3840x2160 PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL Examples Per Second > Higher Is Better ROCm 2.0 ............. 252 |================================================= AMDGPU-PRO 18.50 PAL . 263 |=================================================== PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better ROCm 2.0 ............. 479 |=================================================== AMDGPU-PRO 18.50 PAL . 479 |=================================================== PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: OpenCL Examples Per Second > Higher Is Better ROCm 2.0 ............. 225 |================================================= AMDGPU-PRO 18.50 PAL . 233 |=================================================== PlaidML FP16: No - Mode: Inference - Network: Inception V3 - Device: OpenCL Examples Per Second > Higher Is Better ROCm 2.0 ............. 113.88 |======================================== AMDGPU-PRO 18.50 PAL . 136.38 |================================================ SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Triad GB/s > Higher Is Better ROCm 2.0 ............. 6.69 |================================================== AMDGPU-PRO 18.50 PAL . 6.07 |============================================= SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Bus Speed Download GB/s > Higher Is Better ROCm 2.0 ............. 7.14 |================================================== AMDGPU-PRO 18.50 PAL . 7.14 |================================================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Bus Speed Readback GB/s > Higher Is Better ROCm 2.0 ............. 7.16 |================================================== AMDGPU-PRO 18.50 PAL . 7.16 |================================================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Texture Read Bandwidth GB/s > Higher Is Better ROCm 2.0 ............. 441 |=================================================== AMDGPU-PRO 18.50 PAL . 424 |================================================= cl-mem 2017-01-13 Benchmark: Copy GB/s > Higher Is Better ROCm 2.0 ............. 221 |=============================== AMDGPU-PRO 18.50 PAL . 364 |=================================================== cl-mem 2017-01-13 Benchmark: Read GB/s > Higher Is Better ROCm 2.0 ............. 160 |===================== AMDGPU-PRO 18.50 PAL . 398 |=================================================== cl-mem 2017-01-13 Benchmark: Write GB/s > Higher Is Better ROCm 2.0 ............. 384 |=================================================== AMDGPU-PRO 18.50 PAL . 379 |================================================== clpeak OpenCL Test: Global Memory Bandwidth GBPS > Higher Is Better ROCm 2.0 ............. 362 |=================================================== AMDGPU-PRO 18.50 PAL . 361 |=================================================== clpeak OpenCL Test: Transfer Bandwidth enqueueReadBuffer GBPS > Higher Is Better ROCm 2.0 ............. 17.06 |================================================= AMDGPU-PRO 18.50 PAL . 10.92 |=============================== clpeak OpenCL Test: Transfer Bandwidth enqueueWriteBuffer GBPS > Higher Is Better ROCm 2.0 ............. 45.44 |================================================= AMDGPU-PRO 18.50 PAL . 22.49 |======================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: FFT SP GFLOPS > Higher Is Better ROCm 2.0 ............. 1075 |================================================== AMDGPU-PRO 18.50 PAL . 863 |======================================== clpeak OpenCL Test: Single-Precision Float GFLOPS > Higher Is Better ROCm 2.0 ............. 13053 |================================================= AMDGPU-PRO 18.50 PAL . 12528 |=============================================== clpeak OpenCL Test: Double-Precision Double GFLOPS > Higher Is Better ROCm 2.0 ............. 833 |=================================================== AMDGPU-PRO 18.50 PAL . 828 |=================================================== SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: MD5 Hash GHash/s > Higher Is Better ROCm 2.0 ............. 16.46 |============================================== AMDGPU-PRO 18.50 PAL . 17.63 |================================================= clpeak OpenCL Test: Integer Compute INT GIOPS > Higher Is Better ROCm 2.0 ............. 2497 |================================================== AMDGPU-PRO 18.50 PAL . 2486 |================================================== LeelaChessZero 0.20.1 Backend: OpenCL Nodes Per Second > Higher Is Better ROCm 2.0 . 304 |=============================================================== JuliaGPU 1.2pts1 OpenCL Device: GPU Samples/sec > Higher Is Better ROCm 2.0 ............. 166858816 |=============================== AMDGPU-PRO 18.50 PAL . 240293469 |============================================= Rodinia 2.4 Test: OpenCL Heartwall Seconds < Lower Is Better ROCm 2.0 ............. 4.03 |================================================== AMDGPU-PRO 18.50 PAL . 3.80 |=============================================== Darktable 2.4.2 Test: Boat - Acceleration: OpenCL Seconds < Lower Is Better ROCm 2.0 ............. 4.48 |========================== AMDGPU-PRO 18.50 PAL . 8.50 |================================================== Darktable 2.4.2 Test: Masskrug - Acceleration: OpenCL Seconds < Lower Is Better ROCm 2.0 ............. 5.70 |================================================== AMDGPU-PRO 18.50 PAL . 5.47 |================================================ Darktable 2.4.2 Test: Server Rack - Acceleration: OpenCL Seconds < Lower Is Better ROCm 2.0 ............. 0.23 |================================================== AMDGPU-PRO 18.50 PAL . 0.13 |============================ Darktable 2.4.2 Test: Server Room - Acceleration: OpenCL Seconds < Lower Is Better ROCm 2.0 ............. 1.97 |====================================== AMDGPU-PRO 18.50 PAL . 2.57 |================================================== clpeak OpenCL Test: Kernel Latency us < Lower Is Better ROCm 2.0 ............. 10.56 |============ AMDGPU-PRO 18.50 PAL . 41.75 |=================================================