OpenCL ROCm Linux vs. AMDGPU-PRO Benchmarks

ROCm 1.4 benchmarks on Ubuntu 16.04 compared to AMDGPU-PRO. OpenCL benchmarks by Michael Larabel for a future article on Phoronix.com.

HTML result view exported from: https://openbenchmarking.org/result/1701170-RI-OPENCLROC11&sro.

ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionR9 FuryRX 460RX 480RX 460RX 480R9 Fury ROCm 1.4 ROCm 1.4 ROCm 1.4 AMDGPU-PRO 16.50 AMDGPU-PRO 16.50 AMDGPU-PRO 16.50Intel Xeon E3-1280 v5 @ 4.00GHz (8 Cores)MSI C236A WORKSTATION (MS-7998) v1.0Intel Sky Lake16384MB256GB TOSHIBA-RD400Sapphire AMD Radeon R9 FURY / NANO 3968MBRealtek ALC1150Acer B286HKIntel ConnectionUbuntu 16.044.6.0-kfd-compute-rocm-rel-1.4-16 (x86_64)Unity 7.4.0X Server 1.18.3modesetting 1.18.34.1 Mesa 11.2.0 Gallium 0.4OpenCL 2.0 AMD-APP (2300.5)GCC 5.4.0 20160609 + Clang 4.0 + LLVM 4.0.0ext43840x2160LLVMpipe3.3 Mesa 11.2.0 Gallium 0.4AMD Radeon RX 460 2048MB4.4.0-59-generic (x86_64)amdgpu 1.1.994.5.13462OpenCL 2.0 AMD-APP (2236.5)GCC 5.4.0 20160609AMD Radeon RX 480 8192MBSapphire AMD Radeon R9 Fury 4096MBOpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- Scaling Governor: intel_pstate powersaveGraphics Details- R9 Fury: ROCm 1.4, RX 460: AMDGPU-PRO 16.50, RX 480: AMDGPU-PRO 16.50, R9 Fury: AMDGPU-PRO 16.50: GLAMOREnvironment Details- RX 460: ROCm 1.4, RX 480: ROCm 1.4: LIBGL_ALWAYS_SOFTWARE=1

shoc: OpenCL - Triadshoc: OpenCL - FFT SPshoc: OpenCL - Max SP Flopsshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthparboil: OpenCL BFSparboil: OpenCL LBMparboil: OpenCL TPACFrodinia: OpenCL Myocyterodinia: OpenCL Heartwalldarktable: Boat - OpenCLdarktable: Masskrug - OpenCLdarktable: Server Room - OpenCLjuliagpu: GPUmandelbulbgpu: GPUmandelgpu: GPUluxmark: GPU - Hotelluxmark: GPU - Microphoneluxmark: GPU - Luxball HDRR9 FuryRX 460RX 480RX 460RX 480R9 Fury ROCm 1.4 ROCm 1.4 ROCm 1.4 AMDGPU-PRO 16.50 AMDGPU-PRO 16.50 AMDGPU-PRO 16.5010.59399.715330.6711.3210.86214.531.4336.812.24362.486.454.986.091.4873072755.8044388927.1282051996.2712015695119955.21158.212158.125.725.2791.141.4761.843.90169.9213.519.577.052.4846101692.2729562658.9028295516.3338136647.94403.225815.528.378.38193.491.4938.492.24244.317.285.725.930.9970675082.1049050438.6759296261.8798791966.25245.132066.696.937.1477.35131.597.979.517.202.8350807022.2532208376.9835552080.15897262355479.40508.205750.6913.6614.20160.5788.305.354.375.760.9981972594.4048517365.8081101281.9023996924140664.12751.867131.1813.6914.21223.256.384.226.301.7975992404.7043447360.40107202116.402402768119394OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

R9 FuryRX 460RX 480OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: TriadAMDGPU-PRO 16.50ROCm 1.43691215SE +/- 0.01, N = 3SE +/- 0.04, N = 3SE +/- 0.10, N = 3SE +/- 0.00, N = 3SE +/- 0.14, N = 4SE +/- 0.01, N = 34.1210.596.255.219.407.941. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

R9 FuryRX 460RX 480OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPAMDGPU-PRO 16.50ROCm 1.4160320480640800SE +/- 14.35, N = 3SE +/- 0.44, N = 3SE +/- 1.23, N = 3SE +/- 0.04, N = 3SE +/- 2.19, N = 3SE +/- 6.05, N = 3751.86399.71245.13158.21508.20403.221. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

R9 FuryRX 460RX 480OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Max SP FlopsAMDGPU-PRO 16.50ROCm 1.415003000450060007500SE +/- 0.69, N = 3SE +/- 369.63, N = 6SE +/- 18.41, N = 3SE +/- 0.07, N = 3SE +/- 30.75, N = 3SE +/- 4.14, N = 37131.185330.672066.692158.125750.695815.521. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

R9 FuryRX 460RX 480OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed DownloadAMDGPU-PRO 16.50ROCm 1.448121620SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 313.6911.326.935.7213.668.371. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

R9 FuryRX 460RX 480OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed ReadbackAMDGPU-PRO 16.50ROCm 1.448121620SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 314.2110.867.145.2714.208.381. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

R9 FuryRX 460RX 480OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthAMDGPU-PRO 16.50ROCm 1.450100150200250SE +/- 1.03, N = 3SE +/- 4.26, N = 3SE +/- 0.69, N = 3SE +/- 0.16, N = 3SE +/- 0.37, N = 3SE +/- 1.30, N = 3223.25214.5377.3591.14160.57193.491. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

Parboil

Test: OpenCL BFS

R9 FuryRX 460RX 480OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenCL BFSROCm 1.40.33530.67061.00591.34121.6765SE +/- 0.02, N = 5SE +/- 0.01, N = 3SE +/- 0.02, N = 31.431.471.491. (CXX) g++ options: -lm -lpthread -lgomp -ffast-math -fopenmp

Parboil

Test: OpenCL LBM

R9 FuryRX 460RX 480OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenCL LBMROCm 1.41428425670SE +/- 2.40, N = 6SE +/- 0.13, N = 3SE +/- 0.21, N = 336.8161.8438.491. (CXX) g++ options: -lm -lpthread -lgomp -ffast-math -fopenmp

Parboil

Test: OpenCL TPACF

R9 FuryRX 460RX 480OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenCL TPACFROCm 1.40.87751.7552.63253.514.3875SE +/- 0.05, N = 6SE +/- 0.00, N = 3SE +/- 0.02, N = 32.243.902.241. (CXX) g++ options: -lm -lpthread -lgomp -ffast-math -fopenmp

Rodinia

Test: OpenCL Myocyte

R9 FuryRX 460RX 480OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenCL MyocyteROCm 1.4AMDGPU-PRO 16.5080160240320400SE +/- 0.09, N = 3SE +/- 0.16, N = 2SE +/- 51.83, N = 6SE +/- 26.88, N = 6SE +/- 1.77, N = 3362.48131.59169.9288.30244.311. (CXX) g++ options: -O2 -lOpenCL

Rodinia

Test: OpenCL Heartwall

R9 FuryRX 460RX 480OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenCL HeartwallAMDGPU-PRO 16.50ROCm 1.43691215SE +/- 0.07, N = 3SE +/- 0.16, N = 6SE +/- 0.01, N = 3SE +/- 0.08, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 36.386.457.9713.515.357.281. (CXX) g++ options: -O2 -lOpenCL

Darktable

Test: Boat - Acceleration: OpenCL

R9 FuryRX 460RX 480OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.2.1Test: Boat - Acceleration: OpenCLAMDGPU-PRO 16.50ROCm 1.43691215SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.77, N = 6SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 34.224.989.519.574.375.72

Darktable

Test: Masskrug - Acceleration: OpenCL

R9 FuryRX 460RX 480OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.2.1Test: Masskrug - Acceleration: OpenCLAMDGPU-PRO 16.50ROCm 1.4246810SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.04, N = 3SE +/- 0.08, N = 36.306.097.207.055.765.93

Darktable

Test: Server Room - Acceleration: OpenCL

R9 FuryRX 460RX 480OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.2.1Test: Server Room - Acceleration: OpenCLAMDGPU-PRO 16.50ROCm 1.40.63681.27361.91042.54723.184SE +/- 0.04, N = 6SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 41.791.482.832.480.990.99

JuliaGPU

OpenCL Device: GPU

R9 FuryRX 460RX 480OpenBenchmarking.orgSamples/sec, More Is BetterJuliaGPU 1.2pts1OpenCL Device: GPUAMDGPU-PRO 16.50ROCm 1.420M40M60M80M100MSE +/- 985012.76, N = 3SE +/- 97714.65, N = 2SE +/- 160084.77, N = 3SE +/- 500734.10, N = 2SE +/- 94849.94, N = 375992404.7073072755.8050807022.2546101692.2781972594.4070675082.101. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm

MandelbulbGPU

OpenCL Device: GPU

R9 FuryRX 460RX 480OpenBenchmarking.orgSamples/sec, More Is BetterMandelbulbGPU 1.0pts1OpenCL Device: GPUAMDGPU-PRO 16.50ROCm 1.411M22M33M44M55MSE +/- 2304744.64, N = 6SE +/- 561923.72, N = 4SE +/- 117840.12, N = 3SE +/- 81023.55, N = 343447360.4044388927.1232208376.9829562658.9048517365.8049050438.671. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

MandelGPU

OpenCL Device: GPU

R9 FuryRX 460RX 480OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUAMDGPU-PRO 16.50ROCm 1.420M40M60M80M100MSE +/- 71744.88, N = 3SE +/- 165178.15, N = 2SE +/- 30521.44, N = 3SE +/- 126265.24, N = 3107202116.4082051996.2735552080.1528295516.3381101281.9059296261.871. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

LuxMark

OpenCL Device: GPU - Scene: Hotel

R9 FuryRX 460RX 480OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: HotelAMDGPU-PRO 16.50ROCm 1.45001000150020002500SE +/- 11.46, N = 3SE +/- 0.00, N = 3SE +/- 1.00, N = 3SE +/- 0.58, N = 3SE +/- 6.94, N = 3SE +/- 2.40, N = 3240212018973812399987

LuxMark

OpenCL Device: GPU - Scene: Microphone

R9 FuryRX 460RX 480OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: MicrophoneAMDGPU-PRO 16.50ROCm 1.416003200480064008000SE +/- 17.84, N = 3SE +/- 15.04, N = 3SE +/- 6.98, N = 3SE +/- 13.50, N = 37681569526236924

LuxMark

OpenCL Device: GPU - Scene: Luxball HDR

R9 FuryRX 460RX 480OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: Luxball HDRAMDGPU-PRO 16.50ROCm 1.44K8K12K16K20KSE +/- 75.47, N = 3SE +/- 17.34, N = 3SE +/- 9.82, N = 3SE +/- 17.00, N = 3SE +/- 68.10, N = 3SE +/- 0.67, N = 3193941199555473664140669196


Phoronix Test Suite v10.8.4