NVIDIA vs. OpenCL ROCm Linux vs. AMDGPU-PRO Benchmarks

ROCm 1.4 benchmarks on Ubuntu 16.04 compared to AMDGPU-PRO. Now with NVIDIA comparison points. OpenCL benchmarks by Michael Larabel for a future article on Phoronix.com.

HTML result view exported from: https://openbenchmarking.org/result/1701190-KH-1701193RI82&rdt&grr.

NVIDIA vs. OpenCL ROCm Linux vs. AMDGPU-PRO BenchmarksProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionVulkanRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 TiIntel Xeon E3-1280 v5 @ 4.00GHz (8 Cores)MSI C236A WORKSTATION (MS-7998) v1.0Intel Sky Lake16384MB256GB TOSHIBA-RD400Sapphire AMD Radeon R9 FURY / NANO 3968MBRealtek ALC1150Acer B286HKIntel ConnectionUbuntu 16.044.6.0-kfd-compute-rocm-rel-1.4-16 (x86_64)Unity 7.4.0X Server 1.18.3modesetting 1.18.34.1 Mesa 11.2.0 Gallium 0.4OpenCL 2.0 AMD-APP (2300.5)GCC 5.4.0 20160609 + Clang 4.0 + LLVM 4.0.0ext43840x2160LLVMpipe3.3 Mesa 11.2.0 Gallium 0.4AMD Radeon RX 460 2048MB4.4.0-59-generic (x86_64)amdgpu 1.1.994.5.13462OpenCL 2.0 AMD-APP (2236.5)GCC 5.4.0 20160609AMD Radeon RX 480 8192MBSapphire AMD Radeon R9 Fury 4096MBNVIDIA GeForce GTX 1060 6GB 6144MB (418/4006MHz)NVIDIA 375.264.5.0OpenCL 1.2 CUDA 8.0.01.0.24NVIDIA GeForce GTX 1070 8192MB (1504/4006MHz)NVIDIA GeForce GTX 1080 8192MB (109/5005MHz)Zotac NVIDIA GeForce GTX 1050 2048MB (1075/3504MHz)eVGA NVIDIA GeForce GTX 1050 Ti 4096MB (1341/3504MHz)OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- Scaling Governor: intel_pstate powersaveGraphics Details- Radeon R9 Fury - ROCm, Radeon RX 460 - AMDGPU-PRO, Radeon RX 480 - AMDGPU-PRO, Radeon R9 Fury - AMDGPU-PRO: GLAMOREnvironment Details- Radeon RX 480 - ROCm, Radeon RX 460 - ROCm: LIBGL_ALWAYS_SOFTWARE=1OpenCL Details- GeForce GTX 1060: GPU Compute Cores: 1280- GeForce GTX 1070: GPU Compute Cores: 1920- GeForce GTX 1080: GPU Compute Cores: 2560- GeForce GTX 1050: GPU Compute Cores: 640- GeForce GTX 1050 Ti: GPU Compute Cores: 768System Details- GeForce GTX 1060: GPU Compute Cores: 1280.- GeForce GTX 1070: GPU Compute Cores: 1920.- GeForce GTX 1080: GPU Compute Cores: 2560.- GeForce GTX 1050: GPU Compute Cores: 640.- GeForce GTX 1050 Ti: GPU Compute Cores: 768.

NVIDIA vs. OpenCL ROCm Linux vs. AMDGPU-PRO Benchmarksluxmark: GPU - Luxball HDRluxmark: GPU - Microphoneluxmark: GPU - Hotelmandelgpu: GPUmandelbulbgpu: GPUjuliagpu: GPUdarktable: Server Room - OpenCLdarktable: Masskrug - OpenCLdarktable: Boat - OpenCLrodinia: OpenCL Heartwallshoc: OpenCL - Texture Read Bandwidthshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Max SP Flopsshoc: OpenCL - FFT SPshoc: OpenCL - TriadRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti119955695120182051996.2744388927.1273072755.801.486.094.986.45214.5310.8611.325330.67399.7110.59919698759296261.8749050438.6770675082.100.995.935.727.28193.498.388.375815.52403.227.94366438128295516.3329562658.9046101692.272.487.059.5713.5191.145.275.722158.12158.215.215547262389735552080.1532208376.9850807022.252.837.209.517.9777.357.146.932066.69245.136.25140666924239981101281.9048517365.8081972594.400.995.764.375.35160.5714.2013.665750.69508.209.401939476812402107202116.4043447360.4075992404.701.796.304.226.38223.2514.2113.697131.18751.864.121176852042092112043183.4763345982.20115523522.731.205.904.673.36393.6913.2212.784780.88296.8811.851621573023023159458228.2379620073.63144431468.400.995.743.87446.6413.2212.787115.54456.7212.081296863882993206148858.5391109498.40165302847.330.995.733.72520.5113.2212.789415.48573.7112.2066563300112851548791.3037667402.0364896787.1311.7815.1615.455.27282.4913.1112.752115.38223.3011.2573913612133464272664.5744889116.7078171484.9711.0115.4413.973.65316.1013.2212.782697.13188.1611.38OpenBenchmarking.org

LuxMark

OpenCL Device: GPU - Scene: Luxball HDR

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: Luxball HDRRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti4K8K12K16K20KSE +/- 17.34, N = 3SE +/- 0.67, N = 3SE +/- 17.00, N = 3SE +/- 9.82, N = 3SE +/- 68.10, N = 3SE +/- 75.47, N = 3SE +/- 36.34, N = 3SE +/- 2.31, N = 3SE +/- 12.45, N = 3SE +/- 5.20, N = 3SE +/- 17.00, N = 311995919636645547140661939411768162151296866567391

LuxMark

OpenCL Device: GPU - Scene: Microphone

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: MicrophoneRadeon R9 Fury - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti16003200480064008000SE +/- 15.04, N = 3SE +/- 6.98, N = 3SE +/- 13.50, N = 3SE +/- 17.84, N = 3SE +/- 3.51, N = 3SE +/- 38.17, N = 3SE +/- 2.03, N = 3SE +/- 3.38, N = 3SE +/- 2.52, N = 3569526236924768152047302638833003612

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: HotelRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti6001200180024003000SE +/- 0.00, N = 3SE +/- 2.40, N = 3SE +/- 0.58, N = 3SE +/- 1.00, N = 3SE +/- 6.94, N = 3SE +/- 11.46, N = 3SE +/- 6.03, N = 3SE +/- 4.91, N = 3SE +/- 9.00, N = 3SE +/- 5.67, N = 3SE +/- 3.79, N = 312019873818972399240220923023299311281334

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPURadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti40M80M120M160M200MSE +/- 71744.88, N = 3SE +/- 126265.24, N = 3SE +/- 30521.44, N = 3SE +/- 165178.15, N = 2SE +/- 104172.86, N = 3SE +/- 248567.98, N = 3SE +/- 971382.09, N = 3SE +/- 26110.91, N = 3SE +/- 75826.86, N = 382051996.2759296261.8728295516.3335552080.1581101281.90107202116.40112043183.47159458228.23206148858.5351548791.3064272664.571. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

MandelbulbGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelbulbGPU 1.0pts1OpenCL Device: GPURadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti20M40M60M80M100MSE +/- 2304744.64, N = 6SE +/- 81023.55, N = 3SE +/- 117840.12, N = 3SE +/- 561923.72, N = 4SE +/- 290297.61, N = 3SE +/- 503324.21, N = 3SE +/- 423859.17, N = 3SE +/- 36018.97, N = 3SE +/- 112131.74, N = 344388927.1249050438.6729562658.9032208376.9848517365.8043447360.4063345982.2079620073.6391109498.4037667402.0344889116.701. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

JuliaGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterJuliaGPU 1.2pts1OpenCL Device: GPURadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti40M80M120M160M200MSE +/- 985012.76, N = 3SE +/- 94849.94, N = 3SE +/- 160084.77, N = 3SE +/- 97714.65, N = 2SE +/- 500734.10, N = 2SE +/- 194570.11, N = 3SE +/- 169012.99, N = 3SE +/- 694138.93, N = 3SE +/- 29908.92, N = 3SE +/- 109924.23, N = 373072755.8070675082.1046101692.2750807022.2581972594.4075992404.70115523522.73144431468.40165302847.3364896787.1378171484.971. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm

Darktable

Test: Server Room - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.2.1Test: Server Room - Acceleration: OpenCLRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti3691215SE +/- 0.02, N = 3SE +/- 0.02, N = 4SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.04, N = 6SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 31.480.992.482.830.991.791.200.990.9911.7811.01

Darktable

Test: Masskrug - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.2.1Test: Masskrug - Acceleration: OpenCLRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti48121620SE +/- 0.00, N = 3SE +/- 0.08, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 36.095.937.057.205.766.305.905.745.7315.1615.44

Darktable

Test: Boat - Acceleration: OpenCL

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.2.1Test: Boat - Acceleration: OpenCLRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti48121620SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.77, N = 6SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 34.985.729.579.514.374.224.673.873.7215.4513.97

Rodinia

Test: OpenCL Heartwall

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenCL HeartwallRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1050GeForce GTX 1050 Ti3691215SE +/- 0.16, N = 6SE +/- 0.03, N = 3SE +/- 0.08, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.07, N = 3SE +/- 0.05, N = 5SE +/- 0.04, N = 3SE +/- 0.04, N = 36.457.2813.517.975.356.383.365.273.651. (CXX) g++ options: -O2 -lOpenCL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti110220330440550SE +/- 4.26, N = 3SE +/- 1.30, N = 3SE +/- 0.16, N = 3SE +/- 0.69, N = 3SE +/- 0.37, N = 3SE +/- 1.03, N = 3SE +/- 0.96, N = 3SE +/- 0.12, N = 3SE +/- 1.14, N = 3SE +/- 0.98, N = 3SE +/- 1.06, N = 3214.53193.4991.1477.35160.57223.25393.69446.64520.51282.49316.10-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed ReadbackRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti48121620SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 310.868.385.277.1414.2014.2113.2213.2213.2213.1113.22-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed DownloadRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti48121620SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 311.328.375.726.9313.6613.6912.7812.7812.7812.7512.78-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Max SP FlopsRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti2K4K6K8K10KSE +/- 369.63, N = 6SE +/- 4.14, N = 3SE +/- 0.07, N = 3SE +/- 18.41, N = 3SE +/- 30.75, N = 3SE +/- 0.69, N = 3SE +/- 22.70, N = 3SE +/- 52.21, N = 3SE +/- 70.36, N = 3SE +/- 0.01, N = 3SE +/- 5.23, N = 35330.675815.522158.122066.695750.697131.184780.887115.549415.482115.382697.13-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti160320480640800SE +/- 0.44, N = 3SE +/- 6.05, N = 3SE +/- 0.04, N = 3SE +/- 1.23, N = 3SE +/- 2.19, N = 3SE +/- 14.35, N = 3SE +/- 4.87, N = 3SE +/- 6.56, N = 6SE +/- 6.31, N = 3SE +/- 2.58, N = 3SE +/- 2.31, N = 3399.71403.22158.21245.13508.20751.86296.88456.72573.71223.30188.16-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: TriadRadeon R9 Fury - ROCmRadeon RX 480 - ROCmRadeon RX 460 - ROCmRadeon RX 460 - AMDGPU-PRORadeon RX 480 - AMDGPU-PRORadeon R9 Fury - AMDGPU-PROGeForce GTX 1060GeForce GTX 1070GeForce GTX 1080GeForce GTX 1050GeForce GTX 1050 Ti3691215SE +/- 0.04, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.10, N = 3SE +/- 0.14, N = 4SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 310.597.945.216.259.404.1211.8512.0812.2011.2511.38-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi-lSHOCCommonMPI -pthread -lmpi_cxx -lmpi1. (CXX) g++ options: -O2 -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt


Phoronix Test Suite v10.8.4