CUDA vs. OpenCL NVIDIA Pascal GPU Computing

Tests by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1606129-HA-CUDAVSOPE49.

CUDA vs. OpenCL NVIDIA Pascal GPU ComputingProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay DriverOpenGLVulkanCompilerFile-SystemScreen ResolutionGeForce GTX 1080Intel Xeon E3-1280 v5 @ 4.00GHz (8 Cores)MSI C236A WORKSTATION (MS-7998) v1.0Intel Sky Lake16384MBSamsung SSD 950 PRO 256GBDevice 8187MB (1603/5005MHz)Realtek ALC1150Intel ConnectionUbuntu 16.044.4.0-22-generic (x86_64)Unity 7.4.0NVIDIA 367.184.5.01.0.8GCC 5.3.1 20160413 + CUDA 8.0ext43840x2160OpenBenchmarking.org- --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v - Scaling Governor: intel_pstate powersave- GPU Compute Cores: 2560- GPU Compute Cores: 2560.

CUDA vs. OpenCL NVIDIA Pascal GPU Computingshoc: Triadshoc: FFT SPshoc: MD5 Hashshoc: Max SP Flopsshoc: Bus Speed Downloadshoc: Bus Speed Readbackshoc: Texture Read BandwidthGeForce GTX 1080CUDAOpenCL14.81462.2011.979366.8012.5113.22525.6411.93324.5611.849322.0112.5313.22518.08OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Benchmark: TriadCUDAOpenCL48121620SE +/- 0.05, N = 3SE +/- 0.01, N = 314.8111.931. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Benchmark: FFT SPCUDAOpenCL100200300400500SE +/- 2.48, N = 3SE +/- 3.01, N = 3462.20324.561. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Benchmark: MD5 HashCUDAOpenCL3691215SE +/- 0.01, N = 3SE +/- 0.00, N = 311.9711.841. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Benchmark: Max SP FlopsCUDAOpenCL2K4K6K8K10KSE +/- 74.92, N = 3SE +/- 2.31, N = 39366.809322.011. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Benchmark: Bus Speed DownloadCUDAOpenCL3691215SE +/- 0.01, N = 3SE +/- 0.00, N = 312.5112.531. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Benchmark: Bus Speed ReadbackCUDAOpenCL3691215SE +/- 0.00, N = 3SE +/- 0.00, N = 313.2213.221. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Benchmark: Texture Read BandwidthCUDAOpenCL110220330440550SE +/- 1.00, N = 3SE +/- 1.01, N = 3525.64518.081. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft


Phoronix Test Suite v10.8.4