OpenCL CUDA NVIDIA GPGPU Linux Tests

All Maxwell and various Kepler graphics cards tested on the NVIDIA Linux driver. Benchmarks by Michael Larabel for a future article on Phoronix.com just delivering various GPGPU benchmarks for reference purposes.

HTML result view exported from: https://openbenchmarking.org/result/1511155-BASI-151111343.

OpenCL CUDA NVIDIA GPGPU Linux TestsProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionGeForce GTX 680GeForce GTX 750GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 660Intel Core i5-6600K @ 3.50GHz (4 Cores)MSI Z170A GAMING PRO (MS-7984) v1.0Intel Device 191f16384MB256GB TS256GSSD370SNVIDIA GeForce GTX 680 2048MB (1006/3004MHz)Intel Device a170Intel Device 15b8Ubuntu 14.043.19.0-33-generic (x86_64)Unity 7.2.5X Server 1.17.1NVIDIA 352.394.3.0GCC 4.8.4 + Clang 3.4-1ubuntu3 + CUDA 7.5ext43840x2160eVGA NVIDIA GeForce GTX 750 1024MB (1019/2505MHz)NVIDIA GeForce GTX 760 2048MB (980/3004MHz)NVIDIA GeForce GTX 780 Ti 3072MB (875/3500MHz)eVGA NVIDIA GeForce GTX 950 2048MB (135/405MHz)eVGA NVIDIA GeForce GTX 960 2048MB (1277/3505MHz)eVGA NVIDIA GeForce GTX 970 4096MB (1163/3505MHz)NVIDIA GeForce GTX 980 4096MB (1126/3505MHz)NVIDIA GeForce GTX 980 Ti 6144MB (999/3505MHz)NVIDIA GeForce GTX TITAN X 12288MB (1001/3505MHz)Intel Core i7 920 @ 2.67GHz (8 Cores)ASUS P6T18432MB250GB Samsung SSD 850 + 1500GB EZ BackupNVIDIA GeForce GTX 660 Ti 2048MB (324/324MHz)Gentoo 2.24.2.2-gentoo (x86_64)KDE 4.14.14X Server 1.16.4NVIDIA 355.114.4.0GCC 4.9.3 + Clang 3.5.0 + LLVM 3.5.0 + CUDA 7.5ext32970x1680OpenBenchmarking.orgCompiler Details- GeForce GTX 680, GeForce GTX 750, GeForce GTX 760, GeForce GTX 780 Ti, GeForce GTX 950, GeForce GTX 960, GeForce GTX 970, GeForce GTX 980, GeForce GTX 980 Ti, GeForce GTX TITAN X: --build=x86_64-linux-gnu --disable-browser-plugin --disable-libmudflap --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- Scaling Governor: acpi-cpufreq performanceOpenCL Details- GeForce GTX 680: GPU Compute Cores: 1536- GeForce GTX 750: GPU Compute Cores: 512- GeForce GTX 760: GPU Compute Cores: 1152- GeForce GTX 780 Ti: GPU Compute Cores: 2880- GeForce GTX 950: GPU Compute Cores: 768- GeForce GTX 960: GPU Compute Cores: 1024- GeForce GTX 970: GPU Compute Cores: 1664- GeForce GTX 980: GPU Compute Cores: 2048- GeForce GTX 980 Ti: GPU Compute Cores: 2816- GeForce GTX TITAN X: GPU Compute Cores: 3072- GeForce GTX 660: GPU Compute Cores: 1344System Details- GeForce GTX 680: GPU Compute Cores: 1536.- GeForce GTX 750: GPU Compute Cores: 512.- GeForce GTX 760: GPU Compute Cores: 1152.- GeForce GTX 780 Ti: GPU Compute Cores: 2880.- GeForce GTX 950: GPU Compute Cores: 768.- GeForce GTX 960: GPU Compute Cores: 1024.- GeForce GTX 970: GPU Compute Cores: 1664.- GeForce GTX 980: GPU Compute Cores: 2048.- GeForce GTX 980 Ti: GPU Compute Cores: 2816.- GeForce GTX TITAN X: GPU Compute Cores: 3072.- GeForce GTX 660: GPU Compute Cores: 1344.

OpenCL CUDA NVIDIA GPGPU Linux Testsshoc: CUDA - FFT SPshoc: CUDA - MD5 Hashshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: CUDA - Texture Read Bandwidthshoc: OpenCL - Texture Read Bandwidthaskap: Griddingaskap: Degriddingcuda-mini-nbody: Originalcuda-mini-nbody: Cache Blockingcuda-mini-nbody: Loop Unrollingcuda-mini-nbody: SOA Data Layoutcuda-mini-nbody: Flush Denormals To Zerojuliagpu: GPUmandelbulbgpu: GPUluxmark: GPU - Hotelluxmark: GPU - Microphoneluxmark: GPU - Luxball HDRGeForce GTX 680GeForce GTX 750GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 66074.971.91242.1648074789.0331636512.9757721274554113.641.0854.691.07158.42121.14180.6698.1989.34199.95199.8336136874.0020060275.53349178.441.40170.2638310650.5025392138.5046319414253126.713.78286.6261.0329.9927.0554.3953.2678839770.1347400001.9099243029639172.282.3663.222.34326.23239.193399.145706.07105.3049.8947.54108.50108.4864913682.6337156070.8776924235313212.433.3862.783.36351.31269.983144.855290.3282.0137.0835.3579.9779.8480042041.7344953399.4789724605474263.144.79117.234.77325.16283.365325.129509.1454.3228.5326.4255.8755.80104144917.2358811317.17134644589737289.635.70140.125.68336.48332.606051.271109445.3825.1323.8850.1549.53113830604.2763616558.771492477610713311.466.81170.366.79348.92345.558320.5017380.6034.5819.7718.4640.9440.85127978049.5371656708.831855626813802324.097.42173.897.41356.52354.098458.7717380.6032.3718.6517.5937.4337.37136037921.4375614774.1319066360140814.564.205.124.134.1337267979.6322113594.0054417133355OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: FFT SPGeForce GTX 750GeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN X70140210280350SE +/- 0.69, N = 3SE +/- 0.47, N = 3SE +/- 1.49, N = 3SE +/- 2.44, N = 3SE +/- 3.09, N = 3SE +/- 0.32, N = 3SE +/- 1.19, N = 3113.64172.28212.43263.14289.63311.46324.091. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: MD5 HashGeForce GTX 750GeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN X246810SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.082.363.384.795.706.817.421. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPGeForce GTX 680GeForce GTX 750GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN X4080120160200SE +/- 0.87, N = 3SE +/- 0.08, N = 3SE +/- 0.31, N = 3SE +/- 0.19, N = 3SE +/- 0.08, N = 3SE +/- 1.20, N = 3SE +/- 0.52, N = 3SE +/- 1.30, N = 3SE +/- 0.65, N = 3SE +/- 0.19, N = 374.9754.6978.44126.7163.2262.78117.23140.12170.36173.891. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 HashGeForce GTX 680GeForce GTX 750GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN X246810SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.911.071.403.782.343.364.775.686.797.411. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Texture Read BandwidthGeForce GTX 750GeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN X80160240320400SE +/- 0.42, N = 3SE +/- 0.85, N = 3SE +/- 0.14, N = 3SE +/- 0.28, N = 3SE +/- 1.15, N = 3SE +/- 1.22, N = 3SE +/- 0.12, N = 3158.42326.23351.31325.16336.48348.92356.521. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthGeForce GTX 680GeForce GTX 750GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN X80160240320400SE +/- 1.02, N = 3SE +/- 0.23, N = 3SE +/- 0.28, N = 3SE +/- 0.02, N = 3SE +/- 0.73, N = 3SE +/- 0.56, N = 3SE +/- 0.06, N = 3SE +/- 0.20, N = 3SE +/- 0.21, N = 3SE +/- 1.56, N = 3242.16121.14170.26286.62239.19269.98283.36332.60345.55354.091. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

ASKAP tConvolveCuda

Processing: Gridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP tConvolveCuda 2015-11-10Processing: GriddingGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN X2K4K6K8K10KSE +/- 14.40, N = 3SE +/- 12.43, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 130.14, N = 43399.143144.855325.126051.278320.508458.771. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl

ASKAP tConvolveCuda

Processing: Degridding

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP tConvolveCuda 2015-11-10Processing: DegriddingGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN X4K8K12K16K20KSE +/- 41.05, N = 3SE +/- 34.80, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 369.80, N = 3SE +/- 369.80, N = 35706.075290.329509.1411094.0017380.6017380.601. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl

CUDA Mini-Nbody

Test: Original

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: OriginalGeForce GTX 750GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 6604080120160200SE +/- 0.05, N = 3SE +/- 0.50, N = 3SE +/- 0.21, N = 3SE +/- 0.43, N = 3SE +/- 0.13, N = 3SE +/- 0.10, N = 3SE +/- 0.57, N = 3SE +/- 0.35, N = 3SE +/- 0.43, N = 6180.6661.03105.3082.0154.3245.3834.5832.374.56

CUDA Mini-Nbody

Test: Cache Blocking

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Cache BlockingGeForce GTX 750GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 66020406080100SE +/- 0.00, N = 3SE +/- 0.27, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.21, N = 3SE +/- 0.10, N = 3SE +/- 0.02, N = 398.1929.9949.8937.0828.5325.1319.7718.654.20

CUDA Mini-Nbody

Test: Loop Unrolling

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Loop UnrollingGeForce GTX 750GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 66020406080100SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.21, N = 3SE +/- 0.15, N = 3SE +/- 0.25, N = 3SE +/- 0.02, N = 389.3427.0547.5435.3526.4223.8818.4617.595.12

CUDA Mini-Nbody

Test: SOA Data Layout

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: SOA Data LayoutGeForce GTX 750GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 6604080120160200SE +/- 0.04, N = 3SE +/- 0.16, N = 3SE +/- 0.02, N = 3SE +/- 0.08, N = 3SE +/- 0.05, N = 3SE +/- 0.21, N = 3SE +/- 0.11, N = 3SE +/- 0.20, N = 3SE +/- 0.00, N = 3199.9554.39108.5079.9755.8750.1540.9437.434.13

CUDA Mini-Nbody

Test: Flush Denormals To Zero

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Flush Denormals To ZeroGeForce GTX 750GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 6604080120160200SE +/- 0.03, N = 3SE +/- 0.10, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.18, N = 3SE +/- 0.10, N = 3SE +/- 0.08, N = 3SE +/- 0.02, N = 3199.8353.26108.4879.8455.8049.5340.8537.374.13

JuliaGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterJuliaGPU 1.2pts1OpenCL Device: GPUGeForce GTX 680GeForce GTX 750GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 66030M60M90M120M150MSE +/- 59682.63, N = 3SE +/- 22546.70, N = 3SE +/- 14125.16, N = 3SE +/- 293396.06, N = 3SE +/- 58084.93, N = 3SE +/- 157475.07, N = 3SE +/- 84325.23, N = 3SE +/- 218639.12, N = 3SE +/- 473156.02, N = 3SE +/- 318277.32, N = 3SE +/- 373869.96, N = 348074789.0336136874.0038310650.5078839770.1364913682.6380042041.73104144917.23113830604.27127978049.53136037921.4337267979.631. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm

MandelbulbGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelbulbGPU 1.0pts1OpenCL Device: GPUGeForce GTX 680GeForce GTX 750GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 66016M32M48M64M80MSE +/- 36731.70, N = 3SE +/- 9818.73, N = 3SE +/- 28089.31, N = 3SE +/- 48150.35, N = 3SE +/- 29855.85, N = 3SE +/- 75512.83, N = 3SE +/- 91420.68, N = 3SE +/- 140370.89, N = 3SE +/- 168304.91, N = 3SE +/- 166919.37, N = 3SE +/- 18132.86, N = 331636512.9720060275.5325392138.5047400001.9037156070.8744953399.4758811317.1763616558.7771656708.8375614774.1322113594.001. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: HotelGeForce GTX 680GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 660400800120016002000SE +/- 2.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.67, N = 3SE +/- 0.00, N = 3SE +/- 1.20, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 1.20, N = 35774639927698971346149218551906544

LuxMark

OpenCL Device: GPU - Scene: Microphone

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: MicrophoneGeForce GTX 680GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 66014002800420056007000SE +/- 3.06, N = 3SE +/- 0.67, N = 3SE +/- 12.00, N = 3SE +/- 4.26, N = 3SE +/- 1.15, N = 3SE +/- 7.64, N = 3SE +/- 0.67, N = 3SE +/- 18.50, N = 3SE +/- 3.00, N = 3SE +/- 4.18, N = 32127194143022423246044584776626863601713

LuxMark

OpenCL Device: GPU - Scene: Luxball HDR

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.0OpenCL Device: GPU - Scene: Luxball HDRGeForce GTX 680GeForce GTX 750GeForce GTX 760GeForce GTX 780 TiGeForce GTX 950GeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 6603K6K9K12K15KSE +/- 12.17, N = 3SE +/- 11.67, N = 3SE +/- 1.45, N = 3SE +/- 35.97, N = 3SE +/- 16.67, N = 3SE +/- 0.88, N = 3SE +/- 24.85, N = 3SE +/- 1.20, N = 3SE +/- 44.35, N = 3SE +/- 4.70, N = 3SE +/- 2.40, N = 345543491425396395313547497371071313802140813355


Phoronix Test Suite v10.8.4