OpenCL CUDA NVIDIA GPGPU Linux Tests All Maxwell and various Kepler graphics cards tested on the NVIDIA Linux driver. Benchmarks by Michael Larabel for a future article on Phoronix.com just delivering various GPGPU benchmarks for reference purposes.
HTML result view exported from: https://openbenchmarking.org/result/1511113-PTS-GPGPUNVI62&rdt .
OpenCL CUDA NVIDIA GPGPU Linux Tests Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 Intel Core i5-6600K @ 3.50GHz (4 Cores) MSI Z170A GAMING PRO (MS-7984) v1.0 Intel Device 191f 16384MB 256GB TS256GSSD370S eVGA NVIDIA GeForce GTX 950 2048MB (135/405MHz) Intel Device a170 Intel Device 15b8 Ubuntu 14.04 3.19.0-33-generic (x86_64) Unity 7.2.5 X Server 1.17.1 NVIDIA 352.39 4.3.0 GCC 4.8.4 + Clang 3.4-1ubuntu3 + CUDA 7.5 ext4 3840x2160 NVIDIA GeForce GTX 980 Ti 6144MB (999/3505MHz) eVGA NVIDIA GeForce GTX 970 4096MB (1163/3505MHz) NVIDIA GeForce GTX 980 4096MB (1126/3505MHz) eVGA NVIDIA GeForce GTX 960 2048MB (1277/3505MHz) NVIDIA GeForce GTX TITAN X 12288MB (1001/3505MHz) NVIDIA GeForce GTX 780 Ti 3072MB (875/3500MHz) NVIDIA GeForce GTX 680 2048MB (1006/3004MHz) eVGA NVIDIA GeForce GTX 750 1024MB (1019/2505MHz) NVIDIA GeForce GTX 760 2048MB (980/3004MHz) OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-browser-plugin --disable-libmudflap --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details - Scaling Governor: acpi-cpufreq performance OpenCL Details - GeForce GTX 950: GPU Compute Cores: 768 - GeForce GTX 980 Ti: GPU Compute Cores: 2816 - GeForce GTX 970: GPU Compute Cores: 1664 - GeForce GTX 980: GPU Compute Cores: 2048 - GeForce GTX 960: GPU Compute Cores: 1024 - GeForce GTX TITAN X: GPU Compute Cores: 3072 - GeForce GTX 780 Ti: GPU Compute Cores: 2880 - GeForce GTX 680: GPU Compute Cores: 1536 - GeForce GTX 750: GPU Compute Cores: 512 - GeForce GTX 760: GPU Compute Cores: 1152 System Details - GeForce GTX 950: GPU Compute Cores: 768. - GeForce GTX 980 Ti: GPU Compute Cores: 2816. - GeForce GTX 970: GPU Compute Cores: 1664. - GeForce GTX 980: GPU Compute Cores: 2048. - GeForce GTX 960: GPU Compute Cores: 1024. - GeForce GTX TITAN X: GPU Compute Cores: 3072. - GeForce GTX 780 Ti: GPU Compute Cores: 2880. - GeForce GTX 680: GPU Compute Cores: 1536. - GeForce GTX 750: GPU Compute Cores: 512. - GeForce GTX 760: GPU Compute Cores: 1152.
OpenCL CUDA NVIDIA GPGPU Linux Tests shoc: CUDA - FFT SP shoc: CUDA - MD5 Hash shoc: OpenCL - FFT SP shoc: OpenCL - MD5 Hash shoc: CUDA - Texture Read Bandwidth shoc: OpenCL - Texture Read Bandwidth askap: Gridding askap: Degridding cuda-mini-nbody: Original cuda-mini-nbody: Cache Blocking cuda-mini-nbody: Loop Unrolling cuda-mini-nbody: SOA Data Layout cuda-mini-nbody: Flush Denormals To Zero juliagpu: GPU mandelbulbgpu: GPU luxmark: GPU - Hotel luxmark: GPU - Microphone luxmark: GPU - Luxball HDR GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 172.28 2.36 63.22 2.34 326.23 239.19 3399.14 5706.07 105.30 49.89 47.54 108.50 108.48 64913682.63 37156070.87 769 2423 5313 311.46 6.81 170.36 6.79 348.92 345.55 8320.50 17380.60 34.58 19.77 18.46 40.94 40.85 127978049.53 71656708.83 1855 6268 13802 263.14 4.79 117.23 4.77 325.16 283.36 5325.12 9509.14 54.32 28.53 26.42 55.87 55.80 104144917.23 58811317.17 1346 4458 9737 289.63 5.70 140.12 5.68 336.48 332.60 6051.27 11094 45.38 25.13 23.88 50.15 49.53 113830604.27 63616558.77 1492 4776 10713 212.43 3.38 62.78 3.36 351.31 269.98 3144.85 5290.32 82.01 37.08 35.35 79.97 79.84 80042041.73 44953399.47 897 2460 5474 324.09 7.42 173.89 7.41 356.52 354.09 8458.77 17380.60 32.37 18.65 17.59 37.43 37.37 136037921.43 75614774.13 1906 6360 14081 126.71 3.78 286.62 61.03 29.99 27.05 54.39 53.26 78839770.13 47400001.90 992 4302 9639 74.97 1.91 242.16 48074789.03 31636512.97 577 2127 4554 113.64 1.08 54.69 1.07 158.42 121.14 180.66 98.19 89.34 199.95 199.83 36136874.00 20060275.53 3491 78.44 1.40 170.26 38310650.50 25392138.50 463 1941 4253 OpenBenchmarking.org
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: FFT SP GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 750 70 140 210 280 350 SE +/- 0.47, N = 3 SE +/- 0.32, N = 3 SE +/- 2.44, N = 3 SE +/- 3.09, N = 3 SE +/- 1.49, N = 3 SE +/- 1.19, N = 3 SE +/- 0.69, N = 3 172.28 311.46 263.14 289.63 212.43 324.09 113.64 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: MD5 Hash GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 750 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.36 6.81 4.79 5.70 3.38 7.42 1.08 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: FFT SP GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 40 80 120 160 200 SE +/- 0.08, N = 3 SE +/- 0.65, N = 3 SE +/- 0.52, N = 3 SE +/- 1.30, N = 3 SE +/- 1.20, N = 3 SE +/- 0.19, N = 3 SE +/- 0.19, N = 3 SE +/- 0.87, N = 3 SE +/- 0.08, N = 3 SE +/- 0.31, N = 3 63.22 170.36 117.23 140.12 62.78 173.89 126.71 74.97 54.69 78.44 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: MD5 Hash GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.34 6.79 4.77 5.68 3.36 7.41 3.78 1.91 1.07 1.40 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Texture Read Bandwidth GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 750 80 160 240 320 400 SE +/- 0.85, N = 3 SE +/- 1.22, N = 3 SE +/- 0.28, N = 3 SE +/- 1.15, N = 3 SE +/- 0.14, N = 3 SE +/- 0.12, N = 3 SE +/- 0.42, N = 3 326.23 348.92 325.16 336.48 351.31 356.52 158.42 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Texture Read Bandwidth GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 80 160 240 320 400 SE +/- 0.73, N = 3 SE +/- 0.21, N = 3 SE +/- 0.06, N = 3 SE +/- 0.20, N = 3 SE +/- 0.56, N = 3 SE +/- 1.56, N = 3 SE +/- 0.02, N = 3 SE +/- 1.02, N = 3 SE +/- 0.23, N = 3 SE +/- 0.28, N = 3 239.19 345.55 283.36 332.60 269.98 354.09 286.62 242.16 121.14 170.26 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
ASKAP tConvolveCuda Processing: Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP tConvolveCuda 2015-11-10 Processing: Gridding GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X 2K 4K 6K 8K 10K SE +/- 14.40, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 12.43, N = 3 SE +/- 130.14, N = 4 3399.14 8320.50 5325.12 6051.27 3144.85 8458.77 1. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl
ASKAP tConvolveCuda Processing: Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP tConvolveCuda 2015-11-10 Processing: Degridding GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X 4K 8K 12K 16K 20K SE +/- 41.05, N = 3 SE +/- 369.80, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 34.80, N = 3 SE +/- 369.80, N = 3 5706.07 17380.60 9509.14 11094.00 5290.32 17380.60 1. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl
CUDA Mini-Nbody Test: Original OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Original GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 750 40 80 120 160 200 SE +/- 0.21, N = 3 SE +/- 0.57, N = 3 SE +/- 0.13, N = 3 SE +/- 0.10, N = 3 SE +/- 0.43, N = 3 SE +/- 0.35, N = 3 SE +/- 0.50, N = 3 SE +/- 0.05, N = 3 105.30 34.58 54.32 45.38 82.01 32.37 61.03 180.66
CUDA Mini-Nbody Test: Cache Blocking OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Cache Blocking GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 750 20 40 60 80 100 SE +/- 0.02, N = 3 SE +/- 0.21, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 SE +/- 0.10, N = 3 SE +/- 0.27, N = 3 SE +/- 0.00, N = 3 49.89 19.77 28.53 25.13 37.08 18.65 29.99 98.19
CUDA Mini-Nbody Test: Loop Unrolling OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Loop Unrolling GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 750 20 40 60 80 100 SE +/- 0.03, N = 3 SE +/- 0.15, N = 3 SE +/- 0.02, N = 3 SE +/- 0.21, N = 3 SE +/- 0.03, N = 3 SE +/- 0.25, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 47.54 18.46 26.42 23.88 35.35 17.59 27.05 89.34
CUDA Mini-Nbody Test: SOA Data Layout OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: SOA Data Layout GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 750 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.11, N = 3 SE +/- 0.05, N = 3 SE +/- 0.21, N = 3 SE +/- 0.08, N = 3 SE +/- 0.20, N = 3 SE +/- 0.16, N = 3 SE +/- 0.04, N = 3 108.50 40.94 55.87 50.15 79.97 37.43 54.39 199.95
CUDA Mini-Nbody Test: Flush Denormals To Zero OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Flush Denormals To Zero GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 750 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.18, N = 3 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 SE +/- 0.10, N = 3 SE +/- 0.03, N = 3 108.48 40.85 55.80 49.53 79.84 37.37 53.26 199.83
JuliaGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better JuliaGPU 1.2pts1 OpenCL Device: GPU GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 30M 60M 90M 120M 150M SE +/- 58084.93, N = 3 SE +/- 473156.02, N = 3 SE +/- 84325.23, N = 3 SE +/- 218639.12, N = 3 SE +/- 157475.07, N = 3 SE +/- 318277.32, N = 3 SE +/- 293396.06, N = 3 SE +/- 59682.63, N = 3 SE +/- 22546.70, N = 3 SE +/- 14125.16, N = 3 64913682.63 127978049.53 104144917.23 113830604.27 80042041.73 136037921.43 78839770.13 48074789.03 36136874.00 38310650.50 1. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm
MandelbulbGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better MandelbulbGPU 1.0pts1 OpenCL Device: GPU GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 16M 32M 48M 64M 80M SE +/- 29855.85, N = 3 SE +/- 168304.91, N = 3 SE +/- 91420.68, N = 3 SE +/- 140370.89, N = 3 SE +/- 75512.83, N = 3 SE +/- 166919.37, N = 3 SE +/- 48150.35, N = 3 SE +/- 36731.70, N = 3 SE +/- 9818.73, N = 3 SE +/- 28089.31, N = 3 37156070.87 71656708.83 58811317.17 63616558.77 44953399.47 75614774.13 47400001.90 31636512.97 20060275.53 25392138.50 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
LuxMark OpenCL Device: GPU - Scene: Hotel OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Hotel GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 760 400 800 1200 1600 2000 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 1.20, N = 3 SE +/- 0.67, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 2.00, N = 3 SE +/- 0.33, N = 3 769 1855 1346 1492 897 1906 992 577 463
LuxMark OpenCL Device: GPU - Scene: Microphone OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Microphone GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 760 1400 2800 4200 5600 7000 SE +/- 4.26, N = 3 SE +/- 18.50, N = 3 SE +/- 7.64, N = 3 SE +/- 0.67, N = 3 SE +/- 1.15, N = 3 SE +/- 3.00, N = 3 SE +/- 12.00, N = 3 SE +/- 3.06, N = 3 SE +/- 0.67, N = 3 2423 6268 4458 4776 2460 6360 4302 2127 1941
LuxMark OpenCL Device: GPU - Scene: Luxball HDR OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Luxball HDR GeForce GTX 950 GeForce GTX 980 Ti GeForce GTX 970 GeForce GTX 980 GeForce GTX 960 GeForce GTX TITAN X GeForce GTX 780 Ti GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 3K 6K 9K 12K 15K SE +/- 16.67, N = 3 SE +/- 44.35, N = 3 SE +/- 24.85, N = 3 SE +/- 1.20, N = 3 SE +/- 0.88, N = 3 SE +/- 4.70, N = 3 SE +/- 35.97, N = 3 SE +/- 12.17, N = 3 SE +/- 11.67, N = 3 SE +/- 1.45, N = 3 5313 13802 9737 10713 5474 14081 9639 4554 3491 4253
Phoronix Test Suite v10.8.5