OpenCL CUDA NVIDIA GPGPU Linux Tests All Maxwell and various Kepler graphics cards tested on the NVIDIA Linux driver. Benchmarks by Michael Larabel for a future article on Phoronix.com just delivering various GPGPU benchmarks for reference purposes.
HTML result view exported from: https://openbenchmarking.org/result/1511113-PTS-GPGPUNVI62 .
OpenCL CUDA NVIDIA GPGPU Linux Tests Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X Intel Core i5-6600K @ 3.50GHz (4 Cores) MSI Z170A GAMING PRO (MS-7984) v1.0 Intel Device 191f 16384MB 256GB TS256GSSD370S NVIDIA GeForce GTX 680 2048MB (1006/3004MHz) Intel Device a170 Intel Device 15b8 Ubuntu 14.04 3.19.0-33-generic (x86_64) Unity 7.2.5 X Server 1.17.1 NVIDIA 352.39 4.3.0 GCC 4.8.4 + Clang 3.4-1ubuntu3 + CUDA 7.5 ext4 3840x2160 eVGA NVIDIA GeForce GTX 750 1024MB (1019/2505MHz) NVIDIA GeForce GTX 760 2048MB (980/3004MHz) NVIDIA GeForce GTX 780 Ti 3072MB (875/3500MHz) eVGA NVIDIA GeForce GTX 950 2048MB (135/405MHz) eVGA NVIDIA GeForce GTX 960 2048MB (1277/3505MHz) eVGA NVIDIA GeForce GTX 970 4096MB (1163/3505MHz) NVIDIA GeForce GTX 980 4096MB (1126/3505MHz) NVIDIA GeForce GTX 980 Ti 6144MB (999/3505MHz) NVIDIA GeForce GTX TITAN X 12288MB (1001/3505MHz) OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-browser-plugin --disable-libmudflap --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details - Scaling Governor: acpi-cpufreq performance OpenCL Details - GeForce GTX 680: GPU Compute Cores: 1536 - GeForce GTX 750: GPU Compute Cores: 512 - GeForce GTX 760: GPU Compute Cores: 1152 - GeForce GTX 780 Ti: GPU Compute Cores: 2880 - GeForce GTX 950: GPU Compute Cores: 768 - GeForce GTX 960: GPU Compute Cores: 1024 - GeForce GTX 970: GPU Compute Cores: 1664 - GeForce GTX 980: GPU Compute Cores: 2048 - GeForce GTX 980 Ti: GPU Compute Cores: 2816 - GeForce GTX TITAN X: GPU Compute Cores: 3072 System Details - GeForce GTX 680: GPU Compute Cores: 1536. - GeForce GTX 750: GPU Compute Cores: 512. - GeForce GTX 760: GPU Compute Cores: 1152. - GeForce GTX 780 Ti: GPU Compute Cores: 2880. - GeForce GTX 950: GPU Compute Cores: 768. - GeForce GTX 960: GPU Compute Cores: 1024. - GeForce GTX 970: GPU Compute Cores: 1664. - GeForce GTX 980: GPU Compute Cores: 2048. - GeForce GTX 980 Ti: GPU Compute Cores: 2816. - GeForce GTX TITAN X: GPU Compute Cores: 3072.
OpenCL CUDA NVIDIA GPGPU Linux Tests shoc: CUDA - FFT SP shoc: CUDA - MD5 Hash shoc: OpenCL - FFT SP shoc: OpenCL - MD5 Hash shoc: CUDA - Texture Read Bandwidth shoc: OpenCL - Texture Read Bandwidth askap: Gridding askap: Degridding cuda-mini-nbody: Original cuda-mini-nbody: Cache Blocking cuda-mini-nbody: Loop Unrolling cuda-mini-nbody: SOA Data Layout cuda-mini-nbody: Flush Denormals To Zero juliagpu: GPU mandelbulbgpu: GPU luxmark: GPU - Hotel luxmark: GPU - Microphone luxmark: GPU - Luxball HDR GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 74.97 1.91 242.16 48074789.03 31636512.97 577 2127 4554 113.64 1.08 54.69 1.07 158.42 121.14 180.66 98.19 89.34 199.95 199.83 36136874.00 20060275.53 3491 78.44 1.40 170.26 38310650.50 25392138.50 463 1941 4253 126.71 3.78 286.62 61.03 29.99 27.05 54.39 53.26 78839770.13 47400001.90 992 4302 9639 172.28 2.36 63.22 2.34 326.23 239.19 3399.14 5706.07 105.30 49.89 47.54 108.50 108.48 64913682.63 37156070.87 769 2423 5313 212.43 3.38 62.78 3.36 351.31 269.98 3144.85 5290.32 82.01 37.08 35.35 79.97 79.84 80042041.73 44953399.47 897 2460 5474 263.14 4.79 117.23 4.77 325.16 283.36 5325.12 9509.14 54.32 28.53 26.42 55.87 55.80 104144917.23 58811317.17 1346 4458 9737 289.63 5.70 140.12 5.68 336.48 332.60 6051.27 11094 45.38 25.13 23.88 50.15 49.53 113830604.27 63616558.77 1492 4776 10713 311.46 6.81 170.36 6.79 348.92 345.55 8320.50 17380.60 34.58 19.77 18.46 40.94 40.85 127978049.53 71656708.83 1855 6268 13802 324.09 7.42 173.89 7.41 356.52 354.09 8458.77 17380.60 32.37 18.65 17.59 37.43 37.37 136037921.43 75614774.13 1906 6360 14081 OpenBenchmarking.org
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: FFT SP GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 70 140 210 280 350 SE +/- 0.69, N = 3 SE +/- 0.47, N = 3 SE +/- 1.49, N = 3 SE +/- 2.44, N = 3 SE +/- 3.09, N = 3 SE +/- 0.32, N = 3 SE +/- 1.19, N = 3 113.64 172.28 212.43 263.14 289.63 311.46 324.09 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: MD5 Hash GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.08 2.36 3.38 4.79 5.70 6.81 7.42 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: FFT SP GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 40 80 120 160 200 SE +/- 0.87, N = 3 SE +/- 0.08, N = 3 SE +/- 0.31, N = 3 SE +/- 0.19, N = 3 SE +/- 0.08, N = 3 SE +/- 1.20, N = 3 SE +/- 0.52, N = 3 SE +/- 1.30, N = 3 SE +/- 0.65, N = 3 SE +/- 0.19, N = 3 74.97 54.69 78.44 126.71 63.22 62.78 117.23 140.12 170.36 173.89 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: MD5 Hash GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.91 1.07 1.40 3.78 2.34 3.36 4.77 5.68 6.79 7.41 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Texture Read Bandwidth GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 80 160 240 320 400 SE +/- 0.42, N = 3 SE +/- 0.85, N = 3 SE +/- 0.14, N = 3 SE +/- 0.28, N = 3 SE +/- 1.15, N = 3 SE +/- 1.22, N = 3 SE +/- 0.12, N = 3 158.42 326.23 351.31 325.16 336.48 348.92 356.52 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Texture Read Bandwidth GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 80 160 240 320 400 SE +/- 1.02, N = 3 SE +/- 0.23, N = 3 SE +/- 0.28, N = 3 SE +/- 0.02, N = 3 SE +/- 0.73, N = 3 SE +/- 0.56, N = 3 SE +/- 0.06, N = 3 SE +/- 0.20, N = 3 SE +/- 0.21, N = 3 SE +/- 1.56, N = 3 242.16 121.14 170.26 286.62 239.19 269.98 283.36 332.60 345.55 354.09 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
ASKAP tConvolveCuda Processing: Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP tConvolveCuda 2015-11-10 Processing: Gridding GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 2K 4K 6K 8K 10K SE +/- 14.40, N = 3 SE +/- 12.43, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 130.14, N = 4 3399.14 3144.85 5325.12 6051.27 8320.50 8458.77 1. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl
ASKAP tConvolveCuda Processing: Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP tConvolveCuda 2015-11-10 Processing: Degridding GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 4K 8K 12K 16K 20K SE +/- 41.05, N = 3 SE +/- 34.80, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 369.80, N = 3 SE +/- 369.80, N = 3 5706.07 5290.32 9509.14 11094.00 17380.60 17380.60 1. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl
CUDA Mini-Nbody Test: Original OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Original GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 40 80 120 160 200 SE +/- 0.05, N = 3 SE +/- 0.50, N = 3 SE +/- 0.21, N = 3 SE +/- 0.43, N = 3 SE +/- 0.13, N = 3 SE +/- 0.10, N = 3 SE +/- 0.57, N = 3 SE +/- 0.35, N = 3 180.66 61.03 105.30 82.01 54.32 45.38 34.58 32.37
CUDA Mini-Nbody Test: Cache Blocking OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Cache Blocking GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.27, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.21, N = 3 SE +/- 0.10, N = 3 98.19 29.99 49.89 37.08 28.53 25.13 19.77 18.65
CUDA Mini-Nbody Test: Loop Unrolling OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Loop Unrolling GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.21, N = 3 SE +/- 0.15, N = 3 SE +/- 0.25, N = 3 89.34 27.05 47.54 35.35 26.42 23.88 18.46 17.59
CUDA Mini-Nbody Test: SOA Data Layout OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: SOA Data Layout GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 40 80 120 160 200 SE +/- 0.04, N = 3 SE +/- 0.16, N = 3 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 SE +/- 0.21, N = 3 SE +/- 0.11, N = 3 SE +/- 0.20, N = 3 199.95 54.39 108.50 79.97 55.87 50.15 40.94 37.43
CUDA Mini-Nbody Test: Flush Denormals To Zero OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Flush Denormals To Zero GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 40 80 120 160 200 SE +/- 0.03, N = 3 SE +/- 0.10, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 SE +/- 0.18, N = 3 SE +/- 0.10, N = 3 SE +/- 0.08, N = 3 199.83 53.26 108.48 79.84 55.80 49.53 40.85 37.37
JuliaGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better JuliaGPU 1.2pts1 OpenCL Device: GPU GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 30M 60M 90M 120M 150M SE +/- 59682.63, N = 3 SE +/- 22546.70, N = 3 SE +/- 14125.16, N = 3 SE +/- 293396.06, N = 3 SE +/- 58084.93, N = 3 SE +/- 157475.07, N = 3 SE +/- 84325.23, N = 3 SE +/- 218639.12, N = 3 SE +/- 473156.02, N = 3 SE +/- 318277.32, N = 3 48074789.03 36136874.00 38310650.50 78839770.13 64913682.63 80042041.73 104144917.23 113830604.27 127978049.53 136037921.43 1. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm
MandelbulbGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better MandelbulbGPU 1.0pts1 OpenCL Device: GPU GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 16M 32M 48M 64M 80M SE +/- 36731.70, N = 3 SE +/- 9818.73, N = 3 SE +/- 28089.31, N = 3 SE +/- 48150.35, N = 3 SE +/- 29855.85, N = 3 SE +/- 75512.83, N = 3 SE +/- 91420.68, N = 3 SE +/- 140370.89, N = 3 SE +/- 168304.91, N = 3 SE +/- 166919.37, N = 3 31636512.97 20060275.53 25392138.50 47400001.90 37156070.87 44953399.47 58811317.17 63616558.77 71656708.83 75614774.13 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
LuxMark OpenCL Device: GPU - Scene: Hotel OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Hotel GeForce GTX 680 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 400 800 1200 1600 2000 SE +/- 2.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.67, N = 3 SE +/- 0.00, N = 3 SE +/- 1.20, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 577 463 992 769 897 1346 1492 1855 1906
LuxMark OpenCL Device: GPU - Scene: Microphone OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Microphone GeForce GTX 680 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 1400 2800 4200 5600 7000 SE +/- 3.06, N = 3 SE +/- 0.67, N = 3 SE +/- 12.00, N = 3 SE +/- 4.26, N = 3 SE +/- 1.15, N = 3 SE +/- 7.64, N = 3 SE +/- 0.67, N = 3 SE +/- 18.50, N = 3 SE +/- 3.00, N = 3 2127 1941 4302 2423 2460 4458 4776 6268 6360
LuxMark OpenCL Device: GPU - Scene: Luxball HDR OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Luxball HDR GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 3K 6K 9K 12K 15K SE +/- 12.17, N = 3 SE +/- 11.67, N = 3 SE +/- 1.45, N = 3 SE +/- 35.97, N = 3 SE +/- 16.67, N = 3 SE +/- 0.88, N = 3 SE +/- 24.85, N = 3 SE +/- 1.20, N = 3 SE +/- 44.35, N = 3 SE +/- 4.70, N = 3 4554 3491 4253 9639 5313 5474 9737 10713 13802 14081
Phoronix Test Suite v10.8.5