OpenCL CUDA NVIDIA GPGPU Linux Tests All Maxwell and various Kepler graphics cards tested on the NVIDIA Linux driver. Benchmarks by Michael Larabel for a future article on Phoronix.com just delivering various GPGPU benchmarks for reference purposes.
HTML result view exported from: https://openbenchmarking.org/result/1511113-PTS-GPGPUNVI62&grw .
OpenCL CUDA NVIDIA GPGPU Linux Tests Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X Intel Core i5-6600K @ 3.50GHz (4 Cores) MSI Z170A GAMING PRO (MS-7984) v1.0 Intel Device 191f 16384MB 256GB TS256GSSD370S NVIDIA GeForce GTX 680 2048MB (1006/3004MHz) Intel Device a170 Intel Device 15b8 Ubuntu 14.04 3.19.0-33-generic (x86_64) Unity 7.2.5 X Server 1.17.1 NVIDIA 352.39 4.3.0 GCC 4.8.4 + Clang 3.4-1ubuntu3 + CUDA 7.5 ext4 3840x2160 eVGA NVIDIA GeForce GTX 750 1024MB (1019/2505MHz) NVIDIA GeForce GTX 760 2048MB (980/3004MHz) NVIDIA GeForce GTX 780 Ti 3072MB (875/3500MHz) eVGA NVIDIA GeForce GTX 950 2048MB (135/405MHz) eVGA NVIDIA GeForce GTX 960 2048MB (1277/3505MHz) eVGA NVIDIA GeForce GTX 970 4096MB (1163/3505MHz) NVIDIA GeForce GTX 980 4096MB (1126/3505MHz) NVIDIA GeForce GTX 980 Ti 6144MB (999/3505MHz) NVIDIA GeForce GTX TITAN X 12288MB (1001/3505MHz) OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-browser-plugin --disable-libmudflap --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details - Scaling Governor: acpi-cpufreq performance OpenCL Details - GeForce GTX 680: GPU Compute Cores: 1536 - GeForce GTX 750: GPU Compute Cores: 512 - GeForce GTX 760: GPU Compute Cores: 1152 - GeForce GTX 780 Ti: GPU Compute Cores: 2880 - GeForce GTX 950: GPU Compute Cores: 768 - GeForce GTX 960: GPU Compute Cores: 1024 - GeForce GTX 970: GPU Compute Cores: 1664 - GeForce GTX 980: GPU Compute Cores: 2048 - GeForce GTX 980 Ti: GPU Compute Cores: 2816 - GeForce GTX TITAN X: GPU Compute Cores: 3072 System Details - GeForce GTX 680: GPU Compute Cores: 1536. - GeForce GTX 750: GPU Compute Cores: 512. - GeForce GTX 760: GPU Compute Cores: 1152. - GeForce GTX 780 Ti: GPU Compute Cores: 2880. - GeForce GTX 950: GPU Compute Cores: 768. - GeForce GTX 960: GPU Compute Cores: 1024. - GeForce GTX 970: GPU Compute Cores: 1664. - GeForce GTX 980: GPU Compute Cores: 2048. - GeForce GTX 980 Ti: GPU Compute Cores: 2816. - GeForce GTX TITAN X: GPU Compute Cores: 3072.
OpenCL CUDA NVIDIA GPGPU Linux Tests cuda-mini-nbody: Cache Blocking shoc: CUDA - FFT SP cuda-mini-nbody: SOA Data Layout cuda-mini-nbody: Flush Denormals To Zero shoc: CUDA - MD5 Hash cuda-mini-nbody: Loop Unrolling shoc: OpenCL - FFT SP cuda-mini-nbody: Original shoc: OpenCL - MD5 Hash shoc: CUDA - Texture Read Bandwidth shoc: OpenCL - Texture Read Bandwidth askap: Gridding askap: Degridding juliagpu: GPU luxmark: GPU - Hotel luxmark: GPU - Microphone luxmark: GPU - Luxball HDR mandelbulbgpu: GPU GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 74.97 1.91 242.16 48074789.03 577 2127 4554 31636512.97 98.19 113.64 199.95 199.83 1.08 89.34 54.69 180.66 1.07 158.42 121.14 36136874.00 3491 20060275.53 78.44 1.40 170.26 38310650.50 463 1941 4253 25392138.50 29.99 54.39 53.26 27.05 126.71 61.03 3.78 286.62 78839770.13 992 4302 9639 47400001.90 49.89 172.28 108.50 108.48 2.36 47.54 63.22 105.30 2.34 326.23 239.19 3399.14 5706.07 64913682.63 769 2423 5313 37156070.87 37.08 212.43 79.97 79.84 3.38 35.35 62.78 82.01 3.36 351.31 269.98 3144.85 5290.32 80042041.73 897 2460 5474 44953399.47 28.53 263.14 55.87 55.80 4.79 26.42 117.23 54.32 4.77 325.16 283.36 5325.12 9509.14 104144917.23 1346 4458 9737 58811317.17 25.13 289.63 50.15 49.53 5.70 23.88 140.12 45.38 5.68 336.48 332.60 6051.27 11094 113830604.27 1492 4776 10713 63616558.77 19.77 311.46 40.94 40.85 6.81 18.46 170.36 34.58 6.79 348.92 345.55 8320.50 17380.60 127978049.53 1855 6268 13802 71656708.83 18.65 324.09 37.43 37.37 7.42 17.59 173.89 32.37 7.41 356.52 354.09 8458.77 17380.60 136037921.43 1906 6360 14081 75614774.13 OpenBenchmarking.org
CUDA Mini-Nbody Test: Cache Blocking OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Cache Blocking GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.27, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.21, N = 3 SE +/- 0.10, N = 3 98.19 29.99 49.89 37.08 28.53 25.13 19.77 18.65
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: FFT SP GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 70 140 210 280 350 SE +/- 0.69, N = 3 SE +/- 0.47, N = 3 SE +/- 1.49, N = 3 SE +/- 2.44, N = 3 SE +/- 3.09, N = 3 SE +/- 0.32, N = 3 SE +/- 1.19, N = 3 113.64 172.28 212.43 263.14 289.63 311.46 324.09 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
CUDA Mini-Nbody Test: SOA Data Layout OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: SOA Data Layout GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 40 80 120 160 200 SE +/- 0.04, N = 3 SE +/- 0.16, N = 3 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 SE +/- 0.05, N = 3 SE +/- 0.21, N = 3 SE +/- 0.11, N = 3 SE +/- 0.20, N = 3 199.95 54.39 108.50 79.97 55.87 50.15 40.94 37.43
CUDA Mini-Nbody Test: Flush Denormals To Zero OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Flush Denormals To Zero GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 40 80 120 160 200 SE +/- 0.03, N = 3 SE +/- 0.10, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 SE +/- 0.18, N = 3 SE +/- 0.10, N = 3 SE +/- 0.08, N = 3 199.83 53.26 108.48 79.84 55.80 49.53 40.85 37.37
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: MD5 Hash GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.08 2.36 3.38 4.79 5.70 6.81 7.42 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
CUDA Mini-Nbody Test: Loop Unrolling OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Loop Unrolling GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.21, N = 3 SE +/- 0.15, N = 3 SE +/- 0.25, N = 3 89.34 27.05 47.54 35.35 26.42 23.88 18.46 17.59
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: FFT SP GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 40 80 120 160 200 SE +/- 0.87, N = 3 SE +/- 0.08, N = 3 SE +/- 0.31, N = 3 SE +/- 0.19, N = 3 SE +/- 0.08, N = 3 SE +/- 1.20, N = 3 SE +/- 0.52, N = 3 SE +/- 1.30, N = 3 SE +/- 0.65, N = 3 SE +/- 0.19, N = 3 74.97 54.69 78.44 126.71 63.22 62.78 117.23 140.12 170.36 173.89 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
CUDA Mini-Nbody Test: Original OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Original GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 40 80 120 160 200 SE +/- 0.05, N = 3 SE +/- 0.50, N = 3 SE +/- 0.21, N = 3 SE +/- 0.43, N = 3 SE +/- 0.13, N = 3 SE +/- 0.10, N = 3 SE +/- 0.57, N = 3 SE +/- 0.35, N = 3 180.66 61.03 105.30 82.01 54.32 45.38 34.58 32.37
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: MD5 Hash GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.91 1.07 1.40 3.78 2.34 3.36 4.77 5.68 6.79 7.41 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Texture Read Bandwidth GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 80 160 240 320 400 SE +/- 0.42, N = 3 SE +/- 0.85, N = 3 SE +/- 0.14, N = 3 SE +/- 0.28, N = 3 SE +/- 1.15, N = 3 SE +/- 1.22, N = 3 SE +/- 0.12, N = 3 158.42 326.23 351.31 325.16 336.48 348.92 356.52 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: OpenCL - Benchmark: Texture Read Bandwidth GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 80 160 240 320 400 SE +/- 1.02, N = 3 SE +/- 0.23, N = 3 SE +/- 0.28, N = 3 SE +/- 0.02, N = 3 SE +/- 0.73, N = 3 SE +/- 0.56, N = 3 SE +/- 0.06, N = 3 SE +/- 0.20, N = 3 SE +/- 0.21, N = 3 SE +/- 1.56, N = 3 242.16 121.14 170.26 286.62 239.19 269.98 283.36 332.60 345.55 354.09 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
ASKAP tConvolveCuda Processing: Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP tConvolveCuda 2015-11-10 Processing: Gridding GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 2K 4K 6K 8K 10K SE +/- 14.40, N = 3 SE +/- 12.43, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 130.14, N = 4 3399.14 3144.85 5325.12 6051.27 8320.50 8458.77 1. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl
ASKAP tConvolveCuda Processing: Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP tConvolveCuda 2015-11-10 Processing: Degridding GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 4K 8K 12K 16K 20K SE +/- 41.05, N = 3 SE +/- 34.80, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 369.80, N = 3 SE +/- 369.80, N = 3 5706.07 5290.32 9509.14 11094.00 17380.60 17380.60 1. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl
JuliaGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better JuliaGPU 1.2pts1 OpenCL Device: GPU GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 30M 60M 90M 120M 150M SE +/- 59682.63, N = 3 SE +/- 22546.70, N = 3 SE +/- 14125.16, N = 3 SE +/- 293396.06, N = 3 SE +/- 58084.93, N = 3 SE +/- 157475.07, N = 3 SE +/- 84325.23, N = 3 SE +/- 218639.12, N = 3 SE +/- 473156.02, N = 3 SE +/- 318277.32, N = 3 48074789.03 36136874.00 38310650.50 78839770.13 64913682.63 80042041.73 104144917.23 113830604.27 127978049.53 136037921.43 1. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm
LuxMark OpenCL Device: GPU - Scene: Hotel OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Hotel GeForce GTX 680 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 400 800 1200 1600 2000 SE +/- 2.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.67, N = 3 SE +/- 0.00, N = 3 SE +/- 1.20, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 577 463 992 769 897 1346 1492 1855 1906
LuxMark OpenCL Device: GPU - Scene: Microphone OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Microphone GeForce GTX 680 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 1400 2800 4200 5600 7000 SE +/- 3.06, N = 3 SE +/- 0.67, N = 3 SE +/- 12.00, N = 3 SE +/- 4.26, N = 3 SE +/- 1.15, N = 3 SE +/- 7.64, N = 3 SE +/- 0.67, N = 3 SE +/- 18.50, N = 3 SE +/- 3.00, N = 3 2127 1941 4302 2423 2460 4458 4776 6268 6360
LuxMark OpenCL Device: GPU - Scene: Luxball HDR OpenBenchmarking.org Score, More Is Better LuxMark 3.0 OpenCL Device: GPU - Scene: Luxball HDR GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 3K 6K 9K 12K 15K SE +/- 12.17, N = 3 SE +/- 11.67, N = 3 SE +/- 1.45, N = 3 SE +/- 35.97, N = 3 SE +/- 16.67, N = 3 SE +/- 0.88, N = 3 SE +/- 24.85, N = 3 SE +/- 1.20, N = 3 SE +/- 44.35, N = 3 SE +/- 4.70, N = 3 4554 3491 4253 9639 5313 5474 9737 10713 13802 14081
MandelbulbGPU OpenCL Device: GPU OpenBenchmarking.org Samples/sec, More Is Better MandelbulbGPU 1.0pts1 OpenCL Device: GPU GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX TITAN X 16M 32M 48M 64M 80M SE +/- 36731.70, N = 3 SE +/- 9818.73, N = 3 SE +/- 28089.31, N = 3 SE +/- 48150.35, N = 3 SE +/- 29855.85, N = 3 SE +/- 75512.83, N = 3 SE +/- 91420.68, N = 3 SE +/- 140370.89, N = 3 SE +/- 168304.91, N = 3 SE +/- 166919.37, N = 3 31636512.97 20060275.53 25392138.50 47400001.90 37156070.87 44953399.47 58811317.17 63616558.77 71656708.83 75614774.13 1. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL
Phoronix Test Suite v10.8.5