NVIDIA CUDA Linux 2016 compute benchmarks by Michael Larabel for a future article.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 1702015-TA-1612261TA53 CUDA 2016 NVIDIA Linux Ubuntu - Phoronix Test Suite CUDA 2016 NVIDIA Linux Ubuntu NVIDIA CUDA Linux 2016 compute benchmarks by Michael Larabel for a future article.
HTML result view exported from: https://openbenchmarking.org/result/1702015-TA-1612261TA53&export=pdf&sro&grs .
CUDA 2016 NVIDIA Linux Ubuntu Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Display Driver OpenGL Vulkan Compiler File-System Screen Resolution GeForce GTX 650 GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 ubu_deep Intel Xeon E3-1280 v5 @ 4.00GHz (8 Cores) MSI C236A WORKSTATION (MS-7998) v1.0 Intel Sky Lake 16384MB 256GB TOSHIBA-RD400 MSI NVIDIA GeForce GTX 650 1024MB (1084/2500MHz) Realtek ALC1150 Intel Connection Ubuntu 16.04 4.4.0-57-generic (x86_64) Unity 7.4.0 X Server 1.18.4 NVIDIA 375.26 4.5.0 1.0.24 GCC 5.4.0 20160609 + CUDA 8.0 ext4 3840x2160 NVIDIA GeForce GTX 680 2048MB (1006/3004MHz) eVGA NVIDIA GeForce GTX 750 1024MB (1019/2505MHz) NVIDIA GeForce GTX 760 2048MB (1124/3004MHz) NVIDIA GeForce GTX 780 Ti 3072MB (875/3500MHz) eVGA NVIDIA GeForce GTX 950 2048MB (1202/3304MHz) eVGA NVIDIA GeForce GTX 960 2048MB (1277/3505MHz) eVGA NVIDIA GeForce GTX 970 4096MB (1163/3505MHz) NVIDIA GeForce GTX 980 4096MB (1126/3505MHz) NVIDIA GeForce GTX 980 Ti 6144MB (999/3505MHz) Zotac NVIDIA GeForce GTX 1050 2048MB (1681/3504MHz) eVGA NVIDIA GeForce GTX 1050 Ti 4096MB (1341/3504MHz) NVIDIA GeForce GTX 1060 6GB 6144MB (557/4006MHz) NVIDIA GeForce GTX 1070 8192MB (1069/4006MHz) NVIDIA GeForce GTX 1080 8192MB (1538/5005MHz) 2 x Intel 0000 @ 3.00GHz (48 Cores) Supermicro X10DRi-LN4+ v1.01 Intel Xeon E7 v4/Xeon 64512MB 1000GB My Passport 0820 NVIDIA Device 10f0 Intel I350 Gigabit Connection 4.4.0-59-generic (x86_64) 1.0.8 1024x768 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details - Scaling Governor: intel_pstate powersave OpenCL Details - GeForce GTX 650: GPU Compute Cores: 384 - GeForce GTX 680: GPU Compute Cores: 1536 - GeForce GTX 750: GPU Compute Cores: 512 - GeForce GTX 760: GPU Compute Cores: 1152 - GeForce GTX 780 Ti: GPU Compute Cores: 2880 - GeForce GTX 950: GPU Compute Cores: 768 - GeForce GTX 960: GPU Compute Cores: 1024 - GeForce GTX 970: GPU Compute Cores: 1664 - GeForce GTX 980: GPU Compute Cores: 2048 - GeForce GTX 980 Ti: GPU Compute Cores: 2816 - GeForce GTX 1050: GPU Compute Cores: 640 - GeForce GTX 1050 Ti: GPU Compute Cores: 768 - GeForce GTX 1060: GPU Compute Cores: 1280 - GeForce GTX 1070: GPU Compute Cores: 1920 - GeForce GTX 1080: GPU Compute Cores: 2560 System Details - GeForce GTX 650: GPU Compute Cores: 384. - GeForce GTX 680: GPU Compute Cores: 1536. - GeForce GTX 750: GPU Compute Cores: 512. - GeForce GTX 760: GPU Compute Cores: 1152. - GeForce GTX 780 Ti: GPU Compute Cores: 2880. - GeForce GTX 950: GPU Compute Cores: 768. - GeForce GTX 960: GPU Compute Cores: 1024. - GeForce GTX 970: GPU Compute Cores: 1664. - GeForce GTX 980: GPU Compute Cores: 2048. - GeForce GTX 980 Ti: GPU Compute Cores: 2816. - GeForce GTX 1050: GPU Compute Cores: 640. - GeForce GTX 1050 Ti: GPU Compute Cores: 768. - GeForce GTX 1060: GPU Compute Cores: 1280. - GeForce GTX 1070: GPU Compute Cores: 1920. - GeForce GTX 1080: GPU Compute Cores: 2560.
CUDA 2016 NVIDIA Linux Ubuntu shoc: CUDA - MD5 Hash shoc: CUDA - Max SP Flops caffe: CUDA Googlenet caffe: CUDA AlexNet cuda-mini-nbody: Original shoc: CUDA - FFT SP shoc: CUDA - Texture Read Bandwidth askap: Degridding askap: Gridding GeForce GTX 650 GeForce GTX 680 GeForce GTX 750 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 ubu_deep 133624 52573.77 1.28 1160.99 182.08 116.13 160.58 164009 66411.83 65105.93 28595.10 61.44 2.69 2210.77 68771.30 30595.47 104.15 178.56 364.33 5625.68 3399.14 3.83 2941.88 59805.47 27360.73 82.70 194.10 379.43 5325.12 3132.42 5.44 4320.28 40125.47 17005.67 52.51 266.73 349.90 9399.84 5255.51 6.47 5002.78 35955.47 14977.20 46.60 292.36 335.27 10798.13 6006.45 7.73 6145.80 31440.40 11722.10 36.06 308.49 349.09 17010.80 8320.50 2.49 2109.11 69616.53 30845.03 115.24 171.27 433.21 5873.92 3698 3.03 2688.23 60253.57 26985.90 101.85 199.71 453.15 5961.62 3715.36 5.64 4765.98 37468.77 16266.23 58.42 304.62 503.11 9861.33 5625.68 8.40 7096.44 27658.17 11451.90 39.70 377.27 501.08 13312.80 7607.31 11.90 9385.11 24039.57 9738.65 33.06 462.60 526.21 14273.00 8236.45 25701.77 9734.94 OpenBenchmarking.org
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: MD5 Hash GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.49 3.03 5.64 8.40 11.90 1.28 2.69 3.83 5.44 6.47 7.73 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Max SP Flops GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti 2K 4K 6K 8K 10K SE +/- 0.30, N = 3 SE +/- 5.43, N = 3 SE +/- 21.65, N = 3 SE +/- 50.45, N = 3 SE +/- 64.36, N = 3 SE +/- 0.08, N = 3 SE +/- 6.61, N = 3 SE +/- 8.34, N = 3 SE +/- 3.47, N = 3 SE +/- 9.75, N = 3 SE +/- 20.77, N = 3 2109.11 2688.23 4765.98 7096.44 9385.11 1160.99 2210.77 2941.88 4320.28 5002.78 6145.80 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
Caffe AlexNet Build: CUDA Googlenet OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe AlexNet 2016-06-11 Build: CUDA Googlenet GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 680 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti ubu_deep 40K 80K 120K 160K 200K SE +/- 7.85, N = 3 SE +/- 33.10, N = 3 SE +/- 4.75, N = 3 SE +/- 66.90, N = 3 SE +/- 5.57, N = 3 SE +/- 31.51, N = 3 SE +/- 93.86, N = 3 SE +/- 20.21, N = 3 SE +/- 9.66, N = 3 SE +/- 13.56, N = 3 SE +/- 15.33, N = 3 SE +/- 77.16, N = 3 SE +/- 95.74, N = 3 SE +/- 93.86, N = 3 69616.53 60253.57 37468.77 27658.17 24039.57 133624.00 164009.00 65105.93 68771.30 59805.47 40125.47 35955.47 31440.40 25701.77 1. (CXX) g++ options: -pthread -fPIC -O2 -lcaffe -lglog -lgflags -lprotobuf -lboost_system -lboost_filesystem -lm -lhdf5_hl -lhdf5 -lleveldb -lsnappy -llmdb -lopencv_core -lopencv_highgui -lopencv_imgproc -lboost_thread -lstdc++ -lcblas -latlas
Caffe AlexNet Build: CUDA AlexNet OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe AlexNet 2016-06-11 Build: CUDA AlexNet GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 680 GeForce GTX 760 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti ubu_deep 14K 28K 42K 56K 70K SE +/- 2.73, N = 3 SE +/- 10.92, N = 3 SE +/- 7.27, N = 3 SE +/- 2.23, N = 3 SE +/- 11.97, N = 3 SE +/- 76.71, N = 3 SE +/- 12.08, N = 3 SE +/- 3.70, N = 3 SE +/- 34.72, N = 3 SE +/- 5.41, N = 3 SE +/- 8.15, N = 3 SE +/- 2.22, N = 3 SE +/- 12.80, N = 3 SE +/- 9.00, N = 3 30845.03 26985.90 16266.23 11451.90 9738.65 52573.77 66411.83 28595.10 30595.47 27360.73 17005.67 14977.20 11722.10 9734.94 1. (CXX) g++ options: -pthread -fPIC -O2 -lcaffe -lglog -lgflags -lprotobuf -lboost_system -lboost_filesystem -lm -lhdf5_hl -lhdf5 -lleveldb -lsnappy -llmdb -lopencv_core -lopencv_highgui -lopencv_imgproc -lboost_thread -lstdc++ -lcblas -latlas
CUDA Mini-Nbody Test: Original OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Original GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 750 GeForce GTX 780 Ti GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti 40 80 120 160 200 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 SE +/- 0.11, N = 3 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 SE +/- 0.15, N = 3 SE +/- 0.16, N = 3 SE +/- 0.40, N = 3 SE +/- 0.31, N = 3 SE +/- 0.14, N = 3 SE +/- 0.23, N = 3 SE +/- 0.48, N = 3 115.24 101.85 58.42 39.70 33.06 182.08 61.44 104.15 82.70 52.51 46.60 36.06
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: FFT SP GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti 100 200 300 400 500 SE +/- 0.73, N = 3 SE +/- 1.12, N = 3 SE +/- 0.78, N = 3 SE +/- 2.67, N = 3 SE +/- 1.33, N = 3 SE +/- 0.20, N = 3 SE +/- 0.50, N = 3 SE +/- 1.44, N = 3 SE +/- 1.20, N = 3 SE +/- 0.74, N = 3 SE +/- 0.35, N = 3 171.27 199.71 304.62 377.27 462.60 116.13 178.56 194.10 266.73 292.36 308.49 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Texture Read Bandwidth GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 750 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti 110 220 330 440 550 SE +/- 1.01, N = 3 SE +/- 1.17, N = 3 SE +/- 0.09, N = 3 SE +/- 1.65, N = 3 SE +/- 1.23, N = 3 SE +/- 0.46, N = 3 SE +/- 0.34, N = 3 SE +/- 0.09, N = 3 SE +/- 0.05, N = 3 SE +/- 0.52, N = 3 SE +/- 0.22, N = 3 433.21 453.15 503.11 501.08 526.21 160.58 364.33 379.43 349.90 335.27 349.09 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
ASKAP tConvolveCuda Processing: Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP tConvolveCuda 2015-11-10 Processing: Degridding GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti 4K 8K 12K 16K 20K SE +/- 42.88, N = 3 SE +/- 44.82, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 259.50, N = 3 SE +/- 39.34, N = 3 SE +/- 0.00, N = 3 SE +/- 109.30, N = 3 SE +/- 147.93, N = 3 SE +/- 369.80, N = 3 5873.92 5961.62 9861.33 13312.80 14273.00 5625.68 5325.12 9399.84 10798.13 17010.80 1. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl
ASKAP tConvolveCuda Processing: Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP tConvolveCuda 2015-11-10 Processing: Gridding GeForce GTX 1050 GeForce GTX 1050 Ti GeForce GTX 1060 GeForce GTX 1070 GeForce GTX 1080 GeForce GTX 950 GeForce GTX 960 GeForce GTX 970 GeForce GTX 980 GeForce GTX 980 Ti 2K 4K 6K 8K 10K SE +/- 0.00, N = 3 SE +/- 17.36, N = 3 SE +/- 39.34, N = 3 SE +/- 0.00, N = 3 SE +/- 84.05, N = 3 SE +/- 14.40, N = 3 SE +/- 0.00, N = 3 SE +/- 34.80, N = 3 SE +/- 44.82, N = 3 SE +/- 0.00, N = 3 3698.00 3715.36 5625.68 7607.31 8236.45 3399.14 3132.42 5255.51 6006.45 8320.50 1. (CXX) g++ options: -fPIC -O3 -m64 -lcudadevrt -lcudart_static -lrt -lpthread -ldl
Phoronix Test Suite v10.8.4