cuda-testing Intel Xeon E3-1280 v5 testing with a MSI C236A WORKSTATION (MS-7998) v1.0 and eVGA NVIDIA GeForce GTX 960 2043MB on Ubuntu 16.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/1606119-PTS-CUDATEST14&rdt&grt .
cuda-testing Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Driver OpenGL Vulkan Compiler File-System Screen Resolution GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 Intel Xeon E3-1280 v5 @ 4.00GHz (8 Cores) MSI C236A WORKSTATION (MS-7998) v1.0 Intel Sky Lake 16384MB Samsung SSD 950 PRO 256GB GeForce GTX 1080 8187MB (909/5005MHz) Realtek ALC1150 Intel Connection Ubuntu 16.04 4.4.0-22-generic (x86_64) Unity 7.4.0 NVIDIA 367.18 4.5.0 1.0.8 GCC 5.3.1 20160413 + CUDA 8.0 ext4 3840x2160 NVIDIA GeForce GTX 980 4091MB (1126/3505MHz) eVGA NVIDIA GeForce GTX 960 2043MB (1277/3505MHz) OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details - GeForce GTX 1080: Scaling Governor: intel_pstate powersave - GeForce GTX 980: Scaling Governor: intel_pstate performance - GeForce GTX 960: Scaling Governor: intel_pstate powersave OpenCL Details - GeForce GTX 1080: GPU Compute Cores: 2560 - GeForce GTX 980: GPU Compute Cores: 2048 - GeForce GTX 960: GPU Compute Cores: 1024 System Details - GeForce GTX 1080: GPU Compute Cores: 2560. - GeForce GTX 980: GPU Compute Cores: 2048. - GeForce GTX 960: GPU Compute Cores: 1024.
cuda-testing caffe: CUDA cuda-mini-nbody: Original cuda-mini-nbody: Cache Blocking cuda-mini-nbody: Loop Unrolling cuda-mini-nbody: SOA Data Layout cuda-mini-nbody: Flush Denormals To Zero shoc: CUDA - Triad shoc: CUDA - FFT SP shoc: CUDA - MD5 Hash shoc: CUDA - Max SP Flops shoc: CUDA - Bus Speed Download shoc: CUDA - Bus Speed Readback shoc: CUDA - Texture Read Bandwidth GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 Xeon E3-1280 v5 - CPU Only 8959.77 30.51 14.02 14.52 28.58 28.58 14.86 461.28 11.98 9397.41 12.53 13.22 528.41 15504.53 46.51 24.91 24.63 51.02 50.44 14.74 292.78 6.53 4999.85 12.53 13.22 332.16 28134.07 82.29 36.30 35.71 81.27 81.19 14.36 189.14 3.88 2944.94 12.53 13.21 381.05 1787207 OpenBenchmarking.org
Caffe AlexNet Build: CUDA OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe AlexNet 2016-06-11 Build: CUDA Xeon E3-1280 v5 - CPU Only GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 400K 800K 1200K 1600K 2000K SE +/- 4001.26, N = 3 SE +/- 3.43, N = 3 SE +/- 17.87, N = 3 SE +/- 2.72, N = 3 1787207.00 8959.77 15504.53 28134.07 1. (CXX) g++ options: -pthread -fPIC -O2 -lcaffe -lglog -lgflags -lprotobuf -lboost_system -lboost_filesystem -lm -lhdf5_hl -lhdf5 -lleveldb -lsnappy -llmdb -lopencv_core -lopencv_highgui -lopencv_imgproc -lboost_thread -lstdc++ -lcblas -latlas
CUDA Mini-Nbody Test: Original OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Original GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 20 40 60 80 100 SE +/- 0.08, N = 3 SE +/- 0.15, N = 3 SE +/- 0.27, N = 3 30.51 46.51 82.29
CUDA Mini-Nbody Test: Cache Blocking OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Cache Blocking GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 8 16 24 32 40 SE +/- 0.01, N = 3 SE +/- 0.16, N = 3 SE +/- 0.01, N = 3 14.02 24.91 36.30
CUDA Mini-Nbody Test: Loop Unrolling OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Loop Unrolling GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.20, N = 3 SE +/- 0.03, N = 3 14.52 24.63 35.71
CUDA Mini-Nbody Test: SOA Data Layout OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: SOA Data Layout GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.13, N = 3 SE +/- 0.10, N = 3 28.58 51.02 81.27
CUDA Mini-Nbody Test: Flush Denormals To Zero OpenBenchmarking.org Seconds, Fewer Is Better CUDA Mini-Nbody 2015-11-10 Test: Flush Denormals To Zero GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.22, N = 3 SE +/- 0.07, N = 3 28.58 50.44 81.19
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Triad GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 14.86 14.74 14.36 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: FFT SP GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 100 200 300 400 500 SE +/- 2.81, N = 3 SE +/- 0.60, N = 3 SE +/- 1.12, N = 3 461.28 292.78 189.14 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: MD5 Hash GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 11.98 6.53 3.88 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Max SP Flops GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 2K 4K 6K 8K 10K SE +/- 88.40, N = 3 SE +/- 11.01, N = 3 SE +/- 7.67, N = 3 9397.41 4999.85 2944.94 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Bus Speed Download GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 12.53 12.53 12.53 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Bus Speed Readback GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 13.22 13.22 13.21 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
SHOC Scalable HeterOgeneous Computing Target: CUDA - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2015-11-10 Target: CUDA - Benchmark: Texture Read Bandwidth GeForce GTX 1080 GeForce GTX 980 GeForce GTX 960 110 220 330 440 550 SE +/- 1.22, N = 3 SE +/- 0.47, N = 3 SE +/- 0.15, N = 3 528.41 332.16 381.05 1. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft
Phoronix Test Suite v10.8.5