NVIDIA GeForce GTX 1080 CUDA Linux Compute GPGPU Testing

NVIDIA GeForce GTX 1080 CUDA benchmarking including deep learning on Pascal. Benchmarks by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1606116-HA-CUDATESTI01&grr&sor.

NVIDIA GeForce GTX 1080 CUDA Linux Compute GPGPU TestingProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay DriverOpenGLVulkanCompilerFile-SystemScreen ResolutionGeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 1080Intel Xeon E3-1280 v5 @ 4.00GHz (8 Cores)MSI C236A WORKSTATION (MS-7998) v1.0Intel Sky Lake16384MBSamsung SSD 950 PRO 256GBeVGA NVIDIA GeForce GTX 960 2043MB (1277/3505MHz)Realtek ALC1150Intel ConnectionUbuntu 16.044.4.0-22-generic (x86_64)Unity 7.4.0NVIDIA 367.184.5.01.0.8GCC 5.3.1 20160413 + CUDA 8.0ext43840x2160eVGA NVIDIA GeForce GTX 970 4091MB (1163/3505MHz)NVIDIA GeForce GTX 980 4091MB (1126/3505MHz)NVIDIA GeForce GTX 980 Ti 6139MB (999/3505MHz)NVIDIA GeForce GTX TITAN X 12283MB (1001/3505MHz)GeForce GTX 1080 8187MB (909/5005MHz)OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- Scaling Governor: intel_pstate performanceOpenCL Details- GeForce GTX 960: GPU Compute Cores: 1024- GeForce GTX 970: GPU Compute Cores: 1664- GeForce GTX 980: GPU Compute Cores: 2048- GeForce GTX 980 Ti: GPU Compute Cores: 2816- GeForce GTX TITAN X: GPU Compute Cores: 3072- GeForce GTX 1080: GPU Compute Cores: 2560System Details- GeForce GTX 960: GPU Compute Cores: 1024.- GeForce GTX 970: GPU Compute Cores: 1664.- GeForce GTX 980: GPU Compute Cores: 2048.- GeForce GTX 980 Ti: GPU Compute Cores: 2816.- GeForce GTX TITAN X: GPU Compute Cores: 3072.- GeForce GTX 1080: GPU Compute Cores: 2560.

NVIDIA GeForce GTX 1080 CUDA Linux Compute GPGPU Testingcuda-mini-nbody: Flush Denormals To Zerocuda-mini-nbody: SOA Data Layoutcuda-mini-nbody: Loop Unrollingcuda-mini-nbody: Cache Blockingcuda-mini-nbody: Originalshoc: CUDA - Texture Read Bandwidthshoc: CUDA - Max SP Flopsshoc: CUDA - MD5 Hashshoc: CUDA - FFT SPcaffe: CUDAGeForce GTX 960GeForce GTX 970GeForce GTX 980GeForce GTX 980 TiGeForce GTX TITAN XGeForce GTX 108081.1981.2735.7136.3082.29381.052944.943.88189.1428134.0757.2057.0926.3826.7552.04351.324316.435.47265.1723567.7050.4451.0224.6324.9146.51332.164999.856.53292.7815504.5342.1042.0419.6419.6935.35348.366144.297.81302.7612011.2738.5238.6918.7118.6733.09352.056886.698.43322.5711397.1328.5828.5814.5214.0230.51528.419397.4111.98461.288959.77OpenBenchmarking.org

CUDA Mini-Nbody

Test: Flush Denormals To Zero

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Flush Denormals To ZeroGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 96020406080100SE +/- 0.06, N = 3SE +/- 0.10, N = 3SE +/- 0.05, N = 3SE +/- 0.22, N = 3SE +/- 0.09, N = 3SE +/- 0.07, N = 328.5838.5242.1050.4457.2081.19

CUDA Mini-Nbody

Test: SOA Data Layout

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: SOA Data LayoutGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 96020406080100SE +/- 0.05, N = 3SE +/- 0.06, N = 3SE +/- 0.12, N = 3SE +/- 0.13, N = 3SE +/- 0.01, N = 3SE +/- 0.10, N = 328.5838.6942.0451.0257.0981.27

CUDA Mini-Nbody

Test: Loop Unrolling

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Loop UnrollingGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 960816243240SE +/- 0.02, N = 3SE +/- 0.18, N = 3SE +/- 0.16, N = 3SE +/- 0.20, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 314.5218.7119.6424.6326.3835.71

CUDA Mini-Nbody

Test: Cache Blocking

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: Cache BlockingGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 960816243240SE +/- 0.01, N = 3SE +/- 0.20, N = 3SE +/- 0.30, N = 3SE +/- 0.16, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 314.0218.6719.6924.9126.7536.30

CUDA Mini-Nbody

Test: Original

OpenBenchmarking.orgSeconds, Fewer Is BetterCUDA Mini-Nbody 2015-11-10Test: OriginalGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 96020406080100SE +/- 0.08, N = 3SE +/- 0.18, N = 3SE +/- 0.21, N = 3SE +/- 0.15, N = 3SE +/- 0.13, N = 3SE +/- 0.27, N = 330.5133.0935.3546.5152.0482.29

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Texture Read BandwidthGeForce GTX 1080GeForce GTX 960GeForce GTX TITAN XGeForce GTX 970GeForce GTX 980 TiGeForce GTX 980110220330440550SE +/- 1.22, N = 3SE +/- 0.15, N = 3SE +/- 1.11, N = 3SE +/- 0.03, N = 3SE +/- 0.24, N = 3SE +/- 0.47, N = 3528.41381.05352.05351.32348.36332.161. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: Max SP FlopsGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 9602K4K6K8K10KSE +/- 88.40, N = 3SE +/- 41.66, N = 3SE +/- 21.31, N = 3SE +/- 11.01, N = 3SE +/- 1.66, N = 3SE +/- 7.67, N = 39397.416886.696144.294999.854316.432944.941. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: MD5 HashGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 9603691215SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 311.988.437.816.535.473.881. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

SHOC Scalable HeterOgeneous Computing

Target: CUDA - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: CUDA - Benchmark: FFT SPGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 960100200300400500SE +/- 2.81, N = 3SE +/- 0.29, N = 3SE +/- 4.36, N = 5SE +/- 0.60, N = 3SE +/- 0.05, N = 3SE +/- 1.12, N = 3461.28322.57302.76292.78265.17189.141. (CXX) g++ options: -O2 -lSHOCCommon -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lcufft

Caffe AlexNet

Build: CPU Only

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe AlexNet 2016-06-11Build: CPU OnlyXeon E3-1280 v5 - CPU Only400K800K1200K1600K2000KSE +/- 4001.26, N = 317872071. (CXX) g++ options: -pthread -fPIC -O2 -lcaffe -lglog -lgflags -lprotobuf -lboost_system -lboost_filesystem -lm -lhdf5_hl -lhdf5 -lleveldb -lsnappy -llmdb -lopencv_core -lopencv_highgui -lopencv_imgproc -lboost_thread -lstdc++ -lcblas -latlas

Caffe AlexNet

Build: CUDA

OpenBenchmarking.orgMilli-Seconds, Fewer Is BetterCaffe AlexNet 2016-06-11Build: CUDAGeForce GTX 1080GeForce GTX TITAN XGeForce GTX 980 TiGeForce GTX 980GeForce GTX 970GeForce GTX 9606K12K18K24K30KSE +/- 3.43, N = 3SE +/- 26.29, N = 3SE +/- 7.42, N = 3SE +/- 17.87, N = 3SE +/- 1758.76, N = 6SE +/- 2.72, N = 38959.7711397.1312011.2715504.5323567.7028134.071. (CXX) g++ options: -pthread -fPIC -O2 -lcaffe -lglog -lgflags -lprotobuf -lboost_system -lboost_filesystem -lm -lhdf5_hl -lhdf5 -lleveldb -lsnappy -llmdb -lopencv_core -lopencv_highgui -lopencv_imgproc -lboost_thread -lstdc++ -lcblas -latlas


Phoronix Test Suite v10.8.4