NVIDIA Pascal Fresh Summer 2018 OpenCL Benchmarks

NVIDIA OpenCL compute benchmarks on Ubuntu Linux for a future article by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1808082-KH-1807240RA61.

NVIDIA Pascal Fresh Summer 2018 OpenCL BenchmarksProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 TitestIntel Core i7-8086K @ 5.00GHz (6 Cores / 12 Threads)ASUS PRIME Z370-A (0809 BIOS)Intel Device 3ec216384MB525GB SABRENT + 118GB INTEL SSDPEK1W120GAZotac NVIDIA GeForce GTX 1050 2048MB (1354/3504MHz)Realtek ALC1220DELL P2415QIntel ConnectionUbuntu 18.044.17.8-041708-generic (x86_64)GNOME Shell 3.28.2X Server 1.19.6NVIDIA 396.454.6.0OpenCL 1.2 CUDA 9.2.177GCC 7.3.0ext43840x2160eVGA NVIDIA GeForce GTX 1050 Ti 4096MB (1354/3504MHz)NVIDIA GeForce GTX 1060 6GB 6144MB (1506/4006MHz)NVIDIA GeForce GTX 1070 8192MB (1506/4006MHz)Zotac NVIDIA GeForce GTX 1070 Ti 8192MB (1607/4006MHz)NVIDIA GeForce GTX 1080 Ti 11264MB (1480/5508MHz)AMD Ryzen 7 1700 Eight-Core @ 3.00GHz (16 Cores)Gigabyte AB350-Gaming-CFAMD Device 1450500GB Samsung SSD 850 + 512GB Intenso SSD SatGigabyte NVIDIA GeForce GTX 1070 8192MB (987/4006MHz)NVIDIA Device 10f0Realtek RTL8111/8168/8411elementary 0.4.14.15.0-30-generic (x86_64)NVIDIA 396.51GCC 5.5.0 20171010ext4 (ecryptfs)2560x1440OpenBenchmarking.orgCompiler Details- GeForce GTX 1050: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - GeForce GTX 1050 Ti: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - GeForce GTX 1060: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - GeForce GTX 1070: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - GeForce GTX 1070 Ti: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - GeForce GTX 1080 Ti: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - test: --build=x86_64-linux-gnu --disable-browser-plugin --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-arch-directory=amd64 --with-default-libstdcxx-abi=new --with-multilib-list=m32,m64,mx32 --with-tune=generic -v Processor Details- GeForce GTX 1050: Scaling Governor: intel_pstate performance- GeForce GTX 1050 Ti: Scaling Governor: intel_pstate performance- GeForce GTX 1060: Scaling Governor: intel_pstate performance- GeForce GTX 1070: Scaling Governor: intel_pstate performance- GeForce GTX 1070 Ti: Scaling Governor: intel_pstate performance- GeForce GTX 1080 Ti: Scaling Governor: intel_pstate performance- test: Scaling Governor: acpi-cpufreq ondemandOpenCL Details- GeForce GTX 1050: GPU Compute Cores: 640- GeForce GTX 1050 Ti: GPU Compute Cores: 768- GeForce GTX 1060: GPU Compute Cores: 1280- GeForce GTX 1070: GPU Compute Cores: 1920- GeForce GTX 1070 Ti: GPU Compute Cores: 2432- GeForce GTX 1080 Ti: GPU Compute Cores: 3584- test: GPU Compute Cores: 1920Security Details- GeForce GTX 1050, GeForce GTX 1050 Ti, GeForce GTX 1060, GeForce GTX 1070, GeForce GTX 1070 Ti, GeForce GTX 1080 Ti: KPTI + __user pointer sanitization + Full generic retpoline IBPB IBRS_FW + SSB disabled via prctl and seccomp ProtectionSystem Details- test: GPU Compute Cores: 1920.

NVIDIA Pascal Fresh Summer 2018 OpenCL Benchmarksshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: OpenCL - Texture Read Bandwidthviennacl: OpenCL LU Factorizationcl-mem: Copycl-mem: Readcl-mem: Writeindigobench: Bedroomindigobench: Supercarjuliagpu: GPUmandelgpu: GPUluxmark: GPU - Hotelluxmark: GPU - MicrophoneGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Titest248.783.24308.5645.0588.709690.201.424.6468020911.2048526868.7713854122223.704.12336.6749.5588.1795.1789.101.655.4882228551.0761788066.4716344703350.647.38417.8059.01140.57154.47145.602.658.74123277207.17106989159.3026306965516.4110.74458.9964.27188.23206.30197.633.7712.41155907964.80153001211.8738469982557.4613.83510.1866.49188.20206.50196.534.1513.87177590252.53189306428.90446110620988.4220.20606.7369.22319.20339.23344.105.3217.55209807323.37264945183.70570713781392710061OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti2004006008001000SE +/- 8.24, N = 6SE +/- 4.57, N = 6SE +/- 4.21, N = 3SE +/- 7.97, N = 3SE +/- 0.16, N = 3SE +/- 2.56, N = 3248.78223.70350.64516.41557.46988.421. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 HashGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.04, N = 33.244.127.3810.7413.8320.201. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti130260390520650SE +/- 1.01, N = 3SE +/- 1.20, N = 3SE +/- 2.05, N = 3SE +/- 0.13, N = 3SE +/- 0.86, N = 3SE +/- 3.78, N = 3308.56336.67417.80458.99510.18606.731. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

ViennaCL

OpenCL LU Factorization

OpenBenchmarking.orgGFLOPS, More Is BetterViennaCL 1.4.2OpenCL LU FactorizationGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti1530456075SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.37, N = 345.0549.5559.0164.2766.4969.221. (CXX) g++ options: -rdynamic -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti70140210280350SE +/- 0.30, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 388.7088.17140.57188.23188.20319.201. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti70140210280350SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.12, N = 3SE +/- 0.15, N = 3SE +/- 0.78, N = 396.0095.17154.47206.30206.50339.231. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti70140210280350SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.21, N = 390.2089.10145.60197.63196.53344.101. (CC) gcc options: -O2 -flto -lOpenCL

IndigoBench

Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.0.64Scene: BedroomGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti1.1972.3943.5914.7885.985SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.421.652.653.774.155.32

IndigoBench

Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.0.64Scene: SupercarGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti48121620SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 34.645.488.7412.4113.8717.55

JuliaGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterJuliaGPU 1.2pts1OpenCL Device: GPUGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti40M80M120M160M200MSE +/- 85767.34, N = 3SE +/- 158080.14, N = 3SE +/- 228331.44, N = 3SE +/- 396739.11, N = 3SE +/- 280197.95, N = 3SE +/- 588976.70, N = 368020911.2082228551.07123277207.17155907964.80177590252.53209807323.371. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Ti60M120M180M240M300MSE +/- 25451.70, N = 3SE +/- 121552.05, N = 3SE +/- 36638.63, N = 3SE +/- 152368.68, N = 3SE +/- 264378.94, N = 3SE +/- 328049.57, N = 348526868.7761788066.47106989159.30153001211.87189306428.90264945183.701. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: HotelGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Titest12002400360048006000SE +/- 1.67, N = 3SE +/- 1.33, N = 3SE +/- 0.33, N = 3SE +/- 20.67, N = 3SE +/- 25.01, N = 3SE +/- 16.15, N = 31385163426303846446157073927

LuxMark

OpenCL Device: GPU - Scene: Microphone

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: MicrophoneGeForce GTX 1050GeForce GTX 1050 TiGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1080 Titest3K6K9K12K15KSE +/- 16.17, N = 3SE +/- 8.17, N = 3SE +/- 1.33, N = 3SE +/- 0.88, N = 3SE +/- 0.88, N = 3SE +/- 34.18, N = 3SE +/- 27.20, N = 34122470369659982106201378110061


Phoronix Test Suite v10.8.4