NVIDIA Pascal Fresh Summer 2018 OpenCL Benchmarks

NVIDIA OpenCL compute benchmarks on Ubuntu Linux for a future article by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1807240-RA-NVIDIAPAS67&grr&rdt.

NVIDIA Pascal Fresh Summer 2018 OpenCL BenchmarksProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 1050Intel Core i7-8086K @ 5.00GHz (6 Cores / 12 Threads)ASUS PRIME Z370-A (0809 BIOS)Intel Device 3ec216384MB525GB SABRENT + 118GB INTEL SSDPEK1W120GANVIDIA GeForce GTX 1060 6GB 6144MB (1506/4006MHz)Realtek ALC1220DELL P2415QIntel ConnectionUbuntu 18.044.17.8-041708-generic (x86_64)GNOME Shell 3.28.2X Server 1.19.6NVIDIA 396.454.6.0OpenCL 1.2 CUDA 9.2.177GCC 7.3.0ext43840x2160NVIDIA GeForce GTX 1070 8192MB (1506/4006MHz)Zotac NVIDIA GeForce GTX 1070 Ti 8192MB (1607/4006MHz)eVGA NVIDIA GeForce GTX 1050 Ti 4096MB (1354/3504MHz)NVIDIA GeForce GTX 1080 Ti 11264MB (1480/5508MHz)Zotac NVIDIA GeForce GTX 1050 2048MB (1354/3504MHz)OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-as=/usr/bin/x86_64-linux-gnu-as --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-ld=/usr/bin/x86_64-linux-gnu-ld --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performanceOpenCL Details- GeForce GTX 1060: GPU Compute Cores: 1280- GeForce GTX 1070: GPU Compute Cores: 1920- GeForce GTX 1070 Ti: GPU Compute Cores: 2432- GeForce GTX 1050 Ti: GPU Compute Cores: 768- GeForce GTX 1080 Ti: GPU Compute Cores: 3584- GeForce GTX 1050: GPU Compute Cores: 640Security Details- KPTI + __user pointer sanitization + Full generic retpoline IBPB IBRS_FW + SSB disabled via prctl and seccomp Protection

NVIDIA Pascal Fresh Summer 2018 OpenCL Benchmarksindigobench: Bedroomluxmark: GPU - Microphoneluxmark: GPU - Hotelindigobench: Supercarjuliagpu: GPUshoc: OpenCL - Texture Read Bandwidthmandelgpu: GPUcl-mem: Readcl-mem: Writecl-mem: Copyshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashviennacl: OpenCL LU FactorizationGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 10502.65696526308.74123277207.17417.80106989159.30154.47145.60140.57350.647.3859.013.779982384612.41155907964.80458.99153001211.87206.30197.63188.23516.4110.7464.274.1510620446113.87177590252.53510.18189306428.90206.50196.53188.20557.4613.8366.491.65470316345.4882228551.07336.6761788066.4795.1789.1088.17223.704.1249.555.3213781570717.55209807323.37606.73264945183.70339.23344.10319.20988.4220.2069.221.42412213854.6468020911.20308.5648526868.779690.2088.70248.783.2445.05OpenBenchmarking.org

IndigoBench

Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.0.64Scene: BedroomGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 10501.1972.3943.5914.7885.985SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 32.653.774.151.655.321.42

LuxMark

OpenCL Device: GPU - Scene: Microphone

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: MicrophoneGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 10503K6K9K12K15KSE +/- 1.33, N = 3SE +/- 0.88, N = 3SE +/- 0.88, N = 3SE +/- 8.17, N = 3SE +/- 34.18, N = 3SE +/- 16.17, N = 369659982106204703137814122

LuxMark

OpenCL Device: GPU - Scene: Hotel

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: HotelGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 105012002400360048006000SE +/- 0.33, N = 3SE +/- 20.67, N = 3SE +/- 1.33, N = 3SE +/- 25.01, N = 3SE +/- 1.67, N = 3263038464461163457071385

IndigoBench

Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.0.64Scene: SupercarGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 105048121620SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 38.7412.4113.875.4817.554.64

JuliaGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterJuliaGPU 1.2pts1OpenCL Device: GPUGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 105040M80M120M160M200MSE +/- 228331.44, N = 3SE +/- 396739.11, N = 3SE +/- 280197.95, N = 3SE +/- 158080.14, N = 3SE +/- 588976.70, N = 3SE +/- 85767.34, N = 3123277207.17155907964.80177590252.5382228551.07209807323.3768020911.201. (CC) gcc options: -O3 -march=native -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL -lm

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 1050130260390520650SE +/- 2.05, N = 3SE +/- 0.13, N = 3SE +/- 0.86, N = 3SE +/- 1.20, N = 3SE +/- 3.78, N = 3SE +/- 1.01, N = 3417.80458.99510.18336.67606.73308.561. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 105060M120M180M240M300MSE +/- 36638.63, N = 3SE +/- 152368.68, N = 3SE +/- 264378.94, N = 3SE +/- 121552.05, N = 3SE +/- 328049.57, N = 3SE +/- 25451.70, N = 3106989159.30153001211.87189306428.9061788066.47264945183.7048526868.771. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 105070140210280350SE +/- 0.07, N = 3SE +/- 0.12, N = 3SE +/- 0.15, N = 3SE +/- 0.03, N = 3SE +/- 0.78, N = 3154.47206.30206.5095.17339.2396.001. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 105070140210280350SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.21, N = 3SE +/- 0.00, N = 3145.60197.63196.5389.10344.1090.201. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 105070140210280350SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.30, N = 3140.57188.23188.2088.17319.2088.701. (CC) gcc options: -O2 -flto -lOpenCL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 10502004006008001000SE +/- 4.21, N = 3SE +/- 7.97, N = 3SE +/- 0.16, N = 3SE +/- 4.57, N = 6SE +/- 2.56, N = 3SE +/- 8.24, N = 6350.64516.41557.46223.70988.42248.781. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 HashGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 1050510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.04, N = 3SE +/- 0.00, N = 37.3810.7413.834.1220.203.241. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

ViennaCL

OpenCL LU Factorization

OpenBenchmarking.orgGFLOPS, More Is BetterViennaCL 1.4.2OpenCL LU FactorizationGeForce GTX 1060GeForce GTX 1070GeForce GTX 1070 TiGeForce GTX 1050 TiGeForce GTX 1080 TiGeForce GTX 10501530456075SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.37, N = 3SE +/- 0.01, N = 359.0164.2766.4949.5569.2245.051. (CXX) g++ options: -rdynamic -lOpenCL


Phoronix Test Suite v10.8.4