Coffeelake Beignet vs. OpenCL NEO Intel

Intel Core i7-8700K testing of OpenCL Linux drivers by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 1905107-HV-COFFEELAK00
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Disable Color Branding
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
Beignet Git
May 10 2019
  3 Hours, 24 Minutes
Intel OpenCL NEO
May 10 2019
  1 Hour, 57 Minutes
Invert Behavior (Only Show Selected Data)
  2 Hours, 41 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


Coffeelake Beignet vs. OpenCL NEO IntelProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLCompilerFile-SystemScreen ResolutionBeignet GitIntel OpenCL NEOIntel Core i7-8700K @ 4.70GHz (6 Cores / 12 Threads)ASUS TUF Z370-PLUS GAMING (1802 BIOS)Intel 8th Gen Core16384MB128GB THNSN5128GPU7 TOSHIBAinteldrmfb (1200MHz)Realtek ALC887-VDDELL P2415QIntel I219-VUbuntu 18.044.18.0-18-generic (x86_64)GNOME Shell 3.28.3X Server 1.20.1modesetting 1.20.14.5 Mesa 18.2.8OpenCL 2.0 beignet 1.4 (git-fc5f430c)GCC 7.4.0 + Clang 6.0.0-1ubuntu2 + LLVM 6.0.0ext43840x2160Intel UHD 630 3GB (1200MHz)OpenCL 2.1OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersavePython Details- Python 2.7.15rc1 + Python 3.6.7Security Details- KPTI + __user pointer sanitization + Full generic retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + SSB disabled via prctl and seccomp + PTE Inversion

Beignet Git vs. Intel OpenCL NEO ComparisonPhoronix Test SuiteBaseline+932.8%+932.8%+1865.6%+1865.6%+2798.4%+2798.4%+3731.2%+3731.2%751.3%3731.3%284%216.1%149.2%102.5%51.2%44.1%13.6%10.5%6%5.4%4.4%S.P.FGPU - HotelOpenCL HeartwallOpenCL - Max SP FlopsO.L.FOpenCLOpenCL - TriadOpenCL - Bus Speed DownloadOpenCL - Bus Speed Readback33.5%OpenCL LBMBMW27 - OpenCL11.6%OpenCL - FFT SPKernel Latency9.8%OpenCL BFS8.3%G.M.BServer Rack - OpenCL5.6%OpenCL - MD5 HashOpenCL - T.R.B4.6%Fishy Cat - OpenCLclpeakLuxMarkRodiniaSHOC Scalable HeterOgeneous ComputingViennaCLLeelaChessZeroSHOC Scalable HeterOgeneous ComputingSHOC Scalable HeterOgeneous ComputingSHOC Scalable HeterOgeneous ComputingParboilBlenderSHOC Scalable HeterOgeneous ComputingclpeakParboilclpeakDarktableSHOC Scalable HeterOgeneous ComputingSHOC Scalable HeterOgeneous ComputingBlenderBeignet GitIntel OpenCL NEO

Coffeelake Beignet vs. OpenCL NEO Inteldarktable: Boat - OpenCLdarktable: Masskrug - OpenCLdarktable: Server Rack - OpenCLdarktable: Server Room - OpenCLlczero: OpenCLshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: OpenCL - Max SP Flopsshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthparboil: OpenCL BFSparboil: OpenCL LBMrodinia: OpenCL Heartwallblender: BMW27 - OpenCLblender: Classroom - OpenCLblender: Fishy Cat - OpenCLblender: Barbershop - OpenCLblender: Pabellon Barcelona - OpenCLclpeak: Kernel Latencyclpeak: Single-Precision Floatclpeak: Global Memory Bandwidthclpeak: Transfer Bandwidth enqueueReadBufferclpeak: Transfer Bandwidth enqueueWriteBufferviennacl: OpenCL LU Factorizationluxmark: GPU - Hotelxsbench-cl: comd-cl: Average Atom Update Rateshoc: OpenCL - TriadBeignet GitIntel OpenCL NEO13.876.480.184.7552.2115.980.3746720.4839.0959.740.9678.5635.174136761014135592921.7053.8831.4115.4240.828.5216114555032.589.2013.876.480.194.78105.7317.650.39147629.5129.2757.091.0469.159.16461678971137292623.83458.6833.2815.3441.3121.236132.5813.91OpenBenchmarking.org

Darktable

Darktable is an open-source photography / workflow application this will use any system-installed Darktable program or on Windows will automatically download the pre-built binary from the project. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Boat - Acceleration: OpenCLIntel OpenCL NEOBeignet Git48121620SE +/- 0.05, N = 3SE +/- 0.03, N = 313.8713.87

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Masskrug - Acceleration: OpenCLIntel OpenCL NEOBeignet Git246810SE +/- 0.01, N = 3SE +/- 0.01, N = 36.486.48

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Server Rack - Acceleration: OpenCLIntel OpenCL NEOBeignet Git0.04280.08560.12840.17120.214SE +/- 0.00, N = 3SE +/- 0.00, N = 30.190.18

OpenBenchmarking.orgSeconds, Fewer Is BetterDarktable 2.4.2Test: Server Room - Acceleration: OpenCLIntel OpenCL NEOBeignet Git1.07552.1513.22654.3025.3775SE +/- 0.01, N = 3SE +/- 0.01, N = 34.784.75

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.20.1Backend: OpenCLIntel OpenCL NEOBeignet Git20406080100SE +/- 0.42, N = 3SE +/- 3.79, N = 6105.7352.211. (CXX) g++ options: -lpthread -lOpenCL -lz

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: FFT SPIntel OpenCL NEOBeignet Git48121620SE +/- 0.03, N = 3SE +/- 0.00, N = 317.6515.981. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: MD5 HashIntel OpenCL NEOBeignet Git0.08780.17560.26340.35120.439SE +/- 0.00, N = 3SE +/- 0.00, N = 30.390.371. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Max SP FlopsIntel OpenCL NEOBeignet Git30060090012001500SE +/- 16.73, N = 15SE +/- 0.01, N = 314764671. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed DownloadIntel OpenCL NEOBeignet Git714212835SE +/- 0.15, N = 3SE +/- 0.09, N = 329.5120.481. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Bus Speed ReadbackIntel OpenCL NEOBeignet Git918273645SE +/- 0.43, N = 4SE +/- 0.47, N = 629.2739.091. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: Texture Read BandwidthIntel OpenCL NEOBeignet Git1326395265SE +/- 0.00, N = 3SE +/- 0.30, N = 357.0959.741. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi

Parboil

The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenCL BFSIntel OpenCL NEOBeignet Git0.2340.4680.7020.9361.17SE +/- 0.01, N = 3SE +/- 0.00, N = 31.040.961. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

OpenBenchmarking.orgSeconds, Fewer Is BetterParboil 2.5Test: OpenCL LBMIntel OpenCL NEOBeignet Git20406080100SE +/- 0.41, N = 3SE +/- 0.03, N = 369.1578.561. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp

Rodinia

Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes the OpenCL and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenCL HeartwallIntel OpenCL NEOBeignet Git816243240SE +/- 0.04, N = 3SE +/- 2.44, N = 129.1635.171. (CXX) g++ options: -O2 -lOpenCL

Blender

Blender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing via OpenCL or CUDA is supported. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.79aBlend File: BMW27 - Compute: OpenCLIntel OpenCL NEOBeignet Git100200300400500461413

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.79aBlend File: Classroom - Compute: OpenCLIntel OpenCL NEOBeignet Git150300450600750678676

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.79aBlend File: Fishy Cat - Compute: OpenCLIntel OpenCL NEOBeignet Git20040060080010009711014

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.79aBlend File: Barbershop - Compute: OpenCLIntel OpenCL NEOBeignet Git3006009001200150013721355

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 2.79aBlend File: Pabellon Barcelona - Compute: OpenCLIntel OpenCL NEOBeignet Git2004006008001000926929

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgus, Fewer Is BetterclpeakOpenCL Test: Kernel LatencyIntel OpenCL NEOBeignet Git612182430SE +/- 0.10, N = 3SE +/- 0.26, N = 323.8321.701. (CXX) g++ options: -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGFLOPS, More Is BetterclpeakOpenCL Test: Single-Precision FloatIntel OpenCL NEOBeignet Git100200300400500SE +/- 0.07, N = 3SE +/- 5.07, N = 12458.6853.881. (CXX) g++ options: -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Global Memory BandwidthIntel OpenCL NEOBeignet Git816243240SE +/- 0.09, N = 3SE +/- 2.22, N = 1233.2831.411. (CXX) g++ options: -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Transfer Bandwidth enqueueReadBufferIntel OpenCL NEOBeignet Git48121620SE +/- 0.19, N = 5SE +/- 0.02, N = 315.3415.421. (CXX) g++ options: -O3 -rdynamic -lOpenCL

OpenBenchmarking.orgGBPS, More Is BetterclpeakOpenCL Test: Transfer Bandwidth enqueueWriteBufferIntel OpenCL NEOBeignet Git918273645SE +/- 0.09, N = 3SE +/- 0.15, N = 341.3140.821. (CXX) g++ options: -O3 -rdynamic -lOpenCL

ViennaCL

ViennaCL is an open-source linear algebra library written in C++ and with support for OpenCL and OpenMP. This test profile uses ViennaCL OpenCL support and runs the included computational benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterViennaCL 1.4.2OpenCL LU FactorizationIntel OpenCL NEOBeignet Git510152025SE +/- 0.03, N = 3SE +/- 0.42, N = 1221.238.521. (CXX) g++ options: -rdynamic -lOpenCL

LuxMark

LuxMark is a multi-platform OpenGL benchmark using LuxRender. LuxMark supports targeting different OpenCL devices and has multiple scenes available for rendering. LuxMark is a fully open-source OpenCL program with real-world rendering examples. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterLuxMark 3.1OpenCL Device: GPU - Scene: HotelIntel OpenCL NEOBeignet Git130260390520650SE +/- 6.08, N = 3SE +/- 13.09, N = 1061316

Xsbench OpenCL

Xsbench benchmark in OpenCL via GPUOpen. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgLookups/s, More Is BetterXsbench OpenCL 2017-07-06Beignet Git2M4M6M8M10MSE +/- 2043.29, N = 3114555031. (CC) gcc options: -std=gnu99 -fopenmp -O3 -lm -lOpenCL

CoMD OpenCL

CoMD benchmark in OpenCL via GPUOpen. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgus/atom/task, More Is BetterCoMD OpenCL 2017-07-06Average Atom Update RateIntel OpenCL NEOBeignet Git0.58051.1611.74152.3222.9025SE +/- 0.00, N = 3SE +/- 0.01, N = 32.582.581. (CC) gcc options: -std=c99 -O5 -lm -lOpenCL

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2015-11-10Target: OpenCL - Benchmark: TriadIntel OpenCL NEOBeignet Git48121620SE +/- 0.06, N = 3SE +/- 0.42, N = 1513.919.201. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi