oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative.

To run this test with the Phoronix Test Suite, the basic command is: phoronix-test-suite benchmark onednn.

Project Site

github.com

Test Created

17 June 2020

Last Updated

20 December 2020

Test Maintainer

Michael Larabel

Test Type

Processor

Average Install Time

7 Minutes, 36 Seconds

Average Run Time

2 Minutes, 2 Seconds

Test Dependencies

C/C++ Compiler Toolchain + CMake

Accolades

5k+ Downloads

Supported Platforms


Public Result UploadsReported Installs*Test Completions*OpenBenchmarking.orgEventsoneDNN Popularity Statisticspts/onednn2020.062020.072020.082020.092020.102020.112020.122021.012021.024K8K12K16K20K
* Data based on those opting to upload their test results to OpenBenchmarking.org and users enabling the opt-in anonymous statistics reporting while running benchmarks from an Internet-connected platform.
Data current as of Thu, 25 Feb 2021 16:14:55 GMT.
Deconvolution Batch shapes_3d11.3%Recurrent Neural Network Training15.9%IP Shapes 1D11.3%Convolution Batch Shapes Auto11.5%Matrix Multiply Batch Shapes Transformer11.4%Recurrent Neural Network Inference15.8%IP Shapes 3D11.3%Deconvolution Batch shapes_1d11.4%Harness Option PopularityOpenBenchmarking.org
bf16bf16bf1614.1%u8s8f3240.0%f3246.0%Data Type Option PopularityOpenBenchmarking.org

Revision History

pts/onednn-1.6.1   [View Source]   Sun, 20 Dec 2020 09:58:16 GMT
This test profile builds and works fine on macOS so enable it (MacOSX).

pts/onednn-1.6.0   [View Source]   Wed, 09 Dec 2020 13:47:31 GMT
Update against oneDNN 2.0 upstream.

pts/onednn-1.5.0   [View Source]   Wed, 17 Jun 2020 16:26:39 GMT
Initial commit of oneDNN test profile based on Intel oneDNN 1.5, forked from existing mkl-dnn test profile that is named from MKL-DNN before it was renamed to DNNL and then oneDNN. So create new test profile to match Intel naming convention.

Suites Using This Test

Multi-Core

Machine Learning

Intel oneAPI

CPU Massive

Server CPU Tests

Creator Workloads

HPC - High Performance Computing


Performance Metrics

Analyze Test Configuration:

oneDNN 2.0

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.org metrics for this test profile configuration based on 464 public results since 9 December 2020 with the latest data as of 25 February 2021.

Below is an overview of the generalized performance for components where there is sufficient statistically significant data based upon user-uploaded results. It is important to keep in mind particularly in the Linux/open-source space there can be vastly different OS configurations, with this overview intended to offer just general guidance as to the performance expectations.

Component
Percentile Rank
# Matching Public Results
ms (Average)
81st
10
0.9 +/- 0.1
Mid-Tier
75th
> 1.2
74th
7
1.3 +/- 0.1
66th
4
2.3 +/- 0.1
66th
6
2.5 +/- 0.1
63rd
4
2.6 +/- 0.1
55th
7
3.8 +/- 0.1
55th
3
3.8 +/- 0.4
54th
6
3.9 +/- 0.1
Median
50th
4.0
46th
3
4.2 +/- 0.1
46th
5
4.2 +/- 0.2
37th
3
4.9 +/- 0.1
32nd
9
5.3 +/- 0.1
Low-Tier
25th
> 5.9
24th
6
6.1 +/- 0.2
21st
3
6.6 +/- 0.2
19th
3
6.7 +/- 0.5
15th
5
7.3 +/- 0.3
15th
3
7.3 +/- 0.8
15th
6
7.5 +/- 0.5
13th
7
7.9 +/- 0.2
11th
3
8.4 +/- 0.4
11th
3
8.9 +/- 0.8
10th
3
9.8 +/- 0.6
8th
3
11.4 +/- 0.1
6th
3
16.4 +/- 0.1
4th
3
23.3 +/- 0.1
2nd
3
26.6
2nd
3
28.7 +/- 0.6
1st
4
40.1 +/- 0.3
OpenBenchmarking.orgDistribution Of Public Results - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU464 Results Range From 0 To 41 ms0481216202428323640444850100150200250

Based on OpenBenchmarking.org data, the selected test / test configuration (oneDNN 2.0 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU) has an average run-time of 2 minutes. By default this test profile is set to run at least 3 times but may increase if the standard deviation exceeds pre-defined defaults or other calculations deem additional runs necessary for greater statistical accuracy of the result.

OpenBenchmarking.orgMinutesTime Required To Complete BenchmarkHarness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPURun-Time246810Min: 1 / Avg: 1.02 / Max: 3

Based on public OpenBenchmarking.org results, the selected test / test configuration has an average standard deviation of 0.4%.

OpenBenchmarking.orgPercent, Fewer Is BetterAverage Deviation Between RunsHarness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUDeviation3691215Min: 0 / Avg: 0.43 / Max: 11

Does It Scale Well With Increasing Cores?

Yes, based on the automated analysis of the collected public benchmark data, this test / test settings does generally scale well with increasing CPU core counts. Data based on publicly available results for this test / test settings, separated by vendor, result divided by the reference CPU clock speed, grouped by matching physical CPU core count, and normalized against the smallest core count tested from each vendor for each CPU having a sufficient number of test samples and statistically significant data.

AMDIntelOpenBenchmarking.orgRelative Core Scaling To BaseoneDNN CPU Core ScalingHarness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU46812162432246810

Notable Instruction Set Usage

Notable instruction set extensions supported by this test, based on an automatic analysis by the Phoronix Test Suite / OpenBenchmarking.org analytics engine.

Instruction Set
Support
Instructions Detected
SSE2 (SSE2)
Used by default on supported hardware.
 
MOVAPD MOVDQU MOVD CVTSS2SD MOVDQA CVTSI2SD DIVSD ADDSD UCOMISD COMISD MULSD CVTSD2SS MAXSD SQRTSD SUBSD MINSD MOVUPD ADDPD PUNPCKLQDQ PSRLDQ PUNPCKHQDQ SHUFPD PSHUFD CVTTSD2SI PADDQ PSHUFLW CVTDQ2PS CVTTPS2DQ MULPD MOVHPD UNPCKHPD MOVMSKPD CMPNLESD XORPD
SSE3 (SSE3)
Used by default on supported hardware.
 
MOVDDUP
SSSE3 (SSSE3)
Used by default on supported hardware.
 
PSHUFB PALIGNR
Requires passing a supported compiler/build flag (verified with targets: sandybridge, skylake, tigerlake, cascadelake, znver2).
Found on Intel processors since Sandy Bridge (2011).
Found on AMD processors since Bulldozer (2011).

 
VZEROUPPER VEXTRACTF128 VINSERTF128 VBROADCASTSS VMASKMOVPS VPERM2F128 VBROADCASTSD VPERMILPS
Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, znver2).
Found on Intel processors since Haswell (2013).
Found on AMD processors since Excavator (2016).

 
VPERM2I128 VPSLLVD VEXTRACTI128 VPBROADCASTD VINSERTI128 VPSRAVD VPMASKMOVQ VPERMQ VPBROADCASTQ VGATHERQPS VPBROADCASTW VPSRLVQ VGATHERDPS VPBROADCASTB VPGATHERQQ VPGATHERQD VPSLLVQ VPGATHERDD VPGATHERDQ VPERMD
FMA (FMA)
Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, znver2).
Found on Intel processors since Haswell (2013).
Found on AMD processors since Bulldozer (2011).

 
VFMADD213SS VFMADD132SS VFMADD231SS VFMADD132PS VFMADD231PS VFMADD213PS VFNMSUB231SS VFNMADD132SS VFNMSUB132SS VFNMADD231SS VFNMADD231PS VFMADD132SD VFMADD231SD VFNMADD213SS VFMSUB231SS VFMSUB231PS VFMADD231PD VFMADD213PD VFMADD132PD VFMSUB231SD VFNMSUB231PS VFNMADD132PS VFNMSUB132PS
The test / benchmark does honor compiler flag changes.
Last automated analysis: 31 January 2021

This test profile binary relies on the shared libraries libdnnl.so.2, libpthread.so.0, libm.so.6, libgomp.so.1, libc.so.6, libdl.so.2.

Recent Test Results

OpenBenchmarking.org Results Compare

5 Systems - 42 Benchmark Results

Ampere eMAG ARMv8 - AmpereComputing OSPREY - Applied Micro Circuits X-Gene

Ubuntu 20.04 - 5.7.0-050700-generic - GNOME Shell 3.36.3

1 System - 144 Benchmark Results

Intel Core i9-10885H - HP 8736 - Intel Comet Lake PCH

Ubuntu 20.10 - 5.8.0-43-generic - GNOME Shell 3.38.2

1 System - 2393 Benchmark Results

AMD Ryzen 7 PRO 4750G - ASRock A520M-ITX/ac - AMD Renoir Root Complex

Gentoo - 5.10.16 - amd

5 Systems - 131 Benchmark Results

Intel Core i7-8086K - ASUS PRIME Z370-A - Intel 8th Gen Core

Ubuntu 20.04 - 5.9.0-050900rc8daily20201009-generic - GNOME Shell 3.36.4

2 Systems - 151 Benchmark Results

Intel Core i7-10700T - Logic Supply RXM-181 - Intel Comet Lake PCH

Ubuntu 20.10 - 5.8.0-41-generic - GNOME Shell 3.38.2

1 System - 2388 Benchmark Results

AMD Ryzen 7 PRO 4750G - ASRock A520M-ITX/ac - AMD Renoir Root Complex

Gentoo - 5.10.16 - amd

1 System - 2326 Benchmark Results

AMD Ryzen 7 PRO 4750G - ASRock A520M-ITX/ac - AMD Renoir Root Complex

Gentoo - 5.10.15 - GCC 10.2.0 + Clang 11.0.0 + LLVM 11.0.0

1 System - 158 Benchmark Results

AMD Ryzen 7 3700X 8-Core - MSI A520M-A PRO - AMD Starship

Fedora 33 - 5.10.14-200.fc33.x86_64 - Clang 11.0.0

1 System - 2276 Benchmark Results

1 System - 167 Benchmark Results

AMD Ryzen Threadripper 2950X 16-Core - ASRock X399 Professional Gaming - AMD 17h

Ubuntu 16.04 - 4.19.174-custom - X Server 1.19.6

6 Systems - 143 Benchmark Results

AMD Ryzen 9 5900X 12-Core - ASUS ROG CROSSHAIR VIII HERO - AMD Starship

Ubuntu 20.10 - 5.10.7-051007-generic - GNOME Shell 3.38.1

4 Systems - 133 Benchmark Results

Intel Xeon E3-1260L v5 - ASRock E3V5 WS - Intel Xeon E3-1200 v5

Ubuntu 20.10 - 5.8.0-33-generic - GNOME Shell 3.38.1

Most Popular Test Results

OpenBenchmarking.org Results Compare

2 Systems - 151 Benchmark Results

Intel Core i7-10700T - Logic Supply RXM-181 - Intel Comet Lake PCH

Ubuntu 20.10 - 5.8.0-41-generic - GNOME Shell 3.38.2

5 Systems - 42 Benchmark Results

Ampere eMAG ARMv8 - AmpereComputing OSPREY - Applied Micro Circuits X-Gene

Ubuntu 20.04 - 5.7.0-050700-generic - GNOME Shell 3.36.3

3 Systems - 74 Benchmark Results

2 x AMD EPYC 7601 32-Core - Dell 02MJ3T - AMD 17h

Ubuntu 19.10 - 5.9.0-050900rc6daily20200922-generic - GNOME Shell 3.34.1

3 Systems - 191 Benchmark Results

AMD Ryzen 3 2200G - ASUS PRIME B350M-E - AMD Raven

Ubuntu 20.10 - 5.8.0-38-generic - GNOME Shell 3.38.1

4 Systems - 80 Benchmark Results

Intel Core i7-9750H - Notebook P95_96_97Ex Rx - Intel Cannon Lake PCH

Ubuntu 20.04 - 5.7.0-999-generic - GNOME Shell 3.36.4

3 Systems - 376 Benchmark Results

2 x AMD EPYC 7F72 24-Core - Supermicro H11DSi-NT v2.00 - AMD Starship

Ubuntu 20.10 - 5.11.0-051100rc4daily20210122-generic - GNOME Shell 3.38.1

3 Systems - 19 Benchmark Results

AMD Ryzen Threadripper 3990X 64-Core - System76 Thelio Major - AMD Starship

Pop 20.10 - 5.8.0-7625-generic - GNOME Shell 3.38.1

4 Systems - 104 Benchmark Results

AMD Ryzen 5 2400G - MSI B350M GAMING PRO - AMD Raven

Ubuntu 19.10 - 5.3.0-64-generic - GNOME Shell 3.34.1

4 Systems - 133 Benchmark Results

Intel Xeon E3-1260L v5 - ASRock E3V5 WS - Intel Xeon E3-1200 v5

Ubuntu 20.10 - 5.8.0-33-generic - GNOME Shell 3.38.1

2 Systems - 85 Benchmark Results

Intel Core i7-10510U - LENOVO 20U9CTO1WW - Intel Comet Lake PCH-LP

Fedora 33 - 5.9.16-200.fc33.x86_64 - KDE Plasma 5.20.4

Find More Test Results

OpenBenchmarking.org Community User Comments

Post A Comment