oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative.

To run this test with the Phoronix Test Suite, the basic command is: phoronix-test-suite benchmark onednn.

Project Site

github.com

Test Created

17 June 2020

Last Updated

13 March 2021

Test Maintainer

Michael Larabel 

Test Type

Processor

Average Install Time

7 Minutes, 17 Seconds

Average Run Time

2 Minutes, 4 Seconds

Test Dependencies

C/C++ Compiler Toolchain + CMake

Accolades

10k+ Downloads

Supported Platforms


Public Result Uploads *Reported Installs **Reported Test Completions **Test Profile Page Views ***OpenBenchmarking.orgEventsoneDNN Popularity Statisticspts/onednn2020.062020.072020.082020.092020.102020.112020.122021.012021.022021.032021.042021.052021.062021.072021.082021.092021.106K12K18K24K30K
* Uploading of benchmark result data to OpenBenchmarking.org is always optional (opt-in) via the Phoronix Test Suite for users wishing to share their results publicly.
** Data based on those opting to upload their test results to OpenBenchmarking.org and users enabling the opt-in anonymous statistics reporting while running benchmarks from an Internet-connected platform.
*** Test profile page view reporting began March 2021.
Data current as of 17 October 2021.
IP Shapes 1D11.3%Recurrent Neural Network Training15.4%Matrix Multiply Batch Shapes Transformer11.6%Recurrent Neural Network Inference15.5%Deconvolution Batch shapes_3d11.8%Convolution Batch Shapes Auto11.9%IP Shapes 3D11.2%Deconvolution Batch shapes_1d11.1%Harness Option PopularityOpenBenchmarking.org
bf16bf16bf1616.6%u8s8f3241.0%f3242.4%Data Type Option PopularityOpenBenchmarking.org

Revision History

pts/onednn-1.7.0   [View Source]   Sat, 13 Mar 2021 07:49:33 GMT
Update against oneDNN 2.1.2 upstream.

pts/onednn-1.6.1   [View Source]   Sun, 20 Dec 2020 09:58:16 GMT
This test profile builds and works fine on macOS so enable it (MacOSX).

pts/onednn-1.6.0   [View Source]   Wed, 09 Dec 2020 13:47:31 GMT
Update against oneDNN 2.0 upstream.

pts/onednn-1.5.0   [View Source]   Wed, 17 Jun 2020 16:26:39 GMT
Initial commit of oneDNN test profile based on Intel oneDNN 1.5, forked from existing mkl-dnn test profile that is named from MKL-DNN before it was renamed to DNNL and then oneDNN. So create new test profile to match Intel naming convention.

Suites Using This Test

Multi-Core

Machine Learning

Intel oneAPI

CPU Massive

Server CPU Tests

Creator Workloads

HPC - High Performance Computing


Performance Metrics

Analyze Test Configuration:

oneDNN 2.1.2

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.org metrics for this test profile configuration based on 490 public results since 13 March 2021 with the latest data as of 21 October 2021.

Below is an overview of the generalized performance for components where there is sufficient statistically significant data based upon user-uploaded results. It is important to keep in mind particularly in the Linux/open-source space there can be vastly different OS configurations, with this overview intended to offer just general guidance as to the performance expectations.

Component
Percentile Rank
# Compatible Public Results
ms (Average)
Mid-Tier
75th
> 1
Median
50th
13
Low-Tier
25th
> 21
14th
3
23 +/- 1
OpenBenchmarking.orgDistribution Of Public Results - Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU477 Results Range From 0 To 507 ms02346699211513816118420723025327629932234536839141443746048350652990180270360450

Based on OpenBenchmarking.org data, the selected test / test configuration (oneDNN 2.1.2 - Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU) has an average run-time of 2 minutes. By default this test profile is set to run at least 3 times but may increase if the standard deviation exceeds pre-defined defaults or other calculations deem additional runs necessary for greater statistical accuracy of the result.

OpenBenchmarking.orgMinutesTime Required To Complete BenchmarkHarness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPURun-Time246810Min: 1 / Avg: 1 / Max: 1

Based on public OpenBenchmarking.org results, the selected test / test configuration has an average standard deviation of 0.3%.

OpenBenchmarking.orgPercent, Fewer Is BetterAverage Deviation Between RunsHarness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUDeviation3691215Min: 0 / Avg: 0.31 / Max: 9

Does It Scale Well With Increasing Cores?

No, based on the automated analysis of the collected public benchmark data, this test / test settings does not generally scale well with increasing CPU core counts. Data based on publicly available results for this test / test settings, separated by vendor, result divided by the reference CPU clock speed, grouped by matching physical CPU core count, and normalized against the smallest core count tested from each vendor for each CPU having a sufficient number of test samples and statistically significant data.

AMDIntelOpenBenchmarking.orgRelative Core Scaling To BaseoneDNN CPU Core ScalingHarness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU46812163264918273645

Notable Instruction Set Usage

Notable instruction set extensions supported by this test, based on an automatic analysis by the Phoronix Test Suite / OpenBenchmarking.org analytics engine.

Instruction Set
Support
Instructions Detected
Used by default on supported hardware.
Found on Intel processors since Sandy Bridge (2011).
Found on AMD processors since Bulldozer (2011).

 
VZEROUPPER VBROADCASTSS VEXTRACTF128 VINSERTF128 VMASKMOVPS VPERM2F128 VBROADCASTSD VPERMILPS
Used by default on supported hardware.
Found on Intel processors since Haswell (2013).
Found on AMD processors since Excavator (2016).

 
VINSERTI128 VEXTRACTI128 VPERM2I128 VPSLLVD VPBROADCASTD VPSRAVD VPMASKMOVQ VPERMQ VPBROADCASTQ VPBROADCASTW VPSRLVQ VPBROADCASTB VPSLLVQ VGATHERQPS VPGATHERQQ VPGATHERQD VGATHERDPS VPGATHERDQ VPGATHERDD VPERMD
FMA (FMA)
Used by default on supported hardware.
Found on Intel processors since Haswell (2013).
Found on AMD processors since Bulldozer (2011).

 
VFMADD231SS VFMADD132SS VFMADD213SS VFMADD132PS VFMADD231PS VFMADD213PS VFNMSUB231SS VFNMADD132SS VFNMSUB132SS VFNMADD231SS VFNMADD231PS VFNMADD213SS VFMADD132SD VFMSUB231SS VFMADD231SD VFMSUB231PS VFMADD231PD VFMADD213PD VFMADD132PD VFMSUB231SD VFNMSUB231PS VFNMADD132PS VFNMSUB132PS
The test / benchmark does honor compiler flag changes.
Last automated analysis: 10 May 2021

This test profile binary relies on the shared libraries libdnnl.so.2, libpthread.so.0, libm.so.6, libgomp.so.1, libc.so.6, libdl.so.2.

Recent Test Results

OpenBenchmarking.org Results Compare

3 Systems - 90 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

3 Systems - 74 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

3 Systems - 70 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

3 Systems - 60 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 108 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 96 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

3 Systems - 59 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

3 Systems - 58 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 95 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

3 Systems - 45 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 63 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

3 Systems - 41 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 59 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

3 Systems - 40 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

2 Systems - 55 Benchmark Results

ARMv8 rev 0 - e3360_1099 - 32GB

Ubuntu 20.04 - 5.10.41-tegra - X Server

Most Popular Test Results

OpenBenchmarking.org Results Compare

8 Systems - 439 Benchmark Results

Intel Core i9-10900K - Gigabyte Z490 AORUS MASTER - Intel Comet Lake PCH

Ubuntu 21.04 - 5.12.0-051200rc3daily20210315-generic - GNOME Shell 3.38.3

3 Systems - 104 Benchmark Results

Intel Core i7-2700K - BIOSTAR B75MU3B v5.0 - Intel 2nd Generation Core DRAM

Ubuntu 20.04 - 5.9.1-050901-generic - GNOME Shell 3.34.1

4 Systems - 179 Benchmark Results

AMD Ryzen 5 5600X 6-Core - ASRock X570 Taichi - AMD Starship

Ubuntu 20.04 - 5.10.13-051013-lowlatency - GNOME Shell 3.36.4

3 Systems - 52 Benchmark Results

Intel Xeon E3-1280 v5 - MSI Z170A SLI PLUS - Intel Xeon E3-1200 v5

Ubuntu 20.04 - 5.9.0-050900rc2daily20200826-generic - GNOME Shell 3.36.4

3 Systems - 189 Benchmark Results

AMD EPYC 7763 64-Core - Supermicro H12SSL-i v1.01 - AMD Starship

Ubuntu 20.04 - 5.12.0-051200rc6daily20210408-generic - GNOME Shell 3.36.4

3 Systems - 31 Benchmark Results

Intel Core i5-8400 - MSI Z370M MORTAR - Intel 8th Gen Core

Ubuntu 20.04 - 5.9.0-050900rc7daily20200929-generic - GNOME Shell 3.36.4

3 Systems - 178 Benchmark Results

Intel Core i7-4770K - Gigabyte Z97-HD3 - Intel 4th Gen Core DRAM

Ubuntu 20.10 - 5.8.0-43-generic - GNOME Shell 3.38.1

3 Systems - 29 Benchmark Results

AMD Ryzen 7 3700X 8-Core - Gigabyte A320M-S2H-CF - AMD Starship

Ubuntu 20.04 - 5.8.1-050801-generic - GNOME Shell 3.36.4

3 Systems - 24 Benchmark Results

Intel Core i9-10980XE - ASRock X299 Steel Legend - Intel Sky Lake-E DMI3 Registers

Ubuntu 20.04 - 5.12.0-051200rc1daily20210305-generic - GNOME Shell 3.36.4

2 Systems - 354 Benchmark Results

AMD Ryzen Threadripper 3990X 64-Core - Gigabyte TRX40 AORUS PRO WIFI - AMD Starship

Pop 21.04 - 5.11.0-7620-generic - GNOME Shell 3.38.4

3 Systems - 49 Benchmark Results

Intel Core i7-8565U - Dell 0KTW76 - Intel Cannon Point-LP

Ubuntu 20.10 - 5.8.0-44-generic - GNOME Shell 3.38.2

3 Systems - 214 Benchmark Results

2 x Intel Xeon Platinum 8380 - Intel M50CYP2SB2U - Intel Device 0998

Ubuntu 21.04 - 5.13.0-051300rc4-generic - GNOME Shell 3.38.4

4 Systems - 44 Benchmark Results

AMD Ryzen Threadripper 2950X 16-Core - MSI MEG X399 CREATION - AMD 17h

Debian 10 - 5.8.1-050801-generic - GNOME Shell 3.30.2

12 Systems - 453 Benchmark Results

Intel Core i9-11900K - ASUS ROG MAXIMUS XIII HERO - Intel Tiger Lake-H

Ubuntu 21.04 - 5.12.0-051200rc3daily20210315-generic - GNOME Shell 3.38.3

4 Systems - 172 Benchmark Results

Intel Xeon E3-1275 v6 - ASUS P10S-M WS - Intel Xeon E3-1200 v6

Ubuntu 20.04 - 5.9.0-050900rc8daily20201007-generic - X Server 1.20.8

Find More Test Results