oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative.
To run this test with the Phoronix Test Suite , the basic command is: phoronix-test-suite benchmark onednn .
Test Created 17 June 2020
Last Updated 13 March 2021
Test Maintainer Michael Larabel
Test Type Processor
Average Install Time 6 Minutes, 55 Seconds
Average Run Time 2 Minutes, 4 Seconds
Test Dependencies C/C++ Compiler Toolchain + CMake
Accolades 5k+ Downloads Public Result Uploads Reported Installs* Test Completions* OpenBenchmarking.org Events oneDNN Popularity Statistics pts/onednn 2020.06 2020.07 2020.08 2020.09 2020.10 2020.11 2020.12 2021.01 2021.02 2021.03 2021.04 4K 8K 12K 16K 20K
* Data based on those opting to upload their test results to OpenBenchmarking.org and users enabling the opt-in anonymous statistics reporting while running benchmarks from an Internet-connected platform. Data current as of Sat, 17 Apr 2021 02:00:18 GMT.
IP Shapes 1D 11.5% Recurrent Neural Network Training 16.2% Matrix Multiply Batch Shapes Transformer 11.2% Recurrent Neural Network Inference 16.2% Deconvolution Batch shapes_3d 11.7% Convolution Batch Shapes Auto 10.7% IP Shapes 3D 11.3% Deconvolution Batch shapes_1d 11.2% Harness Option Popularity OpenBenchmarking.org
bf16bf16bf16 14.7% u8s8f32 41.3% f32 44.0% Data Type Option Popularity OpenBenchmarking.org
Revision Historypts/onednn-1.7.0 [View Source ] Sat, 13 Mar 2021 07:49:33 GMT Update against oneDNN 2.1.2 upstream.
pts/onednn-1.6.1 [View Source ] Sun, 20 Dec 2020 09:58:16 GMT This test profile builds and works fine on macOS so enable it (MacOSX).
pts/onednn-1.6.0 [View Source ] Wed, 09 Dec 2020 13:47:31 GMT Update against oneDNN 2.0 upstream.
pts/onednn-1.5.0 [View Source ] Wed, 17 Jun 2020 16:26:39 GMT Initial commit of oneDNN test profile based on Intel oneDNN 1.5, forked from existing mkl-dnn test profile that is named from MKL-DNN before it was renamed to DNNL and then oneDNN. So create new test profile to match Intel naming convention.
Performance MetricsAnalyze Test Configuration: pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.7.x - oneDNN 2.1.2 - Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.6.x - oneDNN 2.0 - Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: IP Batch All - Data Type: f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: IP Batch 1D - Data Type: f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: IP Batch All - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: IP Batch 1D - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Deconvolution Batch deconv_1d - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Deconvolution Batch deconv_3d - Data Type: bf16bf16bf16 - Engine: CPU pts/onednn-1.5.x - oneDNN 1.5 - Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU oneDNN 2.0 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org metrics for this test profile configuration based on 53 public results since 9 December 2020 with the latest data as of 10 February 2021 .
Below is an overview of the generalized performance for components where there is sufficient statistically significant data based upon user-uploaded results. It is important to keep in mind particularly in the Linux/open-source space there can be vastly different OS configurations, with this overview intended to offer just general guidance as to the performance expectations.
Component
Percentile Rank
# Matching Public Results
ms (Average)
OpenBenchmarking.org Distribution Of Public Results - Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU 53 Results Range From 3 To 36 ms 3 6 9 12 15 18 21 24 27 30 33 36 39 8 16 24 32 40
Based on OpenBenchmarking.org data, the selected test / test configuration (oneDNN 2.0 - Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU ) has an average run-time of 2 minutes . By default this test profile is set to run at least 3 times but may increase if the standard deviation exceeds pre-defined defaults or other calculations deem additional runs necessary for greater statistical accuracy of the result.
OpenBenchmarking.org Minutes Time Required To Complete Benchmark Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU Run-Time 2 4 6 8 10 Min: 1 / Avg: 1.38 / Max: 3
Based on public OpenBenchmarking.org results, the selected test / test configuration has an average standard deviation of 0.5% .
OpenBenchmarking.org Percent, Fewer Is Better Average Deviation Between Runs Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU Deviation 2 4 6 8 10 Min: 0 / Avg: 0.49 / Max: 6
Notable Instruction Set Usage Notable instruction set extensions supported by this test, based on an automatic analysis by the Phoronix Test Suite / OpenBenchmarking.org analytics engine.
Instruction Set
Support
Instructions Detected
SSE2 (SSE2)
Used by default on supported hardware.
MOVAPD MOVDQU MOVD CVTSS2SD MOVDQA CVTSI2SD DIVSD ADDSD UCOMISD COMISD MULSD CVTSD2SS MAXSD SQRTSD SUBSD MINSD MOVUPD ADDPD PUNPCKLQDQ PSRLDQ PUNPCKHQDQ SHUFPD PSHUFD CVTTSD2SI PADDQ PSHUFLW CVTDQ2PS CVTTPS2DQ MULPD MOVHPD UNPCKHPD MOVMSKPD CMPNLESD XORPD
SSE3 (SSE3)
Used by default on supported hardware.
MOVDDUP
SSSE3 (SSSE3)
Used by default on supported hardware.
PSHUFB PALIGNR
Requires passing a supported compiler/build flag (verified with targets: sandybridge, skylake, tigerlake, cascadelake, znver2). Found on Intel processors since Sandy Bridge (2011). Found on AMD processors since Bulldozer (2011).
VZEROUPPER VEXTRACTF128 VINSERTF128 VBROADCASTSS VMASKMOVPS VPERM2F128 VBROADCASTSD VPERMILPS
Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, znver2). Found on Intel processors since Haswell (2013). Found on AMD processors since Excavator (2016).
VPERM2I128 VPSLLVD VEXTRACTI128 VPBROADCASTD VINSERTI128 VPSRAVD VPMASKMOVQ VPERMQ VPBROADCASTQ VGATHERQPS VPBROADCASTW VPSRLVQ VGATHERDPS VPBROADCASTB VPGATHERQQ VPGATHERQD VPSLLVQ VPGATHERDD VPGATHERDQ VPERMD
Requires passing a supported compiler/build flag (verified with targets: skylake, tigerlake, cascadelake, znver2). Found on Intel processors since Haswell (2013). Found on AMD processors since Bulldozer (2011).
VFMADD213SS VFMADD132SS VFMADD231SS VFMADD132PS VFMADD231PS VFMADD213PS VFNMSUB231SS VFNMADD132SS VFNMSUB132SS VFNMADD231SS VFNMADD231PS VFMADD132SD VFMADD231SD VFNMADD213SS VFMSUB231SS VFMSUB231PS VFMADD231PD VFMADD213PD VFMADD132PD VFMSUB231SD VFNMSUB231PS VFNMADD132PS VFNMSUB132PS
The test / benchmark does honor compiler flag changes.
Last automated analysis: 31 January 2021
This test profile binary relies on the shared libraries libdnnl.so.2, libpthread.so.0, libm.so.6, libgomp.so.1, libc.so.6, libdl.so.2 .
Recent Test Results
1 System - 179 Benchmark Results
2 x Intel Xeon E5-2667 v3 - Microsoft Virtual Machine v7.0 - Intel 440BX
CentOS Linux 8 - 4.18.0-147.8.1.el8_1.x86_64 - GCC 8.3.1 20190507
Featured Compiler Comparison
Featured Compiler Comparison
Featured Compiler Comparison
3 Systems - 95 Benchmark Results
2 x AMD EPYC 7551 32-Core - Microsoft Virtual Machine - 226GB
CentOS Linux 8 - 4.18.0-147.8.1.el8_1.x86_64 - GCC 8.3.1 20190507
2 Systems - 86 Benchmark Results
2 x AMD EPYC 7V12 64-Core - Microsoft Virtual Machine - 450GB
CentOS Linux 8 - 4.18.0-147.8.1.el8_1.x86_64 - GCC 8.3.1 20190507
12 Systems - 453 Benchmark Results
Intel Core i9-10900K - Gigabyte Z490 AORUS MASTER - Intel Comet Lake PCH
Ubuntu 21.04 - 5.12.0-051200rc3daily20210315-generic - GNOME Shell 3.38.3
1 System - 57 Benchmark Results
AMD EPYC 7763 64-Core - Supermicro H12SSL-i v1.01 - AMD Starship
Ubuntu 20.04 - 5.12.0-051200rc6daily20210408-generic - GNOME Shell 3.36.4
3 Systems - 189 Benchmark Results
AMD EPYC 7763 64-Core - Supermicro H12SSL-i v1.01 - AMD Starship
Ubuntu 20.04 - 5.12.0-051200rc6daily20210408-generic - GNOME Shell 3.36.4
3 Systems - 117 Benchmark Results
AMD EPYC 7702 64-Core - ASRockRack EPYCD8 - AMD Starship
Ubuntu 20.04 - 5.9.0-050900rc6daily20200921-generic - GNOME Shell 3.36.4
9 Systems - 442 Benchmark Results
AMD Ryzen 5 5600X 6-Core - ASUS ROG CROSSHAIR VIII HERO - AMD Starship
Ubuntu 21.04 - 5.12.0-051200rc3daily20210315-generic - GNOME Shell 3.38.3
1 System - 18 Benchmark Results
AMD Ryzen 7 5800X 8-Core - MSI B450-A PRO MAX - AMD Starship
Ubuntu 20.10 - 5.8.0-48-generic - GNOME Shell 3.38.2
Most Popular Test Results