oneDNN 2.0 Intel Tiger Lake

Intel Core i7-1165G7 testing with a Dell 0GG9PT (1.0.3 BIOS) and Intel UHD 3GB on Ubuntu 20.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2012203-PTS-ONEDNN2092.

oneDNN 2.0 Intel Tiger LakeProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLVulkanCompilerFile-SystemScreen ResolutionRun 1Run 2Run 3Intel Core i7-1165G7 @ 4.70GHz (4 Cores / 8 Threads)Dell 0GG9PT (1.0.3 BIOS)Intel Tiger Lake-LP16GBKioxia KBG40ZNS256G NVMe 256GBIntel UHD 3GB (1300MHz)Realtek ALC289Intel Wi-Fi 6 AX201Ubuntu 20.105.10.0-051000rc7daily20201213-generic (x86_64) 20201212GNOME Shell 3.38.1X Server 1.20.9modesetting 1.20.94.6 Mesa 20.2.11.2.145GCC 10.2.0ext41920x1200OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersave - CPU Microcode: 0x60 - Thermald 2.3Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

oneDNN 2.0 Intel Tiger Lakeonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPURun 1Run 2Run 312.195666.436062.489372.8400825.56117.9205513.497014.099711.354539.647803.008012.742998928.894570.329072.2650.885659.943737.74974537.864.963668891.894532.642.7477112.055312.231416.446132.499322.8530825.76138.2922214.959214.936912.1176510.093983.169322.881299301.854793.379309.3751.965161.726242.82094799.985.351029302.704807.712.9932012.655112.289257.175602.707143.1149626.35078.6347814.111114.924312.1534610.075883.170512.892649350.264799.779315.5752.118961.154142.79314809.375.350079299.604816.403.0055312.7105OpenBenchmarking.org

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 1D - Data Type: f32 - Engine: CPURun 1Run 2Run 33691215SE +/- 0.46, N = 12SE +/- 0.49, N = 12SE +/- 0.47, N = 1212.2012.2312.29MIN: 6.66MIN: 6.4MIN: 5.951. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 3D - Data Type: f32 - Engine: CPURun 1Run 2Run 3246810SE +/- 0.05431, N = 14SE +/- 0.06535, N = 13SE +/- 0.06630, N = 76.436066.446137.17560MIN: 5.61MIN: 5.6MIN: 5.531. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPURun 1Run 2Run 30.60911.21821.82732.43643.0455SE +/- 0.03167, N = 12SE +/- 0.03239, N = 12SE +/- 0.02378, N = 122.489372.499322.70714MIN: 1.52MIN: 1.52MIN: 1.521. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPURun 1Run 2Run 30.70091.40182.10272.80363.5045SE +/- 0.04400, N = 12SE +/- 0.05466, N = 12SE +/- 0.04263, N = 122.840082.853083.11496MIN: 2.14MIN: 2.14MIN: 2.111. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPURun 1Run 2Run 3612182430SE +/- 0.58, N = 12SE +/- 0.55, N = 12SE +/- 0.50, N = 1225.5625.7626.35MIN: 18.57MIN: 18.62MIN: 18.71. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPURun 1Run 2Run 3246810SE +/- 0.11029, N = 12SE +/- 0.29805, N = 12SE +/- 0.13054, N = 127.920558.292228.63478MIN: 4.88MIN: 4.94MIN: 4.721. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPURun 1Run 2Run 348121620SE +/- 0.19, N = 12SE +/- 0.69, N = 15SE +/- 0.16, N = 1213.5014.9614.11MIN: 8.19MIN: 8.31MIN: 8.281. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPURun 1Run 2Run 348121620SE +/- 0.15, N = 4SE +/- 0.07, N = 3SE +/- 0.08, N = 314.1014.9414.92MIN: 12.17MIN: 12.23MIN: 12.231. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPURun 1Run 2Run 33691215SE +/- 0.16, N = 14SE +/- 0.17, N = 15SE +/- 0.18, N = 1411.3512.1212.15MIN: 9.25MIN: 9.29MIN: 9.291. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPURun 1Run 2Run 33691215SE +/- 0.08613, N = 13SE +/- 0.09631, N = 12SE +/- 0.08167, N = 129.6478010.0939810.07588MIN: 7.94MIN: 7.96MIN: 7.971. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPURun 1Run 2Run 30.71341.42682.14022.85363.567SE +/- 0.02778, N = 3SE +/- 0.02553, N = 3SE +/- 0.02450, N = 33.008013.169323.17051MIN: 2.49MIN: 2.53MIN: 2.61. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPURun 1Run 2Run 30.65081.30161.95242.60323.254SE +/- 0.03844, N = 15SE +/- 0.04127, N = 15SE +/- 0.04298, N = 152.742992.881292.89264MIN: 2.2MIN: 2.2MIN: 2.21. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPURun 1Run 2Run 32K4K6K8K10KSE +/- 7.25, N = 3SE +/- 5.14, N = 3SE +/- 24.45, N = 38928.899301.859350.26MIN: 8884.62MIN: 9240.8MIN: 9262.281. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPURun 1Run 2Run 310002000300040005000SE +/- 2.42, N = 3SE +/- 12.84, N = 3SE +/- 7.04, N = 34570.324793.374799.77MIN: 4526.96MIN: 4728.25MIN: 4734.031. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPURun 1Run 2Run 32K4K6K8K10KSE +/- 89.00, N = 3SE +/- 10.72, N = 3SE +/- 4.14, N = 39072.269309.379315.57MIN: 8900.41MIN: 9197.72MIN: 9244.671. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPURun 1Run 2Run 31224364860SE +/- 0.98, N = 12SE +/- 0.79, N = 12SE +/- 0.83, N = 1250.8951.9752.12MIN: 36.95MIN: 37.03MIN: 36.971. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPURun 1Run 2Run 31428425670SE +/- 0.67, N = 4SE +/- 0.62, N = 3SE +/- 0.82, N = 359.9461.7361.15MIN: 50.55MIN: 51.66MIN: 48.321. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPURun 1Run 2Run 31020304050SE +/- 0.36, N = 3SE +/- 0.54, N = 15SE +/- 0.55, N = 1537.7542.8242.79MIN: 37.26MIN: 37.09MIN: 37.071. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPURun 1Run 2Run 310002000300040005000SE +/- 1.60, N = 3SE +/- 14.63, N = 3SE +/- 4.81, N = 34537.864799.984809.37MIN: 4499.03MIN: 4723.57MIN: 4750.471. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPURun 1Run 2Run 31.2042.4083.6124.8166.02SE +/- 0.00981, N = 3SE +/- 0.02405, N = 3SE +/- 0.02155, N = 34.963665.351025.35007MIN: 3.84MIN: 3.85MIN: 3.861. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPURun 1Run 2Run 32K4K6K8K10KSE +/- 18.40, N = 3SE +/- 6.77, N = 3SE +/- 8.81, N = 38891.899302.709299.60MIN: 8839.03MIN: 9224.29MIN: 9240.311. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPURun 1Run 2Run 310002000300040005000SE +/- 3.28, N = 3SE +/- 7.95, N = 3SE +/- 21.46, N = 34532.644807.714816.40MIN: 4496.04MIN: 4750.99MIN: 4739.611. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPURun 1Run 2Run 30.67621.35242.02862.70483.381SE +/- 0.00533, N = 3SE +/- 0.01448, N = 3SE +/- 0.01768, N = 32.747712.993203.00553MIN: 2.05MIN: 2.06MIN: 2.081. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.0Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPURun 1Run 2Run 33691215SE +/- 0.01, N = 3SE +/- 0.04, N = 3SE +/- 0.04, N = 312.0612.6612.71MIN: 11.05MIN: 11.29MIN: 11.251. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread


Phoronix Test Suite v10.8.4