onednn tgl

Intel Core i5-1135G7 testing with a Dell 08642J (3.3.0 BIOS) and Intel Xe TGL GT2 3GB on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2203305-NE-ONEDNNTGL55&grs&sro&rro.

onednn tglProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen ResolutionAVCIntel Core i5-1135G7 @ 4.20GHz (4 Cores / 8 Threads)Dell 08642J (3.3.0 BIOS)Intel Device a0ef8GBPC SN530 NVMe WDC 256GBIntel Xe TGL GT2 3GB (1300MHz)Realtek ALC289Intel Device a0f0Ubuntu 20.045.14.0-1029-oem (x86_64)GNOME Shell 3.36.9X Server 1.20.94.6 Mesa 21.2.61.2.182GCC 9.4.0ext43456x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_power) - CPU Microcode: 0x88 - Thermald 1.9.1 Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

onednn tglonednn: Recurrent Neural Network Training - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUAVC7561.256.616404465.246.449621.505502.707664543.0913.53182.1099410.347052.94283.142038.5125710.89982.6379860.13978798.473.1524910.213425.648352.54473840.028685.4216.56948754.586.722304565.496.520501.521662.716624563.7113.55732.1164510.344952.93563.150498.4930110.89122.6375860.19258805.123.1567510.223425.637752.53774396.188986.9517.18007565.216.545584553.796.512081.508612.697664558.6913.58922.1077610.311152.77903.145238.5084110.87972.6424060.24208811.183.1539010.214725.641952.54554467.079364.4017.1237OpenBenchmarking.org

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUVCA2K4K6K8K10KSE +/- 135.26, N = 15SE +/- 6.65, N = 3SE +/- 6.49, N = 38754.587565.217561.25MIN: 7528.22MIN: 7517.11MIN: 7509.861. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUVCA246810SE +/- 0.04745, N = 15SE +/- 0.07707, N = 4SE +/- 0.07031, N = 36.722306.545586.61640MIN: 6.21MIN: 6.22MIN: 6.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUVCA10002000300040005000SE +/- 2.96, N = 3SE +/- 6.34, N = 3SE +/- 65.02, N = 154565.494553.794465.24MIN: 4519.64MIN: 4499.14MIN: 3809.911. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUVCA246810SE +/- 0.04034, N = 3SE +/- 0.02219, N = 3SE +/- 0.01617, N = 36.520506.512086.44962MIN: 5.93MIN: 5.91MIN: 5.951. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUVCA0.34240.68481.02721.36961.712SE +/- 0.01522, N = 6SE +/- 0.00199, N = 3SE +/- 0.00306, N = 31.521661.508611.50550MIN: 1.45MIN: 1.45MIN: 1.451. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUVCA0.61121.22241.83362.44483.056SE +/- 0.00630, N = 3SE +/- 0.00412, N = 3SE +/- 0.01070, N = 32.716622.697662.70766MIN: 2.61MIN: 2.61MIN: 2.641. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUVCA10002000300040005000SE +/- 1.30, N = 3SE +/- 5.97, N = 3SE +/- 7.37, N = 34563.714558.694543.09MIN: 4521.55MIN: 4497.76MIN: 4472.941. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUVCA3691215SE +/- 0.04, N = 3SE +/- 0.01, N = 3SE +/- 0.04, N = 313.5613.5913.53MIN: 13.37MIN: 13.42MIN: 13.41. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUVCA0.47620.95241.42861.90482.381SE +/- 0.00628, N = 3SE +/- 0.00482, N = 3SE +/- 0.00492, N = 32.116452.107762.10994MIN: 1.84MIN: 1.86MIN: 1.891. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUVCA3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 310.3410.3110.35MIN: 10.24MIN: 10.25MIN: 10.251. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUVCA1224364860SE +/- 0.12, N = 3SE +/- 0.04, N = 3SE +/- 0.07, N = 352.9452.7852.94MIN: 52.59MIN: 52.55MIN: 52.651. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUVCA0.70891.41782.12672.83563.5445SE +/- 0.00828, N = 3SE +/- 0.00779, N = 3SE +/- 0.00616, N = 33.150493.145233.14203MIN: 3.12MIN: 3.12MIN: 3.111. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUVCA246810SE +/- 0.00155, N = 3SE +/- 0.00998, N = 3SE +/- 0.01210, N = 38.493018.508418.51257MIN: 8.4MIN: 8.39MIN: 8.41. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUVCA3691215SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 310.8910.8810.90MIN: 10.67MIN: 10.66MIN: 10.651. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUVCA0.59451.1891.78352.3782.9725SE +/- 0.00141, N = 3SE +/- 0.00064, N = 3SE +/- 0.00335, N = 32.637582.642402.63798MIN: 2.6MIN: 2.61MIN: 2.61. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUVCA1326395265SE +/- 0.14, N = 3SE +/- 0.10, N = 3SE +/- 0.08, N = 360.1960.2460.14MIN: 59.35MIN: 59.37MIN: 59.391. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUVCA2K4K6K8K10KSE +/- 129.34, N = 15SE +/- 126.15, N = 15SE +/- 129.93, N = 158805.128811.188798.47MIN: 7514.46MIN: 7525.68MIN: 7521.741. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUVCA0.71031.42062.13092.84123.5515SE +/- 0.00453, N = 3SE +/- 0.00193, N = 3SE +/- 0.00130, N = 33.156753.153903.15249MIN: 3.07MIN: 3.07MIN: 3.071. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUVCA3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 310.2210.2110.21MIN: 9.17MIN: 8.88MIN: 9.581. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUVCA612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 325.6425.6425.65MIN: 25.14MIN: 25.14MIN: 25.151. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUVCA1224364860SE +/- 0.04, N = 3SE +/- 0.07, N = 3SE +/- 0.03, N = 352.5452.5552.54MIN: 52.3MIN: 52.3MIN: 52.311. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUVCA10002000300040005000SE +/- 87.30, N = 12SE +/- 65.79, N = 15SE +/- 2.59, N = 34396.184467.073840.02MIN: 3804.91MIN: 3801.94MIN: 3795.341. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUVCA2K4K6K8K10KSE +/- 11.00, N = 3SE +/- 338.92, N = 15SE +/- 167.45, N = 128986.959364.408685.42MIN: 8916.06MIN: 7528.44MIN: 7519.481. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUVCA48121620SE +/- 0.18, N = 5SE +/- 0.31, N = 15SE +/- 0.23, N = 317.1817.1216.57MIN: 15.23MIN: 15.19MIN: 15.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl


Phoronix Test Suite v10.8.5