onednn tgl

Intel Core i5-1135G7 testing with a Dell 08642J (3.3.0 BIOS) and Intel Xe TGL GT2 3GB on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2203305-NE-ONEDNNTGL55.

onednn tglProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen ResolutionAVCIntel Core i5-1135G7 @ 4.20GHz (4 Cores / 8 Threads)Dell 08642J (3.3.0 BIOS)Intel Device a0ef8GBPC SN530 NVMe WDC 256GBIntel Xe TGL GT2 3GB (1300MHz)Realtek ALC289Intel Device a0f0Ubuntu 20.045.14.0-1029-oem (x86_64)GNOME Shell 3.36.9X Server 1.20.94.6 Mesa 21.2.61.2.182GCC 9.4.0ext43456x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_power) - CPU Microcode: 0x88 - Thermald 1.9.1 Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

onednn tglonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUAVC10.21346.616402.109942.7076625.64836.4496210.347016.569413.53188.512572.637983.142037561.254465.248685.4252.544760.139752.94283840.023.152498798.474543.091.5055010.899810.22346.722302.116452.7166225.63776.5205010.344917.180013.55738.493012.637583.150498754.584565.498986.9552.537760.192552.93564396.183.156758805.124563.711.5216610.891210.21476.545582.107762.6976625.64196.5120810.311117.123713.58928.508412.642403.145237565.214553.799364.4052.545560.242052.77904467.073.153908811.184558.691.5086110.8797OpenBenchmarking.org

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUAVC3691215SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 310.2110.2210.21MIN: 9.58MIN: 9.17MIN: 8.881. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUAVC246810SE +/- 0.07031, N = 3SE +/- 0.04745, N = 15SE +/- 0.07707, N = 46.616406.722306.54558MIN: 6.22MIN: 6.21MIN: 6.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUAVC0.47620.95241.42861.90482.381SE +/- 0.00492, N = 3SE +/- 0.00628, N = 3SE +/- 0.00482, N = 32.109942.116452.10776MIN: 1.89MIN: 1.84MIN: 1.861. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUAVC0.61121.22241.83362.44483.056SE +/- 0.01070, N = 3SE +/- 0.00630, N = 3SE +/- 0.00412, N = 32.707662.716622.69766MIN: 2.64MIN: 2.61MIN: 2.611. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUAVC612182430SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 325.6525.6425.64MIN: 25.15MIN: 25.14MIN: 25.141. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUAVC246810SE +/- 0.01617, N = 3SE +/- 0.04034, N = 3SE +/- 0.02219, N = 36.449626.520506.51208MIN: 5.95MIN: 5.93MIN: 5.911. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUAVC3691215SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 310.3510.3410.31MIN: 10.25MIN: 10.24MIN: 10.251. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUAVC48121620SE +/- 0.23, N = 3SE +/- 0.18, N = 5SE +/- 0.31, N = 1516.5717.1817.12MIN: 15.22MIN: 15.23MIN: 15.191. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUAVC3691215SE +/- 0.04, N = 3SE +/- 0.04, N = 3SE +/- 0.01, N = 313.5313.5613.59MIN: 13.4MIN: 13.37MIN: 13.421. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUAVC246810SE +/- 0.01210, N = 3SE +/- 0.00155, N = 3SE +/- 0.00998, N = 38.512578.493018.50841MIN: 8.4MIN: 8.4MIN: 8.391. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUAVC0.59451.1891.78352.3782.9725SE +/- 0.00335, N = 3SE +/- 0.00141, N = 3SE +/- 0.00064, N = 32.637982.637582.64240MIN: 2.6MIN: 2.6MIN: 2.611. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUAVC0.70891.41782.12672.83563.5445SE +/- 0.00616, N = 3SE +/- 0.00828, N = 3SE +/- 0.00779, N = 33.142033.150493.14523MIN: 3.11MIN: 3.12MIN: 3.121. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUAVC2K4K6K8K10KSE +/- 6.49, N = 3SE +/- 135.26, N = 15SE +/- 6.65, N = 37561.258754.587565.21MIN: 7509.86MIN: 7528.22MIN: 7517.111. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUAVC10002000300040005000SE +/- 65.02, N = 15SE +/- 2.96, N = 3SE +/- 6.34, N = 34465.244565.494553.79MIN: 3809.91MIN: 4519.64MIN: 4499.141. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUAVC2K4K6K8K10KSE +/- 167.45, N = 12SE +/- 11.00, N = 3SE +/- 338.92, N = 158685.428986.959364.40MIN: 7519.48MIN: 8916.06MIN: 7528.441. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUAVC1224364860SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.07, N = 352.5452.5452.55MIN: 52.31MIN: 52.3MIN: 52.31. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUAVC1326395265SE +/- 0.08, N = 3SE +/- 0.14, N = 3SE +/- 0.10, N = 360.1460.1960.24MIN: 59.39MIN: 59.35MIN: 59.371. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUAVC1224364860SE +/- 0.07, N = 3SE +/- 0.12, N = 3SE +/- 0.04, N = 352.9452.9452.78MIN: 52.65MIN: 52.59MIN: 52.551. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUAVC10002000300040005000SE +/- 2.59, N = 3SE +/- 87.30, N = 12SE +/- 65.79, N = 153840.024396.184467.07MIN: 3795.34MIN: 3804.91MIN: 3801.941. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUAVC0.71031.42062.13092.84123.5515SE +/- 0.00130, N = 3SE +/- 0.00453, N = 3SE +/- 0.00193, N = 33.152493.156753.15390MIN: 3.07MIN: 3.07MIN: 3.071. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUAVC2K4K6K8K10KSE +/- 129.93, N = 15SE +/- 129.34, N = 15SE +/- 126.15, N = 158798.478805.128811.18MIN: 7521.74MIN: 7514.46MIN: 7525.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUAVC10002000300040005000SE +/- 7.37, N = 3SE +/- 1.30, N = 3SE +/- 5.97, N = 34543.094563.714558.69MIN: 4472.94MIN: 4521.55MIN: 4497.761. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUAVC0.34240.68481.02721.36961.712SE +/- 0.00306, N = 3SE +/- 0.01522, N = 6SE +/- 0.00199, N = 31.505501.521661.50861MIN: 1.45MIN: 1.45MIN: 1.451. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUAVC3691215SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 310.9010.8910.88MIN: 10.65MIN: 10.67MIN: 10.661. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl


Phoronix Test Suite v10.8.4