onednn onnx alderlake

Intel Core i9-12900K testing with a ASUS ROG STRIX Z690-E GAMING WIFI (1003 BIOS) and Gigabyte AMD Radeon RX 6800 XT 16GB on Ubuntu 21.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2203319-NE-ONEDNNONN20&grs&sor&rro.

onednn onnx alderlakeProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen ResolutionABCDIntel Core i9-12900K @ 5.20GHz (16 Cores / 24 Threads)ASUS ROG STRIX Z690-E GAMING WIFI (1003 BIOS)Intel Device 7aa732GB1000GB Western Digital WDS100T1X0E-00AFY0 + 2000GBGigabyte AMD Radeon RX 6800 XT 16GB (2575/1000MHz)Intel Device 7ad0ASUS VP28UIntel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411Ubuntu 21.105.17.0-phx (x86_64)GNOME Shell 40.5X Server 1.20.13 + Wayland4.6 Mesa 22.1.0-devel (git-ae710f3 2022-03-26 impish-oibaf-ppa) (LLVM 13.0.1 DRM 3.46)1.3.207GCC 11.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x18 - Thermald 2.4.6Python Details- Python 3.9.7Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

onednn onnx alderlakeonnx: bertsquad-12 - CPU - Parallelonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - f32 - CPUonnx: super-resolution-10 - CPU - Parallelonnx: bertsquad-12 - CPU - Standardonnx: GPT-2 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonnx: yolov4 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Parallelonednn: IP Shapes 1D - f32 - CPUonnx: ArcFace ResNet-100 - CPU - Parallelonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonnx: fcn-resnet101-11 - CPU - Parallelonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUABCD9258.276800.8820821.082793.912134518931798919089653321.353761615.08670110356282.637243631617.552883.122881.962884.941614.546.054362.217375.908655.245361111.0336541.245679378.384110.8516271.055783.818874504916811019139553491.353321630.53664110776322.651353641612.802875.932885.962881.541611.806.046152.220085.903205.246011110.8263391.255109338.214150.8817991.057543.884874444918804618969553281.340611618.78666110796312.631803621611.272881.762882.622879.531612.976.056622.220755.905785.246481110.8429981.284329648.050840.8850571.059233.887174506926809518939553841.342961616.11669111236332.632873641614.272879.922880.352882.221614.786.047982.219305.909305.248261111.0111411.31081OpenBenchmarking.org

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: bertsquad-12 - Device: CPU - Executor: ParallelACBD2004006008001000SE +/- 8.08, N = 3SE +/- 12.55, N = 3SE +/- 10.12, N = 5SE +/- 6.62, N = 39259339379641. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUBACD246810SE +/- 0.11946, N = 14SE +/- 0.10247, N = 15SE +/- 0.09831, N = 15SE +/- 0.05693, N = 38.384118.276808.214158.05084MIN: 3.99MIN: 4.02MIN: 4.09MIN: 4.241. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPUDACB0.19910.39820.59730.79640.9955SE +/- 0.006655, N = 3SE +/- 0.000523, N = 3SE +/- 0.002403, N = 3SE +/- 0.003743, N = 30.8850570.8820820.8817990.851627MIN: 0.86MIN: 0.87MIN: 0.86MIN: 0.831. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPUADCB0.24360.48720.73080.97441.218SE +/- 0.01605, N = 15SE +/- 0.00217, N = 3SE +/- 0.00884, N = 3SE +/- 0.00867, N = 31.082791.059231.057541.05578MIN: 1.01MIN: 1.02MIN: 1.01MIN: 1.011. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUADCB0.88021.76042.64063.52084.401SE +/- 0.00073, N = 3SE +/- 0.00582, N = 3SE +/- 0.00218, N = 3SE +/- 0.00339, N = 33.912133.887173.884873.81887MIN: 3.89MIN: 3.85MIN: 3.86MIN: 3.781. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: ParallelCBDA10002000300040005000SE +/- 28.86, N = 3SE +/- 63.24, N = 3SE +/- 57.47, N = 3SE +/- 53.96, N = 344444504450645181. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: bertsquad-12 - Device: CPU - Executor: StandardBCDA2004006008001000SE +/- 3.92, N = 3SE +/- 9.18, N = 3SE +/- 2.60, N = 3SE +/- 5.01, N = 39169189269311. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: GPT-2 - Device: CPU - Executor: ParallelACDB2K4K6K8K10KSE +/- 58.12, N = 3SE +/- 40.10, N = 3SE +/- 24.57, N = 3SE +/- 47.85, N = 379898046809581101. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPU - Executor: StandardDCAB400800120016002000SE +/- 4.07, N = 3SE +/- 11.79, N = 3SE +/- 5.36, N = 3SE +/- 2.93, N = 318931896190819131. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: fcn-resnet101-11 - Device: CPU - Executor: StandardBCDA20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 1.00, N = 3SE +/- 0.33, N = 3959595961. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: StandardCABD12002400360048006000SE +/- 28.50, N = 3SE +/- 20.95, N = 3SE +/- 31.64, N = 3SE +/- 52.89, N = 353285332534953841. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUABDC0.30460.60920.91381.21841.523SE +/- 0.01690, N = 3SE +/- 0.01445, N = 3SE +/- 0.00783, N = 3SE +/- 0.00457, N = 31.353761.353321.342961.34061MIN: 1.28MIN: 1.28MIN: 1.28MIN: 1.281. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUBCDA400800120016002000SE +/- 22.13, N = 3SE +/- 3.52, N = 3SE +/- 2.17, N = 3SE +/- 3.13, N = 31630.531618.781616.111615.08MIN: 1601.77MIN: 1609.4MIN: 1608.7MIN: 1605.121. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: yolov4 - Device: CPU - Executor: StandardBCDA140280420560700SE +/- 4.18, N = 3SE +/- 2.74, N = 3SE +/- 3.53, N = 3SE +/- 2.33, N = 36646666696701. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: GPT-2 - Device: CPU - Executor: StandardABCD2K4K6K8K10KSE +/- 13.90, N = 3SE +/- 11.10, N = 3SE +/- 13.33, N = 3SE +/- 41.07, N = 3110351107711079111231. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: yolov4 - Device: CPU - Executor: ParallelACBD140280420560700SE +/- 2.75, N = 3SE +/- 2.05, N = 3SE +/- 2.03, N = 3SE +/- 1.17, N = 36286316326331. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUBADC0.59661.19321.78982.38642.983SE +/- 0.01495, N = 3SE +/- 0.00542, N = 3SE +/- 0.00281, N = 3SE +/- 0.00440, N = 32.651352.637242.632872.63180MIN: 2.51MIN: 2.5MIN: 2.51MIN: 2.491. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPU - Executor: ParallelCABD80160240320400SE +/- 0.17, N = 3SE +/- 0.17, N = 3SE +/- 0.50, N = 3SE +/- 0.29, N = 33623633643641. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUADBC30060090012001500SE +/- 5.04, N = 3SE +/- 0.67, N = 3SE +/- 1.77, N = 3SE +/- 1.60, N = 31617.551614.271612.801611.27MIN: 1606.77MIN: 1608.32MIN: 1604.84MIN: 1602.831. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUACDB6001200180024003000SE +/- 3.43, N = 3SE +/- 2.26, N = 3SE +/- 3.46, N = 3SE +/- 3.97, N = 32883.122881.762879.922875.93MIN: 2869.72MIN: 2866.35MIN: 2866.64MIN: 2865.181. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPUBCAD6001200180024003000SE +/- 2.20, N = 3SE +/- 1.28, N = 3SE +/- 6.09, N = 3SE +/- 2.59, N = 32885.962882.622881.962880.35MIN: 2873.23MIN: 2872.99MIN: 2865.7MIN: 2869.741. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUADBC6001200180024003000SE +/- 4.56, N = 3SE +/- 0.97, N = 3SE +/- 4.30, N = 3SE +/- 2.50, N = 32884.942882.222881.542879.53MIN: 2874.53MIN: 2871.87MIN: 2866.57MIN: 2866.71. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUDACB30060090012001500SE +/- 1.49, N = 3SE +/- 2.06, N = 3SE +/- 0.75, N = 3SE +/- 2.11, N = 31614.781614.541612.971611.80MIN: 1607.96MIN: 1608.19MIN: 1606.37MIN: 1603.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUCADB246810SE +/- 0.00242, N = 3SE +/- 0.00353, N = 3SE +/- 0.00248, N = 3SE +/- 0.00195, N = 36.056626.054366.047986.04615MIN: 5.9MIN: 5.99MIN: 5.95MIN: 5.971. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUCBDA0.49970.99941.49911.99882.4985SE +/- 0.00145, N = 3SE +/- 0.00095, N = 3SE +/- 0.00390, N = 3SE +/- 0.00202, N = 32.220752.220082.219302.21737MIN: 2.2MIN: 2.19MIN: 2.19MIN: 2.21. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUDACB1.32962.65923.98885.31846.648SE +/- 0.00197, N = 3SE +/- 0.00465, N = 3SE +/- 0.00383, N = 3SE +/- 0.00395, N = 35.909305.908655.905785.90320MIN: 5.75MIN: 5.79MIN: 5.82MIN: 5.831. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUDCBA1.18092.36183.54274.72365.9045SE +/- 0.00317, N = 3SE +/- 0.00169, N = 3SE +/- 0.00260, N = 3SE +/- 0.00223, N = 35.248265.246485.246015.24536MIN: 5.17MIN: 5.19MIN: 5.17MIN: 5.181. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: fcn-resnet101-11 - Device: CPU - Executor: ParallelABCD20406080100SE +/- 0.60, N = 3SE +/- 0.29, N = 3SE +/- 0.17, N = 3SE +/- 0.17, N = 31111111111111. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPUADCB0.23260.46520.69780.93041.163SE +/- 0.086658, N = 12SE +/- 0.091776, N = 12SE +/- 0.006474, N = 3SE +/- 0.001870, N = 31.0336541.0111410.8429980.826339MIN: 0.71MIN: 0.7MIN: 0.72MIN: 0.711. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.6Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUDCBA0.29490.58980.88471.17961.4745SE +/- 0.02113, N = 15SE +/- 0.01243, N = 3SE +/- 0.01551, N = 3SE +/- 0.01358, N = 31.310811.284321.255101.24567MIN: 1.17MIN: 1.19MIN: 1.17MIN: 1.171. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread


Phoronix Test Suite v10.8.5