onednn 3.0 raptor lake Intel Core i9-13900K testing with a ASUS PRIME Z790-P WIFI (0602 BIOS) and eVGA NVIDIA GeForce RTX 3060 12GB on Ubuntu 22.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2212209-PTS-ONEDNN3016 .
onednn 3.0 raptor lake Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution a b c Intel Core i9-13900K @ 4.00GHz (24 Cores / 32 Threads) ASUS PRIME Z790-P WIFI (0602 BIOS) Intel Device 7a27 32GB 1000GB Western Digital WDS100T1X0E-00AFY0 eVGA NVIDIA GeForce RTX 3060 12GB Realtek ALC897 ASUS VP28U Realtek RTL8125 2.5GbE + Intel Device 7a70 Ubuntu 22.10 5.19.0-26-generic (x86_64) GNOME Shell 43.1 X Server 1.21.1.4 NVIDIA 525.60.11 4.6.0 OpenCL 3.0 CUDA 12.0.89 1.3.224 GCC 12.2.0 ext4 2560x1600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x10e - Thermald 2.5.1 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
onednn 3.0 raptor lake onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU a b c 1.71588 4.15008 0.786709 0.629303 5.76384 7.47730 3.42736 5.91006 0.955404 1.44911 2119.68 1087.51 2126.29 1098.42 1.269326 2117.94 1084.75 0.783081 1.88491 3.93573 0.863089 0.621966 5.75816 7.36674 3.42391 5.89829 0.972727 1.44938 2122.24 1073.62 2143.27 1099.64 1.179855 2148.96 1095.58 0.786177 1.86077 3.84879 0.891727 0.599391 5.75127 7.86792 3.42519 5.86806 0.937470 1.44885 2089.67 1104.46 2097.67 1095.20 1.382935 2113.69 1083.23 0.759611 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU a b c 0.4241 0.8482 1.2723 1.6964 2.1205 SE +/- 0.01122, N = 3 SE +/- 0.01809, N = 15 SE +/- 0.02132, N = 15 1.71588 1.88491 1.86077 MIN: 1.57 MIN: 1.57 MIN: 1.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU a b c 0.9338 1.8676 2.8014 3.7352 4.669 SE +/- 0.02209, N = 3 SE +/- 0.00290, N = 3 SE +/- 0.00182, N = 3 4.15008 3.93573 3.84879 MIN: 4.08 MIN: 3.88 MIN: 3.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU a b c 0.2006 0.4012 0.6018 0.8024 1.003 SE +/- 0.034022, N = 15 SE +/- 0.039890, N = 15 SE +/- 0.052959, N = 15 0.786709 0.863089 0.891727 MIN: 0.65 MIN: 0.65 MIN: 0.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b c 0.1416 0.2832 0.4248 0.5664 0.708 SE +/- 0.001158, N = 3 SE +/- 0.008037, N = 14 SE +/- 0.000382, N = 3 0.629303 0.621966 0.599391 MIN: 0.61 MIN: 0.57 MIN: 0.58 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b c 1.2969 2.5938 3.8907 5.1876 6.4845 SE +/- 0.00332, N = 3 SE +/- 0.00221, N = 3 SE +/- 0.00220, N = 3 5.76384 5.75816 5.75127 MIN: 5.54 MIN: 5.53 MIN: 5.53 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b c 2 4 6 8 10 SE +/- 0.11831, N = 15 SE +/- 0.09423, N = 15 SE +/- 0.15590, N = 15 7.47730 7.36674 7.86792 MIN: 2.84 MIN: 2.98 MIN: 2.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b c 0.7712 1.5424 2.3136 3.0848 3.856 SE +/- 0.00441, N = 3 SE +/- 0.00149, N = 3 SE +/- 0.00291, N = 3 3.42736 3.42391 3.42519 MIN: 3.38 MIN: 3.39 MIN: 3.38 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b c 1.3298 2.6596 3.9894 5.3192 6.649 SE +/- 0.00461, N = 3 SE +/- 0.00611, N = 3 SE +/- 0.00679, N = 3 5.91006 5.89829 5.86806 MIN: 5.6 MIN: 5.63 MIN: 5.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b c 0.2189 0.4378 0.6567 0.8756 1.0945 SE +/- 0.007682, N = 15 SE +/- 0.011174, N = 3 SE +/- 0.006580, N = 3 0.955404 0.972727 0.937470 MIN: 0.86 MIN: 0.86 MIN: 0.86 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU a b c 0.3261 0.6522 0.9783 1.3044 1.6305 SE +/- 0.00031, N = 3 SE +/- 0.00127, N = 3 SE +/- 0.00014, N = 3 1.44911 1.44938 1.44885 MIN: 1.44 MIN: 1.43 MIN: 1.44 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b c 500 1000 1500 2000 2500 SE +/- 24.68, N = 3 SE +/- 24.57, N = 3 SE +/- 1.48, N = 3 2119.68 2122.24 2089.67 MIN: 1981.97 MIN: 1978.3 MIN: 1979.87 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 8.11, N = 11 SE +/- 4.14, N = 3 SE +/- 15.65, N = 3 1087.51 1073.62 1104.46 MIN: 1012.76 MIN: 1013.98 MIN: 1015.48 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU a b c 500 1000 1500 2000 2500 SE +/- 21.82, N = 5 SE +/- 15.78, N = 11 SE +/- 21.19, N = 6 2126.29 2143.27 2097.67 MIN: 1980.46 MIN: 1977.9 MIN: 1978.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 11.90, N = 5 SE +/- 10.37, N = 7 SE +/- 9.64, N = 15 1098.42 1099.64 1095.20 MIN: 1013.68 MIN: 1013.55 MIN: 1014.02 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU a b c 0.3112 0.6224 0.9336 1.2448 1.556 SE +/- 0.079866, N = 15 SE +/- 0.089715, N = 12 SE +/- 0.081255, N = 15 1.269326 1.179855 1.382935 MIN: 0.74 MIN: 0.73 MIN: 0.73 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU a b c 500 1000 1500 2000 2500 SE +/- 26.18, N = 3 SE +/- 25.31, N = 4 SE +/- 21.10, N = 3 2117.94 2148.96 2113.69 MIN: 1979.58 MIN: 1978.19 MIN: 1980.31 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 12.11, N = 4 SE +/- 8.18, N = 15 SE +/- 13.87, N = 3 1084.75 1095.58 1083.23 MIN: 1014.51 MIN: 1014.02 MIN: 1015.17 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU a b c 0.1769 0.3538 0.5307 0.7076 0.8845 SE +/- 0.020828, N = 15 SE +/- 0.058336, N = 12 SE +/- 0.017020, N = 15 0.783081 0.786177 0.759611 MIN: 0.53 MIN: 0.53 MIN: 0.55 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Phoronix Test Suite v10.8.4