avx512 onednn 3.0 ryzen 9 7950x AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG CROSSHAIR X670E HERO (0805 BIOS) and AMD Radeon RX 7900 XTX 24GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2212204-PTS-AVX512ON17&sro&grs .
avx512 onednn 3.0 ryzen 9 7950x Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Compiler File-System Screen Resolution a b cc d AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG CROSSHAIR X670E HERO (0805 BIOS) AMD Device 14d8 32GB Western Digital WD_BLACK SN850X 1000GB + 2000GB AMD Radeon RX 7900 XTX 24GB (3220/1249MHz) AMD Device ab30 ASUS MG28U Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 22.04 5.15.0-56-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.3 + Wayland 4.6 Mesa 22.3.0-devel (LLVM 15.0.3 DRM 3.49) OpenCL 2.1 AMD-APP (3513.0) GCC 11.3.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate schedutil (Boost: Enabled) - CPU Microcode: 0xa601203 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
avx512 onednn 3.0 ryzen 9 7950x onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU a b cc d 0.342068 2.36413 582.378 1.71973 1.54703 5.20379 0.209725 3.24945 1.41319 3.34345 580.580 0.587832 0.436887 0.129773 1135.50 580.995 1136.91 0.421798 1135.35 5.60386 1.69724 4.36135 0.702572 0.458029 0.329774 2.55866 576.964 1.71247 1.57145 5.19629 0.212361 3.23468 1.40158 3.37554 582.957 0.586989 0.439701 0.130403 1129.74 583.066 1133.9 0.420827 1135.16 5.60673 1.69655 3.76311 0.768543 0.357162 0.368081 2.35925 569.845 1.72827 1.57079 5.22347 0.211933 3.27078 1.40127 3.3673 579.434 0.584807 0.438763 0.129929 1135.82 582.247 1133.54 0.420617 1132.81 5.61189 1.69867 2.56838 0.631368 0.357673 0.376915 2.35522 583.767 1.74523 1.54588 5.27147 0.210342 3.23747 1.39831 3.36723 583.752 0.588675 0.439201 0.129658 1134.55 581.666 1134.22 0.420822 1134.7 5.60781 1.69653 4.77297 0.718084 0.368789 OpenBenchmarking.org
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b cc d 0.0848 0.1696 0.2544 0.3392 0.424 SE +/- 0.003911, N = 15 0.342068 0.329774 0.368081 0.376915 MIN: 0.29 MIN: 0.3 MIN: 0.32 MIN: 0.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b cc d 0.5757 1.1514 1.7271 2.3028 2.8785 SE +/- 0.00179, N = 3 2.36413 2.55866 2.35925 2.35522 MIN: 2.28 MIN: 2.29 MIN: 2.28 MIN: 2.29 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU a b cc d 130 260 390 520 650 SE +/- 1.08, N = 3 582.38 576.96 569.85 583.77 MIN: 574.3 MIN: 571.43 MIN: 566.12 MIN: 578.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU a b cc d 0.3927 0.7854 1.1781 1.5708 1.9635 SE +/- 0.00347, N = 3 1.71973 1.71247 1.72827 1.74523 MIN: 1.53 MIN: 1.52 MIN: 1.53 MIN: 1.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU a b cc d 0.3536 0.7072 1.0608 1.4144 1.768 SE +/- 0.01845, N = 4 1.54703 1.57145 1.57079 1.54588 MIN: 1.42 MIN: 1.45 MIN: 1.44 MIN: 1.47 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b cc d 1.1861 2.3722 3.5583 4.7444 5.9305 SE +/- 0.00451, N = 3 5.20379 5.19629 5.22347 5.27147 MIN: 5.11 MIN: 5.13 MIN: 5.13 MIN: 5.14 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU a b cc d 0.0478 0.0956 0.1434 0.1912 0.239 SE +/- 0.001018, N = 3 0.209725 0.212361 0.211933 0.210342 MIN: 0.2 MIN: 0.2 MIN: 0.2 MIN: 0.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU a b cc d 0.7359 1.4718 2.2077 2.9436 3.6795 SE +/- 0.01389, N = 3 3.24945 3.23468 3.27078 3.23747 MIN: 3.18 MIN: 3.18 MIN: 3.21 MIN: 3.18 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU a b cc d 0.318 0.636 0.954 1.272 1.59 SE +/- 0.00885, N = 3 1.41319 1.40158 1.40127 1.39831 MIN: 1.35 MIN: 1.36 MIN: 1.36 MIN: 1.36 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU a b cc d 0.7595 1.519 2.2785 3.038 3.7975 SE +/- 0.02067, N = 3 3.34345 3.37554 3.36730 3.36723 MIN: 3.18 MIN: 3.27 MIN: 3.26 MIN: 3.26 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b cc d 130 260 390 520 650 SE +/- 1.94, N = 3 580.58 582.96 579.43 583.75 MIN: 571.58 MIN: 577.42 MIN: 573.74 MIN: 577.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU a b cc d 0.1325 0.265 0.3975 0.53 0.6625 SE +/- 0.000795, N = 3 0.587832 0.586989 0.584807 0.588675 MIN: 0.57 MIN: 0.57 MIN: 0.56 MIN: 0.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU a b cc d 0.0989 0.1978 0.2967 0.3956 0.4945 SE +/- 0.001916, N = 3 0.436887 0.439701 0.438763 0.439201 MIN: 0.42 MIN: 0.42 MIN: 0.42 MIN: 0.42 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU a b cc d 0.0293 0.0586 0.0879 0.1172 0.1465 SE +/- 0.000417, N = 3 0.129773 0.130403 0.129929 0.129658 MIN: 0.12 MIN: 0.12 MIN: 0.12 MIN: 0.12 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b cc d 200 400 600 800 1000 SE +/- 2.11, N = 3 1135.50 1129.74 1135.82 1134.55 MIN: 1125.5 MIN: 1125.41 MIN: 1130.28 MIN: 1129.16 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b cc d 130 260 390 520 650 SE +/- 1.72, N = 3 581.00 583.07 582.25 581.67 MIN: 572.88 MIN: 576.88 MIN: 576.12 MIN: 576.15 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU a b cc d 200 400 600 800 1000 SE +/- 1.06, N = 3 1136.91 1133.90 1133.54 1134.22 MIN: 1128.7 MIN: 1128.93 MIN: 1127.45 MIN: 1128.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b cc d 0.0949 0.1898 0.2847 0.3796 0.4745 SE +/- 0.000955, N = 3 0.421798 0.420827 0.420617 0.420822 MIN: 0.4 MIN: 0.4 MIN: 0.4 MIN: 0.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU a b cc d 200 400 600 800 1000 SE +/- 0.68, N = 3 1135.35 1135.16 1132.81 1134.70 MIN: 1128.23 MIN: 1129.54 MIN: 1127.61 MIN: 1129.33 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b cc d 1.2627 2.5254 3.7881 5.0508 6.3135 SE +/- 0.00957, N = 3 5.60386 5.60673 5.61189 5.60781 MIN: 5.51 MIN: 5.51 MIN: 5.52 MIN: 5.52 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU a b cc d 0.3822 0.7644 1.1466 1.5288 1.911 SE +/- 0.00009, N = 3 1.69724 1.69655 1.69867 1.69653 MIN: 1.65 MIN: 1.65 MIN: 1.65 MIN: 1.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b cc d 1.0739 2.1478 3.2217 4.2956 5.3695 SE +/- 0.31165, N = 12 4.36135 3.76311 2.56838 4.77297 MIN: 2.42 MIN: 2.45 MIN: 2.39 MIN: 2.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU a b cc d 0.1729 0.3458 0.5187 0.6916 0.8645 SE +/- 0.021497, N = 15 0.702572 0.768543 0.631368 0.718084 MIN: 0.58 MIN: 0.67 MIN: 0.58 MIN: 0.59 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU a b cc d 0.1031 0.2062 0.3093 0.4124 0.5155 SE +/- 0.022723, N = 12 0.458029 0.357162 0.357673 0.368789 MIN: 0.34 MIN: 0.34 MIN: 0.34 MIN: 0.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Phoronix Test Suite v10.8.5