avx512 onednn 3.0 ryzen 9 7950x AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG CROSSHAIR X670E HERO (0805 BIOS) and AMD Radeon RX 7900 XTX 24GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2212204-PTS-AVX512ON17&grw&sor .
avx512 onednn 3.0 ryzen 9 7950x Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Compiler File-System Screen Resolution a b cc d AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG CROSSHAIR X670E HERO (0805 BIOS) AMD Device 14d8 32GB Western Digital WD_BLACK SN850X 1000GB + 2000GB AMD Radeon RX 7900 XTX 24GB (3220/1249MHz) AMD Device ab30 ASUS MG28U Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 22.04 5.15.0-56-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.3 + Wayland 4.6 Mesa 22.3.0-devel (LLVM 15.0.3 DRM 3.49) OpenCL 2.1 AMD-APP (3513.0) GCC 11.3.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate schedutil (Boost: Enabled) - CPU Microcode: 0xa601203 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
avx512 onednn 3.0 ryzen 9 7950x onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU a b cc d 1.71973 3.24945 0.458029 0.342068 0.702572 1.54703 5.60386 4.36135 2.36413 5.20379 0.421798 0.587832 1135.50 580.995 1136.91 1.69724 3.34345 1.41319 580.580 0.436887 1135.35 582.378 0.129773 0.209725 1.71247 3.23468 0.357162 0.329774 0.768543 1.57145 5.60673 3.76311 2.55866 5.19629 0.420827 0.586989 1129.74 583.066 1133.9 1.69655 3.37554 1.40158 582.957 0.439701 1135.16 576.964 0.130403 0.212361 1.72827 3.27078 0.357673 0.368081 0.631368 1.57079 5.61189 2.56838 2.35925 5.22347 0.420617 0.584807 1135.82 582.247 1133.54 1.69867 3.3673 1.40127 579.434 0.438763 1132.81 569.845 0.129929 0.211933 1.74523 3.23747 0.368789 0.376915 0.718084 1.54588 5.60781 4.77297 2.35522 5.27147 0.420822 0.588675 1134.55 581.666 1134.22 1.69653 3.36723 1.39831 583.752 0.439201 1134.7 583.767 0.129658 0.210342 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU b a cc d 0.3927 0.7854 1.1781 1.5708 1.9635 SE +/- 0.00347, N = 3 1.71247 1.71973 1.72827 1.74523 MIN: 1.52 MIN: 1.53 MIN: 1.53 MIN: 1.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU b d a cc 0.7359 1.4718 2.2077 2.9436 3.6795 SE +/- 0.01389, N = 3 3.23468 3.23747 3.24945 3.27078 MIN: 3.18 MIN: 3.18 MIN: 3.18 MIN: 3.21 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU b cc d a 0.1031 0.2062 0.3093 0.4124 0.5155 SE +/- 0.022723, N = 12 0.357162 0.357673 0.368789 0.458029 MIN: 0.34 MIN: 0.34 MIN: 0.35 MIN: 0.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU b a cc d 0.0848 0.1696 0.2544 0.3392 0.424 SE +/- 0.003911, N = 15 0.329774 0.342068 0.368081 0.376915 MIN: 0.3 MIN: 0.29 MIN: 0.32 MIN: 0.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU cc a d b 0.1729 0.3458 0.5187 0.6916 0.8645 SE +/- 0.021497, N = 15 0.631368 0.702572 0.718084 0.768543 MIN: 0.58 MIN: 0.58 MIN: 0.59 MIN: 0.67 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU d a cc b 0.3536 0.7072 1.0608 1.4144 1.768 SE +/- 0.01845, N = 4 1.54588 1.54703 1.57079 1.57145 MIN: 1.47 MIN: 1.42 MIN: 1.44 MIN: 1.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b d cc 1.2627 2.5254 3.7881 5.0508 6.3135 SE +/- 0.00957, N = 3 5.60386 5.60673 5.60781 5.61189 MIN: 5.51 MIN: 5.51 MIN: 5.52 MIN: 5.52 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU cc b a d 1.0739 2.1478 3.2217 4.2956 5.3695 SE +/- 0.31165, N = 12 2.56838 3.76311 4.36135 4.77297 MIN: 2.39 MIN: 2.45 MIN: 2.42 MIN: 2.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU d cc a b 0.5757 1.1514 1.7271 2.3028 2.8785 SE +/- 0.00179, N = 3 2.35522 2.35925 2.36413 2.55866 MIN: 2.29 MIN: 2.28 MIN: 2.28 MIN: 2.29 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU b a cc d 1.1861 2.3722 3.5583 4.7444 5.9305 SE +/- 0.00451, N = 3 5.19629 5.20379 5.22347 5.27147 MIN: 5.13 MIN: 5.11 MIN: 5.13 MIN: 5.14 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU cc d b a 0.0949 0.1898 0.2847 0.3796 0.4745 SE +/- 0.000955, N = 3 0.420617 0.420822 0.420827 0.421798 MIN: 0.4 MIN: 0.4 MIN: 0.4 MIN: 0.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU cc b a d 0.1325 0.265 0.3975 0.53 0.6625 SE +/- 0.000795, N = 3 0.584807 0.586989 0.587832 0.588675 MIN: 0.56 MIN: 0.57 MIN: 0.57 MIN: 0.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU b d a cc 200 400 600 800 1000 SE +/- 2.11, N = 3 1129.74 1134.55 1135.50 1135.82 MIN: 1125.41 MIN: 1129.16 MIN: 1125.5 MIN: 1130.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a d cc b 130 260 390 520 650 SE +/- 1.72, N = 3 581.00 581.67 582.25 583.07 MIN: 572.88 MIN: 576.15 MIN: 576.12 MIN: 576.88 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU cc b d a 200 400 600 800 1000 SE +/- 1.06, N = 3 1133.54 1133.90 1134.22 1136.91 MIN: 1127.45 MIN: 1128.93 MIN: 1128.64 MIN: 1128.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU d b a cc 0.3822 0.7644 1.1466 1.5288 1.911 SE +/- 0.00009, N = 3 1.69653 1.69655 1.69724 1.69867 MIN: 1.65 MIN: 1.65 MIN: 1.65 MIN: 1.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU a d cc b 0.7595 1.519 2.2785 3.038 3.7975 SE +/- 0.02067, N = 3 3.34345 3.36723 3.36730 3.37554 MIN: 3.18 MIN: 3.26 MIN: 3.26 MIN: 3.27 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU d cc b a 0.318 0.636 0.954 1.272 1.59 SE +/- 0.00885, N = 3 1.39831 1.40127 1.40158 1.41319 MIN: 1.36 MIN: 1.36 MIN: 1.36 MIN: 1.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU cc a b d 130 260 390 520 650 SE +/- 1.94, N = 3 579.43 580.58 582.96 583.75 MIN: 573.74 MIN: 571.58 MIN: 577.42 MIN: 577.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU a cc d b 0.0989 0.1978 0.2967 0.3956 0.4945 SE +/- 0.001916, N = 3 0.436887 0.438763 0.439201 0.439701 MIN: 0.42 MIN: 0.42 MIN: 0.42 MIN: 0.42 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU cc d b a 200 400 600 800 1000 SE +/- 0.68, N = 3 1132.81 1134.70 1135.16 1135.35 MIN: 1127.61 MIN: 1129.33 MIN: 1129.54 MIN: 1128.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU cc b a d 130 260 390 520 650 SE +/- 1.08, N = 3 569.85 576.96 582.38 583.77 MIN: 566.12 MIN: 571.43 MIN: 574.3 MIN: 578.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU d a cc b 0.0293 0.0586 0.0879 0.1172 0.1465 SE +/- 0.000417, N = 3 0.129658 0.129773 0.129929 0.130403 MIN: 0.12 MIN: 0.12 MIN: 0.12 MIN: 0.12 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU a d cc b 0.0478 0.0956 0.1434 0.1912 0.239 SE +/- 0.001018, N = 3 0.209725 0.210342 0.211933 0.212361 MIN: 0.2 MIN: 0.2 MIN: 0.2 MIN: 0.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Phoronix Test Suite v10.8.5