onednn 3.0 threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 22.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2212207-PTS-ONEDNN3042&grs .
onednn 3.0 threadripper Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b cc AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 22.10 6.1.0-rc8-phx-mglru (x86_64) GNOME Shell 43.0 X Server 1.21.1.4 + Wayland 4.6 Mesa 22.2.1 (LLVM 15.0.2 DRM 3.49) 1.3.224 GCC 12.2.0 + LLVM 15.0.2 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301055 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onednn 3.0 threadripper onednn: IP Shapes 3D - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 1D - f32 - CPU a b cc 5.52121 1.03983 1.72926 1.34962 0.995989 1281.82 5273.92 9.82269 6.48281 5197.15 2.08900 5182.24 1291.22 12.7627 1358.93 8.64383 2.44556 3.92526 7.5412 1.10805 1.74884 1.36747 0.975595 1324.29 5109.34 9.88978 6.58094 5182.4 2.09561 5208.02 1291.11 10.6197 1314.16 10.2806 2.62054 2.03399 7.6897 0.941291 1.6067 1.4283 0.957802 1322.57 5269.95 10.1148 6.67325 5246.9 2.07244 5236.44 1297.12 13.8974 1268.47 10.4148 2.42089 1.65081 OpenBenchmarking.org
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU a b cc 2 4 6 8 10 SE +/- 0.02020, N = 3 5.52121 7.54120 7.68970 MIN: 5.21 MIN: 7.39 MIN: 7.56 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b cc 0.2493 0.4986 0.7479 0.9972 1.2465 SE +/- 0.014383, N = 3 1.039830 1.108050 0.941291 MIN: 0.95 MIN: 1.04 MIN: 0.88 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b cc 0.3935 0.787 1.1805 1.574 1.9675 SE +/- 0.01830, N = 15 1.72926 1.74884 1.60670 MIN: 1.48 MIN: 1.52 MIN: 1.46 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b cc 0.3214 0.6428 0.9642 1.2856 1.607 SE +/- 0.01605, N = 12 1.34962 1.36747 1.42830 MIN: 1.05 MIN: 1.1 MIN: 1.16 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU a b cc 0.2241 0.4482 0.6723 0.8964 1.1205 SE +/- 0.002014, N = 3 0.995989 0.975595 0.957802 MIN: 0.93 MIN: 0.92 MIN: 0.92 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b cc 300 600 900 1200 1500 SE +/- 14.13, N = 4 1281.82 1324.29 1322.57 MIN: 1216.85 MIN: 1292.79 MIN: 1292.36 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU a b cc 1100 2200 3300 4400 5500 SE +/- 13.14, N = 3 5273.92 5109.34 5269.95 MIN: 5193.2 MIN: 5051.41 MIN: 5209.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b cc 3 6 9 12 15 SE +/- 0.08002, N = 3 9.82269 9.88978 10.11480 MIN: 8.11 MIN: 8.3 MIN: 8.29 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b cc 2 4 6 8 10 SE +/- 0.00429, N = 3 6.48281 6.58094 6.67325 MIN: 6.36 MIN: 6.49 MIN: 6.56 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b cc 1100 2200 3300 4400 5500 SE +/- 55.54, N = 5 5197.15 5182.40 5246.90 MIN: 4918.58 MIN: 5129.76 MIN: 5190.59 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b cc 0.4715 0.943 1.4145 1.886 2.3575 SE +/- 0.00438, N = 3 2.08900 2.09561 2.07244 MIN: 2.03 MIN: 2.04 MIN: 2.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU a b cc 1100 2200 3300 4400 5500 SE +/- 43.75, N = 3 5182.24 5208.02 5236.44 MIN: 5039.08 MIN: 5148.07 MIN: 5173.95 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b cc 300 600 900 1200 1500 SE +/- 17.88, N = 3 1291.22 1291.11 1297.12 MIN: 1243.36 MIN: 1261.39 MIN: 1258.04 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU a b cc 4 8 12 16 20 SE +/- 0.37, N = 12 12.76 10.62 13.90 MIN: 9.89 MIN: 9.93 MIN: 13.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU a b cc 300 600 900 1200 1500 SE +/- 76.68, N = 15 1358.93 1314.16 1268.47 MIN: 1193.45 MIN: 1256.02 MIN: 1237.01 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU a b cc 3 6 9 12 15 SE +/- 0.48813, N = 15 8.64383 10.28060 10.41480 MIN: 5.39 MIN: 9.92 MIN: 10.11 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU a b cc 0.5896 1.1792 1.7688 2.3584 2.948 SE +/- 0.05461, N = 15 2.44556 2.62054 2.42089 MIN: 1.4 MIN: 2.09 MIN: 1.86 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU a b cc 0.8832 1.7664 2.6496 3.5328 4.416 SE +/- 1.76988, N = 12 3.92526 2.03399 1.65081 MIN: 1.42 MIN: 1.68 MIN: 1.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Phoronix Test Suite v10.8.5