tr onednn 3.1 AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 23.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2303314-PTS-TRONEDNN36&grt&sor .
tr onednn 3.1 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 23.04 6.2.0-18-generic (x86_64) GNOME Shell 44.0 X Server + Wayland 4.6 Mesa 22.3.6 (LLVM 15.0.7 DRM 3.49) GCC 12.2.0 ext4 3840x2160 4.6 Mesa 23.0.1 (LLVM 15.0.7 DRM 3.49) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-Pa930Z/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-Pa930Z/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301055 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
tr onednn 3.1 onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU a b c d 3.69076 6.47745 11.59889 3.48600 0.919370 10.48951 2.08690 6.48493 1.78620 0.989134 4011.41 859.098 4024.21 856.722 4042.59 844.978 2.43237 6.37674 3.60412 1.09122 0.923333 10.9027 2.07202 6.54503 1.76202 0.97877 3998.12 844.462 4014.98 858.404 4007.14 864.209 2.42598 8.33405 2.60751 1.13827 1.02783 10.2384 2.10876 6.6289 1.82592 0.964088 4018.1 857.907 4008.56 850.4 4027.35 858.755 1.57043 8.41506 2.33334 1.14615 0.967448 9.81509 2.08485 6.66997 1.74741 0.978819 4017.57 839.176 4001.77 862.304 4010.35 841.923 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU d c b a 0.8304 1.6608 2.4912 3.3216 4.152 SE +/- 0.06973, N = 12 1.57043 2.42598 2.43237 3.69076 MIN: 1.39 MIN: 2 MIN: 1.98 MIN: 2.44 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU b a c d 2 4 6 8 10 SE +/- 0.01580, N = 3 6.37674 6.47745 8.33405 8.41506 MIN: 6.23 MIN: 5.78 MIN: 8.22 MIN: 8.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU d c b a 3 6 9 12 15 SE +/- 1.62613, N = 12 2.33334 2.60751 3.60412 11.59889 MIN: 1.99 MIN: 2.14 MIN: 2.35 MIN: 1.68 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU b c d a 0.7844 1.5688 2.3532 3.1376 3.922 SE +/- 0.52175, N = 15 1.09122 1.13827 1.14615 3.48600 MIN: 0.99 MIN: 1.05 MIN: 1.04 MIN: 0.97 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b d c 0.2313 0.4626 0.6939 0.9252 1.1565 SE +/- 0.002441, N = 3 0.919370 0.923333 0.967448 1.027830 MIN: 0.85 MIN: 0.86 MIN: 0.89 MIN: 0.94 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU d c a b 3 6 9 12 15 SE +/- 0.09699, N = 7 9.81509 10.23840 10.48951 10.90270 MIN: 8.37 MIN: 8.5 MIN: 8.19 MIN: 8.81 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU b d a c 0.4745 0.949 1.4235 1.898 2.3725 SE +/- 0.01000, N = 3 2.07202 2.08485 2.08690 2.10876 MIN: 2.02 MIN: 2.03 MIN: 2.03 MIN: 2.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b c d 2 4 6 8 10 SE +/- 0.00405, N = 3 6.48493 6.54503 6.62890 6.66997 MIN: 6.38 MIN: 6.43 MIN: 6.52 MIN: 6.56 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU d b a c 0.4108 0.8216 1.2324 1.6432 2.054 SE +/- 0.01827, N = 3 1.74741 1.76202 1.78620 1.82592 MIN: 1.49 MIN: 1.54 MIN: 1.47 MIN: 1.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU c b d a 0.2226 0.4452 0.6678 0.8904 1.113 SE +/- 0.001428, N = 3 0.964088 0.978770 0.978819 0.989134 MIN: 0.9 MIN: 0.92 MIN: 0.92 MIN: 0.93 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU b a d c 900 1800 2700 3600 4500 SE +/- 6.72, N = 3 3998.12 4011.41 4017.57 4018.10 MIN: 3974.85 MIN: 3977.24 MIN: 3992.22 MIN: 3995.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU d b c a 200 400 600 800 1000 SE +/- 6.16, N = 3 839.18 844.46 857.91 859.10 MIN: 821.06 MIN: 828.46 MIN: 840.05 MIN: 831.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU d c b a 900 1800 2700 3600 4500 SE +/- 9.26, N = 3 4001.77 4008.56 4014.98 4024.21 MIN: 3979.09 MIN: 3987.06 MIN: 3992.86 MIN: 3990.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU c a b d 200 400 600 800 1000 SE +/- 8.46, N = 3 850.40 856.72 858.40 862.30 MIN: 834.24 MIN: 825.43 MIN: 842.21 MIN: 844.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU b d c a 900 1800 2700 3600 4500 SE +/- 7.92, N = 3 4007.14 4010.35 4027.35 4042.59 MIN: 3981.44 MIN: 3987.69 MIN: 4005.47 MIN: 4009.52 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU d a c b 200 400 600 800 1000 SE +/- 3.68, N = 3 841.92 844.98 858.76 864.21 MIN: 825.42 MIN: 824.01 MIN: 842.97 MIN: 847.67 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Phoronix Test Suite v10.8.5