oneDNN 3970X AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1201 BIOS) and AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB on Ubuntu 20.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2103139-PTS-ONEDNN3922&grw&sor .
oneDNN 3970X Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution 1 2 3 AMD Ryzen Threadripper 3970X 32-Core @ 4.55GHz (32 Cores / 64 Threads) ASUS ROG ZENITH II EXTREME (1201 BIOS) AMD Starship/Matisse 64GB Samsung SSD 980 PRO 500GB AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio ASUS VP28U Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 20.10 5.11.0-rc6-phx (x86_64) 20210203 GNOME Shell 3.38.1 X Server 1.20.9 4.6 Mesa 20.2.1 (LLVM 11.0.0) 1.2.131 GCC 10.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
oneDNN 3970X onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU 1 2 3 1.18716 4.20255 0.912011 0.780494 5.36835 4.45405 2.69314 5.98528 1.06143 1.54197 3713.06 874.517 3744.82 876.087 0.389486 3737.45 877.452 0.868249 1.18818 5.12478 0.911686 0.796531 5.75977 4.37845 2.69789 6.47449 1.06134 1.54137 3732.61 878.573 3721.31 877.354 0.388398 3729.65 881.216 0.868381 1.18338 4.61740 0.910394 0.793081 5.43301 4.19662 2.68814 6.18409 1.06123 1.54074 3693.95 883.114 3696.62 880.430 0.388781 3691.57 879.586 0.865572 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 3 1 2 0.2673 0.5346 0.8019 1.0692 1.3365 SE +/- 0.00256, N = 3 SE +/- 0.00280, N = 3 SE +/- 0.00052, N = 3 1.18338 1.18716 1.18818 MIN: 1.15 MIN: 1.15 MIN: 1.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 1 3 2 1.1531 2.3062 3.4593 4.6124 5.7655 SE +/- 0.00384, N = 3 SE +/- 0.01105, N = 3 SE +/- 0.00846, N = 3 4.20255 4.61740 5.12478 MIN: 4.14 MIN: 4.54 MIN: 5.08 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.2052 0.4104 0.6156 0.8208 1.026 SE +/- 0.000598, N = 3 SE +/- 0.002375, N = 3 SE +/- 0.000579, N = 3 0.910394 0.911686 0.912011 MIN: 0.88 MIN: 0.88 MIN: 0.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 3 2 0.1792 0.3584 0.5376 0.7168 0.896 SE +/- 0.002676, N = 3 SE +/- 0.003880, N = 3 SE +/- 0.001938, N = 3 0.780494 0.793081 0.796531 MIN: 0.75 MIN: 0.76 MIN: 0.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 3 2 1.2959 2.5918 3.8877 5.1836 6.4795 SE +/- 0.01110, N = 3 SE +/- 0.00220, N = 3 SE +/- 0.00764, N = 3 5.36835 5.43301 5.75977 MIN: 5.29 MIN: 5.37 MIN: 5.7 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 3 2 1 1.0022 2.0044 3.0066 4.0088 5.011 SE +/- 0.12068, N = 14 SE +/- 0.15743, N = 12 SE +/- 0.17691, N = 15 4.19662 4.37845 4.45405 MIN: 3.42 MIN: 3.31 MIN: 3.44 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 3 1 2 0.607 1.214 1.821 2.428 3.035 SE +/- 0.00604, N = 3 SE +/- 0.00779, N = 3 SE +/- 0.00568, N = 3 2.68814 2.69314 2.69789 MIN: 2.62 MIN: 2.62 MIN: 2.63 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 3 2 2 4 6 8 10 SE +/- 0.02507, N = 3 SE +/- 0.00496, N = 3 SE +/- 0.00991, N = 3 5.98528 6.18409 6.47449 MIN: 5.84 MIN: 6.04 MIN: 6.32 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.2388 0.4776 0.7164 0.9552 1.194 SE +/- 0.00161, N = 3 SE +/- 0.00120, N = 3 SE +/- 0.00066, N = 3 1.06123 1.06134 1.06143 MIN: 1.02 MIN: 1.02 MIN: 1.02 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 3 2 1 0.3469 0.6938 1.0407 1.3876 1.7345 SE +/- 0.00455, N = 3 SE +/- 0.00437, N = 3 SE +/- 0.00128, N = 3 1.54074 1.54137 1.54197 MIN: 1.47 MIN: 1.47 MIN: 1.47 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 3 1 2 800 1600 2400 3200 4000 SE +/- 10.94, N = 3 SE +/- 6.12, N = 3 SE +/- 7.00, N = 3 3693.95 3713.06 3732.61 MIN: 3673.08 MIN: 3696.05 MIN: 3713.58 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 200 400 600 800 1000 SE +/- 2.11, N = 3 SE +/- 1.11, N = 3 SE +/- 4.33, N = 3 874.52 878.57 883.11 MIN: 868.78 MIN: 871.76 MIN: 870.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 3 2 1 800 1600 2400 3200 4000 SE +/- 4.59, N = 3 SE +/- 5.55, N = 3 SE +/- 6.87, N = 3 3696.62 3721.31 3744.82 MIN: 3686.67 MIN: 3702.5 MIN: 3729.91 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 200 400 600 800 1000 SE +/- 1.09, N = 3 SE +/- 0.59, N = 3 SE +/- 0.72, N = 3 876.09 877.35 880.43 MIN: 869.71 MIN: 870.9 MIN: 874.81 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 2 3 1 0.0876 0.1752 0.2628 0.3504 0.438 SE +/- 0.000135, N = 3 SE +/- 0.000706, N = 3 SE +/- 0.000488, N = 3 0.388398 0.388781 0.389486 MIN: 0.38 MIN: 0.38 MIN: 0.38 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 3 2 1 800 1600 2400 3200 4000 SE +/- 12.73, N = 3 SE +/- 6.90, N = 3 SE +/- 19.20, N = 3 3691.57 3729.65 3737.45 MIN: 3665.53 MIN: 3716.28 MIN: 3700.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 3 2 200 400 600 800 1000 SE +/- 1.06, N = 3 SE +/- 3.26, N = 3 SE +/- 0.80, N = 3 877.45 879.59 881.22 MIN: 872.62 MIN: 869.02 MIN: 876.32 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 3 1 2 0.1954 0.3908 0.5862 0.7816 0.977 SE +/- 0.000441, N = 3 SE +/- 0.000929, N = 3 SE +/- 0.000926, N = 3 0.865572 0.868249 0.868381 MIN: 0.81 MIN: 0.82 MIN: 0.82 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Phoronix Test Suite v10.8.4