3900XT oneDNN 2.0 AMD Ryzen 9 3900XT 12-Core testing with a MSI MEG X570 GODLIKE (MS-7C34) v1.0 (1.B3 BIOS) and AMD Radeon RX 56/64 8GB on Ubuntu 20.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2012114-PTS-3900XTON61 .
3900XT oneDNN 2.0 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Vulkan Compiler File-System Screen Resolution 1 2 3 AMD Ryzen 9 3900XT 12-Core @ 3.80GHz (12 Cores / 24 Threads) MSI MEG X570 GODLIKE (MS-7C34) v1.0 (1.B3 BIOS) AMD Starship/Matisse 16GB 500GB Seagate FireCuda 520 SSD ZP500GM30002 AMD Radeon RX 56/64 8GB (1630/945MHz) AMD Vega 10 HDMI Audio ASUS MG28U Realtek Device 2600 + Realtek Device 3000 + Intel Wi-Fi 6 AX200 Ubuntu 20.10 5.8.0-31-generic (x86_64) GNOME Shell 3.38.1 X Server 1.20.9 amdgpu 19.1.0 4.6 Mesa 20.2.1 (LLVM 11.0.0) 1.2.131 GCC 10.2.0 ext4 3840x2160 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x8701021 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
3900XT oneDNN 2.0 onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU 1 2 3 4.91756 10.9202 1.97943 0.917533 22.7079 3.60412 5.38129 25.4572 4.36389 3.54912 4173.77 2525.88 4159.19 2581.13 0.955569 4234.30 2546.38 2.26033 4.93209 10.4841 2.01142 0.913077 22.7706 3.80744 5.29598 25.1449 4.33763 3.58446 4164.13 2488.28 4256.15 2502.21 0.934456 4205.89 2552.93 2.19307 5.08005 10.32357 1.99787 0.904227 22.8405 3.69057 5.42790 25.2608 4.40262 3.63921 4198.12 2498.37 4099.21 2542.01 0.924557 4135.54 2564.79 2.24168 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 1 2 3 1.143 2.286 3.429 4.572 5.715 SE +/- 0.05292, N = 3 SE +/- 0.05679, N = 3 SE +/- 0.09243, N = 15 4.91756 4.93209 5.08005 MIN: 4.54 MIN: 4.52 MIN: 4.52 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 1 2 3 3 6 9 12 15 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 SE +/- 0.08, N = 14 10.92 10.48 10.32 MIN: 10.59 MIN: 9.99 MIN: 9.63 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.4526 0.9052 1.3578 1.8104 2.263 SE +/- 0.02675, N = 15 SE +/- 0.02633, N = 15 SE +/- 0.02721, N = 15 1.97943 2.01142 1.99787 MIN: 1.84 MIN: 1.83 MIN: 1.84 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.2064 0.4128 0.6192 0.8256 1.032 SE +/- 0.015200, N = 3 SE +/- 0.011052, N = 3 SE +/- 0.013741, N = 12 0.917533 0.913077 0.904227 MIN: 0.81 MIN: 0.81 MIN: 0.75 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 1 2 3 5 10 15 20 25 SE +/- 0.22, N = 3 SE +/- 0.25, N = 3 SE +/- 0.30, N = 4 22.71 22.77 22.84 MIN: 21 MIN: 21.58 MIN: 21.64 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 1 2 3 0.8567 1.7134 2.5701 3.4268 4.2835 SE +/- 0.00833, N = 3 SE +/- 0.03432, N = 3 SE +/- 0.03303, N = 11 3.60412 3.80744 3.69057 MIN: 3.45 MIN: 3.47 MIN: 3.47 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 1 2 3 1.2213 2.4426 3.6639 4.8852 6.1065 SE +/- 0.03493, N = 3 SE +/- 0.01863, N = 3 SE +/- 0.08314, N = 3 5.38129 5.29598 5.42790 MIN: 5.14 MIN: 5.16 MIN: 5.17 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 1 2 3 6 12 18 24 30 SE +/- 0.43, N = 3 SE +/- 0.26, N = 3 SE +/- 0.33, N = 3 25.46 25.14 25.26 MIN: 23.47 MIN: 23.6 MIN: 23.33 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.9906 1.9812 2.9718 3.9624 4.953 SE +/- 0.06495, N = 3 SE +/- 0.04011, N = 3 SE +/- 0.07560, N = 3 4.36389 4.33763 4.40262 MIN: 4.07 MIN: 4.08 MIN: 4.08 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.8188 1.6376 2.4564 3.2752 4.094 SE +/- 0.02596, N = 3 SE +/- 0.04801, N = 3 SE +/- 0.07019, N = 12 3.54912 3.58446 3.63921 MIN: 3.37 MIN: 3.38 MIN: 3.37 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 42.56, N = 3 SE +/- 26.99, N = 3 SE +/- 55.67, N = 3 4173.77 4164.13 4198.12 MIN: 3999.11 MIN: 3992.56 MIN: 3922.68 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 1 2 3 500 1000 1500 2000 2500 SE +/- 28.90, N = 6 SE +/- 18.11, N = 3 SE +/- 21.94, N = 3 2525.88 2488.28 2498.37 MIN: 2401.16 MIN: 2399.27 MIN: 2341.2 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 15.37, N = 3 SE +/- 36.36, N = 3 SE +/- 32.65, N = 3 4159.19 4256.15 4099.21 MIN: 4016.61 MIN: 4032.62 MIN: 3912.69 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 1 2 3 600 1200 1800 2400 3000 SE +/- 31.75, N = 5 SE +/- 8.38, N = 3 SE +/- 27.05, N = 3 2581.13 2502.21 2542.01 MIN: 2405.38 MIN: 2403.23 MIN: 2400.05 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 1 2 3 0.215 0.43 0.645 0.86 1.075 SE +/- 0.015503, N = 15 SE +/- 0.009665, N = 15 SE +/- 0.008671, N = 10 0.955569 0.934456 0.924557 MIN: 0.85 MIN: 0.85 MIN: 0.86 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 900 1800 2700 3600 4500 SE +/- 24.48, N = 3 SE +/- 24.69, N = 3 SE +/- 30.67, N = 3 4234.30 4205.89 4135.54 MIN: 4027.21 MIN: 4019.84 MIN: 3917.59 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 1 2 3 600 1200 1800 2400 3000 SE +/- 20.57, N = 15 SE +/- 31.82, N = 4 SE +/- 25.22, N = 3 2546.38 2552.93 2564.79 MIN: 2395.99 MIN: 2395.41 MIN: 2392.97 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 2 3 0.5086 1.0172 1.5258 2.0344 2.543 SE +/- 0.02004, N = 15 SE +/- 0.00579, N = 3 SE +/- 0.02373, N = 8 2.26033 2.19307 2.24168 MIN: 2.1 MIN: 2.12 MIN: 2.1 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread
Phoronix Test Suite v10.8.4