onednn y 3990X AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 22.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2209285-PTS-ONEDNNY340&export=pdf&grs&sor .
onednn y 3990X Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution A B C AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 22.10 6.0.0-060000rc7daily20220927-generic (x86_64) GNOME Shell X Server 1.21.1.3 + Wayland 4.6 Mesa 22.1.7 (LLVM 14.0.6 DRM 3.48) 1.3.211 GCC 12.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-Wbc0TK/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-Wbc0TK/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301055 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onednn y 3990X onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU y-cruncher: 500M y-cruncher: 1B y-cruncher: 10B A B C 6.60041 2.46105 3.32556 0.963938 1.78326 8.84997 14.3554 1290.42 1311.31 1261.59 0.972299 5170.56 5.34789 1.11141 5099.39 2.08438 5084.99 6.40545 10.999 22.616 277.637 6.25739 1.61291 2.36497 0.850528 1.87072 9.04081 14.3409 1252.43 1269.62 1242.98 0.996593 5089.14 5.31681 1.10316 5075.05 2.0978 5055.59 6.41234 10.972 22.53 277.5 10.274 1.64163 3.58297 0.948633 1.92539 8.42061 13.4715 1329.45 1262.25 1284.38 0.98403 5096.85 5.35675 1.10834 5065.26 2.08861 5074.13 6.43985 11.027 22.589 277.874 OpenBenchmarking.org
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU B A C 3 6 9 12 15 6.25739 6.60041 10.27400 MIN: 6.01 MIN: 6.35 MIN: 9.98 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU B C A 0.5537 1.1074 1.6611 2.2148 2.7685 1.61291 1.64163 2.46105 MIN: 1.41 MIN: 1.44 MIN: 2.06 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU B A C 0.8062 1.6124 2.4186 3.2248 4.031 2.36497 3.32556 3.58297 MIN: 1.88 MIN: 2.29 MIN: 2.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU B C A 0.2169 0.4338 0.6507 0.8676 1.0845 0.850528 0.948633 0.963938 MIN: 0.81 MIN: 0.89 MIN: 0.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B C 0.4332 0.8664 1.2996 1.7328 2.166 1.78326 1.87072 1.92539 MIN: 1.54 MIN: 1.64 MIN: 1.58 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU C A B 3 6 9 12 15 8.42061 8.84997 9.04081 MIN: 7.12 MIN: 6.87 MIN: 7.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU C B A 4 8 12 16 20 13.47 14.34 14.36 MIN: 12.66 MIN: 13.83 MIN: 13.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU B A C 300 600 900 1200 1500 1252.43 1290.42 1329.45 MIN: 1220.99 MIN: 1257.52 MIN: 1292.26 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU C B A 300 600 900 1200 1500 1262.25 1269.62 1311.31 MIN: 1236.18 MIN: 1241.95 MIN: 1280.37 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU B A C 300 600 900 1200 1500 1242.98 1261.59 1284.38 MIN: 1218.38 MIN: 1232.39 MIN: 1250.24 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A C B 0.2242 0.4484 0.6726 0.8968 1.121 0.972299 0.984030 0.996593 MIN: 0.92 MIN: 0.93 MIN: 0.93 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU B C A 1100 2200 3300 4400 5500 5089.14 5096.85 5170.56 MIN: 5032.11 MIN: 5046.62 MIN: 5113.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU B A C 1.2053 2.4106 3.6159 4.8212 6.0265 5.31681 5.34789 5.35675 MIN: 5.12 MIN: 5.14 MIN: 5.16 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU B C A 0.2501 0.5002 0.7503 1.0004 1.2505 1.10316 1.10834 1.11141 MIN: 1.01 MIN: 1.02 MIN: 1.02 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU C B A 1100 2200 3300 4400 5500 5065.26 5075.05 5099.39 MIN: 5018.58 MIN: 5024.01 MIN: 5042.36 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A C B 0.472 0.944 1.416 1.888 2.36 2.08438 2.08861 2.09780 MIN: 2.03 MIN: 2.04 MIN: 2.04 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU B C A 1100 2200 3300 4400 5500 5055.59 5074.13 5084.99 MIN: 5000.27 MIN: 5026.36 MIN: 5029.13 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.7 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C 2 4 6 8 10 6.40545 6.41234 6.43985 MIN: 6.31 MIN: 6.31 MIN: 6.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.7.10.9513 Pi Digits To Calculate: 500M B A C 3 6 9 12 15 10.97 11.00 11.03
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.7.10.9513 Pi Digits To Calculate: 1B B C A 5 10 15 20 25 22.53 22.59 22.62
Y-Cruncher Pi Digits To Calculate: 10B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.7.10.9513 Pi Digits To Calculate: 10B B A C 60 120 180 240 300 277.50 277.64 277.87
Phoronix Test Suite v10.8.5