baseline AMD Ryzen 7 1700 Eight-Core testing with a ASUS PRIME B350-PLUS (5007 BIOS) and eVGA NVIDIA GeForce GTX 1080 Ti 11GB on Arch rolling via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2007193-NI-BASELINE202&grw .
baseline Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Display Server Display Driver OpenGL Compiler File-System Screen Resolution mainlinelinuxai AMD Ryzen 7 1700 Eight-Core @ 3.00GHz (8 Cores / 16 Threads) ASUS PRIME B350-PLUS (5007 BIOS) AMD 17h 32GB Samsung SSD 960 EVO 250GB + 3001GB Seagate ST3000DM008-2DM1 + 2000GB Western Digital WD20EFRX-68E + 2 x 2000GB Seagate ST2000DM008-2FR1 eVGA NVIDIA GeForce GTX 1080 Ti 11GB (1556/5508MHz) NVIDIA GP102 HDMI Audio U2777B Realtek RTL8111/8168/8411 Arch rolling 5.7.8-arch1-1 (x86_64) X Server 1.20.8 NVIDIA 450.57 4.6.0 GCC 10.1.0 + Clang 10.0.0 + LLVM 10.0.0 + ICC + CUDA 10.2 ext4 11520x2160 OpenBenchmarking.org - --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-isl --with-linker-hash-style=gnu - Scaling Governor: acpi-cpufreq schedutil - CPU Microcode: 0x8001138 - Python 3.8.3 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected
baseline onednn: IP Batch 1D - f32 - CPU onednn: IP Batch All - f32 - CPU onednn: IP Batch 1D - u8s8f32 - CPU onednn: IP Batch All - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch deconv_1d - f32 - CPU onednn: Deconvolution Batch deconv_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch deconv_1d - u8s8f32 - CPU onednn: Deconvolution Batch deconv_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU build-ffmpeg: Time To Compile mainlinelinuxai 10.8498 137.040 7.81265 91.6060 21.0595 12.2483 18.1529 21.8465 17.8937 16.1617 1064.854 282.763 5.96066 6.74795 91.237 OpenBenchmarking.org
oneDNN Harness: IP Batch 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: f32 - Engine: CPU mainlinelinuxai 3 6 9 12 15 SE +/- 0.12, N = 3 10.85 MIN: 8.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Batch All - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: f32 - Engine: CPU mainlinelinuxai 30 60 90 120 150 SE +/- 0.35, N = 3 137.04 MIN: 125.73 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU mainlinelinuxai 2 4 6 8 10 SE +/- 0.03612, N = 3 7.81265 MIN: 6.61 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU mainlinelinuxai 20 40 60 80 100 SE +/- 0.48, N = 3 91.61 MIN: 84.73 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU mainlinelinuxai 5 10 15 20 25 SE +/- 0.16, N = 3 21.06 MIN: 17.92 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU mainlinelinuxai 3 6 9 12 15 SE +/- 0.16, N = 4 12.25 MIN: 10.14 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU mainlinelinuxai 4 8 12 16 20 SE +/- 0.22, N = 6 18.15 MIN: 16.86 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU mainlinelinuxai 5 10 15 20 25 SE +/- 0.13, N = 3 21.85 MIN: 19.29 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU mainlinelinuxai 4 8 12 16 20 SE +/- 1.01, N = 13 17.89 MIN: 13.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU mainlinelinuxai 4 8 12 16 20 SE +/- 0.17, N = 15 16.16 MIN: 13.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU mainlinelinuxai 200 400 600 800 1000 SE +/- 60.97, N = 12 1064.85 MIN: 682.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU mainlinelinuxai 60 120 180 240 300 SE +/- 4.57, N = 3 282.76 MIN: 229.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU mainlinelinuxai 1.3411 2.6822 4.0233 5.3644 6.7055 SE +/- 0.02692, N = 3 5.96066 MIN: 4.84 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU mainlinelinuxai 2 4 6 8 10 SE +/- 0.08744, N = 5 6.74795 MIN: 5.32 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Timed FFmpeg Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed FFmpeg Compilation 4.2.2 Time To Compile mainlinelinuxai 20 40 60 80 100 SE +/- 0.22, N = 3 91.24
Phoronix Test Suite v10.8.4