oneDNN 3970X

AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1201 BIOS) and AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB on Ubuntu 20.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2103139-PTS-ONEDNN3922
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
1
March 13 2021
  35 Minutes
2
March 13 2021
  34 Minutes
3
March 13 2021
  35 Minutes
Invert Hiding All Results Option
  35 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


oneDNN 3970XProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen Resolution123AMD Ryzen Threadripper 3970X 32-Core @ 4.55GHz (32 Cores / 64 Threads)ASUS ROG ZENITH II EXTREME (1201 BIOS)AMD Starship/Matisse64GBSamsung SSD 980 PRO 500GBAMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (1750/875MHz)AMD Navi 10 HDMI AudioASUS VP28UAquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200Ubuntu 20.105.11.0-rc6-phx (x86_64) 20210203GNOME Shell 3.38.1X Server 1.20.94.6 Mesa 20.2.1 (LLVM 11.0.0)1.2.131GCC 10.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

123Result OverviewPhoronix Test Suite100%105%111%116%oneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNIP Shapes 3D - f32 - CPUC.B.S.A - u8s8f32 - CPUC.B.S.A - f32 - CPUD.B.s - f32 - CPUIP Shapes 3D - u8s8f32 - CPUR.N.N.T - u8s8f32 - CPUR.N.N.T - bf16bf16bf16 - CPUR.N.N.T - f32 - CPUR.N.N.I - f32 - CPUR.N.N.I - u8s8f32 - CPUR.N.N.I - bf16bf16bf16 - CPUIP Shapes 1D - f32 - CPUD.B.s - f32 - CPUM.M.B.S.T - u8s8f32 - CPUM.M.B.S.T - f32 - CPUIP Shapes 1D - u8s8f32 - CPUD.B.s - u8s8f32 - CPUD.B.s - u8s8f32 - CPU

oneDNN 3970Xonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU1231.187164.202550.9120110.7804945.368354.454052.693145.985281.061431.541973713.06874.5173744.82876.0870.3894863737.45877.4520.8682491.188185.124780.9116860.7965315.759774.378452.697896.474491.061341.541373732.61878.5733721.31877.3540.3883983729.65881.2160.8683811.183384.617400.9103940.7930815.433014.196622.688146.184091.061231.540743693.95883.1143696.62880.4300.3887813691.57879.5860.865572OpenBenchmarking.org

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU1230.26730.53460.80191.06921.3365SE +/- 0.00280, N = 3SE +/- 0.00052, N = 3SE +/- 0.00256, N = 31.187161.188181.18338MIN: 1.15MIN: 1.15MIN: 1.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU123246810Min: 1.18 / Avg: 1.19 / Max: 1.19Min: 1.19 / Avg: 1.19 / Max: 1.19Min: 1.18 / Avg: 1.18 / Max: 1.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU1231.15312.30623.45934.61245.7655SE +/- 0.00384, N = 3SE +/- 0.00846, N = 3SE +/- 0.01105, N = 34.202555.124784.61740MIN: 4.14MIN: 5.08MIN: 4.541. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU123246810Min: 4.2 / Avg: 4.2 / Max: 4.21Min: 5.12 / Avg: 5.12 / Max: 5.14Min: 4.6 / Avg: 4.62 / Max: 4.641. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU1230.20520.41040.61560.82081.026SE +/- 0.000579, N = 3SE +/- 0.002375, N = 3SE +/- 0.000598, N = 30.9120110.9116860.910394MIN: 0.88MIN: 0.88MIN: 0.881. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU123246810Min: 0.91 / Avg: 0.91 / Max: 0.91Min: 0.91 / Avg: 0.91 / Max: 0.92Min: 0.91 / Avg: 0.91 / Max: 0.911. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU1230.17920.35840.53760.71680.896SE +/- 0.002676, N = 3SE +/- 0.001938, N = 3SE +/- 0.003880, N = 30.7804940.7965310.793081MIN: 0.75MIN: 0.76MIN: 0.761. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU123246810Min: 0.78 / Avg: 0.78 / Max: 0.78Min: 0.79 / Avg: 0.8 / Max: 0.8Min: 0.79 / Avg: 0.79 / Max: 0.81. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU1231.29592.59183.88775.18366.4795SE +/- 0.01110, N = 3SE +/- 0.00764, N = 3SE +/- 0.00220, N = 35.368355.759775.43301MIN: 5.29MIN: 5.7MIN: 5.371. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU123246810Min: 5.35 / Avg: 5.37 / Max: 5.38Min: 5.75 / Avg: 5.76 / Max: 5.77Min: 5.43 / Avg: 5.43 / Max: 5.441. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU1231.00222.00443.00664.00885.011SE +/- 0.17691, N = 15SE +/- 0.15743, N = 12SE +/- 0.12068, N = 144.454054.378454.19662MIN: 3.44MIN: 3.31MIN: 3.421. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU123246810Min: 3.7 / Avg: 4.45 / Max: 5.54Min: 3.74 / Avg: 4.38 / Max: 5.47Min: 3.66 / Avg: 4.2 / Max: 5.511. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU1230.6071.2141.8212.4283.035SE +/- 0.00779, N = 3SE +/- 0.00568, N = 3SE +/- 0.00604, N = 32.693142.697892.68814MIN: 2.62MIN: 2.63MIN: 2.621. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU123246810Min: 2.68 / Avg: 2.69 / Max: 2.71Min: 2.69 / Avg: 2.7 / Max: 2.71Min: 2.68 / Avg: 2.69 / Max: 2.71. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU123246810SE +/- 0.02507, N = 3SE +/- 0.00991, N = 3SE +/- 0.00496, N = 35.985286.474496.18409MIN: 5.84MIN: 6.32MIN: 6.041. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU1233691215Min: 5.96 / Avg: 5.99 / Max: 6.04Min: 6.46 / Avg: 6.47 / Max: 6.49Min: 6.17 / Avg: 6.18 / Max: 6.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU1230.23880.47760.71640.95521.194SE +/- 0.00066, N = 3SE +/- 0.00120, N = 3SE +/- 0.00161, N = 31.061431.061341.06123MIN: 1.02MIN: 1.02MIN: 1.021. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU123246810Min: 1.06 / Avg: 1.06 / Max: 1.06Min: 1.06 / Avg: 1.06 / Max: 1.06Min: 1.06 / Avg: 1.06 / Max: 1.061. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU1230.34690.69381.04071.38761.7345SE +/- 0.00128, N = 3SE +/- 0.00437, N = 3SE +/- 0.00455, N = 31.541971.541371.54074MIN: 1.47MIN: 1.47MIN: 1.471. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU123246810Min: 1.54 / Avg: 1.54 / Max: 1.54Min: 1.53 / Avg: 1.54 / Max: 1.55Min: 1.53 / Avg: 1.54 / Max: 1.551. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU1238001600240032004000SE +/- 6.12, N = 3SE +/- 7.00, N = 3SE +/- 10.94, N = 33713.063732.613693.95MIN: 3696.05MIN: 3713.58MIN: 3673.081. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU1236001200180024003000Min: 3700.96 / Avg: 3713.06 / Max: 3720.66Min: 3719.81 / Avg: 3732.61 / Max: 3743.9Min: 3681.09 / Avg: 3693.95 / Max: 3715.711. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU1232004006008001000SE +/- 2.11, N = 3SE +/- 1.11, N = 3SE +/- 4.33, N = 3874.52878.57883.11MIN: 868.78MIN: 871.76MIN: 870.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU123150300450600750Min: 871.9 / Avg: 874.52 / Max: 878.7Min: 876.42 / Avg: 878.57 / Max: 880.13Min: 875.85 / Avg: 883.11 / Max: 890.831. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU1238001600240032004000SE +/- 6.87, N = 3SE +/- 5.55, N = 3SE +/- 4.59, N = 33744.823721.313696.62MIN: 3729.91MIN: 3702.5MIN: 3686.671. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU1237001400210028003500Min: 3733.37 / Avg: 3744.82 / Max: 3757.12Min: 3712.1 / Avg: 3721.31 / Max: 3731.28Min: 3690.9 / Avg: 3696.62 / Max: 3705.71. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU1232004006008001000SE +/- 1.09, N = 3SE +/- 0.59, N = 3SE +/- 0.72, N = 3876.09877.35880.43MIN: 869.71MIN: 870.9MIN: 874.811. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU123150300450600750Min: 873.92 / Avg: 876.09 / Max: 877.38Min: 876.76 / Avg: 877.35 / Max: 878.54Min: 878.98 / Avg: 880.43 / Max: 881.231. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU1230.08760.17520.26280.35040.438SE +/- 0.000488, N = 3SE +/- 0.000135, N = 3SE +/- 0.000706, N = 30.3894860.3883980.388781MIN: 0.38MIN: 0.38MIN: 0.381. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU12312345Min: 0.39 / Avg: 0.39 / Max: 0.39Min: 0.39 / Avg: 0.39 / Max: 0.39Min: 0.39 / Avg: 0.39 / Max: 0.391. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU1238001600240032004000SE +/- 19.20, N = 3SE +/- 6.90, N = 3SE +/- 12.73, N = 33737.453729.653691.57MIN: 3700.79MIN: 3716.28MIN: 3665.531. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU1236001200180024003000Min: 3703.4 / Avg: 3737.45 / Max: 3769.83Min: 3722.44 / Avg: 3729.65 / Max: 3743.45Min: 3668.45 / Avg: 3691.57 / Max: 3712.351. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU1232004006008001000SE +/- 1.06, N = 3SE +/- 0.80, N = 3SE +/- 3.26, N = 3877.45881.22879.59MIN: 872.62MIN: 876.32MIN: 869.021. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU123150300450600750Min: 875.66 / Avg: 877.45 / Max: 879.33Min: 879.62 / Avg: 881.22 / Max: 882.04Min: 874.13 / Avg: 879.59 / Max: 885.411. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU1230.19540.39080.58620.78160.977SE +/- 0.000929, N = 3SE +/- 0.000926, N = 3SE +/- 0.000441, N = 30.8682490.8683810.865572MIN: 0.82MIN: 0.82MIN: 0.811. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU123246810Min: 0.87 / Avg: 0.87 / Max: 0.87Min: 0.87 / Avg: 0.87 / Max: 0.87Min: 0.86 / Avg: 0.87 / Max: 0.871. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl