oneDNN 3970X

AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1201 BIOS) and AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB on Ubuntu 20.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2103139-PTS-ONEDNN3922
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
1
March 13 2021
  35 Minutes
2
March 13 2021
  34 Minutes
3
March 13 2021
  35 Minutes
Invert Hiding All Results Option
  35 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


oneDNN 3970XProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen Resolution123AMD Ryzen Threadripper 3970X 32-Core @ 4.55GHz (32 Cores / 64 Threads)ASUS ROG ZENITH II EXTREME (1201 BIOS)AMD Starship/Matisse64GBSamsung SSD 980 PRO 500GBAMD Radeon RX 5600 OEM/5600 XT / 5700/5700 8GB (1750/875MHz)AMD Navi 10 HDMI AudioASUS VP28UAquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200Ubuntu 20.105.11.0-rc6-phx (x86_64) 20210203GNOME Shell 3.38.1X Server 1.20.94.6 Mesa 20.2.1 (LLVM 11.0.0)1.2.131GCC 10.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

123Result OverviewPhoronix Test Suite100%105%111%116%oneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNoneDNNIP Shapes 3D - f32 - CPUC.B.S.A - u8s8f32 - CPUC.B.S.A - f32 - CPUD.B.s - f32 - CPUIP Shapes 3D - u8s8f32 - CPUR.N.N.T - u8s8f32 - CPUR.N.N.T - bf16bf16bf16 - CPUR.N.N.T - f32 - CPUR.N.N.I - f32 - CPUR.N.N.I - u8s8f32 - CPUR.N.N.I - bf16bf16bf16 - CPUIP Shapes 1D - f32 - CPUD.B.s - f32 - CPUM.M.B.S.T - u8s8f32 - CPUM.M.B.S.T - f32 - CPUIP Shapes 1D - u8s8f32 - CPUD.B.s - u8s8f32 - CPUD.B.s - u8s8f32 - CPU

oneDNN 3970Xonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - u8s8f32 - CPUonednn: IP Shapes 3D - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU1231.187164.202550.9120110.7804945.368354.454052.693145.985281.061431.541973713.06874.5173744.82876.0870.3894863737.45877.4520.8682491.188185.124780.9116860.7965315.759774.378452.697896.474491.061341.541373732.61878.5733721.31877.3540.3883983729.65881.2160.8683811.183384.617400.9103940.7930815.433014.196622.688146.184091.061231.540743693.95883.1143696.62880.4300.3887813691.57879.5860.865572OpenBenchmarking.org

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU3120.26730.53460.80191.06921.3365SE +/- 0.00256, N = 3SE +/- 0.00280, N = 3SE +/- 0.00052, N = 31.183381.187161.18818MIN: 1.15MIN: 1.15MIN: 1.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU312246810Min: 1.18 / Avg: 1.18 / Max: 1.19Min: 1.18 / Avg: 1.19 / Max: 1.19Min: 1.19 / Avg: 1.19 / Max: 1.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU1321.15312.30623.45934.61245.7655SE +/- 0.00384, N = 3SE +/- 0.01105, N = 3SE +/- 0.00846, N = 34.202554.617405.12478MIN: 4.14MIN: 4.54MIN: 5.081. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU132246810Min: 4.2 / Avg: 4.2 / Max: 4.21Min: 4.6 / Avg: 4.62 / Max: 4.64Min: 5.12 / Avg: 5.12 / Max: 5.141. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU3210.20520.41040.61560.82081.026SE +/- 0.000598, N = 3SE +/- 0.002375, N = 3SE +/- 0.000579, N = 30.9103940.9116860.912011MIN: 0.88MIN: 0.88MIN: 0.881. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU321246810Min: 0.91 / Avg: 0.91 / Max: 0.91Min: 0.91 / Avg: 0.91 / Max: 0.92Min: 0.91 / Avg: 0.91 / Max: 0.911. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU1320.17920.35840.53760.71680.896SE +/- 0.002676, N = 3SE +/- 0.003880, N = 3SE +/- 0.001938, N = 30.7804940.7930810.796531MIN: 0.75MIN: 0.76MIN: 0.761. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU132246810Min: 0.78 / Avg: 0.78 / Max: 0.78Min: 0.79 / Avg: 0.79 / Max: 0.8Min: 0.79 / Avg: 0.8 / Max: 0.81. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU1321.29592.59183.88775.18366.4795SE +/- 0.01110, N = 3SE +/- 0.00220, N = 3SE +/- 0.00764, N = 35.368355.433015.75977MIN: 5.29MIN: 5.37MIN: 5.71. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU132246810Min: 5.35 / Avg: 5.37 / Max: 5.38Min: 5.43 / Avg: 5.43 / Max: 5.44Min: 5.75 / Avg: 5.76 / Max: 5.771. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU3211.00222.00443.00664.00885.011SE +/- 0.12068, N = 14SE +/- 0.15743, N = 12SE +/- 0.17691, N = 154.196624.378454.45405MIN: 3.42MIN: 3.31MIN: 3.441. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU321246810Min: 3.66 / Avg: 4.2 / Max: 5.51Min: 3.74 / Avg: 4.38 / Max: 5.47Min: 3.7 / Avg: 4.45 / Max: 5.541. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU3120.6071.2141.8212.4283.035SE +/- 0.00604, N = 3SE +/- 0.00779, N = 3SE +/- 0.00568, N = 32.688142.693142.69789MIN: 2.62MIN: 2.62MIN: 2.631. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU312246810Min: 2.68 / Avg: 2.69 / Max: 2.7Min: 2.68 / Avg: 2.69 / Max: 2.71Min: 2.69 / Avg: 2.7 / Max: 2.711. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU132246810SE +/- 0.02507, N = 3SE +/- 0.00496, N = 3SE +/- 0.00991, N = 35.985286.184096.47449MIN: 5.84MIN: 6.04MIN: 6.321. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU1323691215Min: 5.96 / Avg: 5.99 / Max: 6.04Min: 6.17 / Avg: 6.18 / Max: 6.19Min: 6.46 / Avg: 6.47 / Max: 6.491. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU3210.23880.47760.71640.95521.194SE +/- 0.00161, N = 3SE +/- 0.00120, N = 3SE +/- 0.00066, N = 31.061231.061341.06143MIN: 1.02MIN: 1.02MIN: 1.021. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU321246810Min: 1.06 / Avg: 1.06 / Max: 1.06Min: 1.06 / Avg: 1.06 / Max: 1.06Min: 1.06 / Avg: 1.06 / Max: 1.061. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU3210.34690.69381.04071.38761.7345SE +/- 0.00455, N = 3SE +/- 0.00437, N = 3SE +/- 0.00128, N = 31.540741.541371.54197MIN: 1.47MIN: 1.47MIN: 1.471. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU321246810Min: 1.53 / Avg: 1.54 / Max: 1.55Min: 1.53 / Avg: 1.54 / Max: 1.55Min: 1.54 / Avg: 1.54 / Max: 1.541. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU3128001600240032004000SE +/- 10.94, N = 3SE +/- 6.12, N = 3SE +/- 7.00, N = 33693.953713.063732.61MIN: 3673.08MIN: 3696.05MIN: 3713.581. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU3126001200180024003000Min: 3681.09 / Avg: 3693.95 / Max: 3715.71Min: 3700.96 / Avg: 3713.06 / Max: 3720.66Min: 3719.81 / Avg: 3732.61 / Max: 3743.91. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU1232004006008001000SE +/- 2.11, N = 3SE +/- 1.11, N = 3SE +/- 4.33, N = 3874.52878.57883.11MIN: 868.78MIN: 871.76MIN: 870.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU123150300450600750Min: 871.9 / Avg: 874.52 / Max: 878.7Min: 876.42 / Avg: 878.57 / Max: 880.13Min: 875.85 / Avg: 883.11 / Max: 890.831. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU3218001600240032004000SE +/- 4.59, N = 3SE +/- 5.55, N = 3SE +/- 6.87, N = 33696.623721.313744.82MIN: 3686.67MIN: 3702.5MIN: 3729.911. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU3217001400210028003500Min: 3690.9 / Avg: 3696.62 / Max: 3705.7Min: 3712.1 / Avg: 3721.31 / Max: 3731.28Min: 3733.37 / Avg: 3744.82 / Max: 3757.121. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU1232004006008001000SE +/- 1.09, N = 3SE +/- 0.59, N = 3SE +/- 0.72, N = 3876.09877.35880.43MIN: 869.71MIN: 870.9MIN: 874.811. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU123150300450600750Min: 873.92 / Avg: 876.09 / Max: 877.38Min: 876.76 / Avg: 877.35 / Max: 878.54Min: 878.98 / Avg: 880.43 / Max: 881.231. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU2310.08760.17520.26280.35040.438SE +/- 0.000135, N = 3SE +/- 0.000706, N = 3SE +/- 0.000488, N = 30.3883980.3887810.389486MIN: 0.38MIN: 0.38MIN: 0.381. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU23112345Min: 0.39 / Avg: 0.39 / Max: 0.39Min: 0.39 / Avg: 0.39 / Max: 0.39Min: 0.39 / Avg: 0.39 / Max: 0.391. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU3218001600240032004000SE +/- 12.73, N = 3SE +/- 6.90, N = 3SE +/- 19.20, N = 33691.573729.653737.45MIN: 3665.53MIN: 3716.28MIN: 3700.791. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU3216001200180024003000Min: 3668.45 / Avg: 3691.57 / Max: 3712.35Min: 3722.44 / Avg: 3729.65 / Max: 3743.45Min: 3703.4 / Avg: 3737.45 / Max: 3769.831. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU1322004006008001000SE +/- 1.06, N = 3SE +/- 3.26, N = 3SE +/- 0.80, N = 3877.45879.59881.22MIN: 872.62MIN: 869.02MIN: 876.321. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU132150300450600750Min: 875.66 / Avg: 877.45 / Max: 879.33Min: 874.13 / Avg: 879.59 / Max: 885.41Min: 879.62 / Avg: 881.22 / Max: 882.041. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU3120.19540.39080.58620.78160.977SE +/- 0.000441, N = 3SE +/- 0.000929, N = 3SE +/- 0.000926, N = 30.8655720.8682490.868381MIN: 0.81MIN: 0.82MIN: 0.821. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU312246810Min: 0.86 / Avg: 0.87 / Max: 0.87Min: 0.87 / Avg: 0.87 / Max: 0.87Min: 0.87 / Avg: 0.87 / Max: 0.871. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl