onednn_test_all

2 x AMD EPYC 7552 48-Core testing with a HPE ProLiant DL385 Gen10 Plus (A42 BIOS) and Matrox MGA G200eH3 on Ubuntu 20.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2010062-NE-ONEDNNTES64
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table
Only show results matching title/arguments (delimit multiple options with a comma):


onednn_test_allOpenBenchmarking.orgPhoronix Test Suite 10.2.22 x AMD EPYC 7552 48-Core @ 2.20GHz (96 Cores / 192 Threads)HPE ProLiant DL385 Gen10 Plus (A42 BIOS)AMD Starship/Matisse1008GB4 x 3201GB MO003200KWZQQ + 192010GB LOGICAL VOLUMEMatrox MGA G200eH34 x QLogic FastLinQ QL41000 10/25/40/50GbE + 4 x Intel I350Ubuntu 20.045.4.0-48-generic (x86_64)GCC 9.3.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionOnednn_test_all BenchmarksSystem Logs- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301038- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

onednn_test_allonednn: IP Batch 1D - f32 - CPUonednn: IP Batch All - f32 - CPUonednn: IP Batch 1D - u8s8f32 - CPUonednn: IP Batch All - u8s8f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch deconv_1d - f32 - CPUonednn: Deconvolution Batch deconv_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - u8s8f32 - CPUonednn: Deconvolution Batch deconv_1d - u8s8f32 - CPUonednn: Deconvolution Batch deconv_3d - u8s8f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU2 x AMD EPYC 7552 48-Core2.2049441.33233.4022210.73050.8601583.345144.514014.002392.464711.63114932.136404.9890.8306510.868965OpenBenchmarking.org

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: IP Batch 1D - Data Type: f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.49610.99221.48831.98442.4805SE +/- 0.01971, N = 112.20494MIN: 1.791. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: IP Batch All - Data Type: f32 - Engine: CPU2 x AMD EPYC 7552 48-Core918273645SE +/- 0.79, N = 1341.33MIN: 25.511. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.76551.5312.29653.0623.8275SE +/- 0.02438, N = 33.40222MIN: 2.561. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU2 x AMD EPYC 7552 48-Core3691215SE +/- 0.09, N = 310.73MIN: 9.611. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.19350.3870.58050.7740.9675SE +/- 0.012123, N = 30.860158MIN: 0.731. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.75271.50542.25813.01083.7635SE +/- 0.01778, N = 33.34514MIN: 2.651. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU2 x AMD EPYC 7552 48-Core1.01572.03143.04714.06285.0785SE +/- 0.04556, N = 34.51401MIN: 3.171. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.90051.8012.70153.6024.5025SE +/- 0.16329, N = 124.00239MIN: 3.151. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.55461.10921.66382.21842.773SE +/- 0.02700, N = 72.46471MIN: 2.191. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.3670.7341.1011.4681.835SE +/- 0.01236, N = 31.63114MIN: 1.21. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU2 x AMD EPYC 7552 48-Core2004006008001000SE +/- 11.02, N = 15932.14MIN: 767.431. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU2 x AMD EPYC 7552 48-Core90180270360450SE +/- 15.20, N = 12404.99MIN: 316.821. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.18690.37380.56070.74760.9345SE +/- 0.007317, N = 150.830651MIN: 0.621. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 1.5Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU2 x AMD EPYC 7552 48-Core0.19550.3910.58650.7820.9775SE +/- 0.003342, N = 30.868965MIN: 0.751. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl