onednn_test_all 2 x AMD EPYC 7552 48-Core testing with a HPE ProLiant DL385 Gen10 Plus (A42 BIOS) and Matrox MGA G200eH3 on Ubuntu 20.04 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2010062-NE-ONEDNNTES64 2 x AMD EPYC 7552 48-Core Processor: 2 x AMD EPYC 7552 48-Core @ 2.20GHz (96 Cores / 192 Threads), Motherboard: HPE ProLiant DL385 Gen10 Plus (A42 BIOS), Chipset: AMD Starship/Matisse, Memory: 1008GB, Disk: 4 x 3201GB MO003200KWZQQ + 192010GB LOGICAL VOLUME, Graphics: Matrox MGA G200eH3, Network: 4 x QLogic FastLinQ QL41000 10/25/40/50GbE + 4 x Intel I350
OS: Ubuntu 20.04, Kernel: 5.4.0-48-generic (x86_64), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1024x768
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301038Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
onednn_test_all OpenBenchmarking.org Phoronix Test Suite 2 x AMD EPYC 7552 48-Core @ 2.20GHz (96 Cores / 192 Threads) HPE ProLiant DL385 Gen10 Plus (A42 BIOS) AMD Starship/Matisse 1008GB 4 x 3201GB MO003200KWZQQ + 192010GB LOGICAL VOLUME Matrox MGA G200eH3 4 x QLogic FastLinQ QL41000 10/25/40/50GbE + 4 x Intel I350 Ubuntu 20.04 5.4.0-48-generic (x86_64) GCC 9.3.0 ext4 1024x768 Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution Onednn_test_all Benchmarks System Logs - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301038 - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
onednn_test_all onednn: IP Batch 1D - f32 - CPU onednn: IP Batch All - f32 - CPU onednn: IP Batch 1D - u8s8f32 - CPU onednn: IP Batch All - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch deconv_1d - f32 - CPU onednn: Deconvolution Batch deconv_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch deconv_1d - u8s8f32 - CPU onednn: Deconvolution Batch deconv_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU 2 x AMD EPYC 7552 48-Core 2.20494 41.3323 3.40222 10.7305 0.860158 3.34514 4.51401 4.00239 2.46471 1.63114 932.136 404.989 0.830651 0.868965 OpenBenchmarking.org
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the oneAPI initiative. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.4961 0.9922 1.4883 1.9844 2.4805 SE +/- 0.01971, N = 11 2.20494 MIN: 1.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 9 18 27 36 45 SE +/- 0.79, N = 13 41.33 MIN: 25.51 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch 1D - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.7655 1.531 2.2965 3.062 3.8275 SE +/- 0.02438, N = 3 3.40222 MIN: 2.56 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: IP Batch All - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 3 6 9 12 15 SE +/- 0.09, N = 3 10.73 MIN: 9.61 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.1935 0.387 0.5805 0.774 0.9675 SE +/- 0.012123, N = 3 0.860158 MIN: 0.73 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.7527 1.5054 2.2581 3.0108 3.7635 SE +/- 0.01778, N = 3 3.34514 MIN: 2.65 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 1.0157 2.0314 3.0471 4.0628 5.0785 SE +/- 0.04556, N = 3 4.51401 MIN: 3.17 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.9005 1.801 2.7015 3.602 4.5025 SE +/- 0.16329, N = 12 4.00239 MIN: 3.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_1d - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.5546 1.1092 1.6638 2.2184 2.773 SE +/- 0.02700, N = 7 2.46471 MIN: 2.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Deconvolution Batch deconv_3d - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.367 0.734 1.101 1.468 1.835 SE +/- 0.01236, N = 3 1.63114 MIN: 1.2 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 200 400 600 800 1000 SE +/- 11.02, N = 15 932.14 MIN: 767.43 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 90 180 270 360 450 SE +/- 15.20, N = 12 404.99 MIN: 316.82 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.1869 0.3738 0.5607 0.7476 0.9345 SE +/- 0.007317, N = 15 0.830651 MIN: 0.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 1.5 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 2 x AMD EPYC 7552 48-Core 0.1955 0.391 0.5865 0.782 0.9775 SE +/- 0.003342, N = 3 0.868965 MIN: 0.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
2 x AMD EPYC 7552 48-Core Processor: 2 x AMD EPYC 7552 48-Core @ 2.20GHz (96 Cores / 192 Threads), Motherboard: HPE ProLiant DL385 Gen10 Plus (A42 BIOS), Chipset: AMD Starship/Matisse, Memory: 1008GB, Disk: 4 x 3201GB MO003200KWZQQ + 192010GB LOGICAL VOLUME, Graphics: Matrox MGA G200eH3, Network: 4 x QLogic FastLinQ QL41000 10/25/40/50GbE + 4 x Intel I350
OS: Ubuntu 20.04, Kernel: 5.4.0-48-generic (x86_64), Compiler: GCC 9.3.0, File-System: ext4, Screen Resolution: 1024x768
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301038Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 7 October 2020 04:14 by user ansible.