onednn-c3a-03 KVM testing on Ubuntu 22.04 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2401083-NE-ONEDNNC3A57 AMD EPYC 9Y24 96-Core Processor: AMD EPYC 9Y24 96-Core (16 Cores / 32 Threads), Motherboard: ByteDance OpenStack Nova v0.1, Chipset: Intel 440FX 82441FX PMC, Memory: 4 x 16 GB RAM QEMU, Disk: 99GB, Graphics: Cirrus Logic GD 5446, Network: Red Hat Virtio device
OS: Ubuntu 22.04, Kernel: 5.15.0-83-generic (x86_64), Compiler: GCC 11.4.0, File-System: ext4, Screen Resolution: 1024x768, System Layer: KVM
Kernel Notes: Transparent Huge Pages: alwaysCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: CPU Microcode: 0x1000065Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onednn-c3a-03 OpenBenchmarking.org Phoronix Test Suite AMD EPYC 9Y24 96-Core (16 Cores / 32 Threads) ByteDance OpenStack Nova v0.1 Intel 440FX 82441FX PMC 4 x 16 GB RAM QEMU 99GB Cirrus Logic GD 5446 Red Hat Virtio device Ubuntu 22.04 5.15.0-83-generic (x86_64) GCC 11.4.0 ext4 1024x768 KVM Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution System Layer Onednn-c3a-03 Benchmarks System Logs - Transparent Huge Pages: always - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - CPU Microcode: 0x1000065 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onednn-c3a-03 onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU AMD EPYC 9Y24 96-Core 963.757 1847.78 958.853 2.36536 3.60921 1.55775 1868.10 952.497 1850.33 1.014092 0.715819 3.14711 3.98901 5.33716 3.49421 1.84735 1.09003 0.640274 2.94462 2.85576 0.574468 OpenBenchmarking.org
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU AMD EPYC 9Y24 96-Core 200 400 600 800 1000 SE +/- 2.86, N = 3 963.76 MIN: 907.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU AMD EPYC 9Y24 96-Core 400 800 1200 1600 2000 SE +/- 4.93, N = 3 1847.78 MIN: 1779.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU AMD EPYC 9Y24 96-Core 200 400 600 800 1000 SE +/- 3.98, N = 3 958.85 MIN: 907.96 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU AMD EPYC 9Y24 96-Core 0.5322 1.0644 1.5966 2.1288 2.661 SE +/- 0.01768, N = 15 2.36536 MIN: 2.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU AMD EPYC 9Y24 96-Core 0.8121 1.6242 2.4363 3.2484 4.0605 SE +/- 0.01201, N = 3 3.60921 MIN: 3.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU AMD EPYC 9Y24 96-Core 0.3505 0.701 1.0515 1.402 1.7525 SE +/- 0.00101, N = 3 1.55775 MIN: 1.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU AMD EPYC 9Y24 96-Core 400 800 1200 1600 2000 SE +/- 13.74, N = 3 1868.10 MIN: 1783.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU AMD EPYC 9Y24 96-Core 200 400 600 800 1000 SE +/- 3.64, N = 3 952.50 MIN: 907.41 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU AMD EPYC 9Y24 96-Core 400 800 1200 1600 2000 SE +/- 7.00, N = 3 1850.33 MIN: 1778.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.2282 0.4564 0.6846 0.9128 1.141 SE +/- 0.010282, N = 5 1.014092 MIN: 0.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.1611 0.3222 0.4833 0.6444 0.8055 SE +/- 0.005655, N = 9 0.715819 MIN: 0.62 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.7081 1.4162 2.1243 2.8324 3.5405 SE +/- 0.00237, N = 3 3.14711 MIN: 2.93 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.8975 1.795 2.6925 3.59 4.4875 SE +/- 0.00729, N = 3 3.98901 MIN: 3.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU AMD EPYC 9Y24 96-Core 1.2009 2.4018 3.6027 4.8036 6.0045 SE +/- 0.03048, N = 3 5.33716 MIN: 3.44 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.7862 1.5724 2.3586 3.1448 3.931 SE +/- 0.00063, N = 3 3.49421 MIN: 3.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU AMD EPYC 9Y24 96-Core 0.4157 0.8314 1.2471 1.6628 2.0785 SE +/- 0.02172, N = 3 1.84735 MIN: 1.59 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU AMD EPYC 9Y24 96-Core 0.2453 0.4906 0.7359 0.9812 1.2265 SE +/- 0.00508, N = 3 1.09003 MIN: 0.94 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.1441 0.2882 0.4323 0.5764 0.7205 SE +/- 0.007565, N = 3 0.640274 MIN: 0.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.6625 1.325 1.9875 2.65 3.3125 SE +/- 0.02857, N = 3 2.94462 MIN: 2.79 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.6425 1.285 1.9275 2.57 3.2125 SE +/- 0.02178, N = 14 2.85576 MIN: 2.37 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU AMD EPYC 9Y24 96-Core 0.1293 0.2586 0.3879 0.5172 0.6465 SE +/- 0.013460, N = 15 0.574468 MIN: 0.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
AMD EPYC 9Y24 96-Core Processor: AMD EPYC 9Y24 96-Core (16 Cores / 32 Threads), Motherboard: ByteDance OpenStack Nova v0.1, Chipset: Intel 440FX 82441FX PMC, Memory: 4 x 16 GB RAM QEMU, Disk: 99GB, Graphics: Cirrus Logic GD 5446, Network: Red Hat Virtio device
OS: Ubuntu 22.04, Kernel: 5.15.0-83-generic (x86_64), Compiler: GCC 11.4.0, File-System: ext4, Screen Resolution: 1024x768, System Layer: KVM
Kernel Notes: Transparent Huge Pages: alwaysCompiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -vProcessor Notes: CPU Microcode: 0x1000065Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 8 January 2024 21:10 by user root.