onednn NVIDIA GH200 ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2403018-NE-ONEDNNNVI02&rdt .
onednn NVIDIA GH200 Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution a b c d ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onednn NVIDIA GH200 onednn: IP Shapes 1D - CPU onednn: IP Shapes 3D - CPU onednn: Convolution Batch Shapes Auto - CPU onednn: Deconvolution Batch shapes_1d - CPU onednn: Deconvolution Batch shapes_3d - CPU onednn: Recurrent Neural Network Training - CPU onednn: Recurrent Neural Network Inference - CPU a b c d 4.17093 1.83694 4.71311 24.4811 6.43106 3583.01 2283.74 4.13326 1.83867 4.72171 24.5327 6.43839 3584.68 2287.55 4.11456 1.82904 4.70649 24.5554 6.44031 3574.73 2286.80 4.09690 1.83058 4.71070 24.4925 6.44054 3602.94 2280.18 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: IP Shapes 1D - Engine: CPU a b c d 0.9385 1.877 2.8155 3.754 4.6925 SE +/- 0.01783, N = 3 SE +/- 0.03178, N = 3 SE +/- 0.04283, N = 3 SE +/- 0.03906, N = 3 4.17093 4.13326 4.11456 4.09690 MIN: 3.77 MIN: 3.73 MIN: 3.68 MIN: 3.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: IP Shapes 3D - Engine: CPU a b c d 0.4137 0.8274 1.2411 1.6548 2.0685 SE +/- 0.00416, N = 3 SE +/- 0.00422, N = 3 SE +/- 0.00414, N = 3 SE +/- 0.00258, N = 3 1.83694 1.83867 1.82904 1.83058 MIN: 1.66 MIN: 1.64 MIN: 1.64 MIN: 1.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Convolution Batch Shapes Auto - Engine: CPU a b c d 1.0624 2.1248 3.1872 4.2496 5.312 SE +/- 0.00174, N = 3 SE +/- 0.00171, N = 3 SE +/- 0.00401, N = 3 SE +/- 0.00422, N = 3 4.71311 4.72171 4.70649 4.71070 MIN: 4.62 MIN: 4.6 MIN: 4.6 MIN: 4.59 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Deconvolution Batch shapes_1d - Engine: CPU a b c d 6 12 18 24 30 SE +/- 0.04, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 24.48 24.53 24.56 24.49 MIN: 23.03 MIN: 22.96 MIN: 23.08 MIN: 23.01 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Deconvolution Batch shapes_3d - Engine: CPU a b c d 2 4 6 8 10 SE +/- 0.01147, N = 3 SE +/- 0.00735, N = 3 SE +/- 0.01150, N = 3 SE +/- 0.01016, N = 3 6.43106 6.43839 6.44031 6.44054 MIN: 6.14 MIN: 6.13 MIN: 6.17 MIN: 6.15 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Recurrent Neural Network Training - Engine: CPU a b c d 800 1600 2400 3200 4000 SE +/- 8.84, N = 3 SE +/- 5.09, N = 3 SE +/- 31.12, N = 3 SE +/- 12.76, N = 3 3583.01 3584.68 3574.73 3602.94 MIN: 3531.56 MIN: 3531.87 MIN: 3486.97 MIN: 3540.98 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Recurrent Neural Network Inference - Engine: CPU a b c d 500 1000 1500 2000 2500 SE +/- 5.70, N = 3 SE +/- 8.97, N = 3 SE +/- 4.93, N = 3 SE +/- 3.22, N = 3 2283.74 2287.55 2286.80 2280.18 MIN: 2232.84 MIN: 2226.72 MIN: 2234.59 MIN: 2235.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
Phoronix Test Suite v10.8.5