onednn NVIDIA GH200 ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2403018-NE-ONEDNNNVI02&grr&sor .
onednn NVIDIA GH200 Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution a b c d ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onednn NVIDIA GH200 onednn: Recurrent Neural Network Training - CPU onednn: Recurrent Neural Network Inference - CPU onednn: Deconvolution Batch shapes_1d - CPU onednn: IP Shapes 1D - CPU onednn: IP Shapes 3D - CPU onednn: Convolution Batch Shapes Auto - CPU onednn: Deconvolution Batch shapes_3d - CPU a b c d 3583.01 2283.74 24.4811 4.17093 1.83694 4.71311 6.43106 3584.68 2287.55 24.5327 4.13326 1.83867 4.72171 6.43839 3574.73 2286.80 24.5554 4.11456 1.82904 4.70649 6.44031 3602.94 2280.18 24.4925 4.09690 1.83058 4.71070 6.44054 OpenBenchmarking.org
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Recurrent Neural Network Training - Engine: CPU c a b d 800 1600 2400 3200 4000 SE +/- 31.12, N = 3 SE +/- 8.84, N = 3 SE +/- 5.09, N = 3 SE +/- 12.76, N = 3 3574.73 3583.01 3584.68 3602.94 MIN: 3486.97 MIN: 3531.56 MIN: 3531.87 MIN: 3540.98 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Recurrent Neural Network Inference - Engine: CPU d a c b 500 1000 1500 2000 2500 SE +/- 3.22, N = 3 SE +/- 5.70, N = 3 SE +/- 4.93, N = 3 SE +/- 8.97, N = 3 2280.18 2283.74 2286.80 2287.55 MIN: 2235.1 MIN: 2232.84 MIN: 2234.59 MIN: 2226.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Deconvolution Batch shapes_1d - Engine: CPU a d b c 6 12 18 24 30 SE +/- 0.04, N = 3 SE +/- 0.09, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 24.48 24.49 24.53 24.56 MIN: 23.03 MIN: 23.01 MIN: 22.96 MIN: 23.08 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: IP Shapes 1D - Engine: CPU d c b a 0.9385 1.877 2.8155 3.754 4.6925 SE +/- 0.03906, N = 3 SE +/- 0.04283, N = 3 SE +/- 0.03178, N = 3 SE +/- 0.01783, N = 3 4.09690 4.11456 4.13326 4.17093 MIN: 3.7 MIN: 3.68 MIN: 3.73 MIN: 3.77 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: IP Shapes 3D - Engine: CPU c d a b 0.4137 0.8274 1.2411 1.6548 2.0685 SE +/- 0.00414, N = 3 SE +/- 0.00258, N = 3 SE +/- 0.00416, N = 3 SE +/- 0.00422, N = 3 1.82904 1.83058 1.83694 1.83867 MIN: 1.64 MIN: 1.64 MIN: 1.66 MIN: 1.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Convolution Batch Shapes Auto - Engine: CPU c d a b 1.0624 2.1248 3.1872 4.2496 5.312 SE +/- 0.00401, N = 3 SE +/- 0.00422, N = 3 SE +/- 0.00174, N = 3 SE +/- 0.00171, N = 3 4.70649 4.71070 4.71311 4.72171 MIN: 4.6 MIN: 4.59 MIN: 4.62 MIN: 4.6 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Deconvolution Batch shapes_3d - Engine: CPU a b c d 2 4 6 8 10 SE +/- 0.01147, N = 3 SE +/- 0.00735, N = 3 SE +/- 0.01150, N = 3 SE +/- 0.01016, N = 3 6.43106 6.43839 6.44031 6.44054 MIN: 6.14 MIN: 6.13 MIN: 6.17 MIN: 6.15 1. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl
Phoronix Test Suite v10.8.5