onednn NVIDIA GH200

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2403018-NE-ONEDNNNVI02&grs&sor.

onednn NVIDIA GH200ProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLVulkanCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH200 480GB2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 22.046.5.0-1007-NVIDIA-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.4.891.3.277GCC 11.4.0 + CUDA 11.5ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

onednn NVIDIA GH200onednn: IP Shapes 1D - CPUonednn: Recurrent Neural Network Training - CPUonednn: IP Shapes 3D - CPUonednn: Convolution Batch Shapes Auto - CPUonednn: Recurrent Neural Network Inference - CPUonednn: Deconvolution Batch shapes_1d - CPUonednn: Deconvolution Batch shapes_3d - CPUabcd4.170933583.011.836944.713112283.7424.48116.431064.133263584.681.838674.721712287.5524.53276.438394.114563574.731.829044.706492286.8024.55546.440314.096903602.941.830584.710702280.1824.49256.44054OpenBenchmarking.org

oneDNN

Harness: IP Shapes 1D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 1D - Engine: CPUdcba0.93851.8772.81553.7544.6925SE +/- 0.03906, N = 3SE +/- 0.04283, N = 3SE +/- 0.03178, N = 3SE +/- 0.01783, N = 34.096904.114564.133264.17093MIN: 3.7MIN: 3.68MIN: 3.73MIN: 3.771. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Training - Engine: CPUcabd8001600240032004000SE +/- 31.12, N = 3SE +/- 8.84, N = 3SE +/- 5.09, N = 3SE +/- 12.76, N = 33574.733583.013584.683602.94MIN: 3486.97MIN: 3531.56MIN: 3531.87MIN: 3540.981. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: IP Shapes 3D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 3D - Engine: CPUcdab0.41370.82741.24111.65482.0685SE +/- 0.00414, N = 3SE +/- 0.00258, N = 3SE +/- 0.00416, N = 3SE +/- 0.00422, N = 31.829041.830581.836941.83867MIN: 1.64MIN: 1.64MIN: 1.66MIN: 1.641. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Convolution Batch Shapes Auto - Engine: CPUcdab1.06242.12483.18724.24965.312SE +/- 0.00401, N = 3SE +/- 0.00422, N = 3SE +/- 0.00174, N = 3SE +/- 0.00171, N = 34.706494.710704.713114.72171MIN: 4.6MIN: 4.59MIN: 4.62MIN: 4.61. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Inference - Engine: CPUdacb5001000150020002500SE +/- 3.22, N = 3SE +/- 5.70, N = 3SE +/- 4.93, N = 3SE +/- 8.97, N = 32280.182283.742286.802287.55MIN: 2235.1MIN: 2232.84MIN: 2234.59MIN: 2226.721. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_1d - Engine: CPUadbc612182430SE +/- 0.04, N = 3SE +/- 0.09, N = 3SE +/- 0.10, N = 3SE +/- 0.07, N = 324.4824.4924.5324.56MIN: 23.03MIN: 23.01MIN: 22.96MIN: 23.081. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_3d - Engine: CPUabcd246810SE +/- 0.01147, N = 3SE +/- 0.00735, N = 3SE +/- 0.01150, N = 3SE +/- 0.01016, N = 36.431066.438396.440316.44054MIN: 6.14MIN: 6.13MIN: 6.17MIN: 6.151. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl


Phoronix Test Suite v10.8.5