onednn NVIDIA GH200

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2403018-NE-ONEDNNNVI02.

onednn NVIDIA GH200ProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLVulkanCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH200 480GB2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 22.046.5.0-1007-NVIDIA-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.4.891.3.277GCC 11.4.0 + CUDA 11.5ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

onednn NVIDIA GH200onednn: IP Shapes 1D - CPUonednn: IP Shapes 3D - CPUonednn: Convolution Batch Shapes Auto - CPUonednn: Deconvolution Batch shapes_1d - CPUonednn: Deconvolution Batch shapes_3d - CPUonednn: Recurrent Neural Network Training - CPUonednn: Recurrent Neural Network Inference - CPUabcd4.170931.836944.7131124.48116.431063583.012283.744.133261.838674.7217124.53276.438393584.682287.554.114561.829044.7064924.55546.440313574.732286.804.096901.830584.7107024.49256.440543602.942280.18OpenBenchmarking.org

oneDNN

Harness: IP Shapes 1D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 1D - Engine: CPUabcd0.93851.8772.81553.7544.6925SE +/- 0.01783, N = 3SE +/- 0.03178, N = 3SE +/- 0.04283, N = 3SE +/- 0.03906, N = 34.170934.133264.114564.09690MIN: 3.77MIN: 3.73MIN: 3.68MIN: 3.71. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: IP Shapes 3D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 3D - Engine: CPUabcd0.41370.82741.24111.65482.0685SE +/- 0.00416, N = 3SE +/- 0.00422, N = 3SE +/- 0.00414, N = 3SE +/- 0.00258, N = 31.836941.838671.829041.83058MIN: 1.66MIN: 1.64MIN: 1.64MIN: 1.641. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Convolution Batch Shapes Auto - Engine: CPUabcd1.06242.12483.18724.24965.312SE +/- 0.00174, N = 3SE +/- 0.00171, N = 3SE +/- 0.00401, N = 3SE +/- 0.00422, N = 34.713114.721714.706494.71070MIN: 4.62MIN: 4.6MIN: 4.6MIN: 4.591. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_1d - Engine: CPUabcd612182430SE +/- 0.04, N = 3SE +/- 0.10, N = 3SE +/- 0.07, N = 3SE +/- 0.09, N = 324.4824.5324.5624.49MIN: 23.03MIN: 22.96MIN: 23.08MIN: 23.011. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_3d - Engine: CPUabcd246810SE +/- 0.01147, N = 3SE +/- 0.00735, N = 3SE +/- 0.01150, N = 3SE +/- 0.01016, N = 36.431066.438396.440316.44054MIN: 6.14MIN: 6.13MIN: 6.17MIN: 6.151. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Training - Engine: CPUabcd8001600240032004000SE +/- 8.84, N = 3SE +/- 5.09, N = 3SE +/- 31.12, N = 3SE +/- 12.76, N = 33583.013584.683574.733602.94MIN: 3531.56MIN: 3531.87MIN: 3486.97MIN: 3540.981. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Inference - Engine: CPUabcd5001000150020002500SE +/- 5.70, N = 3SE +/- 8.97, N = 3SE +/- 4.93, N = 3SE +/- 3.22, N = 32283.742287.552286.802280.18MIN: 2232.84MIN: 2226.72MIN: 2234.59MIN: 2235.11. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl


Phoronix Test Suite v10.8.4