NVIDIA GH200 GPU

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH100 [GH200 120GB] on Ubuntu 23.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2401285-NE-NVIDIAGH229.

NVIDIA GH200 GPUProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480 GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH100 [GH200 120GB]2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 23.106.5.0-15-generic (aarch64)GCC 13.2.0ext41920x1200OpenBenchmarking.org- Transparent Huge Pages: madvise- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)- Python 3.11.6- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

NVIDIA GH200 GPUpytorch: CPU - 1 - ResNet-50pytorch: CPU - 1 - ResNet-152pytorch: CPU - 512 - ResNet-50pytorch: CPU - 512 - ResNet-152pytorch: CPU - 1 - Efficientnet_v2_lpytorch: CPU - 512 - Efficientnet_v2_lblender: BMW27 - CUDAblender: BMW27 - CPU-Onlyblender: Classroom - CUDAblender: Fishy Cat - CUDAblender: Barbershop - CUDAblender: Classroom - CPU-Onlyblender: Fishy Cat - CPU-Onlyblender: Barbershop - CPU-Onlyblender: Pabellon Barcelona - CUDAblender: Pabellon Barcelona - CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -10.663.9310.283.770.442.5440.7241.686.8883.30455.6987.2582.72456.61172.36171.76OpenBenchmarking.org

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -3691215SE +/- 0.04, N = 310.66MIN: 7.64 / MAX: 11.23

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.88431.76862.65293.53724.4215SE +/- 0.00, N = 33.93MIN: 3.45 / MAX: 4.1

PyTorch

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-50ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -3691215SE +/- 0.01, N = 310.28MIN: 8.23 / MAX: 10.67

PyTorch

Device: CPU - Batch Size: 512 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-152ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.84831.69662.54493.39324.2415SE +/- 0.01, N = 33.77MIN: 3.28 / MAX: 3.89

PyTorch

Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_lARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.0990.1980.2970.3960.495SE +/- 0.00, N = 30.44MIN: 0.43 / MAX: 0.92

PyTorch

Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_lARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.57151.1431.71452.2862.8575SE +/- 0.01, N = 32.54MIN: 2.33 / MAX: 2.63

Blender

Blend File: BMW27 - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: BMW27 - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -918273645SE +/- 0.08, N = 340.43

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: BMW27 - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -918273645SE +/- 0.18, N = 340.98

Blender

Blend File: Classroom - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Classroom - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.37, N = 386.88

Blender

Blend File: Fishy Cat - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Fishy Cat - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.83, N = 383.30

Blender

Blend File: Barbershop - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Barbershop - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -100200300400500SE +/- 0.82, N = 3455.69

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Classroom - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.07, N = 387.25

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Fishy Cat - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.41, N = 382.72

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Barbershop - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -100200300400500SE +/- 1.11, N = 3456.61

Blender

Blend File: Pabellon Barcelona - Compute: CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Pabellon Barcelona - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -4080120160200SE +/- 0.22, N = 3172.36

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Pabellon Barcelona - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -4080120160200SE +/- 0.46, N = 3171.76


Phoronix Test Suite v10.8.4