NVIDIA GH200 GPU

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH100 [GH200 120GB] on Ubuntu 23.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2401285-NE-NVIDIAGH229
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -
January 27
  6 Hours, 57 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


NVIDIA GH200 GPUOpenBenchmarking.orgPhoronix Test SuiteARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480 GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH100 [GH200 120GB]2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 23.106.5.0-15-generic (aarch64)GCC 13.2.0ext41920x1200ProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionNVIDIA GH200 GPU BenchmarksSystem Logs- Transparent Huge Pages: madvise- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)- Python 3.11.6- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

NVIDIA GH200 GPUpytorch: CPU - 1 - ResNet-50pytorch: CPU - 1 - ResNet-152pytorch: CPU - 512 - ResNet-50pytorch: CPU - 512 - ResNet-152pytorch: CPU - 1 - Efficientnet_v2_lpytorch: CPU - 512 - Efficientnet_v2_lblender: BMW27 - CUDAblender: BMW27 - CPU-Onlyblender: Classroom - CUDAblender: Fishy Cat - CUDAblender: Barbershop - CUDAblender: Classroom - CPU-Onlyblender: Fishy Cat - CPU-Onlyblender: Barbershop - CPU-Onlyblender: Pabellon Barcelona - CUDAblender: Pabellon Barcelona - CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -10.663.9310.283.770.442.5440.7241.686.8883.30455.6987.2582.72456.61172.36171.76OpenBenchmarking.org

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -3691215SE +/- 0.04, N = 310.66MIN: 7.64 / MAX: 11.23

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.88431.76862.65293.53724.4215SE +/- 0.00, N = 33.93MIN: 3.45 / MAX: 4.1

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-50ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -3691215SE +/- 0.01, N = 310.28MIN: 8.23 / MAX: 10.67

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-152ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.84831.69662.54493.39324.2415SE +/- 0.01, N = 33.77MIN: 3.28 / MAX: 3.89

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_lARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.0990.1980.2970.3960.495SE +/- 0.00, N = 30.44MIN: 0.43 / MAX: 0.92

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_lARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.57151.1431.71452.2862.8575SE +/- 0.01, N = 32.54MIN: 2.33 / MAX: 2.63

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Blender

Blender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing is supported. This system/blender test profile makes use of the system-supplied Blender. Use pts/blender if wishing to stick to a fixed version of Blender. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: BMW27 - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -918273645SE +/- 0.08, N = 340.43
OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: BMW27 - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -816243240Min: 40.28 / Avg: 40.43 / Max: 40.52

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: BMW27 - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -918273645SE +/- 0.18, N = 340.98
OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: BMW27 - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -918273645Min: 40.68 / Avg: 40.98 / Max: 41.3

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Classroom - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.37, N = 386.88

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Fishy Cat - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.83, N = 383.30

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Barbershop - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -100200300400500SE +/- 0.82, N = 3455.69

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Classroom - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.07, N = 387.25

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Fishy Cat - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.41, N = 382.72

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Barbershop - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -100200300400500SE +/- 1.11, N = 3456.61

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Pabellon Barcelona - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -4080120160200SE +/- 0.22, N = 3172.36

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Pabellon Barcelona - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -4080120160200SE +/- 0.46, N = 3171.76