NVIDIA GH200 GPU

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH100 [GH200 120GB] on Ubuntu 23.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2401285-NE-NVIDIAGH229
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -
January 27
  6 Hours, 57 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


NVIDIA GH200 GPUOpenBenchmarking.orgPhoronix Test SuiteARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480 GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH100 [GH200 120GB]2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 23.106.5.0-15-generic (aarch64)GCC 13.2.0ext41920x1200ProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionNVIDIA GH200 GPU BenchmarksSystem Logs- Transparent Huge Pages: madvise- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)- Python 3.11.6- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

NVIDIA GH200 GPUblender: Pabellon Barcelona - CPU-Onlypytorch: CPU - 1 - ResNet-50pytorch: CPU - 1 - ResNet-152blender: BMW27 - CUDApytorch: CPU - 512 - ResNet-50pytorch: CPU - 512 - ResNet-152pytorch: CPU - 1 - Efficientnet_v2_lblender: BMW27 - CPU-Onlypytorch: CPU - 512 - Efficientnet_v2_lblender: Pabellon Barcelona - CUDAblender: Fishy Cat - CPU-Onlyblender: Barbershop - CPU-Onlyblender: Barbershop - CUDAblender: Classroom - CPU-Onlyblender: Classroom - CUDAblender: Fishy Cat - CUDApytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_lARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -171.7610.663.9340.7210.283.770.4441.62.54172.3682.72456.61455.6987.2586.8883.30OpenBenchmarking.org

Blender

Blender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing is supported. This system/blender test profile makes use of the system-supplied Blender. Use pts/blender if wishing to stick to a fixed version of Blender. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Pabellon Barcelona - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -4080120160200SE +/- 0.46, N = 3171.76

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -3691215SE +/- 0.04, N = 310.66MIN: 7.64 / MAX: 11.23

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.88431.76862.65293.53724.4215SE +/- 0.00, N = 33.93MIN: 3.45 / MAX: 4.1

Blender

Blender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing is supported. This system/blender test profile makes use of the system-supplied Blender. Use pts/blender if wishing to stick to a fixed version of Blender. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: BMW27 - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -918273645SE +/- 0.08, N = 340.43

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-50ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -3691215SE +/- 0.01, N = 310.28MIN: 8.23 / MAX: 10.67

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-152ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.84831.69662.54493.39324.2415SE +/- 0.01, N = 33.77MIN: 3.28 / MAX: 3.89

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_lARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.0990.1980.2970.3960.495SE +/- 0.00, N = 30.44MIN: 0.43 / MAX: 0.92

Blender

Blender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing is supported. This system/blender test profile makes use of the system-supplied Blender. Use pts/blender if wishing to stick to a fixed version of Blender. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: BMW27 - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -918273645SE +/- 0.18, N = 340.98

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_lARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -0.57151.1431.71452.2862.8575SE +/- 0.01, N = 32.54MIN: 2.33 / MAX: 2.63

Blender

Blender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing is supported. This system/blender test profile makes use of the system-supplied Blender. Use pts/blender if wishing to stick to a fixed version of Blender. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Pabellon Barcelona - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -4080120160200SE +/- 0.22, N = 3172.36

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Fishy Cat - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.41, N = 382.72

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Barbershop - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -100200300400500SE +/- 1.11, N = 3456.61

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Barbershop - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -100200300400500SE +/- 0.82, N = 3455.69

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Classroom - Compute: CPU-OnlyARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.07, N = 387.25

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Classroom - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.37, N = 386.88

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6.2Blend File: Fishy Cat - Compute: CUDAARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -20406080100SE +/- 0.83, N = 383.30

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] -: The test quit with a non-zero exit status. E: AssertionError: Torch not compiled with CUDA enabled