NVIDIA GH200 GPU ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH100 [GH200 120GB] on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401285-NE-NVIDIAGH229&grr .
NVIDIA GH200 GPU Processor Motherboard Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480 GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH100 [GH200 120GB] 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 23.10 6.5.0-15-generic (aarch64) GCC 13.2.0 ext4 1920x1200 OpenBenchmarking.org - Transparent Huge Pages: madvise - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) - Python 3.11.6 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
NVIDIA GH200 GPU pytorch: CPU - 512 - Efficientnet_v2_l pytorch: CPU - 1 - Efficientnet_v2_l pytorch: CPU - 512 - ResNet-152 blender: Barbershop - CPU-Only blender: Barbershop - CUDA pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 512 - ResNet-50 blender: Pabellon Barcelona - CUDA blender: Pabellon Barcelona - CPU-Only pytorch: CPU - 1 - ResNet-50 blender: Classroom - CPU-Only blender: Classroom - CUDA blender: Fishy Cat - CUDA blender: Fishy Cat - CPU-Only blender: BMW27 - CPU-Only blender: BMW27 - CUDA pytorch: NVIDIA CUDA GPU - 512 - ResNet-50 ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 2.54 0.44 3.77 456.61 455.69 3.93 10.28 172.36 171.76 10.66 87.25 86.88 83.30 82.72 41.6 40.72 OpenBenchmarking.org
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 0.5715 1.143 1.7145 2.286 2.8575 SE +/- 0.01, N = 3 2.54 MIN: 2.33 / MAX: 2.63
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 0.099 0.198 0.297 0.396 0.495 SE +/- 0.00, N = 3 0.44 MIN: 0.43 / MAX: 0.92
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 0.8483 1.6966 2.5449 3.3932 4.2415 SE +/- 0.01, N = 3 3.77 MIN: 3.28 / MAX: 3.89
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: Barbershop - Compute: CPU-Only ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 100 200 300 400 500 SE +/- 1.11, N = 3 456.61
Blender Blend File: Barbershop - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: Barbershop - Compute: CUDA ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 100 200 300 400 500 SE +/- 0.82, N = 3 455.69
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 0.8843 1.7686 2.6529 3.5372 4.4215 SE +/- 0.00, N = 3 3.93 MIN: 3.45 / MAX: 4.1
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 3 6 9 12 15 SE +/- 0.01, N = 3 10.28 MIN: 8.23 / MAX: 10.67
Blender Blend File: Pabellon Barcelona - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: Pabellon Barcelona - Compute: CUDA ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 40 80 120 160 200 SE +/- 0.22, N = 3 172.36
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: Pabellon Barcelona - Compute: CPU-Only ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 40 80 120 160 200 SE +/- 0.46, N = 3 171.76
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 3 6 9 12 15 SE +/- 0.04, N = 3 10.66 MIN: 7.64 / MAX: 11.23
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: Classroom - Compute: CPU-Only ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 20 40 60 80 100 SE +/- 0.07, N = 3 87.25
Blender Blend File: Classroom - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: Classroom - Compute: CUDA ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 20 40 60 80 100 SE +/- 0.37, N = 3 86.88
Blender Blend File: Fishy Cat - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: Fishy Cat - Compute: CUDA ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 20 40 60 80 100 SE +/- 0.83, N = 3 83.30
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: Fishy Cat - Compute: CPU-Only ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 20 40 60 80 100 SE +/- 0.41, N = 3 82.72
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: BMW27 - Compute: CPU-Only ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 9 18 27 36 45 SE +/- 0.18, N = 3 40.98
Blender Blend File: BMW27 - Compute: CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6.2 Blend File: BMW27 - Compute: CUDA ARMv8 Neoverse-V2 - NVIDIA GH100 [GH200 120GB] - 9 18 27 36 45 SE +/- 0.08, N = 3 40.43
Phoronix Test Suite v10.8.5