NVIDIA GH200 Compilers

Clang and GCC benchmarks by Michael Larabel for a future article. ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2402098-NE-NVIDIAGH291
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Limit displaying results to tests within:

Audio Encoding 3 Tests
C/C++ Compiler Tests 7 Tests
CPU Massive 8 Tests
Creator Workloads 7 Tests
Encoding 4 Tests
HPC - High Performance Computing 3 Tests
Imaging 3 Tests
Molecular Dynamics 2 Tests
Multi-Core 6 Tests
OpenMPI Tests 2 Tests
Scientific Computing 2 Tests
Server CPU Tests 3 Tests
Single-Threaded 2 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
GCC 13
February 09
  2 Hours, 27 Minutes
Clang 17
February 09
  2 Hours, 31 Minutes
Invert Hiding All Results Option
  2 Hours, 29 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


NVIDIA GH200 Compilers Clang and GCC benchmarks by Michael Larabel for a future article. ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite. GCC 13: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: ASPEED, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 23.10, Kernel: 6.8.0-060800rc3daily20240208-generic-64k (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 Clang 17: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: ASPEED, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 23.10, Kernel: 6.8.0-060800rc3daily20240208-generic-64k (aarch64), Compiler: Clang 17.0.2, File-System: ext4, Screen Resolution: 1920x1200 C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Seconds < Lower Is Better GCC 13 ... 6.006 |====================================================== Clang 17 . 6.749 |============================================================= FLAC Audio Encoding 1.4 WAV To FLAC Seconds < Lower Is Better GCC 13 ... 16.87 |============================================================= Clang 17 . 16.08 |========================================================== GraphicsMagick 1.3.38 Operation: Swirl Iterations Per Minute > Higher Is Better GCC 13 ... 3672 |============================================================= Clang 17 . 3748 |============================================================== GraphicsMagick 1.3.38 Operation: Rotate Iterations Per Minute > Higher Is Better GCC 13 ... 1764 |============================================================ Clang 17 . 1820 |============================================================== GraphicsMagick 1.3.38 Operation: Sharpen Iterations Per Minute > Higher Is Better GCC 13 ... 882 |=============================== Clang 17 . 1761 |============================================================== GraphicsMagick 1.3.38 Operation: Enhanced Iterations Per Minute > Higher Is Better GCC 13 ... 2170 |============================================================== Clang 17 . 1542 |============================================ GraphicsMagick 1.3.38 Operation: Resizing Iterations Per Minute > Higher Is Better GCC 13 ... 8044 |============================================================== Clang 17 . 7919 |============================================================= GraphicsMagick 1.3.38 Operation: Noise-Gaussian Iterations Per Minute > Higher Is Better GCC 13 ... 1920 |============================================================== Clang 17 . 1440 |=============================================== GraphicsMagick 1.3.38 Operation: HWB Color Space Iterations Per Minute > Higher Is Better GCC 13 ... 4731 |============================================================== Clang 17 . 4360 |========================================================= Helsing 1.0-beta Digit Range: 14 digit Seconds < Lower Is Better GCC 13 ... 68.10 |================================================= Clang 17 . 84.33 |============================================================= LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better GCC 13 ... 5.474 |===================================================== Clang 17 . 6.287 |============================================================= LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: 20k Atoms ns/day > Higher Is Better GCC 13 ... 48.23 |=========================================================== Clang 17 . 49.52 |============================================================= LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: Rhodopsin Protein ns/day > Higher Is Better GCC 13 ... 55.36 |============================================================ Clang 17 . 56.34 |============================================================= libavif avifenc 1.0 Encoder Speed: 0 Seconds < Lower Is Better GCC 13 ... 109.67 |====================================================== Clang 17 . 122.87 |============================================================ libavif avifenc 1.0 Encoder Speed: 2 Seconds < Lower Is Better GCC 13 ... 67.07 |==================================================== Clang 17 . 79.37 |============================================================= libavif avifenc 1.0 Encoder Speed: 6, Lossless Seconds < Lower Is Better GCC 13 ... 3.754 |============================================================= Clang 17 . 3.717 |============================================================ libavif avifenc 1.0 Encoder Speed: 10, Lossless Seconds < Lower Is Better GCC 13 ... 2.851 |============================================================= Clang 17 . 2.871 |============================================================= Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better GCC 13 ... 45523000 |====================================== Clang 17 . 68921000 |========================================================== Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better GCC 13 ... 26470667 |========================================== Clang 17 . 36488333 |========================================================== Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better GCC 13 ... 3194567 |======================================================= Clang 17 . 3414867 |=========================================================== Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better GCC 13 ... 1362833333 |===================================== Clang 17 . 2066233333 |======================================================== Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better GCC 13 ... 795033333 |========================================= Clang 17 . 1096366667 |======================================================== Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better GCC 13 ... 2671533333 |===================================== Clang 17 . 4032833333 |======================================================== Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better GCC 13 ... 1587900000 |========================================= Clang 17 . 2182700000 |======================================================== Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better GCC 13 ... 2952333333 |====================================== Clang 17 . 4386000000 |======================================================== Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better GCC 13 ... 1767800000 |========================================= Clang 17 . 2406400000 |======================================================== Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better GCC 13 ... 95931000 |===================================================== Clang 17 . 102853333 |========================================================= Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better GCC 13 ... 191886667 |===================================================== Clang 17 . 206270000 |========================================================= Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better GCC 13 ... 215500000 |===================================================== Clang 17 . 231370000 |========================================================= LULESH 2.0.3 z/s > Higher Is Better GCC 13 ... 48090.64 |========================================================== Clang 17 . 47590.09 |========================================================= miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 GFInst/s > Higher Is Better GCC 13 ... 1193.88 |============================================= Clang 17 . 1551.88 |=========================================================== miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 Billion Interactions/s > Higher Is Better GCC 13 ... 47.76 |=============================================== Clang 17 . 62.08 |============================================================= miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 GFInst/s > Higher Is Better GCC 13 ... 1201.03 |============================================== Clang 17 . 1524.47 |=========================================================== miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 Billion Interactions/s > Higher Is Better GCC 13 ... 48.04 |================================================ Clang 17 . 60.98 |============================================================= Opus Codec Encoding 1.4 WAV To Opus Encode Seconds < Lower Is Better GCC 13 ... 33.04 |============================================================= Clang 17 . 31.42 |========================================================== Primesieve 8.0 Length: 1e12 Seconds < Lower Is Better GCC 13 ... 2.891 |============================================================ Clang 17 . 2.929 |============================================================= Primesieve 8.0 Length: 1e13 Seconds < Lower Is Better GCC 13 ... 35.19 |============================================================ Clang 17 . 35.60 |============================================================= QuantLib 1.32 Configuration: Multi-Threaded MFLOPS > Higher Is Better GCC 13 ... 232068.2 |====================================================== Clang 17 . 249451.3 |========================================================== QuantLib 1.32 Configuration: Single-Threaded MFLOPS > Higher Is Better GCC 13 ... 3456.0 |======================================================= Clang 17 . 3740.4 |============================================================ SecureMark 1.0.4 Benchmark: SecureMark-TLS marks > Higher Is Better GCC 13 ... 265718 |============================================================ Clang 17 . 267498 |============================================================ Stress-NG 0.16.04 Test: CPU Cache Bogo Ops/s > Higher Is Better GCC 13 ... 949580.78 |========================================================= Clang 17 . 932492.75 |======================================================== Stress-NG 0.16.04 Test: Matrix Math Bogo Ops/s > Higher Is Better GCC 13 ... 515044.15 |===================================================== Clang 17 . 550915.68 |========================================================= Stress-NG 0.16.04 Test: Vector Math Bogo Ops/s > Higher Is Better GCC 13 ... 387369.41 |================================================= Clang 17 . 450187.94 |========================================================= Stress-NG 0.16.04 Test: Floating Point Bogo Ops/s > Higher Is Better GCC 13 ... 19830.04 |========================================================== Clang 17 . 19566.28 |========================================================= Stress-NG 0.16.04 Test: Vector Shuffle Bogo Ops/s > Higher Is Better GCC 13 ... 71014.33 |========================================================== Stress-NG 0.16.04 Test: Fused Multiply-Add Bogo Ops/s > Higher Is Better GCC 13 ... 161511818.72 |====================================================== Clang 17 . 157339813.59 |===================================================== Stress-NG 0.16.04 Test: Vector Floating Point Bogo Ops/s > Higher Is Better GCC 13 ... 83730.08 |================================== Clang 17 . 141522.24 |========================================================= TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better GCC 13 ... 2078407 |======================================================= Clang 17 . 2248073 |=========================================================== WebP Image Encode 1.2.4 Encode Settings: Default MP/s > Higher Is Better GCC 13 ... 13.95 |======================================================= Clang 17 . 15.42 |============================================================= WebP Image Encode 1.2.4 Encode Settings: Quality 100 MP/s > Higher Is Better GCC 13 ... 9.44 |========================================================= Clang 17 . 10.19 |============================================================= WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless MP/s > Higher Is Better GCC 13 ... 1.28 |============================================================= Clang 17 . 1.31 |============================================================== WebP Image Encode 1.2.4 Encode Settings: Quality 100, Highest Compression MP/s > Higher Is Better GCC 13 ... 3.86 |=================================================== Clang 17 . 4.66 |============================================================== WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless, Highest Compression MP/s > Higher Is Better GCC 13 ... 0.52 |========================================================== Clang 17 . 0.56 |============================================================== Zstd Compression 1.5.4 Compression Level: 19 - Compression Speed MB/s > Higher Is Better GCC 13 ... 14.7 |============================================================= Clang 17 . 14.9 |============================================================== Zstd Compression 1.5.4 Compression Level: 19 - Decompression Speed MB/s > Higher Is Better GCC 13 ... 1237.4 |============================================================ Clang 17 . 1031.1 |================================================== Zstd Compression 1.5.4 Compression Level: 19, Long Mode - Compression Speed MB/s > Higher Is Better GCC 13 ... 8.57 |============================================================= Clang 17 . 8.70 |============================================================== Zstd Compression 1.5.4 Compression Level: 19, Long Mode - Decompression Speed MB/s > Higher Is Better GCC 13 ... 1283.6 |============================================================ Clang 17 . 1094.4 |===================================================