NVIDIA GH200 Compilers Clang and GCC benchmarks by Michael Larabel for a future article. ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite. GCC 13: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: ASPEED, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 23.10, Kernel: 6.8.0-060800rc3daily20240208-generic-64k (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 Clang 17: Processor: ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores), Motherboard: Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS), Memory: 1 x 480GB DRAM-6400MT/s, Disk: 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9, Graphics: ASPEED, Network: 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE OS: Ubuntu 23.10, Kernel: 6.8.0-060800rc3daily20240208-generic-64k (aarch64), Compiler: Clang 17.0.2, File-System: ext4, Screen Resolution: 1920x1200 QuantLib 1.32 Configuration: Multi-Threaded MFLOPS > Higher Is Better GCC 13 ... 232068.2 |====================================================== Clang 17 . 249451.3 |========================================================== QuantLib 1.32 Configuration: Single-Threaded MFLOPS > Higher Is Better GCC 13 ... 3456.0 |======================================================= Clang 17 . 3740.4 |============================================================ miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 GFInst/s > Higher Is Better GCC 13 ... 1193.88 |============================================= Clang 17 . 1551.88 |=========================================================== miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 Billion Interactions/s > Higher Is Better GCC 13 ... 47.76 |=============================================== Clang 17 . 62.08 |============================================================= miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 GFInst/s > Higher Is Better GCC 13 ... 1201.03 |============================================== Clang 17 . 1524.47 |=========================================================== miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 Billion Interactions/s > Higher Is Better GCC 13 ... 48.04 |================================================ Clang 17 . 60.98 |============================================================= LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: 20k Atoms ns/day > Higher Is Better GCC 13 ... 48.23 |=========================================================== Clang 17 . 49.52 |============================================================= LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: Rhodopsin Protein ns/day > Higher Is Better GCC 13 ... 55.36 |============================================================ Clang 17 . 56.34 |============================================================= LULESH 2.0.3 z/s > Higher Is Better GCC 13 ... 48090.64 |========================================================== Clang 17 . 47590.09 |========================================================= Zstd Compression 1.5.4 Compression Level: 19 - Compression Speed MB/s > Higher Is Better GCC 13 ... 14.7 |============================================================= Clang 17 . 14.9 |============================================================== Zstd Compression 1.5.4 Compression Level: 19 - Decompression Speed MB/s > Higher Is Better GCC 13 ... 1237.4 |============================================================ Clang 17 . 1031.1 |================================================== Zstd Compression 1.5.4 Compression Level: 19, Long Mode - Compression Speed MB/s > Higher Is Better GCC 13 ... 8.57 |============================================================= Clang 17 . 8.70 |============================================================== Zstd Compression 1.5.4 Compression Level: 19, Long Mode - Decompression Speed MB/s > Higher Is Better GCC 13 ... 1283.6 |============================================================ Clang 17 . 1094.4 |=================================================== WebP Image Encode 1.2.4 Encode Settings: Default MP/s > Higher Is Better GCC 13 ... 13.95 |======================================================= Clang 17 . 15.42 |============================================================= WebP Image Encode 1.2.4 Encode Settings: Quality 100 MP/s > Higher Is Better GCC 13 ... 9.44 |========================================================= Clang 17 . 10.19 |============================================================= WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless MP/s > Higher Is Better GCC 13 ... 1.28 |============================================================= Clang 17 . 1.31 |============================================================== WebP Image Encode 1.2.4 Encode Settings: Quality 100, Highest Compression MP/s > Higher Is Better GCC 13 ... 3.86 |=================================================== Clang 17 . 4.66 |============================================================== WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless, Highest Compression MP/s > Higher Is Better GCC 13 ... 0.52 |========================================================== Clang 17 . 0.56 |============================================================== TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better GCC 13 ... 2078407 |======================================================= Clang 17 . 2248073 |=========================================================== GraphicsMagick 1.3.38 Operation: Swirl Iterations Per Minute > Higher Is Better GCC 13 ... 3672 |============================================================= Clang 17 . 3748 |============================================================== GraphicsMagick 1.3.38 Operation: Rotate Iterations Per Minute > Higher Is Better GCC 13 ... 1764 |============================================================ Clang 17 . 1820 |============================================================== GraphicsMagick 1.3.38 Operation: Sharpen Iterations Per Minute > Higher Is Better GCC 13 ... 882 |=============================== Clang 17 . 1761 |============================================================== GraphicsMagick 1.3.38 Operation: Enhanced Iterations Per Minute > Higher Is Better GCC 13 ... 2170 |============================================================== Clang 17 . 1542 |============================================ GraphicsMagick 1.3.38 Operation: Resizing Iterations Per Minute > Higher Is Better GCC 13 ... 8044 |============================================================== Clang 17 . 7919 |============================================================= GraphicsMagick 1.3.38 Operation: Noise-Gaussian Iterations Per Minute > Higher Is Better GCC 13 ... 1920 |============================================================== Clang 17 . 1440 |=============================================== GraphicsMagick 1.3.38 Operation: HWB Color Space Iterations Per Minute > Higher Is Better GCC 13 ... 4731 |============================================================== Clang 17 . 4360 |========================================================= libavif avifenc 1.0 Encoder Speed: 0 Seconds < Lower Is Better GCC 13 ... 109.67 |====================================================== Clang 17 . 122.87 |============================================================ libavif avifenc 1.0 Encoder Speed: 2 Seconds < Lower Is Better GCC 13 ... 67.07 |==================================================== Clang 17 . 79.37 |============================================================= libavif avifenc 1.0 Encoder Speed: 6, Lossless Seconds < Lower Is Better GCC 13 ... 3.754 |============================================================= Clang 17 . 3.717 |============================================================ libavif avifenc 1.0 Encoder Speed: 10, Lossless Seconds < Lower Is Better GCC 13 ... 2.851 |============================================================= Clang 17 . 2.871 |============================================================= C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Seconds < Lower Is Better GCC 13 ... 6.006 |====================================================== Clang 17 . 6.749 |============================================================= Primesieve 8.0 Length: 1e12 Seconds < Lower Is Better GCC 13 ... 2.891 |============================================================ Clang 17 . 2.929 |============================================================= Primesieve 8.0 Length: 1e13 Seconds < Lower Is Better GCC 13 ... 35.19 |============================================================ Clang 17 . 35.60 |============================================================= FLAC Audio Encoding 1.4 WAV To FLAC Seconds < Lower Is Better GCC 13 ... 16.87 |============================================================= Clang 17 . 16.08 |========================================================== LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better GCC 13 ... 5.474 |===================================================== Clang 17 . 6.287 |============================================================= Opus Codec Encoding 1.4 WAV To Opus Encode Seconds < Lower Is Better GCC 13 ... 33.04 |============================================================= Clang 17 . 31.42 |========================================================== Helsing 1.0-beta Digit Range: 14 digit Seconds < Lower Is Better GCC 13 ... 68.10 |================================================= Clang 17 . 84.33 |============================================================= SecureMark 1.0.4 Benchmark: SecureMark-TLS marks > Higher Is Better GCC 13 ... 265718 |============================================================ Clang 17 . 267498 |============================================================ Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better GCC 13 ... 45523000 |====================================== Clang 17 . 68921000 |========================================================== Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better GCC 13 ... 26470667 |========================================== Clang 17 . 36488333 |========================================================== Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better GCC 13 ... 3194567 |======================================================= Clang 17 . 3414867 |=========================================================== Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better GCC 13 ... 1362833333 |===================================== Clang 17 . 2066233333 |======================================================== Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better GCC 13 ... 795033333 |========================================= Clang 17 . 1096366667 |======================================================== Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better GCC 13 ... 2671533333 |===================================== Clang 17 . 4032833333 |======================================================== Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better GCC 13 ... 1587900000 |========================================= Clang 17 . 2182700000 |======================================================== Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better GCC 13 ... 2952333333 |====================================== Clang 17 . 4386000000 |======================================================== Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better GCC 13 ... 1767800000 |========================================= Clang 17 . 2406400000 |======================================================== Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better GCC 13 ... 95931000 |===================================================== Clang 17 . 102853333 |========================================================= Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better GCC 13 ... 191886667 |===================================================== Clang 17 . 206270000 |========================================================= Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better GCC 13 ... 215500000 |===================================================== Clang 17 . 231370000 |========================================================= Stress-NG 0.16.04 Test: CPU Cache Bogo Ops/s > Higher Is Better GCC 13 ... 949580.78 |========================================================= Clang 17 . 932492.75 |======================================================== Stress-NG 0.16.04 Test: Matrix Math Bogo Ops/s > Higher Is Better GCC 13 ... 515044.15 |===================================================== Clang 17 . 550915.68 |========================================================= Stress-NG 0.16.04 Test: Vector Math Bogo Ops/s > Higher Is Better GCC 13 ... 387369.41 |================================================= Clang 17 . 450187.94 |========================================================= Stress-NG 0.16.04 Test: Floating Point Bogo Ops/s > Higher Is Better GCC 13 ... 19830.04 |========================================================== Clang 17 . 19566.28 |========================================================= Stress-NG 0.16.04 Test: Vector Shuffle Bogo Ops/s > Higher Is Better GCC 13 ... 71014.33 |========================================================== Stress-NG 0.16.04 Test: Fused Multiply-Add Bogo Ops/s > Higher Is Better GCC 13 ... 161511818.72 |====================================================== Clang 17 . 157339813.59 |===================================================== Stress-NG 0.16.04 Test: Vector Floating Point Bogo Ops/s > Higher Is Better GCC 13 ... 83730.08 |================================== Clang 17 . 141522.24 |=========================================================