NVIDIA GH200 Compilers Clang and GCC benchmarks by Michael Larabel for a future article. ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402098-NE-NVIDIAGH291&grs&rdt .
NVIDIA GH200 Compilers Processor Motherboard Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution GCC 13 Clang 17 ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 ASPEED 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 23.10 6.8.0-060800rc3daily20240208-generic-64k (aarch64) GCC 13.2.0 ext4 1920x1200 Clang 17.0.2 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Environment Details - CXXFLAGS="-O3 -mtune=neoverse-v2 -mcpu=neoverse-v2" CFLAGS="-O3 -mtune=neoverse-v2 -mcpu=neoverse-v2" Compiler Details - GCC 13: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
NVIDIA GH200 Compilers graphics-magick: Sharpen stress-ng: Vector Floating Point liquid-dsp: 32 - 256 - 32 liquid-dsp: 1 - 256 - 32 liquid-dsp: 64 - 256 - 32 liquid-dsp: 72 - 256 - 32 graphics-magick: Enhanced liquid-dsp: 32 - 256 - 57 liquid-dsp: 1 - 256 - 57 liquid-dsp: 64 - 256 - 57 liquid-dsp: 72 - 256 - 57 graphics-magick: Noise-Gaussian minibude: OpenMP - BM1 minibude: OpenMP - BM1 minibude: OpenMP - BM2 minibude: OpenMP - BM2 helsing: 14 digit webp: Quality 100, Highest Compression compress-zstd: 19 - Decompression Speed avifenc: 2 compress-zstd: 19, Long Mode - Decompression Speed stress-ng: Vector Math encode-mp3: WAV To MP3 c-ray: Total Time - 4K, 16 Rays Per Pixel avifenc: 0 webp: Default graphics-magick: HWB Color Space quantlib: Single-Threaded tscp: AI Chess Performance webp: Quality 100 webp: Quality 100, Lossless, Highest Compression liquid-dsp: 64 - 256 - 512 quantlib: Multi-Threaded liquid-dsp: 72 - 256 - 512 liquid-dsp: 32 - 256 - 512 stress-ng: Matrix Math liquid-dsp: 1 - 256 - 512 encode-opus: WAV To Opus Encode encode-flac: WAV To FLAC graphics-magick: Rotate lammps: 20k Atoms stress-ng: Fused Multiply-Add webp: Quality 100, Lossless graphics-magick: Swirl lammps: Rhodopsin Protein graphics-magick: Resizing compress-zstd: 19, Long Mode - Compression Speed compress-zstd: 19 - Compression Speed stress-ng: Floating Point primesieve: 1e12 primesieve: 1e13 lulesh: avifenc: 6, Lossless avifenc: 10, Lossless securemark: SecureMark-TLS stress-ng: Vector Shuffle stress-ng: CPU Cache GCC 13 Clang 17 882 83730.08 1362833333 45523000 2671533333 2952333333 2170 795033333 26470667 1587900000 1767800000 1920 1193.878 47.755 48.041 1201.027 68.102 3.86 1237.4 67.073 1283.6 387369.41 5.474 6.006 109.674 13.95 4731 3456.0 2078407 9.44 0.52 191886667 232068.2 215500000 95931000 515044.15 3194567 33.037 16.872 1764 48.233 161511818.72 1.28 3672 55.360 8044 8.57 14.7 19830.04 2.891 35.191 48090.643 3.754 2.851 265718 71014.33 949580.78 1761 141522.24 2066233333 68921000 4032833333 4386000000 1542 1096366667 36488333 2182700000 2406400000 1440 1551.880 62.075 60.979 1524.466 84.332 4.66 1031.1 79.371 1094.4 450187.94 6.287 6.749 122.871 15.42 4360 3740.4 2248073 10.19 0.56 206270000 249451.3 231370000 102853333 550915.68 3414867 31.420 16.078 1820 49.521 157339813.59 1.31 3748 56.335 7919 8.70 14.9 19566.28 2.929 35.597 47590.091 3.717 2.871 267498 932492.75 OpenBenchmarking.org
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Sharpen GCC 13 Clang 17 400 800 1200 1600 2000 SE +/- 1.86, N = 3 SE +/- 11.15, N = 15 882 1761 1. (CC) gcc options: -fopenmp -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lzstd -lm -lpthread
Stress-NG Test: Vector Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Vector Floating Point GCC 13 Clang 17 30K 60K 90K 120K 150K SE +/- 275.28, N = 3 SE +/- 38.13, N = 3 83730.08 141522.24
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 GCC 13 Clang 17 400M 800M 1200M 1600M 2000M SE +/- 1337078.07, N = 3 SE +/- 993870.10, N = 3 1362833333 2066233333 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 32 GCC 13 Clang 17 15M 30M 45M 60M 75M SE +/- 30138.57, N = 3 SE +/- 6082.76, N = 3 45523000 68921000 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 32 GCC 13 Clang 17 900M 1800M 2700M 3600M 4500M SE +/- 18636374.23, N = 3 SE +/- 37196788.99, N = 3 2671533333 4032833333 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 72 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 32 GCC 13 Clang 17 900M 1800M 2700M 3600M 4500M SE +/- 13903516.74, N = 3 SE +/- 26602506.15, N = 3 2952333333 4386000000 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
GraphicsMagick Operation: Enhanced OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Enhanced GCC 13 Clang 17 500 1000 1500 2000 2500 SE +/- 11.57, N = 3 SE +/- 14.05, N = 3 2170 1542 1. (CC) gcc options: -fopenmp -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lzstd -lm -lpthread
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 GCC 13 Clang 17 200M 400M 600M 800M 1000M SE +/- 153441.99, N = 3 SE +/- 120185.04, N = 3 795033333 1096366667 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 57 GCC 13 Clang 17 8M 16M 24M 32M 40M SE +/- 14240.01, N = 3 SE +/- 1666.67, N = 3 26470667 36488333 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 GCC 13 Clang 17 500M 1000M 1500M 2000M 2500M SE +/- 57735.03, N = 3 SE +/- 4628534.69, N = 3 1587900000 2182700000 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 72 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 57 GCC 13 Clang 17 500M 1000M 1500M 2000M 2500M SE +/- 5356304.70, N = 3 SE +/- 8154140.05, N = 3 1767800000 2406400000 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
GraphicsMagick Operation: Noise-Gaussian OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Noise-Gaussian GCC 13 Clang 17 400 800 1200 1600 2000 SE +/- 0.67, N = 3 SE +/- 4.67, N = 3 1920 1440 1. (CC) gcc options: -fopenmp -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lzstd -lm -lpthread
miniBUDE Implementation: OpenMP - Input Deck: BM1 OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 GCC 13 Clang 17 300 600 900 1200 1500 SE +/- 2.51, N = 3 SE +/- 1.53, N = 3 1193.88 1551.88 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -mcpu=native -lm
miniBUDE Implementation: OpenMP - Input Deck: BM1 OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 GCC 13 Clang 17 14 28 42 56 70 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 47.76 62.08 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -mcpu=native -lm
miniBUDE Implementation: OpenMP - Input Deck: BM2 OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 GCC 13 Clang 17 14 28 42 56 70 SE +/- 0.09, N = 3 SE +/- 0.68, N = 3 48.04 60.98 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -mcpu=native -lm
miniBUDE Implementation: OpenMP - Input Deck: BM2 OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 GCC 13 Clang 17 300 600 900 1200 1500 SE +/- 2.12, N = 3 SE +/- 16.91, N = 3 1201.03 1524.47 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -mcpu=native -lm
Helsing Digit Range: 14 digit OpenBenchmarking.org Seconds, Fewer Is Better Helsing 1.0-beta Digit Range: 14 digit GCC 13 Clang 17 20 40 60 80 100 SE +/- 0.40, N = 15 SE +/- 0.72, N = 3 68.10 84.33 1. (CC) gcc options: -O2 -pthread
WebP Image Encode Encode Settings: Quality 100, Highest Compression OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Highest Compression GCC 13 Clang 17 1.0485 2.097 3.1455 4.194 5.2425 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 3.86 4.66 1. (CC) gcc options: -fvisibility=hidden -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
Zstd Compression Compression Level: 19 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 19 - Decompression Speed GCC 13 Clang 17 300 600 900 1200 1500 SE +/- 13.01, N = 3 SE +/- 4.91, N = 3 1237.4 1031.1 -Qunused-arguments 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lz -llzma
libavif avifenc Encoder Speed: 2 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 1.0 Encoder Speed: 2 GCC 13 Clang 17 20 40 60 80 100 SE +/- 0.03, N = 3 SE +/- 0.27, N = 3 67.07 79.37 1. (CXX) g++ options: -O3 -fPIC -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
Zstd Compression Compression Level: 19, Long Mode - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 19, Long Mode - Decompression Speed GCC 13 Clang 17 300 600 900 1200 1500 SE +/- 3.15, N = 3 SE +/- 2.48, N = 3 1283.6 1094.4 -Qunused-arguments 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lz -llzma
Stress-NG Test: Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Vector Math GCC 13 Clang 17 100K 200K 300K 400K 500K SE +/- 25.00, N = 3 SE +/- 100.06, N = 3 387369.41 450187.94
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 GCC 13 Clang 17 2 4 6 8 10 SE +/- 0.003, N = 3 SE +/- 0.001, N = 3 5.474 6.287 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -lncurses 1. (CC) gcc options: -O3 -pipe -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
C-Ray Total Time - 4K, 16 Rays Per Pixel OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel GCC 13 Clang 17 2 4 6 8 10 SE +/- 0.003, N = 3 SE +/- 0.009, N = 3 6.006 6.749 1. (CC) gcc options: -lm -lpthread -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2
libavif avifenc Encoder Speed: 0 OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 1.0 Encoder Speed: 0 GCC 13 Clang 17 30 60 90 120 150 SE +/- 0.43, N = 3 SE +/- 0.34, N = 3 109.67 122.87 1. (CXX) g++ options: -O3 -fPIC -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
WebP Image Encode Encode Settings: Default OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Default GCC 13 Clang 17 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 13.95 15.42 1. (CC) gcc options: -fvisibility=hidden -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
GraphicsMagick Operation: HWB Color Space OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: HWB Color Space GCC 13 Clang 17 1000 2000 3000 4000 5000 SE +/- 41.68, N = 3 SE +/- 16.18, N = 3 4731 4360 1. (CC) gcc options: -fopenmp -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lzstd -lm -lpthread
QuantLib Configuration: Single-Threaded OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.32 Configuration: Single-Threaded GCC 13 Clang 17 800 1600 2400 3200 4000 SE +/- 35.85, N = 3 SE +/- 37.89, N = 6 3456.0 3740.4 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance GCC 13 Clang 17 500K 1000K 1500K 2000K 2500K SE +/- 0.00, N = 5 SE +/- 0.00, N = 5 2078407 2248073 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -march=native
WebP Image Encode Encode Settings: Quality 100 OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100 GCC 13 Clang 17 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 9.44 10.19 1. (CC) gcc options: -fvisibility=hidden -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
WebP Image Encode Encode Settings: Quality 100, Lossless, Highest Compression OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless, Highest Compression GCC 13 Clang 17 0.126 0.252 0.378 0.504 0.63 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.52 0.56 1. (CC) gcc options: -fvisibility=hidden -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 GCC 13 Clang 17 40M 80M 120M 160M 200M SE +/- 67659.28, N = 3 SE +/- 37859.39, N = 3 191886667 206270000 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
QuantLib Configuration: Multi-Threaded OpenBenchmarking.org MFLOPS, More Is Better QuantLib 1.32 Configuration: Multi-Threaded GCC 13 Clang 17 50K 100K 150K 200K 250K SE +/- 3083.44, N = 3 SE +/- 2974.17, N = 3 232068.2 249451.3 1. (CXX) g++ options: -O3 -march=native -fPIE -pie
Liquid-DSP Threads: 72 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 72 - Buffer Length: 256 - Filter Length: 512 GCC 13 Clang 17 50M 100M 150M 200M 250M SE +/- 79372.54, N = 3 SE +/- 91651.51, N = 3 215500000 231370000 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 512 GCC 13 Clang 17 20M 40M 60M 80M 100M SE +/- 129816.54, N = 3 SE +/- 196751.39, N = 3 95931000 102853333 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Stress-NG Test: Matrix Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Matrix Math GCC 13 Clang 17 120K 240K 360K 480K 600K SE +/- 2880.52, N = 3 SE +/- 356.30, N = 3 515044.15 550915.68
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 512 GCC 13 Clang 17 700K 1400K 2100K 2800K 3500K SE +/- 1117.04, N = 3 SE +/- 2635.86, N = 3 3194567 3414867 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lm -lc -lliquid
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.4 WAV To Opus Encode GCC 13 Clang 17 8 16 24 32 40 SE +/- 0.02, N = 5 SE +/- 0.01, N = 5 33.04 31.42 1. (CXX) g++ options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -fvisibility=hidden -logg -lm
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.4 WAV To FLAC GCC 13 Clang 17 4 8 12 16 20 SE +/- 0.19, N = 5 SE +/- 0.13, N = 9 16.87 16.08 1. (CXX) g++ options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -fvisibility=hidden -logg -lm
GraphicsMagick Operation: Rotate OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Rotate GCC 13 Clang 17 400 800 1200 1600 2000 SE +/- 9.35, N = 3 SE +/- 25.64, N = 3 1764 1820 1. (CC) gcc options: -fopenmp -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lzstd -lm -lpthread
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: 20k Atoms GCC 13 Clang 17 11 22 33 44 55 SE +/- 0.15, N = 3 SE +/- 0.56, N = 3 48.23 49.52 1. (CXX) g++ options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm -ldl
Stress-NG Test: Fused Multiply-Add OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Fused Multiply-Add GCC 13 Clang 17 30M 60M 90M 120M 150M SE +/- 1745267.88, N = 3 SE +/- 1087775.35, N = 15 161511818.72 157339813.59
WebP Image Encode Encode Settings: Quality 100, Lossless OpenBenchmarking.org MP/s, More Is Better WebP Image Encode 1.2.4 Encode Settings: Quality 100, Lossless GCC 13 Clang 17 0.2948 0.5896 0.8844 1.1792 1.474 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.28 1.31 1. (CC) gcc options: -fvisibility=hidden -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
GraphicsMagick Operation: Swirl OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Swirl GCC 13 Clang 17 800 1600 2400 3200 4000 SE +/- 5.24, N = 3 SE +/- 48.77, N = 3 3672 3748 1. (CC) gcc options: -fopenmp -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lzstd -lm -lpthread
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: Rhodopsin Protein GCC 13 Clang 17 13 26 39 52 65 SE +/- 0.12, N = 3 SE +/- 0.11, N = 3 55.36 56.34 1. (CXX) g++ options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm -ldl
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.38 Operation: Resizing GCC 13 Clang 17 2K 4K 6K 8K 10K SE +/- 39.89, N = 3 SE +/- 70.92, N = 3 8044 7919 1. (CC) gcc options: -fopenmp -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -ljbig -lwebp -lwebpmux -ltiff -lfreetype -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lxml2 -lz -lzstd -lm -lpthread
Zstd Compression Compression Level: 19, Long Mode - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 19, Long Mode - Compression Speed GCC 13 Clang 17 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 8.57 8.70 -Qunused-arguments 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lz -llzma
Zstd Compression Compression Level: 19 - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 19 - Compression Speed GCC 13 Clang 17 4 8 12 16 20 SE +/- 0.12, N = 3 SE +/- 0.17, N = 3 14.7 14.9 -Qunused-arguments 1. (CC) gcc options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -pthread -lz -llzma
Stress-NG Test: Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Floating Point GCC 13 Clang 17 4K 8K 12K 16K 20K SE +/- 0.79, N = 3 SE +/- 4.32, N = 3 19830.04 19566.28
Primesieve Length: 1e12 OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 8.0 Length: 1e12 GCC 13 Clang 17 0.659 1.318 1.977 2.636 3.295 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 2.891 2.929 1. (CXX) g++ options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2
Primesieve Length: 1e13 OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 8.0 Length: 1e13 GCC 13 Clang 17 8 16 24 32 40 SE +/- 0.42, N = 3 SE +/- 0.34, N = 3 35.19 35.60 1. (CXX) g++ options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 GCC 13 Clang 17 10K 20K 30K 40K 50K SE +/- 108.54, N = 3 SE +/- 108.91, N = 3 48090.64 47590.09 1. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi
libavif avifenc Encoder Speed: 6, Lossless OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 1.0 Encoder Speed: 6, Lossless GCC 13 Clang 17 0.8447 1.6894 2.5341 3.3788 4.2235 SE +/- 0.007, N = 3 SE +/- 0.008, N = 3 3.754 3.717 1. (CXX) g++ options: -O3 -fPIC -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
libavif avifenc Encoder Speed: 10, Lossless OpenBenchmarking.org Seconds, Fewer Is Better libavif avifenc 1.0 Encoder Speed: 10, Lossless GCC 13 Clang 17 0.646 1.292 1.938 2.584 3.23 SE +/- 0.006, N = 3 SE +/- 0.003, N = 3 2.851 2.871 1. (CXX) g++ options: -O3 -fPIC -mtune=neoverse-v2 -mcpu=neoverse-v2 -lm
SecureMark Benchmark: SecureMark-TLS OpenBenchmarking.org marks, More Is Better SecureMark 1.0.4 Benchmark: SecureMark-TLS GCC 13 Clang 17 60K 120K 180K 240K 300K SE +/- 525.41, N = 3 SE +/- 54.60, N = 3 265718 267498 1. (CC) gcc options: -pedantic -O3
Stress-NG Test: Vector Shuffle OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: Vector Shuffle GCC 13 15K 30K 45K 60K 75K SE +/- 173.05, N = 3 71014.33 1. (CXX) g++ options: -O3 -mtune=neoverse-v2 -mcpu=neoverse-v2 -std=gnu99 -U_FORTIFY_SOURCE -O2 -lc
Stress-NG Test: CPU Cache OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.16.04 Test: CPU Cache GCC 13 Clang 17 200K 400K 600K 800K 1000K SE +/- 34515.94, N = 15 SE +/- 44765.07, N = 15 949580.78 932492.75
Phoronix Test Suite v10.8.5