GCC Clang Compiler Benchmarks Zen 4 Threadripper

GCC and Clang compiler benchmarks by Michael Larabel for year end 2023 future article.

GCC 13.2

Processor: AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads), Motherboard: HP 8B24 (U65 Ver. 01.01.04 BIOS), Chipset: AMD Device 14a4, Memory: 128GB, Disk: 2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1, Graphics: NVIDIA RTX A4000 16GB, Audio: NVIDIA GA104 HD Audio, Monitor: ASUS VP28U, Network: Realtek RTL8111/8168/8411

OS: Ubuntu 23.10, Kernel: 6.5.0-14-generic (x86_64), Desktop: GNOME Shell 45.0, Display Server: X Server 1.21.1.7, Display Driver: NVIDIA 535.129.03, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.2.147, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 3840x2160

Kernel Notes: Transparent Huge Pages: madvise
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa108105
OpenCL Notes: GPU Compute Cores: 6144
Python Notes: Python 3.11.6
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Clang 17.0.2

OS: Ubuntu 23.10, Kernel: 6.5.0-14-generic (x86_64), Desktop: GNOME Shell 45.0, Display Server: X Server 1.21.1.7, Display Driver: NVIDIA 535.129.03, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.2.147, Compiler: Clang 17.0.2, File-System: ext4, Screen Resolution: 3840x2160

Kernel Notes: Transparent Huge Pages: madvise
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"
Processor Notes: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa108105
OpenCL Notes: GPU Compute Cores: 6144
Python Notes: Python 3.11.6
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Clang 18 23 Dec

OS: Ubuntu 23.10, Kernel: 6.5.0-14-generic (x86_64), Desktop: GNOME Shell 45.0, Display Server: X Server 1.21.1.7, Display Driver: NVIDIA 535.129.03, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.2.147, Compiler: Clang 18.0.0, File-System: ext4, Screen Resolution: 3840x2160

GCC 14 23 Dec

OS: Ubuntu 23.10, Kernel: 6.5.0-14-generic (x86_64), Desktop: GNOME Shell 45.0, Display Server: X Server 1.21.1.7, Display Driver: NVIDIA 535.129.03, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.2.147, Compiler: GCC 14.0.0 20231224, File-System: ext4, Screen Resolution: 3840x2160

Kernel Notes: Transparent Huge Pages: madvise
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"
Compiler Notes: --disable-multilib --enable-checking=release --enable-languages=c,c++
Processor Notes: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa108105
OpenCL Notes: GPU Compute Cores: 6144
Python Notes: Python 3.11.6
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ASTC Encoder

ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.

Preset: Medium

GCC 14 23 Dec: The test quit with a non-zero exit status. E: ./astcenc: 2: ./astc-encoder-4.0.0/build/Source/astcenc-avx2: not found

Preset: Thorough

GCC 14 23 Dec: The test quit with a non-zero exit status. E: ./astcenc: 2: ./astc-encoder-4.0.0/build/Source/astcenc-avx2: not found

Preset: Exhaustive

GCC 14 23 Dec: The test quit with a non-zero exit status. E: ./astcenc: 2: ./astc-encoder-4.0.0/build/Source/astcenc-avx2: not found

C-Blosc

C-Blosc (c-blosc2) simple, compressed, fast and persistent data store library for C that focuses on compression of binary data. Learn more via the OpenBenchmarking.org test page.

C-Ray

This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.

Coremark

This is a test of EEMBC CoreMark processor benchmark. Learn more via the OpenBenchmarking.org test page.

Crypto++

Crypto++ is a C++ class library of cryptographic algorithms. Learn more via the OpenBenchmarking.org test page.

Test: Keyed Algorithms

GCC 14 23 Dec: The test quit with a non-zero exit status. E: ./cryptest.exe: /lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by ./cryptest.exe)

Test: Unkeyed Algorithms

GCC 14 23 Dec: The test quit with a non-zero exit status. E: ./cryptest.exe: /lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by ./cryptest.exe)

FLAC Audio Encoding

This test times how long it takes to encode a sample WAV file to FLAC audio format ten times using the --best preset settings. Learn more via the OpenBenchmarking.org test page.

GPAW

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.

GraphicsMagick

This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

Implementation: MPI CPU - Input: water_GMX50_bare

GCC 14 23 Dec: The test quit with a non-zero exit status. E: /mpi-build/bin/gmx_mpi: /lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by mpi-build/bin/../lib/libgromacs_mpi.so.8)

John The Ripper

This is a benchmark of John The Ripper, which is a password cracker. Learn more via the OpenBenchmarking.org test page.

Kvazaar

This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.

LAME MP3 Encoding

LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.

LAMMPS Molecular Dynamics Simulator

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.

Model: 20k Atoms

GCC 14 23 Dec: The test quit with a non-zero exit status. E: ../b/lmp: /lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by ../b/lmp)

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

Backend: Eigen

GCC 14 23 Dec: The test quit with a non-zero exit status. E: ./lc0: /lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by ./lc0)

libavif avifenc

This is a test of the AOMedia libavif library testing the encoding of a JPEG image to AV1 Image Format (AVIF). Learn more via the OpenBenchmarking.org test page.

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

LZ4 Compression

This test measures the time needed to compress/decompress a sample file (an Ubuntu ISO) using LZ4 compression. Learn more via the OpenBenchmarking.org test page.

Memcached

Memcached is a high performance, distributed memory object caching system. This Memcached test profiles makes use of memtier_benchmark for excuting this CPU/memory-focused server benchmark. Learn more via the OpenBenchmarking.org test page.

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenJPEG

OpenJPEG is an open-source JPEG 2000 codec written in the C programming language. The default input for this test profile is the NASA/JPL-Caltech/MSSS Curiosity panorama 717MB TIFF image file converting to JPEG2000 format. Learn more via the OpenBenchmarking.org test page.

OpenSSL

OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

Opus Codec Encoding

Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus five times. Learn more via the OpenBenchmarking.org test page.

PETSc

PETSc, the Portable, Extensible Toolkit for Scientific Computation, is for the scalable (parallel) solution of scientific applications modeled by partial differential equations. This test profile runs the PETSc "make streams" benchmark and records the throughput rate when all available cores are utilized for the MPI Streams build. Learn more via the OpenBenchmarking.org test page.

Test: Streams

GCC 14 23 Dec: The test run did not produce a result. E: /usr/bin/ld: petsc-3.19.0/arch-linux-c-opt/lib/libpetsc.so: undefined reference to `__cxa_call_terminate'

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

POV-Ray

This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing. Learn more via the OpenBenchmarking.org test page.

Trace Time

GCC 14 23 Dec: The test quit with a non-zero exit status. E: ./povray: 3: ./unix/povray: not found

QuantLib

QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.

Configuration: Multi-Threaded

Clang 18 23 Dec: The test quit with a non-zero exit status. E: ./quantlib: line 3: ./test-suite/quantlib-benchmark: No such file or directory

Configuration: Single-Threaded

Clang 18 23 Dec: The test quit with a non-zero exit status. E: ./quantlib: line 3: ./test-suite/quantlib-benchmark: No such file or directory

Redis

Redis is an open-source in-memory data structure store, used as a database, cache, and message broker. Learn more via the OpenBenchmarking.org test page.

SecureMark

SecureMark is an objective, standardized benchmarking framework for measuring the efficiency of cryptographic processing solutions developed by EEMBC. SecureMark-TLS is benchmarking Transport Layer Security performance with a focus on IoT/edge computing. Learn more via the OpenBenchmarking.org test page.

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

Throughput Test: Kostya