AMD EPYC 9575F HPC Tuning Recommendations

Benchmarks for a future article by Michael Larabel.

Default

Processor: AMD EPYC 9575F 64-Core @ 3.30GHz (64 Cores / 128 Threads), Motherboard: Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS), Chipset: AMD 1Ah, Memory: 12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF, Disk: 3201GB Micron_7450_MTFDKCB3T2TFS, Graphics: ASPEED, Network: 2 x Broadcom NetXtreme BCM5720 PCIe

OS: Ubuntu 24.10, Kernel: 6.12.0-rc7-linux-pm-next-phx (x86_64), Desktop: GNOME Shell 47.0, Display Server: X Server, Compiler: GCC 14.2.0, File-System: ext4, Screen Resolution: 1024x768

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116
Python Notes: Python 3.12.7
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

HPC Tuning Recommendations

Changed Processor to AMD EPYC 9575F 64-Core @ 3.30GHz (64 Cores).

Security Change: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: disabled; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Graph500

This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.

Algebraic Multi-Grid Benchmark

AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

ACES DGEMM

Result

CPU Power Consumption

System Power Consumption

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Laghos

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

NAMD

Result

CPU Power Consumption

System Power Consumption

LAMMPS Molecular Dynamics Simulator

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Graph500

Result

CPU Power Consumption

System Power Consumption

CP2K Molecular Dynamics

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

easyWave

The easyWave software allows simulating tsunami generation and propagation in the context of early warning systems. EasyWave supports making use of OpenMP for CPU multi-threading and there are also GPU ports available but not currently incorporated as part of this test profile. The easyWave tsunami generation software is run with one of the example/reference input files for measuring the CPU execution time. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

GPAW

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

NWChem

Result

CPU Power Consumption

System Power Consumption

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

Result

CPU Power Consumption

System Power Consumption

XNNPACK

Result

CPU Power Consumption

System Power Consumption

65 Results Shown

Graph500:
26:
bfs max_TEPS
bfs median_TEPS
Algebraic Multi-Grid Benchmark
Quicksilver:
CORAL2 P1
CORAL2 P2
CTS2
High Performance Conjugate Gradient:
144 144 144 - 60
160 160 160 - 60
ACES DGEMM
HeFFTe - Highly Efficient FFT for Exascale:
r2c - FFTW - float - 128
r2c - FFTW - float - 256
c2c - FFTW - float - 128
c2c - FFTW - float - 256
libxsmm:
32
64
128
256
Laghos:
Sedov Blast Wave, ube_922_hex.mesh
Triple Point Problem
GROMACS
NAMD
LAMMPS Molecular Dynamics Simulator
Graph500:
sssp max_TEPS
sssp median_TEPS
CP2K Molecular Dynamics:
Fayalite-FIST
H20-64
H20-256
easyWave:
e2Asean Grid + BengkuluSept2007 Source - 1200
e2Asean Grid + BengkuluSept2007 Source - 2400
SPECFEM3D:
Layered Halfspace
Water-layered Halfspace
Homogeneous Halfspace
Mount St. Helens
Tomographic Model
OpenRadioss:
Bird Strike on Windshield
Rubber O-Ring Seal Installation
Cell Phone Drop Test
Bumper Beam
INIVOL and Fluid Structure Interaction Drop Container
Chrysler Neon 1M
OpenFOAM:
drivaerFastback, Small Mesh Size - Mesh Time
drivaerFastback, Small Mesh Size - Execution Time
drivaerFastback, Medium Mesh Size - Mesh Time
drivaerFastback, Medium Mesh Size - Execution Time
drivaerFastback, Large Mesh Size - Mesh Time
drivaerFastback, Large Mesh Size - Execution Time
Xcompact3d Incompact3d:
input.i3d 193 Cells Per Direction
X3D-benchmarking input.i3d
GPAW
NWChem
QMCPACK:
simple-H2O
Li2_STO_ae
FeCO6_b3lyp_gms
O_ae_pyscf_UHF
LiH_ae_MSD
H4_ae
XNNPACK:
FP32MobileNetV1
FP32MobileNetV2
FP32MobileNetV3Large
FP32MobileNetV3Small
FP16MobileNetV1
FP16MobileNetV2
FP16MobileNetV3Large
FP16MobileNetV3Small
QS8MobileNetV2

Default

Testing initiated at 29 November 2024 02:36 by user phoronix.

HPC Tuning Recommendations

Processor: AMD EPYC 9575F 64-Core @ 3.30GHz (64 Cores), Motherboard: Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS), Chipset: AMD 1Ah, Memory: 12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF, Disk: 3201GB Micron_7450_MTFDKCB3T2TFS, Graphics: ASPEED, Network: 2 x Broadcom NetXtreme BCM5720 PCIe

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116
Python Notes: Python 3.12.7
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: disabled; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 29 November 2024 12:04 by user phoronix.

AMD EPYC 9575F HPC Tuning Recommendations

View

Statistics

Graph Settings

Additional Graphs

Multi-Way Comparison

Table

Sensor Monitoring

Run Management

Default

HPC Tuning Recommendations

Graph500

Algebraic Multi-Grid Benchmark

Quicksilver

High Performance Conjugate Gradient

ACES DGEMM

HeFFTe - Highly Efficient FFT for Exascale

libxsmm

Laghos

GROMACS

NAMD

LAMMPS Molecular Dynamics Simulator

Graph500

CP2K Molecular Dynamics

easyWave

SPECFEM3D

OpenRadioss

OpenFOAM

Xcompact3d Incompact3d

GPAW

NWChem

QMCPACK

XNNPACK

65 Results Shown

Default

HPC Tuning Recommendations