AMD EPYC 9575F HPC Tuning Guide

Benchmarks for a future article by Michael Larabel.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2411294-NE-AMDEPYC9542
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Additional Graphs

Show Perf Per Core/Thread Calculation Graphs Where Applicable

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
Default
November 29
  10 Hours, 49 Minutes
HPC Tuning Recommendations
November 29
  12 Hours, 25 Minutes
Invert Behavior (Only Show Selected Data)
  11 Hours, 37 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC 9575F HPC Tuning GuideOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9575F 64-Core @ 3.30GHz (64 Cores / 128 Threads)AMD EPYC 9575F 64-Core @ 3.30GHz (64 Cores)Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS)AMD 1Ah12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Broadcom NetXtreme BCM5720 PCIeUbuntu 24.106.12.0-rc7-linux-pm-next-phx (x86_64)GNOME Shell 47.0X ServerGCC 14.2.0ext41024x768ProcessorsMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionAMD EPYC 9575F HPC Tuning Guide BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 - Python 3.12.7- Default: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HPC Tuning Recommendations: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: disabled; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Default vs. HPC Tuning Recommendations ComparisonPhoronix Test SuiteBaseline+23.1%+23.1%+46.2%+46.2%+69.3%+69.3%92.3%91.7%83.5%83.4%83.3%82.7%79.5%65.7%44.1%32.3%30.4%29.8%27.9%25.3%24.1%21.7%18.8%18%16.6%13.5%13.5%9.6%9.5%8.8%7.8%6.4%5.9%5.8%5.7%5.4%5.2%QS8MobileNetV2FP16MobileNetV3SmallFP32MobileNetV3SmallFP16MobileNetV2FP32MobileNetV3LargeFP16MobileNetV3LargeFP32MobileNetV2FP16MobileNetV1FP32MobileNetV1X.b.i.id.L.M.S - Execution Timed.M.M.S - Execution Time256Bumper Beam144 144 144 - 60160 160 160 - 60d.S.M.S - Mesh Timec2c - FFTW - float - 256d.L.M.S - Mesh TimeChrysler Neon 1Mi.i.1.C.P.Dr2c - FFTW - float - 256262626Tomographic Modeld.M.M.S - Mesh TimeH.HB.S.o.W26c2c - FFTW - float - 128I.a.F.S.I.D.C4%C240 Buckyball3.9%e.G.B.S - 12002.2%MPI CPU - water_GMX50_bare2.1%XNNPACKXNNPACKXNNPACKXNNPACKXNNPACKXNNPACKXNNPACKXNNPACKXNNPACKXcompact3d Incompact3dOpenFOAMOpenFOAMlibxsmmOpenRadiossHigh Performance Conjugate GradientHigh Performance Conjugate GradientOpenFOAMHeFFTe - Highly Efficient FFT for ExascaleOpenFOAMOpenRadiossXcompact3d Incompact3dHeFFTe - Highly Efficient FFT for ExascaleGraph500Graph500Graph500SPECFEM3DOpenFOAMSPECFEM3DOpenRadiossGraph500HeFFTe - Highly Efficient FFT for ExascaleOpenRadiossNWChemeasyWaveGROMACSDefaultHPC Tuning Recommendations

AMD EPYC 9575F HPC Tuning Guidexnnpack: FP16MobileNetV3Smallxnnpack: FP32MobileNetV3Largexnnpack: FP16MobileNetV3Largexnnpack: FP32MobileNetV2openfoam: drivaerFastback, Large Mesh Size - Execution Timeopenfoam: drivaerFastback, Medium Mesh Size - Execution Timelibxsmm: 256openradioss: Bumper Beamhpcg: 160 160 160 - 60openfoam: drivaerFastback, Small Mesh Size - Mesh Timeheffte: c2c - FFTW - float - 256openfoam: drivaerFastback, Large Mesh Size - Mesh Timeopenradioss: Chrysler Neon 1Mincompact3d: input.i3d 193 Cells Per Directionheffte: r2c - FFTW - float - 256graph500: 26graph500: 26graph500: 26specfem3d: Tomographic Modelopenfoam: drivaerFastback, Medium Mesh Size - Mesh Timespecfem3d: Homogeneous Halfspaceopenradioss: Bird Strike on Windshieldgraph500: 26openradioss: INIVOL and Fluid Structure Interaction Drop Containernwchem: C240 Buckyballeasywave: e2Asean Grid + BengkuluSept2007 Source - 1200gromacs: MPI CPU - water_GMX50_bareopenradioss: Cell Phone Drop Testeasywave: e2Asean Grid + BengkuluSept2007 Source - 2400cp2k: H20-256openfoam: drivaerFastback, Small Mesh Size - Execution Timeopenradioss: Rubber O-Ring Seal Installationheffte: r2c - FFTW - float - 128gpaw: Carbon Nanotubexnnpack: QS8MobileNetV2xnnpack: FP16MobileNetV2xnnpack: FP16MobileNetV1xnnpack: FP32MobileNetV3Smallxnnpack: FP32MobileNetV1heffte: c2c - FFTW - float - 128incompact3d: X3D-benchmarking input.i3dhpcg: 144 144 144 - 60DefaultHPC Tuning Recommendations54497258704647588380.5323283.11872422.766.0940.348118.974323178.295603.69492125.147.95593405374.902133529000013055600006001530007.31000888884.1193269.00769683082.7747037800090.691249.120.81414.53317.7651.643137.80224.57934438.44332.01327.85750324681246753982429223.464325.05256841.094028423959385626516426.0644218.127753098.952.7349.090815.967165210.419517.58403110.257.01252317411.072146189000014200900006469420006.87288497479.4680328.51437893178.2749571100094.311298.221.26514.22918.0752.469135.94524.25671438.33332.91327.83726172553148929411686235.079245.67947551.0171OpenBenchmarking.org

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3SmallHPC Tuning RecommendationsDefault12002400360048006000SE +/- 15.58, N = 10SE +/- 65.16, N = 3284254491. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3LargeHPC Tuning RecommendationsDefault16003200480064008000SE +/- 23.40, N = 10SE +/- 43.97, N = 3395972581. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3LargeHPC Tuning RecommendationsDefault15003000450060007500SE +/- 55.41, N = 10SE +/- 11.89, N = 3385670461. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2HPC Tuning RecommendationsDefault10002000300040005000SE +/- 49.78, N = 10SE +/- 24.31, N = 3265147581. (CXX) g++ options: -O3 -lrt -lm

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Large Mesh Size - Execution TimeHPC Tuning RecommendationsDefault2K4K6K8K10K6426.068380.531. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution TimeHPC Tuning RecommendationsDefault60120180240300218.13283.121. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HPC Tuning RecommendationsDefault7001400210028003500SE +/- 23.61, N = 3SE +/- 4.19, N = 33098.92422.71. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Bumper BeamHPC Tuning RecommendationsDefault1530456075SE +/- 0.19, N = 3SE +/- 0.61, N = 1552.7366.09

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HPC Tuning RecommendationsDefault1122334455SE +/- 0.11, N = 3SE +/- 0.27, N = 349.0940.351. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Mesh TimeHPC Tuning RecommendationsDefault51015202515.9718.971. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HPC Tuning RecommendationsDefault50100150200250SE +/- 0.34, N = 13SE +/- 2.31, N = 15210.42178.301. (CXX) g++ options: -O3

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Large Mesh Size - Mesh TimeHPC Tuning RecommendationsDefault130260390520650517.58603.691. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Chrysler Neon 1MHPC Tuning RecommendationsDefault306090120150SE +/- 0.10, N = 3SE +/- 1.75, N = 12110.25125.14

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionHPC Tuning RecommendationsDefault246810SE +/- 0.02216234, N = 6SE +/- 0.05610512, N = 57.012523177.955934051. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256HPC Tuning RecommendationsDefault90180270360450SE +/- 2.78, N = 15SE +/- 2.51, N = 13411.07374.901. (CXX) g++ options: -O3

Graph500

This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbfs max_TEPS, More Is BetterGraph500 3.0Scale: 26HPC Tuning RecommendationsDefault300M600M900M1200M1500M146189000013352900001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenBenchmarking.orgbfs median_TEPS, More Is BetterGraph500 3.0Scale: 26HPC Tuning RecommendationsDefault300M600M900M1200M1500M142009000013055600001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenBenchmarking.orgsssp max_TEPS, More Is BetterGraph500 3.0Scale: 26HPC Tuning RecommendationsDefault140M280M420M560M700M6469420006001530001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Tomographic ModelHPC Tuning RecommendationsDefault246810SE +/- 0.017950948, N = 5SE +/- 0.069507248, N = 66.8728849747.3100088881. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Mesh TimeHPC Tuning RecommendationsDefault2040608010079.4784.121. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Homogeneous HalfspaceHPC Tuning RecommendationsDefault3691215SE +/- 0.018784174, N = 5SE +/- 0.067842113, N = 58.5143789319.0076968301. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Bird Strike on WindshieldHPC Tuning RecommendationsDefault20406080100SE +/- 0.26, N = 3SE +/- 0.12, N = 378.2782.77

Graph500

This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsssp median_TEPS, More Is BetterGraph500 3.0Scale: 26HPC Tuning RecommendationsDefault110M220M330M440M550M4957110004703780001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: INIVOL and Fluid Structure Interaction Drop ContainerDefaultHPC Tuning Recommendations20406080100SE +/- 0.35, N = 3SE +/- 0.16, N = 390.6994.31

NWChem

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.2.3Input: C240 BuckyballDefaultHPC Tuning Recommendations300600900120015001249.11298.21. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lfcidump -lgwmol -lga -larmci -lpeigs -l64to32 -llapack -lopenblas -lpthread -lrt -lcomex -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -ffast-math -std=legacy -fdefault-integer-8 -O0

easyWave

The easyWave software allows simulating tsunami generation and propagation in the context of early warning systems. EasyWave supports making use of OpenMP for CPU multi-threading and there are also GPU ports available but not currently incorporated as part of this test profile. The easyWave tsunami generation software is run with one of the example/reference input files for measuring the CPU execution time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200DefaultHPC Tuning Recommendations510152025SE +/- 0.23, N = 3SE +/- 0.16, N = 320.8121.271. (CXX) g++ options: -O3 -fopenmp

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareDefaultHPC Tuning Recommendations48121620SE +/- 0.00, N = 3SE +/- 0.01, N = 314.5314.231. (CXX) g++ options: -O3 -lm

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Cell Phone Drop TestDefaultHPC Tuning Recommendations48121620SE +/- 0.16, N = 3SE +/- 0.02, N = 317.7618.07

easyWave

The easyWave software allows simulating tsunami generation and propagation in the context of early warning systems. EasyWave supports making use of OpenMP for CPU multi-threading and there are also GPU ports available but not currently incorporated as part of this test profile. The easyWave tsunami generation software is run with one of the example/reference input files for measuring the CPU execution time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400DefaultHPC Tuning Recommendations1224364860SE +/- 0.13, N = 3SE +/- 0.49, N = 1551.6452.471. (CXX) g++ options: -O3 -fopenmp

CP2K Molecular Dynamics

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 2024.3Input: H20-256HPC Tuning RecommendationsDefault306090120150SE +/- 0.47, N = 3SE +/- 0.29, N = 3135.95137.801. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution TimeHPC Tuning RecommendationsDefault61218243024.2624.581. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Rubber O-Ring Seal InstallationHPC Tuning RecommendationsDefault918273645SE +/- 0.02, N = 3SE +/- 0.41, N = 438.3338.44

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128HPC Tuning RecommendationsDefault70140210280350SE +/- 3.79, N = 15SE +/- 1.37, N = 14332.91332.011. (CXX) g++ options: -O3

GPAW

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 23.6Input: Carbon NanotubeHPC Tuning RecommendationsDefault714212835SE +/- 0.07, N = 3SE +/- 0.17, N = 327.8427.861. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

XNNPACK

MinAvgMaxHPC Tuning Recommendations97.7353.6403.1Default94.9438.7489.6OpenBenchmarking.orgWatts, Fewer Is BetterXNNPACK b7b048System Power Consumption Monitor130260390520650

MinAvgMaxHPC Tuning Recommendations1.9224.7253.8Default126.0294.5327.8OpenBenchmarking.orgWatts, Fewer Is BetterXNNPACK b7b048CPU Power Consumption Monitor80160240320400

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2HPC Tuning RecommendationsDefault11002200330044005500SE +/- 75.96, N = 10SE +/- 15.76, N = 3261750321. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2HPC Tuning RecommendationsDefault10002000300040005000SE +/- 73.59, N = 10SE +/- 14.43, N = 3255346811. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1HPC Tuning RecommendationsDefault5001000150020002500SE +/- 75.71, N = 10SE +/- 1.45, N = 3148924671. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3SmallHPC Tuning RecommendationsDefault12002400360048006000SE +/- 99.39, N = 10SE +/- 13.20, N = 3294153981. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1HPC Tuning RecommendationsDefault5001000150020002500SE +/- 170.15, N = 10SE +/- 5.13, N = 3168624291. (CXX) g++ options: -O3 -lrt -lm

HeFFTe - Highly Efficient FFT for Exascale

MinAvgMaxDefault95.4105.9168.3HPC Tuning Recommendations95.2128.0236.1OpenBenchmarking.orgWatts, Fewer Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4System Power Consumption Monitor60120180240300

MinAvgMaxHPC Tuning Recommendations1.454.672.9Default56.761.671.7OpenBenchmarking.orgWatts, Fewer Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4CPU Power Consumption Monitor20406080100

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128HPC Tuning RecommendationsDefault50100150200250SE +/- 3.65, N = 15SE +/- 0.74, N = 14235.08223.461. (CXX) g++ options: -O3

Xcompact3d Incompact3d

MinAvgMaxHPC Tuning Recommendations101581613Default106642685OpenBenchmarking.orgWatts, Fewer Is BetterXcompact3d Incompact3d 2021-03-11System Power Consumption Monitor2004006008001000

MinAvgMaxHPC Tuning Recommendations1.0333.4344.1Default0.1387.2401.6OpenBenchmarking.orgWatts, Fewer Is BetterXcompact3d Incompact3d 2021-03-11CPU Power Consumption Monitor110220330440550

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: X3D-benchmarking input.i3dHPC Tuning RecommendationsDefault70140210280350SE +/- 3.22, N = 9SE +/- 7.62, N = 9245.68325.051. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

High Performance Conjugate Gradient

MinAvgMaxHPC Tuning Recommendations100578611Default101639679OpenBenchmarking.orgWatts, Fewer Is BetterHigh Performance Conjugate Gradient 3.1System Power Consumption Monitor2004006008001000

MinAvgMaxHPC Tuning Recommendations25.4336.6347.9Default83.7387.3402.2OpenBenchmarking.orgWatts, Fewer Is BetterHigh Performance Conjugate Gradient 3.1CPU Power Consumption Monitor110220330440550

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HPC Tuning RecommendationsDefault1224364860SE +/- 1.04, N = 12SE +/- 0.58, N = 951.0241.091. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi