AMD EPYC 9575F HPC Tuning Recommendations

Benchmarks for a future article by Michael Larabel.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2411294-NE-AMDEPYC9526
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Additional Graphs

Show Perf Per Core/Thread Calculation Graphs Where Applicable

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
Default
November 29
  11 Hours, 59 Minutes
HPC Tuning Recommendations
November 29
  13 Hours, 48 Minutes
Invert Behavior (Only Show Selected Data)
  12 Hours, 54 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC 9575F HPC Tuning RecommendationsOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9575F 64-Core @ 3.30GHz (64 Cores / 128 Threads)AMD EPYC 9575F 64-Core @ 3.30GHz (64 Cores)Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS)AMD 1Ah12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Broadcom NetXtreme BCM5720 PCIeUbuntu 24.106.12.0-rc7-linux-pm-next-phx (x86_64)GNOME Shell 47.0X ServerGCC 14.2.0ext41024x768ProcessorsMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionAMD EPYC 9575F HPC Tuning Recommendations PerformanceSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 - Python 3.12.7- Default: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HPC Tuning Recommendations: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: disabled; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Default vs. HPC Tuning Recommendations ComparisonPhoronix Test SuiteBaseline+23.1%+23.1%+46.2%+46.2%+69.3%+69.3%92.3%91.7%83.5%83.4%83.3%82.7%79.5%65.7%44.1%32.3%30.4%29.8%27.9%25.3%24.1%21.7%18.8%18%16.6%13.5%13.5%9.6%9.5%8.8%7.8%6.4%5.9%5.8%5.7%5.4%5.2%QS8MobileNetV2FP16MobileNetV3SmallFP32MobileNetV3SmallFP16MobileNetV2FP32MobileNetV3LargeFP16MobileNetV3LargeFP32MobileNetV23273%6469.6%12867.2%FP16MobileNetV1CORAL2 P264.2%CTS251.4%FP32MobileNetV1CORAL2 P132.4%X.b.i.id.L.M.S - Execution Timed.M.M.S - Execution Time256Bumper Beam144 144 144 - 60160 160 160 - 60d.S.M.S - Mesh Timec2c - FFTW - float - 256d.L.M.S - Mesh TimeS.w.1.0.6.A14.6%Chrysler Neon 1Mi.i.1.C.P.DS.F.P.R11.9%Layered Halfspace11.7%Mount St. Helens11.2%r2c - FFTW - float - 256262626FeCO6_b3lyp_gms6.8%Tomographic Modeld.M.M.S - Mesh TimeH.HB.S.o.W26c2c - FFTW - float - 128simple-H2O4.8%LiH_ae_MSD4.8%4.1%W.l.H4%Li2_STO_ae4%I.a.F.S.I.D.C4%C240 Buckyball3.9%20k Atoms3.5%Fayalite-FIST3.3%O_ae_pyscf_UHF3%S.B.W.u.m2.9%e.G.B.S - 12002.2%MPI CPU - water_GMX50_bare2.1%XNNPACKXNNPACKXNNPACKXNNPACKXNNPACKXNNPACKXNNPACKlibxsmmlibxsmmlibxsmmXNNPACKQuicksilverQuicksilverXNNPACKQuicksilverXcompact3d Incompact3dOpenFOAMOpenFOAMlibxsmmOpenRadiossHigh Performance Conjugate GradientHigh Performance Conjugate GradientOpenFOAMHeFFTe - Highly Efficient FFT for ExascaleOpenFOAMNAMDOpenRadiossXcompact3d Incompact3dACES DGEMMSPECFEM3DSPECFEM3DHeFFTe - Highly Efficient FFT for ExascaleGraph500Graph500Graph500QMCPACKSPECFEM3DOpenFOAMSPECFEM3DOpenRadiossGraph500HeFFTe - Highly Efficient FFT for ExascaleQMCPACKQMCPACKAlgebraic Multi-Grid BenchmarkSPECFEM3DQMCPACKOpenRadiossNWChemLAMMPS Molecular Dynamics SimulatorCP2K Molecular DynamicsQMCPACKLaghoseasyWaveGROMACSDefaultHPC Tuning Recommendations

AMD EPYC 9575F HPC Tuning Recommendationsgraph500: 26heffte: r2c - FFTW - float - 128graph500: 26graph500: 26graph500: 26openradioss: Bird Strike on Windshieldopenradioss: Cell Phone Drop Testopenradioss: Bumper Beamopenradioss: INIVOL and Fluid Structure Interaction Drop Containeropenradioss: Chrysler Neon 1Mxnnpack: FP32MobileNetV1xnnpack: FP32MobileNetV2heffte: r2c - FFTW - float - 256openradioss: Rubber O-Ring Seal Installationxnnpack: FP32MobileNetV3Largexnnpack: FP32MobileNetV3Smallspecfem3d: Layered Halfspaceheffte: c2c - FFTW - float - 256specfem3d: Water-layered Halfspacespecfem3d: Homogeneous Halfspacespecfem3d: Mount St. Helensheffte: c2c - FFTW - float - 128specfem3d: Tomographic Modelxnnpack: FP16MobileNetV1xnnpack: FP16MobileNetV2xnnpack: FP16MobileNetV3Largexnnpack: FP16MobileNetV3Smallxnnpack: QS8MobileNetV2gromacs: MPI CPU - water_GMX50_bareeasywave: e2Asean Grid + BengkuluSept2007 Source - 1200lammps: 20k Atomshpcg: 144 144 144 - 60hpcg: 160 160 160 - 60libxsmm: 32easywave: e2Asean Grid + BengkuluSept2007 Source - 2400namd: STMV with 1,066,628 Atomsmt-dgemm: Sustained Floating-Point Rateamg: nwchem: C240 Buckyballlibxsmm: 128libxsmm: 64openfoam: drivaerFastback, Small Mesh Size - Mesh Timequicksilver: CORAL2 P1openfoam: drivaerFastback, Small Mesh Size - Execution Timelibxsmm: 256openfoam: drivaerFastback, Medium Mesh Size - Mesh Timeopenfoam: drivaerFastback, Medium Mesh Size - Execution Timeopenfoam: drivaerFastback, Large Mesh Size - Mesh Timeopenfoam: drivaerFastback, Large Mesh Size - Execution Timelaghos: Sedov Blast Wave, ube_922_hex.meshlaghos: Triple Point Problemquicksilver: CTS2quicksilver: CORAL2 P2qmcpack: simple-H2Oqmcpack: Li2_STO_aeqmcpack: FeCO6_b3lyp_gmsqmcpack: O_ae_pyscf_UHFqmcpack: LiH_ae_MSDqmcpack: H4_aeincompact3d: input.i3d 193 Cells Per Directionincompact3d: X3D-benchmarking input.i3dgpaw: Carbon Nanotubecp2k: Fayalite-FISTcp2k: H20-64cp2k: H20-256DefaultHPC Tuning Recommendations1305560000332.013133529000047037800060015300082.7717.7666.0990.69125.1424294758374.90238.447258539816.548165482178.29515.7818447519.0076968306.206598110223.4647.3100088882467468170465449503214.53320.81453.20941.094040.3481948.751.6433.721995183.43969031908382501249.13358.01799.118.9743233661000024.5793442422.784.119326283.1187603.694928380.5323567.09295.58265100002466666719.29572.84169.256142.3152.3588.7027.95593405325.05256827.85762.27114.182137.8021420090000332.913146189000049571100064694200078.2718.0752.7394.31110.2516862651411.07238.333959294118.487096249210.41916.419279918.5143789316.901913424235.0796.8728849741489255338562842261714.22921.26551.42051.017149.0908548.452.4693.247734633.21771730645647501298.22007.91060.615.9671652765333324.2567143098.979.468032218.12775517.584036426.0644551.12293.03175133331502000020.22775.75373.994146.5854.8728.7547.01252317245.67947527.83764.35214.308135.945OpenBenchmarking.org

Graph500

This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbfs median_TEPS, More Is BetterGraph500 3.0Scale: 26DefaultHPC Tuning Recommendations300M600M900M1200M1500M130556000014200900001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128DefaultHPC Tuning Recommendations70140210280350SE +/- 1.37, N = 14SE +/- 3.79, N = 15332.01332.911. (CXX) g++ options: -O3

Graph500

This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbfs max_TEPS, More Is BetterGraph500 3.0Scale: 26DefaultHPC Tuning Recommendations300M600M900M1200M1500M133529000014618900001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenBenchmarking.orgsssp median_TEPS, More Is BetterGraph500 3.0Scale: 26DefaultHPC Tuning Recommendations110M220M330M440M550M4703780004957110001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenBenchmarking.orgsssp max_TEPS, More Is BetterGraph500 3.0Scale: 26DefaultHPC Tuning Recommendations140M280M420M560M700M6001530006469420001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Bird Strike on WindshieldDefaultHPC Tuning Recommendations20406080100SE +/- 0.12, N = 3SE +/- 0.26, N = 382.7778.27

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Cell Phone Drop TestDefaultHPC Tuning Recommendations48121620SE +/- 0.16, N = 3SE +/- 0.02, N = 317.7618.07

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Bumper BeamDefaultHPC Tuning Recommendations1530456075SE +/- 0.61, N = 15SE +/- 0.19, N = 366.0952.73

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: INIVOL and Fluid Structure Interaction Drop ContainerDefaultHPC Tuning Recommendations20406080100SE +/- 0.35, N = 3SE +/- 0.16, N = 390.6994.31

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Chrysler Neon 1MDefaultHPC Tuning Recommendations306090120150SE +/- 1.75, N = 12SE +/- 0.10, N = 3125.14110.25

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1DefaultHPC Tuning Recommendations5001000150020002500SE +/- 5.13, N = 3SE +/- 170.15, N = 10242916861. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2DefaultHPC Tuning Recommendations10002000300040005000SE +/- 24.31, N = 3SE +/- 49.78, N = 10475826511. (CXX) g++ options: -O3 -lrt -lm

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256DefaultHPC Tuning Recommendations90180270360450SE +/- 2.51, N = 13SE +/- 2.78, N = 15374.90411.071. (CXX) g++ options: -O3

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Rubber O-Ring Seal InstallationDefaultHPC Tuning Recommendations918273645SE +/- 0.41, N = 4SE +/- 0.02, N = 338.4438.33

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3LargeDefaultHPC Tuning Recommendations16003200480064008000SE +/- 43.97, N = 3SE +/- 23.40, N = 10725839591. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3SmallDefaultHPC Tuning Recommendations12002400360048006000SE +/- 13.20, N = 3SE +/- 99.39, N = 10539829411. (CXX) g++ options: -O3 -lrt -lm

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Layered HalfspaceDefaultHPC Tuning Recommendations510152025SE +/- 0.18, N = 3SE +/- 0.04, N = 316.5518.491. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256DefaultHPC Tuning Recommendations50100150200250SE +/- 2.31, N = 15SE +/- 0.34, N = 13178.30210.421. (CXX) g++ options: -O3

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Water-layered HalfspaceDefaultHPC Tuning Recommendations48121620SE +/- 0.14, N = 3SE +/- 0.09, N = 315.7816.421. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Homogeneous HalfspaceDefaultHPC Tuning Recommendations3691215SE +/- 0.067842113, N = 5SE +/- 0.018784174, N = 59.0076968308.5143789311. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Mount St. HelensDefaultHPC Tuning Recommendations246810SE +/- 0.051223682, N = 9SE +/- 0.013455303, N = 56.2065981106.9019134241. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.4Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128DefaultHPC Tuning Recommendations50100150200250SE +/- 0.74, N = 14SE +/- 3.65, N = 15223.46235.081. (CXX) g++ options: -O3

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Tomographic ModelDefaultHPC Tuning Recommendations246810SE +/- 0.069507248, N = 6SE +/- 0.017950948, N = 57.3100088886.8728849741. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1DefaultHPC Tuning Recommendations5001000150020002500SE +/- 1.45, N = 3SE +/- 75.71, N = 10246714891. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2DefaultHPC Tuning Recommendations10002000300040005000SE +/- 14.43, N = 3SE +/- 73.59, N = 10468125531. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3LargeDefaultHPC Tuning Recommendations15003000450060007500SE +/- 11.89, N = 3SE +/- 55.41, N = 10704638561. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3SmallDefaultHPC Tuning Recommendations12002400360048006000SE +/- 65.16, N = 3SE +/- 15.58, N = 10544928421. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2DefaultHPC Tuning Recommendations11002200330044005500SE +/- 15.76, N = 3SE +/- 75.96, N = 10503226171. (CXX) g++ options: -O3 -lrt -lm

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareDefaultHPC Tuning Recommendations48121620SE +/- 0.00, N = 3SE +/- 0.01, N = 314.5314.231. (CXX) g++ options: -O3 -lm

easyWave

The easyWave software allows simulating tsunami generation and propagation in the context of early warning systems. EasyWave supports making use of OpenMP for CPU multi-threading and there are also GPU ports available but not currently incorporated as part of this test profile. The easyWave tsunami generation software is run with one of the example/reference input files for measuring the CPU execution time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200DefaultHPC Tuning Recommendations510152025SE +/- 0.23, N = 3SE +/- 0.16, N = 320.8121.271. (CXX) g++ options: -O3 -fopenmp

LAMMPS Molecular Dynamics Simulator

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgns/day, More Is BetterLAMMPS Molecular Dynamics Simulator 23Jun2022Model: 20k AtomsDefaultHPC Tuning Recommendations1224364860SE +/- 0.14, N = 3SE +/- 0.16, N = 353.2151.421. (CXX) g++ options: -O3 -lm -ldl

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60DefaultHPC Tuning Recommendations1224364860SE +/- 0.58, N = 9SE +/- 1.04, N = 1241.0951.021. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60DefaultHPC Tuning Recommendations1122334455SE +/- 0.27, N = 3SE +/- 0.11, N = 340.3549.091. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32DefaultHPC Tuning Recommendations2004006008001000SE +/- 0.78, N = 6SE +/- 6.50, N = 15948.7548.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

easyWave

The easyWave software allows simulating tsunami generation and propagation in the context of early warning systems. EasyWave supports making use of OpenMP for CPU multi-threading and there are also GPU ports available but not currently incorporated as part of this test profile. The easyWave tsunami generation software is run with one of the example/reference input files for measuring the CPU execution time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400DefaultHPC Tuning Recommendations1224364860SE +/- 0.13, N = 3SE +/- 0.49, N = 1551.6452.471. (CXX) g++ options: -O3 -fopenmp

NAMD

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0Input: STMV with 1,066,628 AtomsDefaultHPC Tuning Recommendations0.83741.67482.51223.34964.187SE +/- 0.00564, N = 4SE +/- 0.00833, N = 33.721993.24773

ACES DGEMM

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateDefaultHPC Tuning Recommendations11002200330044005500SE +/- 5.20, N = 5SE +/- 7.97, N = 55183.444633.221. (CC) gcc options: -ffast-math -mavx2 -O3 -fopenmp -lopenblas

Algebraic Multi-Grid Benchmark

AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.2DefaultHPC Tuning Recommendations700M1400M2100M2800M3500MSE +/- 11683916.89, N = 4SE +/- 21054535.14, N = 431908382503064564750

NWChem

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.2.3Input: C240 BuckyballDefaultHPC Tuning Recommendations300600900120015001249.11298.21. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lfcidump -lgwmol -lga -larmci -lpeigs -l64to32 -llapack -lopenblas -lpthread -lrt -lcomex -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -ffast-math -std=legacy -fdefault-integer-8 -O0

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128DefaultHPC Tuning Recommendations7001400210028003500SE +/- 10.68, N = 3SE +/- 24.99, N = 43358.02007.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64DefaultHPC Tuning Recommendations400800120016002000SE +/- 2.83, N = 6SE +/- 8.66, N = 151799.11060.61. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Mesh TimeDefaultHPC Tuning Recommendations51015202518.9715.971. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CORAL2 P1DefaultHPC Tuning Recommendations8M16M24M32M40MSE +/- 80000.00, N = 3SE +/- 100388.14, N = 336610000276533331. (CXX) g++ options: -fopenmp -O3 -march=native

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Small Mesh Size - Execution TimeDefaultHPC Tuning Recommendations61218243024.5824.261. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256DefaultHPC Tuning Recommendations7001400210028003500SE +/- 4.19, N = 3SE +/- 23.61, N = 32422.73098.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Mesh TimeDefaultHPC Tuning Recommendations2040608010084.1279.471. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution TimeDefaultHPC Tuning Recommendations60120180240300283.12218.131. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Large Mesh Size - Mesh TimeDefaultHPC Tuning Recommendations130260390520650603.69517.581. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Large Mesh Size - Execution TimeDefaultHPC Tuning Recommendations2K4K6K8K10K8380.536426.061. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Laghos

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshDefaultHPC Tuning Recommendations120240360480600SE +/- 3.42, N = 3SE +/- 5.64, N = 3567.09551.121. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemDefaultHPC Tuning Recommendations60120180240300SE +/- 1.56, N = 3SE +/- 2.26, N = 3295.58293.031. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CTS2DefaultHPC Tuning Recommendations6M12M18M24M30MSE +/- 60827.63, N = 3SE +/- 110503.90, N = 326510000175133331. (CXX) g++ options: -fopenmp -O3 -march=native

OpenBenchmarking.orgFigure Of Merit, More Is BetterQuicksilver 20230818Input: CORAL2 P2DefaultHPC Tuning Recommendations5M10M15M20M25MSE +/- 29627.31, N = 3SE +/- 11547.01, N = 324666667150200001. (CXX) g++ options: -fopenmp -O3 -march=native

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.17.1Input: simple-H2ODefaultHPC Tuning Recommendations510152025SE +/- 0.14, N = 12SE +/- 0.01, N = 319.3020.231. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.17.1Input: Li2_STO_aeDefaultHPC Tuning Recommendations20406080100SE +/- 0.21, N = 3SE +/- 0.16, N = 372.8475.751. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.17.1Input: FeCO6_b3lyp_gmsDefaultHPC Tuning Recommendations1632486480SE +/- 0.12, N = 3SE +/- 0.07, N = 369.2673.991. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.17.1Input: O_ae_pyscf_UHFDefaultHPC Tuning Recommendations306090120150SE +/- 1.70, N = 3SE +/- 0.88, N = 3142.31146.581. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.17.1Input: LiH_ae_MSDDefaultHPC Tuning Recommendations1224364860SE +/- 0.14, N = 3SE +/- 0.15, N = 352.3654.871. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.17.1Input: H4_aeDefaultHPC Tuning Recommendations246810SE +/- 0.079, N = 15SE +/- 0.010, N = 58.7028.7541. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionDefaultHPC Tuning Recommendations246810SE +/- 0.05610512, N = 5SE +/- 0.02216234, N = 67.955934057.012523171. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: X3D-benchmarking input.i3dDefaultHPC Tuning Recommendations70140210280350SE +/- 7.62, N = 9SE +/- 3.22, N = 9325.05245.681. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

GPAW

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 23.6Input: Carbon NanotubeDefaultHPC Tuning Recommendations714212835SE +/- 0.17, N = 3SE +/- 0.07, N = 327.8627.841. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

CP2K Molecular Dynamics

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 2024.3Input: Fayalite-FISTDefaultHPC Tuning Recommendations1428425670SE +/- 0.10, N = 3SE +/- 0.06, N = 362.2764.351. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 2024.3Input: H20-64DefaultHPC Tuning Recommendations48121620SE +/- 0.12, N = 4SE +/- 0.05, N = 414.1814.311. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 2024.3Input: H20-256DefaultHPC Tuning Recommendations306090120150SE +/- 0.29, N = 3SE +/- 0.47, N = 3137.80135.951. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm

65 Results Shown

Graph500
HeFFTe - Highly Efficient FFT for Exascale
Graph500:
  26:
    bfs max_TEPS
    sssp median_TEPS
    sssp max_TEPS
OpenRadioss:
  Bird Strike on Windshield
  Cell Phone Drop Test
  Bumper Beam
  INIVOL and Fluid Structure Interaction Drop Container
  Chrysler Neon 1M
XNNPACK:
  FP32MobileNetV1
  FP32MobileNetV2
HeFFTe - Highly Efficient FFT for Exascale
OpenRadioss
XNNPACK:
  FP32MobileNetV3Large
  FP32MobileNetV3Small
SPECFEM3D
HeFFTe - Highly Efficient FFT for Exascale
SPECFEM3D:
  Water-layered Halfspace
  Homogeneous Halfspace
  Mount St. Helens
HeFFTe - Highly Efficient FFT for Exascale
SPECFEM3D
XNNPACK:
  FP16MobileNetV1
  FP16MobileNetV2
  FP16MobileNetV3Large
  FP16MobileNetV3Small
  QS8MobileNetV2
GROMACS
easyWave
LAMMPS Molecular Dynamics Simulator
High Performance Conjugate Gradient:
  144 144 144 - 60
  160 160 160 - 60
libxsmm
easyWave
NAMD
ACES DGEMM
Algebraic Multi-Grid Benchmark
NWChem
libxsmm:
  128
  64
OpenFOAM
Quicksilver
OpenFOAM
libxsmm
OpenFOAM:
  drivaerFastback, Medium Mesh Size - Mesh Time
  drivaerFastback, Medium Mesh Size - Execution Time
  drivaerFastback, Large Mesh Size - Mesh Time
  drivaerFastback, Large Mesh Size - Execution Time
Laghos:
  Sedov Blast Wave, ube_922_hex.mesh
  Triple Point Problem
Quicksilver:
  CTS2
  CORAL2 P2
QMCPACK:
  simple-H2O
  Li2_STO_ae
  FeCO6_b3lyp_gms
  O_ae_pyscf_UHF
  LiH_ae_MSD
  H4_ae
Xcompact3d Incompact3d:
  input.i3d 193 Cells Per Direction
  X3D-benchmarking input.i3d
GPAW
CP2K Molecular Dynamics:
  Fayalite-FIST
  H20-64
  H20-256