mpitest AMD Ryzen Threadripper PRO 3995WX 64-Cores testing with a GIGABYTE WRX80-SU8 N/A (WRX80SU8-F2 BIOS) and Gigabyte NVIDIA GeForce RTX 3080 Ti 12GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2501206-NE-MPITEST7573&grs .
mpitest Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenCL Compiler File-System Screen Resolution AMD Ryzen Threadripper PRO 3995WX 64-Cores AMD Ryzen Threadripper PRO 3995WX 64-Cores @ 2.70GHz (64 Cores / 128 Threads) GIGABYTE WRX80-SU8 N/A (WRX80SU8-F2 BIOS) AMD Starship/Matisse 4 x 16GB DDR4-2133MT/s CM4X16GC3200C16K2E 1000GB CT1000P3SSD8 + 1000GB KINGSTON SNV3S1000G Gigabyte NVIDIA GeForce RTX 3080 Ti 12GB AMD Starship/Matisse 2 x LG TV 2 x Realtek RTL8111/8168/8211/8411 + 2 x Intel I210 + 2 x Intel X550 + Intel Wi-Fi 6 AX200 Ubuntu 24.04 6.8.0-51-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 NVIDIA OpenCL 3.0 CUDA 12.4.131 GCC 13.3.0 + CUDA 12.6 ext4 1920x1080 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107c - Python 3.12.4 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
mpitest gpaw: Carbon Nanotube gromacs: NVIDIA CUDA GPU - water_GMX50_bare gromacs: MPI CPU - water_GMX50_bare intel-mpi: IMB-MPI1 Sendrecv intel-mpi: IMB-MPI1 Sendrecv intel-mpi: IMB-MPI1 PingPong intel-mpi: IMB-MPI1 Exchange intel-mpi: IMB-MPI1 Exchange intel-mpi: IMB-P2P PingPong lammps: Rhodopsin Protein lammps: 20k Atoms mocassin: Dust 2D tau100.0 mocassin: Gas HII40 incompact3d: input.i3d 193 Cells Per Direction incompact3d: input.i3d 129 Cells Per Direction mrbayes: Primate Phylogeny Analysis pennant: leblancbig pennant: sedovbig minife: Small hpcc: Max Ping Pong Bandwidth hpcc: Rand Ring Bandwidth hpcc: Rand Ring Latency hpcc: G-Rand Access hpcc: EP-STREAM Triad hpcc: G-Ptrans hpcc: EP-DGEMM hpcc: G-Ffte hpcc: G-HPL npb: SP.C npb: SP.B npb: MG.C npb: LU.C npb: IS.D npb: FT.C npb: EP.D npb: EP.C npb: CG.C npb: BT.C hpcg: 104 104 104 - 1800 askap: tConvolve MPI - Gridding askap: tConvolve MPI - Degridding hpcg: 104 104 104 - 60 AMD Ryzen Threadripper PRO 3995WX 64-Cores 136.787 23.801 3.171 292.17 1859.20 2411.51 557.43 2490.11 25366668 21.838 24.598 145.597 19.249 59.3955548 13.4860032 131.461 5.925810 17.35116 6809.84 9246.266 0.71653 1.14125 0.19271 0.56827 2.54122 7.25358 11.19347 152.53333 16955.27 30075.20 19608.97 49703.3 853.88 21478.66 5322.84 5140.81 5819.63 49906.82 6.33226 10871.70 11744.39 5.59916 OpenBenchmarking.org
GPAW Input: Carbon Nanotube OpenBenchmarking.org Seconds, Fewer Is Better GPAW 23.6 Input: Carbon Nanotube AMD Ryzen Threadripper PRO 3995WX 64-Cores 30 60 90 120 150 SE +/- 0.41, N = 3 136.79 1. (CC) gcc options: -pthread -shared -lxc -lblas -lmpi -fno-strict-overflow -O2 -fPIC -isystem -UNDEBUG -std=c99
GROMACS Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare AMD Ryzen Threadripper PRO 3995WX 64-Cores 6 12 18 24 30 SE +/- 0.01, N = 3 23.80 1. (CXX) g++ options: -O3 -lm
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare AMD Ryzen Threadripper PRO 3995WX 64-Cores 0.7135 1.427 2.1405 2.854 3.5675 SE +/- 0.036, N = 3 3.171 1. (CXX) g++ options: -O3 -lm
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv AMD Ryzen Threadripper PRO 3995WX 64-Cores 60 120 180 240 300 SE +/- 2.93, N = 3 292.17 MIN: 0.5 / MAX: 9957.71 1. (CXX) g++ options: -O0 -pedantic -fopenmp -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv AMD Ryzen Threadripper PRO 3995WX 64-Cores 400 800 1200 1600 2000 SE +/- 6.66, N = 3 1859.20 MAX: 6882.76 1. (CXX) g++ options: -O0 -pedantic -fopenmp -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 PingPong OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 PingPong AMD Ryzen Threadripper PRO 3995WX 64-Cores 500 1000 1500 2000 2500 SE +/- 10.55, N = 3 2411.51 MIN: 4.38 / MAX: 7065.21 1. (CXX) g++ options: -O0 -pedantic -fopenmp -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange AMD Ryzen Threadripper PRO 3995WX 64-Cores 120 240 360 480 600 SE +/- 8.21, N = 3 557.43 MIN: 0.97 / MAX: 22328.95 1. (CXX) g++ options: -O0 -pedantic -fopenmp -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange AMD Ryzen Threadripper PRO 3995WX 64-Cores 500 1000 1500 2000 2500 SE +/- 29.82, N = 3 2490.11 MAX: 9884.84 1. (CXX) g++ options: -O0 -pedantic -fopenmp -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-P2P PingPong OpenBenchmarking.org Average Msg/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-P2P PingPong AMD Ryzen Threadripper PRO 3995WX 64-Cores 5M 10M 15M 20M 25M SE +/- 223826.40, N = 3 25366668 MIN: 2089 / MAX: 67855108 1. (CXX) g++ options: -O0 -pedantic -fopenmp -lmpi_cxx -lmpi
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: Rhodopsin Protein AMD Ryzen Threadripper PRO 3995WX 64-Cores 5 10 15 20 25 SE +/- 0.24, N = 5 21.84 1. (CXX) g++ options: -O3 -lm -ldl
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 23Jun2022 Model: 20k Atoms AMD Ryzen Threadripper PRO 3995WX 64-Cores 6 12 18 24 30 SE +/- 0.32, N = 3 24.60 1. (CXX) g++ options: -O3 -lm -ldl
Monte Carlo Simulations of Ionised Nebulae Input: Dust 2D tau100.0 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2.02.73.3 Input: Dust 2D tau100.0 AMD Ryzen Threadripper PRO 3995WX 64-Cores 30 60 90 120 150 SE +/- 0.98, N = 3 145.60 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O2 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lz
Monte Carlo Simulations of Ionised Nebulae Input: Gas HII40 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2.02.73.3 Input: Gas HII40 AMD Ryzen Threadripper PRO 3995WX 64-Cores 5 10 15 20 25 SE +/- 0.08, N = 3 19.25 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O2 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lz
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction AMD Ryzen Threadripper PRO 3995WX 64-Cores 13 26 39 52 65 SE +/- 0.13, N = 3 59.40 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Xcompact3d Incompact3d Input: input.i3d 129 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction AMD Ryzen Threadripper PRO 3995WX 64-Cores 3 6 9 12 15 SE +/- 0.10, N = 3 13.49 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis AMD Ryzen Threadripper PRO 3995WX 64-Cores 30 60 90 120 150 SE +/- 0.17, N = 3 131.46 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig AMD Ryzen Threadripper PRO 3995WX 64-Cores 1.3333 2.6666 3.9999 5.3332 6.6665 SE +/- 0.074192, N = 15 5.925810 1. (CXX) g++ options: -fopenmp -lmpi_cxx -lmpi
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig AMD Ryzen Threadripper PRO 3995WX 64-Cores 4 8 12 16 20 SE +/- 0.22, N = 4 17.35 1. (CXX) g++ options: -fopenmp -lmpi_cxx -lmpi
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small AMD Ryzen Threadripper PRO 3995WX 64-Cores 1500 3000 4500 6000 7500 SE +/- 73.24, N = 5 6809.84 1. (CXX) g++ options: -O3 -fopenmp -lmpi_cxx -lmpi
HPC Challenge Test / Class: Max Ping Pong Bandwidth OpenBenchmarking.org MB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth AMD Ryzen Threadripper PRO 3995WX 64-Cores 2K 4K 6K 8K 10K SE +/- 9.69, N = 3 9246.27 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
HPC Challenge Test / Class: Random Ring Bandwidth OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth AMD Ryzen Threadripper PRO 3995WX 64-Cores 0.1612 0.3224 0.4836 0.6448 0.806 SE +/- 0.01836, N = 3 0.71653 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
HPC Challenge Test / Class: Random Ring Latency OpenBenchmarking.org usecs, Fewer Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Latency AMD Ryzen Threadripper PRO 3995WX 64-Cores 0.2568 0.5136 0.7704 1.0272 1.284 SE +/- 0.00848, N = 3 1.14125 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
HPC Challenge Test / Class: G-Random Access OpenBenchmarking.org GUP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Random Access AMD Ryzen Threadripper PRO 3995WX 64-Cores 0.0434 0.0868 0.1302 0.1736 0.217 SE +/- 0.00082, N = 3 0.19271 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
HPC Challenge Test / Class: EP-STREAM Triad OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad AMD Ryzen Threadripper PRO 3995WX 64-Cores 0.1279 0.2558 0.3837 0.5116 0.6395 SE +/- 0.00036, N = 3 0.56827 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
HPC Challenge Test / Class: G-Ptrans OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ptrans AMD Ryzen Threadripper PRO 3995WX 64-Cores 0.5718 1.1436 1.7154 2.2872 2.859 SE +/- 0.01461, N = 3 2.54122 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
HPC Challenge Test / Class: EP-DGEMM OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: EP-DGEMM AMD Ryzen Threadripper PRO 3995WX 64-Cores 2 4 6 8 10 SE +/- 0.08554, N = 3 7.25358 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte AMD Ryzen Threadripper PRO 3995WX 64-Cores 3 6 9 12 15 SE +/- 0.01, N = 3 11.19 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL AMD Ryzen Threadripper PRO 3995WX 64-Cores 30 60 90 120 150 SE +/- 0.48, N = 3 152.53 1. (CC) gcc options: -lblas -lm -lmpi -fomit-frame-pointer -funroll-loops 2. OpenBLAS + Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: SP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C AMD Ryzen Threadripper PRO 3995WX 64-Cores 4K 8K 12K 16K 20K SE +/- 70.76, N = 3 16955.27 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: SP.B OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.B AMD Ryzen Threadripper PRO 3995WX 64-Cores 6K 12K 18K 24K 30K SE +/- 331.56, N = 4 30075.20 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C AMD Ryzen Threadripper PRO 3995WX 64-Cores 4K 8K 12K 16K 20K SE +/- 58.72, N = 3 19608.97 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: LU.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: LU.C AMD Ryzen Threadripper PRO 3995WX 64-Cores 11K 22K 33K 44K 55K SE +/- 171.34, N = 3 49703.3 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D AMD Ryzen Threadripper PRO 3995WX 64-Cores 200 400 600 800 1000 SE +/- 5.18, N = 3 853.88 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C AMD Ryzen Threadripper PRO 3995WX 64-Cores 5K 10K 15K 20K 25K SE +/- 53.71, N = 3 21478.66 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: EP.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D AMD Ryzen Threadripper PRO 3995WX 64-Cores 1100 2200 3300 4400 5500 SE +/- 13.91, N = 3 5322.84 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C AMD Ryzen Threadripper PRO 3995WX 64-Cores 1100 2200 3300 4400 5500 SE +/- 57.80, N = 3 5140.81 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C AMD Ryzen Threadripper PRO 3995WX 64-Cores 1200 2400 3600 4800 6000 SE +/- 70.34, N = 4 5819.63 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C AMD Ryzen Threadripper PRO 3995WX 64-Cores 11K 22K 33K 44K 55K SE +/- 287.94, N = 3 49906.82 1. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.6
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 1800 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 1800 AMD Ryzen Threadripper PRO 3995WX 64-Cores 2 4 6 8 10 SE +/- 0.00769, N = 3 6.33226 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
ASKAP Test: tConvolve MPI - Gridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding AMD Ryzen Threadripper PRO 3995WX 64-Cores 2K 4K 6K 8K 10K SE +/- 392.31, N = 12 10871.70 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MPI - Degridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding AMD Ryzen Threadripper PRO 3995WX 64-Cores 3K 6K 9K 12K 15K SE +/- 430.92, N = 12 11744.39 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 AMD Ryzen Threadripper PRO 3995WX 64-Cores 1.2598 2.5196 3.7794 5.0392 6.299 SE +/- 0.17243, N = 9 5.59916 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
Phoronix Test Suite v10.8.5