Microsoft Azure EPYC Milan-X HBv3 Benchmarks Microsoft Azure HBv3 (Milan) versus HBv3 (Milan-X) benchmarking by Michael Larabel for a future article on Phoronix.com. Looking at performance of AMD EPYC Milan-X in Microsoft Azure cloud for a variety of workloads.
HTML result view exported from: https://openbenchmarking.org/result/2203201-PTS-AZUREHBV49&grs .
Processor Motherboard Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution System Layer HBv3 HBv3 Milan-X HBv3 HBv3 Milan-X 64 Cores 64 Cores 120 Cores 120 Cores 2 x AMD EPYC 7V13 64-Core (64 Cores) Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS) 442GB 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk hyperv_fb Mellanox MT27710 CentOS Linux 8 4.18.0-147.8.1.el8_1.x86_64 (x86_64) GCC 8.3.1 20190507 ext4 1152x864 microsoft 2 x AMD EPYC 7V73X 64-Core (64 Cores) 2 x AMD EPYC 7V13 64-Core (120 Cores) 2 x AMD EPYC 7V73X 64-Core (120 Cores) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Compiler Details - --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details - CPU Microcode: 0xffffffff Python Details - Python 3.6.8 Security Details - SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline STIBP: disabled RSB filling + tsx_async_abort: Not affected
hpcc: Rand Ring Bandwidth ospray: San Miguel - Path Tracer hpcc: G-HPL graphics-magick: Noise-Gaussian embree: Pathtracer - Asian Dragon embree: Pathtracer - Crown embree: Pathtracer ISPC - Crown embree: Pathtracer ISPC - Asian Dragon brl-cad: VGR Performance Metric ospray: NASA Streamlines - Path Tracer ospray: NASA Streamlines - SciVis ospray: XFrog Forest - Path Tracer ospray: Magnetic Reconnection - SciVis ospray: XFrog Forest - SciVis openfoam: Motorbike 60M compress-zstd: 19, Long Mode - Compression Speed ospray: San Miguel - SciVis askap: tConvolve MPI - Degridding rocksdb: Rand Read openvkl: vklBenchmark Scalar namd: ATPase Simulation - 327,506 Atoms relion: Basic - CPU askap: tConvolve MPI - Gridding openvkl: vklBenchmark ISPC incompact3d: X3D-benchmarking input.i3d lulesh: build-nodejs: Time To Compile build-linux-kernel: Time To Compile wrf: conus 2.5km gromacs: MPI CPU - water_GMX50_bare compress-zstd: 19 - Compression Speed lammps: 20k Atoms rocksdb: Read Rand Write Rand lammps: Rhodopsin Protein hpcc: Max Ping Pong Bandwidth nwchem: C240 Buckyball onnx: super-resolution-10 - CPU npb: CG.C hpcg: kripke: john-the-ripper: MD5 parboil: OpenMP CUTCP incompact3d: input.i3d 193 Cells Per Direction HBv3 HBv3 Milan-X HBv3 HBv3 Milan-X 64 Cores 64 Cores 120 Cores 120 Cores 1.82992 4.32 99.56610 585 41.8596 40.7800 38.9051 42.0015 618492 15.87 71.43 5.70 38.46 10.75 89.65 39.8 52.63 35988.0 324822447 72 0.41157 418.479 38175.0 120 348.114604 44262.227 96.348 24.159 10150.067 7.476 85.1 31.605 1357349 32.958 17174.753 2256.6 6107 20940.70 40.0233 73635521 5697467 1.515548 13.6781092 6.20242 4.93 175.02700 874 45.6833 46.0929 44.2523 45.6566 655183 17.24 83.33 6.20 40 11.36 65.50 59.8 55.56 40896.3 330410911 74 0.40802 414.541 41160.1 126 322.875112 54759.689 93.795 23.906 9294.703 7.977 106.2 32.373 1381157 33.955 18347.866 2219.8 6354 22323.23 41.1303 97373301 5913000 1.127166 10.8780505 0.76414 7.39 89.35550 721 64.3717 66.5049 63.2862 63.4384 1044368 24.59 111.11 9.09 62.5 16.95 80.60 36.6 83.33 41724.7 502728808 106 0.27619 312.797 41287.0 166 287.761383 40262.206 75.714 19.065 8766.54 9.054 82.0 36.881 1587743 35.409 15815.903 2557.1 5852 20926.52 38.7180 88201142 7143267 0.976470 12.2859945 3.41538 8.50 139.04400 1123 78.7710 75.9409 71.9802 76.5242 1109486 27.78 125 9.90 66.67 18.41 54.03 52.8 85.86 57881.4 522387301 111 0.26900 274.353 57042.5 177 255.836411 47341.128 72.452 18.559 7804.46 9.705 93.8 39.535 1684654 38.467 16082.148 2467.9 6485 21914.51 39.4368 93936541 8141400 0.847720 9.93829823 OpenBenchmarking.org
HPC Challenge Test / Class: Random Ring Bandwidth HBv3 HBv3 Milan-X OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth 64 Cores 120 Cores 2 4 6 8 10 1.82992 0.76414 6.20242 3.41538 1. (CC) gcc options: -lblas -lm -fexceptions -pthread -lmpi 2. ATLAS + Open MPI 4.0.5
OSPray Demo: San Miguel - Renderer: Path Tracer HBv3 HBv3 Milan-X OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: Path Tracer 64 Cores 120 Cores 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 4.32 7.39 4.93 8.50
HPC Challenge Test / Class: G-HPL HBv3 HBv3 Milan-X OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL 64 Cores 120 Cores 40 80 120 160 200 99.57 89.36 175.03 139.04 1. (CC) gcc options: -lblas -lm -fexceptions -pthread -lmpi 2. ATLAS + Open MPI 4.0.5
GraphicsMagick Operation: Noise-Gaussian HBv3 HBv3 Milan-X OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian 64 Cores 120 Cores 200 400 600 800 1000 SE +/- 6.24, N = 4 SE +/- 4.18, N = 3 SE +/- 4.48, N = 3 SE +/- 11.24, N = 15 585 721 874 1123 1. (CC) gcc options: -fopenmp -O2 -pthread -ltiff -ljpeg -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
Embree Binary: Pathtracer - Model: Asian Dragon HBv3 HBv3 Milan-X OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer - Model: Asian Dragon 64 Cores 120 Cores 20 40 60 80 100 SE +/- 0.23, N = 3 SE +/- 0.28, N = 3 SE +/- 0.29, N = 3 SE +/- 0.21, N = 3 41.86 64.37 45.68 78.77
Embree Binary: Pathtracer - Model: Crown HBv3 HBv3 Milan-X OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer - Model: Crown 64 Cores 120 Cores 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.17, N = 3 SE +/- 0.15, N = 3 SE +/- 0.07, N = 3 40.78 66.50 46.09 75.94
Embree Binary: Pathtracer ISPC - Model: Crown HBv3 HBv3 Milan-X OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Crown 64 Cores 120 Cores 16 32 48 64 80 SE +/- 0.08, N = 3 SE +/- 0.16, N = 3 SE +/- 0.07, N = 3 SE +/- 0.18, N = 3 38.91 63.29 44.25 71.98
Embree Binary: Pathtracer ISPC - Model: Asian Dragon HBv3 HBv3 Milan-X OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Asian Dragon 64 Cores 120 Cores 20 40 60 80 100 SE +/- 0.21, N = 3 SE +/- 0.14, N = 3 SE +/- 0.14, N = 3 SE +/- 0.12, N = 3 42.00 63.44 45.66 76.52
BRL-CAD VGR Performance Metric HBv3 HBv3 Milan-X OpenBenchmarking.org VGR Performance Metric, More Is Better BRL-CAD 7.32.2 VGR Performance Metric 64 Cores 120 Cores 200K 400K 600K 800K 1000K 618492 1044368 655183 1109486 1. (CXX) g++ options: -std=c++11 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -pedantic -pthread -ldl -lm
OSPray Demo: NASA Streamlines - Renderer: Path Tracer HBv3 HBv3 Milan-X OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: Path Tracer 64 Cores 120 Cores 7 14 21 28 35 SE +/- 0.00, N = 3 SE +/- 0.20, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 15.87 24.59 17.24 27.78
OSPray Demo: NASA Streamlines - Renderer: SciVis HBv3 HBv3 Milan-X OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: SciVis 64 Cores 120 Cores 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 71.43 111.11 83.33 125.00
OSPray Demo: XFrog Forest - Renderer: Path Tracer HBv3 HBv3 Milan-X OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: Path Tracer 64 Cores 120 Cores 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 5.70 9.09 6.20 9.90
OSPray Demo: Magnetic Reconnection - Renderer: SciVis HBv3 HBv3 Milan-X OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: Magnetic Reconnection - Renderer: SciVis 64 Cores 120 Cores 15 30 45 60 75 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 38.46 62.50 40.00 66.67
OSPray Demo: XFrog Forest - Renderer: SciVis HBv3 HBv3 Milan-X OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: SciVis 64 Cores 120 Cores 5 10 15 20 25 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.11, N = 3 10.75 16.95 11.36 18.41
OpenFOAM Input: Motorbike 60M HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 64 Cores 120 Cores 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.05, N = 3 SE +/- 0.15, N = 3 SE +/- 0.22, N = 3 89.65 80.60 65.50 54.03 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfoamToVTK -ldynamicMesh -llagrangian -lgenericPatchFields -lfileFormats -lOpenFOAM -ldl -lm
Zstd Compression Compression Level: 19, Long Mode - Compression Speed HBv3 HBv3 Milan-X OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed 64 Cores 120 Cores 13 26 39 52 65 SE +/- 0.34, N = 15 SE +/- 0.29, N = 3 SE +/- 0.64, N = 3 SE +/- 0.50, N = 15 39.8 36.6 59.8 52.8 1. (CC) gcc options: -O3 -pthread -lz -llzma
OSPray Demo: San Miguel - Renderer: SciVis HBv3 HBv3 Milan-X OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: SciVis 64 Cores 120 Cores 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.95, N = 15 52.63 83.33 55.56 85.86
ASKAP Test: tConvolve MPI - Degridding HBv3 HBv3 Milan-X OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding 64 Cores 120 Cores 12K 24K 36K 48K 60K SE +/- 204.47, N = 3 SE +/- 146.90, N = 3 SE +/- 263.83, N = 3 SE +/- 0.00, N = 3 35988.0 41724.7 40896.3 57881.4 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
Facebook RocksDB Test: Random Read HBv3 HBv3 Milan-X OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.22.1 Test: Random Read 64 Cores 120 Cores 110M 220M 330M 440M 550M SE +/- 95304.17, N = 3 SE +/- 4680557.08, N = 7 SE +/- 648314.38, N = 3 SE +/- 1522212.52, N = 3 324822447 502728808 330410911 522387301 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -O2 -fno-rtti -lgflags
OpenVKL Benchmark: vklBenchmark Scalar HBv3 HBv3 Milan-X OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.0 Benchmark: vklBenchmark Scalar 64 Cores 120 Cores 20 40 60 80 100 SE +/- 0.88, N = 3 SE +/- 1.21, N = 9 SE +/- 0.67, N = 3 SE +/- 1.11, N = 6 72 106 74 111
NAMD ATPase Simulation - 327,506 Atoms HBv3 HBv3 Milan-X OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 64 Cores 120 Cores 0.0926 0.1852 0.2778 0.3704 0.463 SE +/- 0.00048, N = 3 SE +/- 0.00012, N = 3 SE +/- 0.00005, N = 3 SE +/- 0.00007, N = 3 0.41157 0.27619 0.40802 0.26900
RELION Test: Basic - Device: CPU HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 64 Cores 120 Cores 90 180 270 360 450 SE +/- 1.03, N = 3 SE +/- 1.58, N = 3 SE +/- 0.68, N = 3 SE +/- 1.22, N = 3 418.48 312.80 414.54 274.35 1. (CXX) g++ options: -fopenmp -std=c++0x -O2 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -fexceptions -pthread -lmpi_cxx -lmpi
ASKAP Test: tConvolve MPI - Gridding HBv3 HBv3 Milan-X OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding 64 Cores 120 Cores 12K 24K 36K 48K 60K SE +/- 400.79, N = 3 SE +/- 143.87, N = 3 SE +/- 0.00, N = 3 38175.0 41287.0 41160.1 57042.5 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OpenVKL Benchmark: vklBenchmark ISPC HBv3 HBv3 Milan-X OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.0 Benchmark: vklBenchmark ISPC 64 Cores 120 Cores 40 80 120 160 200 SE +/- 0.88, N = 3 SE +/- 1.75, N = 5 120 166 126 177
Xcompact3d Incompact3d Input: X3D-benchmarking input.i3d HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d 64 Cores 120 Cores 80 160 240 320 400 SE +/- 0.83, N = 3 SE +/- 0.24, N = 3 SE +/- 0.69, N = 3 SE +/- 0.52, N = 3 348.11 287.76 322.88 255.84 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
LULESH HBv3 HBv3 Milan-X OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 64 Cores 120 Cores 12K 24K 36K 48K 60K SE +/- 36.72, N = 3 SE +/- 286.97, N = 3 SE +/- 258.47, N = 3 SE +/- 209.57, N = 3 44262.23 40262.21 54759.69 47341.13 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi
Timed Node.js Compilation Time To Compile HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better Timed Node.js Compilation 15.11 Time To Compile 64 Cores 120 Cores 20 40 60 80 100 SE +/- 0.27, N = 3 SE +/- 0.28, N = 3 SE +/- 0.31, N = 3 SE +/- 0.32, N = 3 96.35 75.71 93.80 72.45
Timed Linux Kernel Compilation Time To Compile HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 5.14 Time To Compile 64 Cores 120 Cores 6 12 18 24 30 SE +/- 0.22, N = 7 SE +/- 0.16, N = 8 SE +/- 0.21, N = 8 SE +/- 0.12, N = 13 24.16 19.07 23.91 18.56
WRF Input: conus 2.5km HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better WRF 4.2.2 Input: conus 2.5km 64 Cores 120 Cores 2K 4K 6K 8K 10K 10150.07 8766.54 9294.70 7804.46 1. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -fexceptions -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare HBv3 HBv3 Milan-X OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2021.2 Implementation: MPI CPU - Input: water_GMX50_bare 64 Cores 120 Cores 3 6 9 12 15 SE +/- 0.061, N = 3 SE +/- 0.051, N = 3 SE +/- 0.020, N = 3 SE +/- 0.061, N = 3 7.476 9.054 7.977 9.705 1. (CXX) g++ options: -O2 -pthread
Zstd Compression Compression Level: 19 - Compression Speed HBv3 HBv3 Milan-X OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed 64 Cores 120 Cores 20 40 60 80 100 SE +/- 0.93, N = 15 SE +/- 0.83, N = 6 SE +/- 1.05, N = 15 SE +/- 1.33, N = 3 85.1 82.0 106.2 93.8 1. (CC) gcc options: -O3 -pthread -lz -llzma
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms HBv3 HBv3 Milan-X OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms 64 Cores 120 Cores 9 18 27 36 45 SE +/- 0.12, N = 3 SE +/- 0.22, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 31.61 36.88 32.37 39.54 1. (CXX) g++ options: -O2 -pthread -lm
Facebook RocksDB Test: Read Random Write Random HBv3 HBv3 Milan-X OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.22.1 Test: Read Random Write Random 64 Cores 120 Cores 400K 800K 1200K 1600K 2000K SE +/- 10799.70, N = 3 SE +/- 5368.09, N = 3 SE +/- 6520.37, N = 3 SE +/- 6175.84, N = 3 1357349 1587743 1381157 1684654 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -O2 -fno-rtti -lgflags
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein HBv3 HBv3 Milan-X OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein 64 Cores 120 Cores 9 18 27 36 45 SE +/- 0.24, N = 3 SE +/- 0.11, N = 3 SE +/- 0.11, N = 3 SE +/- 0.29, N = 3 32.96 35.41 33.96 38.47 1. (CXX) g++ options: -O2 -pthread -lm
HPC Challenge Test / Class: Max Ping Pong Bandwidth HBv3 HBv3 Milan-X OpenBenchmarking.org MB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth 64 Cores 120 Cores 4K 8K 12K 16K 20K 17174.75 15815.90 18347.87 16082.15 1. (CC) gcc options: -lblas -lm -fexceptions -pthread -lmpi 2. ATLAS + Open MPI 4.0.5
NWChem Input: C240 Buckyball HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better NWChem 7.0.2 Input: C240 Buckyball 64 Cores 120 Cores 500 1000 1500 2000 2500 2256.6 2557.1 2219.8 2467.9 1. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lcomex -lm -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2
ONNX Runtime Model: super-resolution-10 - Device: CPU HBv3 HBv3 Milan-X OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.9.1 Model: super-resolution-10 - Device: CPU 64 Cores 120 Cores 1400 2800 4200 5600 7000 SE +/- 62.83, N = 3 SE +/- 56.15, N = 3 SE +/- 100.53, N = 9 SE +/- 117.46, N = 9 6107 5852 6354 6485 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O2 -flto -fno-fat-lto-objects -ldl -lrt -pthread -lpthread
NAS Parallel Benchmarks Test / Class: CG.C HBv3 HBv3 Milan-X OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C 64 Cores 120 Cores 5K 10K 15K 20K 25K SE +/- 51.72, N = 3 SE +/- 26.14, N = 3 SE +/- 34.77, N = 3 SE +/- 70.02, N = 3 20940.70 20926.52 22323.23 21914.51 1. (F9X) gfortran options: -O3 -march=native -fexceptions -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
High Performance Conjugate Gradient HBv3 HBv3 Milan-X OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 64 Cores 120 Cores 9 18 27 36 45 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.09, N = 3 SE +/- 0.03, N = 3 40.02 38.72 41.13 39.44 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
Kripke HBv3 HBv3 Milan-X OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 64 Cores 120 Cores 20M 40M 60M 80M 100M SE +/- 1812363.94, N = 15 SE +/- 2974209.65, N = 15 SE +/- 2522711.91, N = 15 SE +/- 2167036.06, N = 15 73635521 88201142 97373301 93936541 1. (CXX) g++ options: -O2 -fopenmp
John The Ripper Test: MD5 HBv3 HBv3 Milan-X OpenBenchmarking.org Real C/S, More Is Better John The Ripper 1.9.0-jumbo-1 Test: MD5 64 Cores 120 Cores 2M 4M 6M 8M 10M SE +/- 54210.51, N = 15 SE +/- 283586.65, N = 15 SE +/- 10969.66, N = 3 SE +/- 271831.96, N = 15 5697467 7143267 5913000 8141400 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -pthread -lm -lz -ldl -lcrypt -lbz2
Parboil Test: OpenMP CUTCP HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 64 Cores 120 Cores 0.341 0.682 1.023 1.364 1.705 SE +/- 0.014450, N = 3 SE +/- 0.006046, N = 3 SE +/- 0.011448, N = 6 SE +/- 0.022647, N = 12 1.515548 0.976470 1.127166 0.847720 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction HBv3 HBv3 Milan-X OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction 64 Cores 120 Cores 4 8 12 16 20 SE +/- 0.43535891, N = 15 SE +/- 0.02189533, N = 3 SE +/- 0.04593850, N = 3 SE +/- 0.02331367, N = 3 13.67810920 12.28599450 10.87805050 9.93829823 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Phoronix Test Suite v10.8.5