Microsoft Azure EPYC Milan-X HBv3 Benchmarks Microsoft Azure HBv3 (Milan) versus HBv3 (Milan-X) benchmarking by Michael Larabel for a future article on Phoronix.com. Looking at performance of AMD EPYC Milan-X in Microsoft Azure cloud for a variety of workloads.
HTML result view exported from: https://openbenchmarking.org/result/2203201-PTS-AZUREHBV49&sro&grr .
Processor Motherboard Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution System Layer HBv3 HBv3 Milan-X HBv3 HBv3 Milan-X 64 Cores 64 Cores 120 Cores 120 Cores 2 x AMD EPYC 7V13 64-Core (64 Cores) Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS) 442GB 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk hyperv_fb Mellanox MT27710 CentOS Linux 8 4.18.0-147.8.1.el8_1.x86_64 (x86_64) GCC 8.3.1 20190507 ext4 1152x864 microsoft 2 x AMD EPYC 7V73X 64-Core (64 Cores) 2 x AMD EPYC 7V13 64-Core (120 Cores) 2 x AMD EPYC 7V73X 64-Core (120 Cores) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Compiler Details - --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details - CPU Microcode: 0xffffffff Python Details - Python 3.6.8 Security Details - SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline STIBP: disabled RSB filling + tsx_async_abort: Not affected
hpcc: G-HPL wrf: conus 2.5km openvkl: vklBenchmark Scalar openvkl: vklBenchmark ISPC nwchem: C240 Buckyball relion: Basic - CPU incompact3d: X3D-benchmarking input.i3d brl-cad: VGR Performance Metric lammps: 20k Atoms john-the-ripper: MD5 onnx: super-resolution-10 - CPU kripke: hpcg: compress-zstd: 19 - Compression Speed openfoam: Motorbike 60M compress-zstd: 19, Long Mode - Compression Speed graphics-magick: Noise-Gaussian build-nodejs: Time To Compile rocksdb: Rand Read build-linux-kernel: Time To Compile ospray: San Miguel - Path Tracer rocksdb: Read Rand Write Rand ospray: San Miguel - SciVis askap: tConvolve MPI - Gridding askap: tConvolve MPI - Degridding ospray: XFrog Forest - Path Tracer gromacs: MPI CPU - water_GMX50_bare namd: ATPase Simulation - 327,506 Atoms incompact3d: input.i3d 193 Cells Per Direction ospray: XFrog Forest - SciVis npb: CG.C embree: Pathtracer ISPC - Crown ospray: NASA Streamlines - Path Tracer embree: Pathtracer ISPC - Asian Dragon embree: Pathtracer - Crown embree: Pathtracer - Asian Dragon lulesh: ospray: Magnetic Reconnection - SciVis ospray: NASA Streamlines - SciVis lammps: Rhodopsin Protein parboil: OpenMP CUTCP hpcc: Max Ping Pong Bandwidth hpcc: Rand Ring Bandwidth HBv3 HBv3 Milan-X HBv3 HBv3 Milan-X 64 Cores 64 Cores 120 Cores 120 Cores 99.56610 10150.067 72 120 2256.6 418.479 348.114604 618492 31.605 5697467 6107 73635521 40.0233 85.1 89.65 39.8 585 96.348 324822447 24.159 4.32 1357349 52.63 38175.0 35988.0 5.70 7.476 0.41157 13.6781092 10.75 20940.70 38.9051 15.87 42.0015 40.7800 41.8596 44262.227 38.46 71.43 32.958 1.515548 17174.753 1.82992 175.02700 9294.703 74 126 2219.8 414.541 322.875112 655183 32.373 5913000 6354 97373301 41.1303 106.2 65.50 59.8 874 93.795 330410911 23.906 4.93 1381157 55.56 41160.1 40896.3 6.20 7.977 0.40802 10.8780505 11.36 22323.23 44.2523 17.24 45.6566 46.0929 45.6833 54759.689 40 83.33 33.955 1.127166 18347.866 6.20242 89.35550 8766.54 106 166 2557.1 312.797 287.761383 1044368 36.881 7143267 5852 88201142 38.7180 82.0 80.60 36.6 721 75.714 502728808 19.065 7.39 1587743 83.33 41287.0 41724.7 9.09 9.054 0.27619 12.2859945 16.95 20926.52 63.2862 24.59 63.4384 66.5049 64.3717 40262.206 62.5 111.11 35.409 0.976470 15815.903 0.76414 139.04400 7804.46 111 177 2467.9 274.353 255.836411 1109486 39.535 8141400 6485 93936541 39.4368 93.8 54.03 52.8 1123 72.452 522387301 18.559 8.50 1684654 85.86 57042.5 57881.4 9.90 9.705 0.26900 9.93829823 18.41 21914.51 71.9802 27.78 76.5242 75.9409 78.7710 47341.128 66.67 125 38.467 0.847720 16082.148 3.41538 OpenBenchmarking.org
HPC Challenge Test / Class: G-HPL HBv3 Milan-X HBv3 OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL 120 Cores 64 Cores 40 80 120 160 200 139.04 175.03 89.36 99.57 1. (CC) gcc options: -lblas -lm -fexceptions -pthread -lmpi 2. ATLAS + Open MPI 4.0.5
WRF Input: conus 2.5km HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better WRF 4.2.2 Input: conus 2.5km 120 Cores 64 Cores 2K 4K 6K 8K 10K 7804.46 9294.70 8766.54 10150.07 1. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -fexceptions -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenVKL Benchmark: vklBenchmark Scalar HBv3 Milan-X HBv3 OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.0 Benchmark: vklBenchmark Scalar 120 Cores 64 Cores 20 40 60 80 100 SE +/- 1.11, N = 6 SE +/- 0.67, N = 3 SE +/- 1.21, N = 9 SE +/- 0.88, N = 3 111 74 106 72
OpenVKL Benchmark: vklBenchmark ISPC HBv3 Milan-X HBv3 OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.0 Benchmark: vklBenchmark ISPC 120 Cores 64 Cores 40 80 120 160 200 SE +/- 1.75, N = 5 SE +/- 0.88, N = 3 177 126 166 120
NWChem Input: C240 Buckyball HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better NWChem 7.0.2 Input: C240 Buckyball 120 Cores 64 Cores 500 1000 1500 2000 2500 2467.9 2219.8 2557.1 2256.6 1. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lcomex -lm -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2
RELION Test: Basic - Device: CPU HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 120 Cores 64 Cores 90 180 270 360 450 SE +/- 1.22, N = 3 SE +/- 0.68, N = 3 SE +/- 1.58, N = 3 SE +/- 1.03, N = 3 274.35 414.54 312.80 418.48 1. (CXX) g++ options: -fopenmp -std=c++0x -O2 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -fexceptions -pthread -lmpi_cxx -lmpi
Xcompact3d Incompact3d Input: X3D-benchmarking input.i3d HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: X3D-benchmarking input.i3d 120 Cores 64 Cores 80 160 240 320 400 SE +/- 0.52, N = 3 SE +/- 0.69, N = 3 SE +/- 0.24, N = 3 SE +/- 0.83, N = 3 255.84 322.88 287.76 348.11 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
BRL-CAD VGR Performance Metric HBv3 Milan-X HBv3 OpenBenchmarking.org VGR Performance Metric, More Is Better BRL-CAD 7.32.2 VGR Performance Metric 120 Cores 64 Cores 200K 400K 600K 800K 1000K 1109486 655183 1044368 618492 1. (CXX) g++ options: -std=c++11 -pipe -fvisibility=hidden -fno-strict-aliasing -fno-common -fexceptions -ftemplate-depth-128 -m64 -ggdb3 -O3 -fipa-pta -fstrength-reduce -finline-functions -flto -pedantic -pthread -ldl -lm
LAMMPS Molecular Dynamics Simulator Model: 20k Atoms HBv3 Milan-X HBv3 OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: 20k Atoms 120 Cores 64 Cores 9 18 27 36 45 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.22, N = 3 SE +/- 0.12, N = 3 39.54 32.37 36.88 31.61 1. (CXX) g++ options: -O2 -pthread -lm
John The Ripper Test: MD5 HBv3 Milan-X HBv3 OpenBenchmarking.org Real C/S, More Is Better John The Ripper 1.9.0-jumbo-1 Test: MD5 120 Cores 64 Cores 2M 4M 6M 8M 10M SE +/- 271831.96, N = 15 SE +/- 10969.66, N = 3 SE +/- 283586.65, N = 15 SE +/- 54210.51, N = 15 8141400 5913000 7143267 5697467 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -pthread -lm -lz -ldl -lcrypt -lbz2
ONNX Runtime Model: super-resolution-10 - Device: CPU HBv3 Milan-X HBv3 OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.9.1 Model: super-resolution-10 - Device: CPU 120 Cores 64 Cores 1400 2800 4200 5600 7000 SE +/- 117.46, N = 9 SE +/- 100.53, N = 9 SE +/- 56.15, N = 3 SE +/- 62.83, N = 3 6485 6354 5852 6107 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O2 -flto -fno-fat-lto-objects -ldl -lrt -pthread -lpthread
Kripke HBv3 Milan-X HBv3 OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.4 120 Cores 64 Cores 20M 40M 60M 80M 100M SE +/- 2167036.06, N = 15 SE +/- 2522711.91, N = 15 SE +/- 2974209.65, N = 15 SE +/- 1812363.94, N = 15 93936541 97373301 88201142 73635521 1. (CXX) g++ options: -O2 -fopenmp
High Performance Conjugate Gradient HBv3 Milan-X HBv3 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 120 Cores 64 Cores 9 18 27 36 45 SE +/- 0.03, N = 3 SE +/- 0.09, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 39.44 41.13 38.72 40.02 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
Zstd Compression Compression Level: 19 - Compression Speed HBv3 Milan-X HBv3 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19 - Compression Speed 120 Cores 64 Cores 20 40 60 80 100 SE +/- 1.33, N = 3 SE +/- 1.05, N = 15 SE +/- 0.83, N = 6 SE +/- 0.93, N = 15 93.8 106.2 82.0 85.1 1. (CC) gcc options: -O3 -pthread -lz -llzma
OpenFOAM Input: Motorbike 60M HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 120 Cores 64 Cores 20 40 60 80 100 SE +/- 0.22, N = 3 SE +/- 0.15, N = 3 SE +/- 0.05, N = 3 SE +/- 0.13, N = 3 54.03 65.50 80.60 89.65 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfoamToVTK -ldynamicMesh -llagrangian -lgenericPatchFields -lfileFormats -lOpenFOAM -ldl -lm
Zstd Compression Compression Level: 19, Long Mode - Compression Speed HBv3 Milan-X HBv3 OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.0 Compression Level: 19, Long Mode - Compression Speed 120 Cores 64 Cores 13 26 39 52 65 SE +/- 0.50, N = 15 SE +/- 0.64, N = 3 SE +/- 0.29, N = 3 SE +/- 0.34, N = 15 52.8 59.8 36.6 39.8 1. (CC) gcc options: -O3 -pthread -lz -llzma
GraphicsMagick Operation: Noise-Gaussian HBv3 Milan-X HBv3 OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.33 Operation: Noise-Gaussian 120 Cores 64 Cores 200 400 600 800 1000 SE +/- 11.24, N = 15 SE +/- 4.48, N = 3 SE +/- 4.18, N = 3 SE +/- 6.24, N = 4 1123 874 721 585 1. (CC) gcc options: -fopenmp -O2 -pthread -ltiff -ljpeg -lX11 -llzma -lbz2 -lxml2 -lz -lm -lpthread
Timed Node.js Compilation Time To Compile HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better Timed Node.js Compilation 15.11 Time To Compile 120 Cores 64 Cores 20 40 60 80 100 SE +/- 0.32, N = 3 SE +/- 0.31, N = 3 SE +/- 0.28, N = 3 SE +/- 0.27, N = 3 72.45 93.80 75.71 96.35
Facebook RocksDB Test: Random Read HBv3 Milan-X HBv3 OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.22.1 Test: Random Read 120 Cores 64 Cores 110M 220M 330M 440M 550M SE +/- 1522212.52, N = 3 SE +/- 648314.38, N = 3 SE +/- 4680557.08, N = 7 SE +/- 95304.17, N = 3 522387301 330410911 502728808 324822447 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -O2 -fno-rtti -lgflags
Timed Linux Kernel Compilation Time To Compile HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 5.14 Time To Compile 120 Cores 64 Cores 6 12 18 24 30 SE +/- 0.12, N = 13 SE +/- 0.21, N = 8 SE +/- 0.16, N = 8 SE +/- 0.22, N = 7 18.56 23.91 19.07 24.16
OSPray Demo: San Miguel - Renderer: Path Tracer HBv3 Milan-X HBv3 OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: Path Tracer 120 Cores 64 Cores 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 8.50 4.93 7.39 4.32
Facebook RocksDB Test: Read Random Write Random HBv3 Milan-X HBv3 OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.22.1 Test: Read Random Write Random 120 Cores 64 Cores 400K 800K 1200K 1600K 2000K SE +/- 6175.84, N = 3 SE +/- 6520.37, N = 3 SE +/- 5368.09, N = 3 SE +/- 10799.70, N = 3 1684654 1381157 1587743 1357349 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -O2 -fno-rtti -lgflags
OSPray Demo: San Miguel - Renderer: SciVis HBv3 Milan-X HBv3 OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: San Miguel - Renderer: SciVis 120 Cores 64 Cores 20 40 60 80 100 SE +/- 0.95, N = 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 85.86 55.56 83.33 52.63
ASKAP Test: tConvolve MPI - Gridding HBv3 Milan-X HBv3 OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding 120 Cores 64 Cores 12K 24K 36K 48K 60K SE +/- 0.00, N = 3 SE +/- 143.87, N = 3 SE +/- 400.79, N = 3 57042.5 41160.1 41287.0 38175.0 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MPI - Degridding HBv3 Milan-X HBv3 OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding 120 Cores 64 Cores 12K 24K 36K 48K 60K SE +/- 0.00, N = 3 SE +/- 263.83, N = 3 SE +/- 146.90, N = 3 SE +/- 204.47, N = 3 57881.4 40896.3 41724.7 35988.0 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
OSPray Demo: XFrog Forest - Renderer: Path Tracer HBv3 Milan-X HBv3 OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: Path Tracer 120 Cores 64 Cores 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 9.90 6.20 9.09 5.70
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare HBv3 Milan-X HBv3 OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2021.2 Implementation: MPI CPU - Input: water_GMX50_bare 120 Cores 64 Cores 3 6 9 12 15 SE +/- 0.061, N = 3 SE +/- 0.020, N = 3 SE +/- 0.051, N = 3 SE +/- 0.061, N = 3 9.705 7.977 9.054 7.476 1. (CXX) g++ options: -O2 -pthread
NAMD ATPase Simulation - 327,506 Atoms HBv3 Milan-X HBv3 OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 120 Cores 64 Cores 0.0926 0.1852 0.2778 0.3704 0.463 SE +/- 0.00007, N = 3 SE +/- 0.00005, N = 3 SE +/- 0.00012, N = 3 SE +/- 0.00048, N = 3 0.26900 0.40802 0.27619 0.41157
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction 120 Cores 64 Cores 4 8 12 16 20 SE +/- 0.02331367, N = 3 SE +/- 0.04593850, N = 3 SE +/- 0.02189533, N = 3 SE +/- 0.43535891, N = 15 9.93829823 10.87805050 12.28599450 13.67810920 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OSPray Demo: XFrog Forest - Renderer: SciVis HBv3 Milan-X HBv3 OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: XFrog Forest - Renderer: SciVis 120 Cores 64 Cores 5 10 15 20 25 SE +/- 0.11, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 18.41 11.36 16.95 10.75
NAS Parallel Benchmarks Test / Class: CG.C HBv3 Milan-X HBv3 OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C 120 Cores 64 Cores 5K 10K 15K 20K 25K SE +/- 70.02, N = 3 SE +/- 34.77, N = 3 SE +/- 26.14, N = 3 SE +/- 51.72, N = 3 21914.51 22323.23 20926.52 20940.70 1. (F9X) gfortran options: -O3 -march=native -fexceptions -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Embree Binary: Pathtracer ISPC - Model: Crown HBv3 Milan-X HBv3 OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Crown 120 Cores 64 Cores 16 32 48 64 80 SE +/- 0.18, N = 3 SE +/- 0.07, N = 3 SE +/- 0.16, N = 3 SE +/- 0.08, N = 3 71.98 44.25 63.29 38.91
OSPray Demo: NASA Streamlines - Renderer: Path Tracer HBv3 Milan-X HBv3 OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: Path Tracer 120 Cores 64 Cores 7 14 21 28 35 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.20, N = 3 SE +/- 0.00, N = 3 27.78 17.24 24.59 15.87
Embree Binary: Pathtracer ISPC - Model: Asian Dragon HBv3 Milan-X HBv3 OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer ISPC - Model: Asian Dragon 120 Cores 64 Cores 20 40 60 80 100 SE +/- 0.12, N = 3 SE +/- 0.14, N = 3 SE +/- 0.14, N = 3 SE +/- 0.21, N = 3 76.52 45.66 63.44 42.00
Embree Binary: Pathtracer - Model: Crown HBv3 Milan-X HBv3 OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer - Model: Crown 120 Cores 64 Cores 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.15, N = 3 SE +/- 0.17, N = 3 SE +/- 0.05, N = 3 75.94 46.09 66.50 40.78
Embree Binary: Pathtracer - Model: Asian Dragon HBv3 Milan-X HBv3 OpenBenchmarking.org Frames Per Second, More Is Better Embree 3.13 Binary: Pathtracer - Model: Asian Dragon 120 Cores 64 Cores 20 40 60 80 100 SE +/- 0.21, N = 3 SE +/- 0.29, N = 3 SE +/- 0.28, N = 3 SE +/- 0.23, N = 3 78.77 45.68 64.37 41.86
LULESH HBv3 Milan-X HBv3 OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 120 Cores 64 Cores 12K 24K 36K 48K 60K SE +/- 209.57, N = 3 SE +/- 258.47, N = 3 SE +/- 286.97, N = 3 SE +/- 36.72, N = 3 47341.13 54759.69 40262.21 44262.23 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi
OSPray Demo: Magnetic Reconnection - Renderer: SciVis HBv3 Milan-X HBv3 OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: Magnetic Reconnection - Renderer: SciVis 120 Cores 64 Cores 15 30 45 60 75 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 66.67 40.00 62.50 38.46
OSPray Demo: NASA Streamlines - Renderer: SciVis HBv3 Milan-X HBv3 OpenBenchmarking.org FPS, More Is Better OSPray 1.8.5 Demo: NASA Streamlines - Renderer: SciVis 120 Cores 64 Cores 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 125.00 83.33 111.11 71.43
LAMMPS Molecular Dynamics Simulator Model: Rhodopsin Protein HBv3 Milan-X HBv3 OpenBenchmarking.org ns/day, More Is Better LAMMPS Molecular Dynamics Simulator 29Oct2020 Model: Rhodopsin Protein 120 Cores 64 Cores 9 18 27 36 45 SE +/- 0.29, N = 3 SE +/- 0.11, N = 3 SE +/- 0.11, N = 3 SE +/- 0.24, N = 3 38.47 33.96 35.41 32.96 1. (CXX) g++ options: -O2 -pthread -lm
Parboil Test: OpenMP CUTCP HBv3 Milan-X HBv3 OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 120 Cores 64 Cores 0.341 0.682 1.023 1.364 1.705 SE +/- 0.022647, N = 12 SE +/- 0.011448, N = 6 SE +/- 0.006046, N = 3 SE +/- 0.014450, N = 3 0.847720 1.127166 0.976470 1.515548 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
HPC Challenge Test / Class: Max Ping Pong Bandwidth HBv3 Milan-X HBv3 OpenBenchmarking.org MB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth 120 Cores 64 Cores 4K 8K 12K 16K 20K 16082.15 18347.87 15815.90 17174.75 1. (CC) gcc options: -lblas -lm -fexceptions -pthread -lmpi 2. ATLAS + Open MPI 4.0.5
HPC Challenge Test / Class: Random Ring Bandwidth HBv3 Milan-X HBv3 OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth 120 Cores 64 Cores 2 4 6 8 10 3.41538 6.20242 0.76414 1.82992 1. (CC) gcc options: -lblas -lm -fexceptions -pthread -lmpi 2. ATLAS + Open MPI 4.0.5
Phoronix Test Suite v10.8.5