Microsoft Azure HBv4 HPC Comparison Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

HC

Processor: 2 x Intel Xeon Platinum 8168 (44 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 60928 MB + 118272 MB + 176 GB, Disk: 32GB Virtual Disk + 752GB Virtual Disk, Graphics: hyperv_fb

OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Kernel Notes: Transparent Huge Pages: always
Compiler Notes: --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver
Processor Notes: CPU Microcode: 0xffffffff
Python Notes: Python 3.6.8
Security Notes: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown

HBv2

Changed Processor to 2 x AMD EPYC 7V12 64-Core (120 Cores).

Changed Memory to 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB.

Changed Disk to 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk.

Security Change: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

HBv3

Changed Processor to 2 x AMD EPYC 7V73X 64-Core (120 Cores).

Changed Disk to 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk.

Security Change: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

HBv4

Processor: 2 x AMD EPYC 9V33X 96-Core (176 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB, Disk: 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb

OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

Blender

HeFFTe - Highly Efficient FFT for Exascale

Blender

HeFFTe - Highly Efficient FFT for Exascale

NAS Parallel Benchmarks

Blender

NAS Parallel Benchmarks

HeFFTe - Highly Efficient FFT for Exascale

Blender

HeFFTe - Highly Efficient FFT for Exascale

Blender

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

NAS Parallel Benchmarks

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

OSPRay

HeFFTe - Highly Efficient FFT for Exascale

OSPRay

HeFFTe - Highly Efficient FFT for Exascale

OSPRay

HeFFTe - Highly Efficient FFT for Exascale

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

Liquid-DSP

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

NAS Parallel Benchmarks

Liquid-DSP

OSPRay

Liquid-DSP

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

oneDNN

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

oneDNN

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

oneDNN

Liquid-DSP

oneDNN

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

Liquid-DSP

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested or alternatively an allmodconfig for building all possible kernel modules for the build. Learn more via the OpenBenchmarking.org test page.

Liquid-DSP

PETSc

PETSc, the Portable, Extensible Toolkit for Scientific Computation, is for the scalable (parallel) solution of scientific applications modeled by partial differential equations. This test profile runs the PETSc "make streams" benchmark and records the throughput rate when all available cores are utilized for the MPI Streams build. Learn more via the OpenBenchmarking.org test page.

oneDNN

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

OSPRay

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

NAS Parallel Benchmarks

95 Results Shown

HC

OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Testing initiated at 4 July 2023 19:25 by user .

HBv2

Processor: 2 x AMD EPYC 7V12 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb

OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Kernel Notes: Transparent Huge Pages: always
Compiler Notes: --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver
Processor Notes: CPU Microcode: 0xffffffff
Python Notes: Python 3.6.8
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 3 July 2023 12:09 by user .

HBv3

Processor: 2 x AMD EPYC 7V73X 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb

OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Kernel Notes: Transparent Huge Pages: always
Compiler Notes: --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver
Processor Notes: CPU Microcode: 0xffffffff
Python Notes: Python 3.6.8
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 2 July 2023 12:27 by user .

HBv4

OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Kernel Notes: Transparent Huge Pages: always
Compiler Notes: --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver
Processor Notes: CPU Microcode: 0xffffffff
Python Notes: Python 3.6.8
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 1 July 2023 11:02 by user .

Microsoft Azure HBv4 HPC Comparison Benchmarks

View

Statistics

Graph Settings

Additional Graphs

Multi-Way Comparison

Table

Run Management

HC

HBv2

HBv3

HBv4

Pennant

HeFFTe - Highly Efficient FFT for Exascale

NAS Parallel Benchmarks

Blender

HeFFTe - Highly Efficient FFT for Exascale

Blender

HeFFTe - Highly Efficient FFT for Exascale

NAS Parallel Benchmarks

Blender

NAS Parallel Benchmarks

HeFFTe - Highly Efficient FFT for Exascale

Blender

HeFFTe - Highly Efficient FFT for Exascale

Blender

Pennant

NAS Parallel Benchmarks

7-Zip Compression

HeFFTe - Highly Efficient FFT for Exascale

OSPRay

HeFFTe - Highly Efficient FFT for Exascale

OSPRay

HeFFTe - Highly Efficient FFT for Exascale

OSPRay

HeFFTe - Highly Efficient FFT for Exascale

OSPRay

HeFFTe - Highly Efficient FFT for Exascale

Liquid-DSP

HeFFTe - Highly Efficient FFT for Exascale

Liquid-DSP

NAMD

High Performance Conjugate Gradient

NAS Parallel Benchmarks

Liquid-DSP

OSPRay

Liquid-DSP

oneDNN

PostgreSQL

oneDNN

PostgreSQL

oneDNN

Timed Node.js Compilation

oneDNN

Liquid-DSP

oneDNN

Intel Open Image Denoise

Remhos

Intel Open Image Denoise

Laghos

Liquid-DSP

Timed Linux Kernel Compilation

Liquid-DSP

PETSc

oneDNN

7-Zip Compression

OSPRay

ACES DGEMM

HeFFTe - Highly Efficient FFT for Exascale

libxsmm

NAS Parallel Benchmarks

95 Results Shown

HC

HBv2

HBv3

HBv4