Microsoft Azure HBv4 HPC Performance Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

HC

Processor: 2 x Intel Xeon Platinum 8168 (44 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 60928 MB + 118272 MB + 176 GB, Disk: 32GB Virtual Disk + 752GB Virtual Disk, Graphics: hyperv_fb

OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Kernel Notes: Transparent Huge Pages: always
Environment Notes: CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native"
Compiler Notes: --disable-multilib --enable-checking=release
Processor Notes: CPU Microcode: 0xffffffff
Python Notes: Python 3.6.8
Security Notes: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown

HBv2

Changed Processor to 2 x AMD EPYC 7V12 64-Core (120 Cores).

Changed Memory to 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB.

Changed Disk to 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk.

Security Change: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

HBv3

Changed Processor to 2 x AMD EPYC 7V73X 64-Core (120 Cores).

Changed Disk to 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk.

Security Change: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

HBv4

Changed Processor to 2 x AMD EPYC 9V33X 96-Core (176 Cores).

Changed Memory to 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB.

Changed Disk to 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk.

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Blender

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

Result

Performance / Cost

Perf Per Core

Result Confidence

PETSc

PETSc, the Portable, Extensible Toolkit for Scientific Computation, is for the scalable (parallel) solution of scientific applications modeled by partial differential equations. This test profile runs the PETSc "make streams" benchmark and records the throughput rate when all available cores are utilized for the MPI Streams build. Learn more via the OpenBenchmarking.org test page.

Result

Performance / Cost

Perf Per Core

Result Confidence

Geometric Mean Of All Test Results

73 Results Shown

High Performance Conjugate Gradient:
104 104 104 - 60
144 144 144 - 60
160 160 160 - 60
NAS Parallel Benchmarks:
BT.C
CG.C
FT.C
IS.D
MG.C
SP.C
NAMD
libxsmm:
128
256
32
64
Laghos:
Triple Point Problem
Sedov Blast Wave, ube_922_hex.mesh
HeFFTe - Highly Efficient FFT for Exascale:
c2c - FFTW - float - 256
c2c - FFTW - float - 512
r2c - FFTW - float - 512
c2c - FFTW - double - 512
c2c - Stock - float - 256
c2c - Stock - float - 512
r2c - FFTW - double - 256
r2c - FFTW - double - 512
r2c - Stock - float - 512
c2c - Stock - double - 512
r2c - Stock - double - 256
r2c - Stock - double - 512
c2c - FFTW - float-long - 256
c2c - FFTW - float-long - 512
r2c - FFTW - float-long - 256
r2c - FFTW - float-long - 512
c2c - FFTW - double-long - 512
c2c - Stock - float-long - 256
c2c - Stock - float-long - 512
r2c - FFTW - double-long - 256
r2c - FFTW - double-long - 512
r2c - Stock - float-long - 512
c2c - Stock - double-long - 512
r2c - Stock - double-long - 256
r2c - Stock - double-long - 512
Pennant:
sedovbig
leblancbig
ACES DGEMM
Intel Open Image Denoise:
RT.hdr_alb_nrm.3840x2160 - CPU-Only
RT.ldr_alb_nrm.3840x2160 - CPU-Only
RTLightmap.hdr.4096x4096 - CPU-Only
OSPRay:
particle_volume/ao/real_time
particle_volume/scivis/real_time
particle_volume/pathtracer/real_time
gravity_spheres_volume/dim_512/ao/real_time
gravity_spheres_volume/dim_512/scivis/real_time
gravity_spheres_volume/dim_512/pathtracer/real_time
7-Zip Compression:
Compression Rating
Decompression Rating
Timed Node.js Compilation
oneDNN:
Recurrent Neural Network Training - bf16bf16bf16 - CPU
Recurrent Neural Network Inference - bf16bf16bf16 - CPU
Liquid-DSP:
128 - 256 - 57
176 - 256 - 32
176 - 256 - 57
176 - 256 - 512
PostgreSQL:
1 - 500 - Read Only
1 - 500 - Read Only - Average Latency
1 - 800 - Read Only
1 - 800 - Read Only - Average Latency
Blender:
BMW27 - CPU-Only
Classroom - CPU-Only
Fishy Cat - CPU-Only
Barbershop - CPU-Only
Pabellon Barcelona - CPU-Only
PETSc
Geometric Mean Of All Test Results

HC

OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Testing initiated at 27 July 2023 22:13 by user .

HBv2

Processor: 2 x AMD EPYC 7V12 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb

OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Kernel Notes: Transparent Huge Pages: always
Environment Notes: CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native"
Compiler Notes: --disable-multilib --enable-checking=release
Processor Notes: CPU Microcode: 0xffffffff
Python Notes: Python 3.6.8
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 27 July 2023 16:01 by user .

HBv3

Processor: 2 x AMD EPYC 7V73X 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb

OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Kernel Notes: Transparent Huge Pages: always
Environment Notes: CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native"
Compiler Notes: --disable-multilib --enable-checking=release
Processor Notes: CPU Microcode: 0xffffffff
Python Notes: Python 3.6.8
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 27 July 2023 10:21 by user .

HBv4

Processor: 2 x AMD EPYC 9V33X 96-Core (176 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB, Disk: 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb

OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft

Kernel Notes: Transparent Huge Pages: always
Environment Notes: CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native"
Compiler Notes: --disable-multilib --enable-checking=release
Processor Notes: CPU Microcode: 0xffffffff
Python Notes: Python 3.6.8
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 26 July 2023 23:36 by user .

Microsoft Azure HBv4 HPC Performance Benchmarks

View

Limit displaying results to tests within:

Statistics

Graph Settings

Additional Graphs

Multi-Way Comparison

Table

Run Management

HC

HBv2

HBv3

HBv4

High Performance Conjugate Gradient

NAS Parallel Benchmarks

NAMD

libxsmm

Laghos

HeFFTe - Highly Efficient FFT for Exascale

Pennant

ACES DGEMM

Intel Open Image Denoise

OSPRay

7-Zip Compression

Timed Node.js Compilation

oneDNN

Liquid-DSP

PostgreSQL

Blender

PETSc

Geometric Mean Of All Test Results

73 Results Shown

HC

HBv2

HBv3

HBv4