Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2308011-PTS-AZUREHBV71
Microsoft Azure HBv4 HPC Performance Benchmarks
Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..
HBv4:
Processor: 2 x AMD EPYC 9V33X 96-Core (176 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB, Disk: 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb
OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft
HBv3:
Processor: 2 x AMD EPYC 7V73X 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb
OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft
HBv2:
Processor: 2 x AMD EPYC 7V12 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb
OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft
HC:
Processor: 2 x Intel Xeon Platinum 8168 (44 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 60928 MB + 118272 MB + 176 GB, Disk: 32GB Virtual Disk + 752GB Virtual Disk, Graphics: hyperv_fb
OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft
High Performance Conjugate Gradient 3.1
X Y Z: 104 104 104 - RT: 60
GFLOP/s > Higher Is Better
HBv4 . 89.38 |=================================================================
HBv3 . 39.61 |=============================
HBv2 . 37.04 |===========================
HC ... 26.00 |===================
High Performance Conjugate Gradient 3.1
X Y Z: 144 144 144 - RT: 60
GFLOP/s > Higher Is Better
HBv4 . 88.52 |=================================================================
HBv3 . 38.97 |=============================
HBv2 . 36.09 |===========================
HC ... 25.87 |===================
High Performance Conjugate Gradient 3.1
X Y Z: 160 160 160 - RT: 60
GFLOP/s > Higher Is Better
HBv4 . 87.90 |=================================================================
HBv3 . 39.11 |=============================
HBv2 . 36.02 |===========================
HC ... 25.56 |===================
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 256.35 |================================================================
HBv3 . 103.51 |==========================
HBv2 . 91.54 |=======================
HC ... 58.36 |===============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 355.86 |================================================================
HBv3 . 135.69 |========================
HBv2 . 95.88 |=================
HC ... 62.98 |===========
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 622.58 |================================================================
HBv3 . 254.25 |==========================
HBv2 . 191.78 |====================
HC ... 114.03 |============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 159.18 |================================================================
HBv3 . 57.33 |=======================
HBv2 . 47.61 |===================
HC ... 33.52 |=============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: Stock - Precision: float - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 244.34 |================================================================
HBv3 . 103.41 |===========================
HBv2 . 91.26 |========================
HC ... 59.73 |================
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: Stock - Precision: float - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 323.36 |================================================================
HBv3 . 123.24 |========================
HBv2 . 93.79 |===================
HC ... 57.76 |===========
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 261.90 |================================================================
HBv3 . 103.25 |=========================
HBv2 . 91.92 |======================
HC ... 57.31 |==============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 314.34 |================================================================
HBv3 . 121.28 |=========================
HBv2 . 91.48 |===================
HC ... 60.88 |============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: Stock - Precision: float - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 596.23 |================================================================
HBv3 . 232.17 |=========================
HBv2 . 190.95 |====================
HC ... 110.05 |============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: Stock - Precision: double - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 154.65 |================================================================
HBv3 . 56.22 |=======================
HBv2 . 46.98 |===================
HC ... 31.57 |=============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: Stock - Precision: double - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 264.95 |================================================================
HBv3 . 102.70 |=========================
HBv2 . 93.31 |=======================
HC ... 60.57 |===============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: Stock - Precision: double - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 311.80 |================================================================
HBv3 . 117.73 |========================
HBv2 . 94.53 |===================
HC ... 59.82 |============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 255.97 |================================================================
HBv3 . 105.09 |==========================
HBv2 . 90.79 |=======================
HC ... 58.55 |===============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 355.51 |================================================================
HBv3 . 135.95 |========================
HBv2 . 96.49 |=================
HC ... 62.90 |===========
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 427.10 |================================================================
HBv3 . 221.86 |=================================
HBv2 . 200.04 |==============================
HC ... 122.77 |==================
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 624.95 |================================================================
HBv3 . 257.42 |==========================
HBv2 . 191.14 |====================
HC ... 113.94 |============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 159.26 |================================================================
HBv3 . 57.23 |=======================
HBv2 . 47.37 |===================
HC ... 33.55 |=============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 247.73 |================================================================
HBv3 . 105.36 |===========================
HBv2 . 92.13 |========================
HC ... 59.55 |===============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 323.70 |================================================================
HBv3 . 124.60 |=========================
HBv2 . 93.26 |==================
HC ... 57.92 |===========
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 273.12 |================================================================
HBv3 . 106.63 |=========================
HBv2 . 88.61 |=====================
HC ... 57.13 |=============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 315.98 |================================================================
HBv3 . 120.96 |========================
HBv2 . 91.43 |===================
HC ... 60.82 |============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 590.93 |================================================================
HBv3 . 233.80 |=========================
HBv2 . 189.21 |====================
HC ... 110.20 |============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 154.57 |================================================================
HBv3 . 56.27 |=======================
HBv2 . 46.93 |===================
HC ... 31.58 |=============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256
GFLOP/s > Higher Is Better
HBv4 . 258.72 |================================================================
HBv3 . 105.50 |==========================
HBv2 . 92.39 |=======================
HC ... 60.89 |===============
HeFFTe - Highly Efficient FFT for Exascale 2.3
Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512
GFLOP/s > Higher Is Better
HBv4 . 311.27 |================================================================
HBv3 . 118.24 |========================
HBv2 . 95.20 |====================
HC ... 59.90 |============
ACES DGEMM 1.0
Sustained Floating-Point Rate
GFLOP/s > Higher Is Better
HBv4 . 52.802440 |=============================================================
HBv3 . 25.048352 |=============================
HBv2 . 6.395415 |=======
HC ... 14.072027 |================
libxsmm 2-1.17-3645
M N K: 128
GFLOPS/s > Higher Is Better
HBv4 . 6655.2 |================================================================
HBv3 . 2273.5 |======================
HBv2 . 1011.4 |==========
HC ... 1284.8 |============
libxsmm 2-1.17-3645
M N K: 256
GFLOPS/s > Higher Is Better
HBv4 . 6908.6 |================================================================
HBv3 . 2045.7 |===================
HBv2 . 1128.3 |==========
HC ... 904.1 |========
libxsmm 2-1.17-3645
M N K: 32
GFLOPS/s > Higher Is Better
HBv4 . 6163.0 |================================================================
HBv3 . 1438.1 |===============
HBv2 . 164.8 |==
HC ... 384.9 |====
libxsmm 2-1.17-3645
M N K: 64
GFLOPS/s > Higher Is Better
HBv4 . 5898.2 |================================================================
HBv3 . 2413.7 |==========================
HBv2 . 331.4 |====
HC ... 748.1 |========
Intel Open Image Denoise 2.0
Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only
Images / Sec > Higher Is Better
HBv4 . 3.11 |==================================================================
HBv3 . 1.72 |=====================================
HBv2 . 2.03 |===========================================
HC ... 1.85 |=======================================
Intel Open Image Denoise 2.0
Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only
Images / Sec > Higher Is Better
HBv4 . 3.08 |==================================================================
HBv3 . 1.69 |====================================
HBv2 . 2.01 |===========================================
HC ... 1.85 |========================================
Intel Open Image Denoise 2.0
Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only
Images / Sec > Higher Is Better
HBv4 . 1.32 |==================================================================
HBv3 . 0.80 |========================================
HBv2 . 0.96 |================================================
HC ... 0.87 |============================================
OSPRay 2.12
Benchmark: particle_volume/ao/real_time
Items Per Second > Higher Is Better
HBv4 . 36.65480 |==============================================================
HBv3 . 24.47100 |=========================================
HBv2 . 22.36680 |======================================
HC ... 8.99618 |===============
OSPRay 2.12
Benchmark: particle_volume/scivis/real_time
Items Per Second > Higher Is Better
HBv4 . 36.54460 |==============================================================
HBv3 . 24.21970 |=========================================
HBv2 . 22.17470 |======================================
HC ... 8.87831 |===============
OSPRay 2.12
Benchmark: particle_volume/pathtracer/real_time
Items Per Second > Higher Is Better
HBv4 . 208.05 |================================================================
HBv3 . 167.50 |====================================================
HBv2 . 162.45 |==================================================
HC ... 96.76 |==============================
OSPRay 2.12
Benchmark: gravity_spheres_volume/dim_512/ao/real_time
Items Per Second > Higher Is Better
HBv4 . 38.07690 |==============================================================
HBv3 . 11.75010 |===================
HBv2 . 8.66888 |==============
HC ... 9.52293 |================
OSPRay 2.12
Benchmark: gravity_spheres_volume/dim_512/scivis/real_time
Items Per Second > Higher Is Better
HBv4 . 37.06240 |==============================================================
HBv3 . 11.17230 |===================
HBv2 . 8.32323 |==============
HC ... 9.02689 |===============
OSPRay 2.12
Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time
Items Per Second > Higher Is Better
HBv4 . 32.58 |=================================================================
HBv3 . 14.61 |=============================
HBv2 . 13.94 |============================
HC ... 10.06 |====================
Laghos 3.1
Test: Triple Point Problem
Major Kernels Total Rate > Higher Is Better
HBv4 . 228.15 |================================================================
HBv3 . 192.74 |======================================================
HBv2 . 183.82 |====================================================
HC ... 156.52 |============================================
Laghos 3.1
Test: Sedov Blast Wave, ube_922_hex.mesh
Major Kernels Total Rate > Higher Is Better
HBv4 . 402.94 |================================================================
HBv3 . 361.81 |=========================================================
HBv2 . 345.14 |=======================================================
HC ... 247.49 |=======================================
PETSc 3.19
Test: Streams
MB/s > Higher Is Better
HBv4 . 598417.70 |=============================================================
HBv3 . 284001.92 |=============================
HBv2 . 197895.47 |====================
HC ... 151286.25 |===============
7-Zip Compression 22.01
Test: Compression Rating
MIPS > Higher Is Better
HBv4 . 1083523 |===============================================================
HBv3 . 566595 |=================================
HBv2 . 501534 |=============================
HC ... 216451 |=============
7-Zip Compression 22.01
Test: Decompression Rating
MIPS > Higher Is Better
HBv4 . 742859 |================================================================
HBv3 . 406516 |===================================
HBv2 . 388577 |=================================
HC ... 150841 |=============
Liquid-DSP 1.6
Threads: 128 - Buffer Length: 256 - Filter Length: 57
samples/s > Higher Is Better
HBv4 . 5412900000 |============================================================
HBv3 . 4216966667 |===============================================
HBv2 . 4309133333 |================================================
HC ... 1570633333 |=================
Liquid-DSP 1.6
Threads: 176 - Buffer Length: 256 - Filter Length: 32
samples/s > Higher Is Better
HBv4 . 6181766667 |============================================================
HBv3 . 3864000000 |======================================
HBv2 . 4275533333 |=========================================
HC ... 1536633333 |===============
Liquid-DSP 1.6
Threads: 176 - Buffer Length: 256 - Filter Length: 57
samples/s > Higher Is Better
HBv4 . 7095033333 |============================================================
HBv3 . 4281533333 |====================================
HBv2 . 4350100000 |=====================================
HC ... 1683033333 |==============
Liquid-DSP 1.6
Threads: 176 - Buffer Length: 256 - Filter Length: 512
samples/s > Higher Is Better
HBv4 . 2221966667 |============================================================
HBv3 . 814950000 |======================
HBv2 . 924243333 |=========================
HC ... 544626667 |===============
NAS Parallel Benchmarks 3.4
Test / Class: BT.C
Total Mop/s > Higher Is Better
HBv4 . 744413.90 |=============================================================
HBv3 . 313813.98 |==========================
HBv2 . 241509.88 |====================
HC ... 106230.52 |=========
NAS Parallel Benchmarks 3.4
Test / Class: CG.C
Total Mop/s > Higher Is Better
HBv4 . 74101.94 |==============================================================
HBv3 . 36681.43 |===============================
HBv2 . 36367.35 |==============================
HC ... 27619.05 |=======================
NAS Parallel Benchmarks 3.4
Test / Class: FT.C
Total Mop/s > Higher Is Better
HBv4 . 230164.79 |=============================================================
HBv3 . 102122.36 |===========================
HBv2 . 98485.23 |==========================
HC ... 55288.19 |===============
NAS Parallel Benchmarks 3.4
Test / Class: IS.D
Total Mop/s > Higher Is Better
HBv4 . 12967.37 |==============================================================
HBv3 . 5730.01 |===========================
HBv2 . 3977.02 |===================
HC ... 1864.68 |=========
NAS Parallel Benchmarks 3.4
Test / Class: MG.C
Total Mop/s > Higher Is Better
HBv4 . 437417.16 |=============================================================
HBv3 . 131635.41 |==================
HBv2 . 108985.72 |===============
HC ... 63404.01 |=========
NAS Parallel Benchmarks 3.4
Test / Class: SP.C
Total Mop/s > Higher Is Better
HBv4 . 427298.99 |=============================================================
HBv3 . 205795.59 |=============================
HBv2 . 104771.90 |===============
HC ... 41543.94 |======
PostgreSQL 15
Scaling Factor: 1 - Clients: 500 - Mode: Read Only
TPS > Higher Is Better
HBv4 . 3161848 |===============================================================
HBv3 . 2434749 |=================================================
HBv2 . 2467328 |=================================================
HC ... 1353510 |===========================
PostgreSQL 15
Scaling Factor: 1 - Clients: 800 - Mode: Read Only
TPS > Higher Is Better
HBv4 . 3146173 |===============================================================
HBv3 . 2478917 |==================================================
HBv2 . 2481320 |==================================================
HC ... 1159492 |=======================
NAMD 2.14
ATPase Simulation - 327,506 Atoms
days/ns < Lower Is Better
HBv4 . 0.14380 |=================
HBv3 . 0.27111 |================================
HBv2 . 0.26505 |================================
HC ... 0.52697 |===============================================================
Pennant 1.0.1
Test: sedovbig
Hydro Cycle Time - Seconds < Lower Is Better
HBv4 . 3.581391 |=========
HBv3 . 6.277107 |===============
HBv2 . 5.915805 |==============
HC ... 25.019560 |=============================================================
Pennant 1.0.1
Test: leblancbig
Hydro Cycle Time - Seconds < Lower Is Better
HBv4 . 2.122074 |============
HBv3 . 3.649317 |=====================
HBv2 . 3.466885 |====================
HC ... 10.645480 |=============================================================
oneDNN 3.1
Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU
ms < Lower Is Better
HBv4 . 533.49 |=========================
HBv3 . 886.81 |=========================================
HBv2 . 1367.73 |===============================================================
HC ... 707.32 |=================================
oneDNN 3.1
Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU
ms < Lower Is Better
HBv4 . 411.23 |=============================
HBv3 . 529.97 |=====================================
HBv2 . 910.94 |================================================================
HC ... 442.47 |===============================
PostgreSQL 15
Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency
ms < Lower Is Better
HBv4 . 0.158 |============================
HBv3 . 0.206 |====================================
HBv2 . 0.203 |====================================
HC ... 0.369 |=================================================================
PostgreSQL 15
Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency
ms < Lower Is Better
HBv4 . 0.254 |========================
HBv3 . 0.323 |==============================
HBv2 . 0.323 |==============================
HC ... 0.690 |=================================================================
Timed Node.js Compilation 19.8.1
Time To Compile
Seconds < Lower Is Better
HBv4 . 150.56 |=============================
HBv3 . 185.57 |====================================
HBv2 . 194.37 |======================================
HC ... 330.61 |================================================================
Blender 3.6
Blend File: BMW27 - Compute: CPU-Only
Seconds < Lower Is Better
HBv4 . 10.11 |=============
HBv3 . 19.43 |=========================
HBv2 . 19.58 |=========================
HC ... 49.95 |=================================================================
Blender 3.6
Blend File: Classroom - Compute: CPU-Only
Seconds < Lower Is Better
HBv4 . 25.61 |============
HBv3 . 50.71 |=======================
HBv2 . 50.95 |========================
HC ... 138.51 |================================================================
Blender 3.6
Blend File: Fishy Cat - Compute: CPU-Only
Seconds < Lower Is Better
HBv4 . 13.74 |============
HBv3 . 25.59 |=======================
HBv2 . 26.43 |========================
HC ... 71.76 |=================================================================
Blender 3.6
Blend File: Barbershop - Compute: CPU-Only
Seconds < Lower Is Better
HBv4 . 97.52 |============
HBv3 . 188.96 |=======================
HBv2 . 211.46 |==========================
HC ... 526.93 |================================================================
Blender 3.6
Blend File: Pabellon Barcelona - Compute: CPU-Only
Seconds < Lower Is Better
HBv4 . 33.01 |============
HBv3 . 62.90 |=======================
HBv2 . 64.84 |========================
HC ... 175.07 |================================================================