Microsoft Azure HBv4 HPC Performance Benchmarks Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance.. HC: Processor: 2 x Intel Xeon Platinum 8168 (44 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 60928 MB + 118272 MB + 176 GB, Disk: 32GB Virtual Disk + 752GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv2: Processor: 2 x AMD EPYC 7V12 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv3: Processor: 2 x AMD EPYC 7V73X 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv4: Processor: 2 x AMD EPYC 9V33X 96-Core (176 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB, Disk: 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft 7-Zip Compression 22.01 Test: Compression Rating MIPS > Higher Is Better HC ... 216451 |============= HBv2 . 501534 |============================= HBv3 . 566595 |================================= HBv4 . 1083523 |=============================================================== 7-Zip Compression 22.01 Test: Decompression Rating MIPS > Higher Is Better HC ... 150841 |============= HBv2 . 388577 |================================= HBv3 . 406516 |=================================== HBv4 . 742859 |================================================================ ACES DGEMM 1.0 Sustained Floating-Point Rate GFLOP/s > Higher Is Better HC ... 14.072027 |================ HBv2 . 6.395415 |======= HBv3 . 25.048352 |============================= HBv4 . 52.802440 |============================================================= Blender 3.6 Blend File: BMW27 - Compute: CPU-Only Seconds < Lower Is Better HC ... 49.95 |================================================================= HBv2 . 19.58 |========================= HBv3 . 19.43 |========================= HBv4 . 10.11 |============= Blender 3.6 Blend File: Classroom - Compute: CPU-Only Seconds < Lower Is Better HC ... 138.51 |================================================================ HBv2 . 50.95 |======================== HBv3 . 50.71 |======================= HBv4 . 25.61 |============ Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only Seconds < Lower Is Better HC ... 71.76 |================================================================= HBv2 . 26.43 |======================== HBv3 . 25.59 |======================= HBv4 . 13.74 |============ Blender 3.6 Blend File: Barbershop - Compute: CPU-Only Seconds < Lower Is Better HC ... 526.93 |================================================================ HBv2 . 211.46 |========================== HBv3 . 188.96 |======================= HBv4 . 97.52 |============ Blender 3.6 Blend File: Pabellon Barcelona - Compute: CPU-Only Seconds < Lower Is Better HC ... 175.07 |================================================================ HBv2 . 64.84 |======================== HBv3 . 62.90 |======================= HBv4 . 33.01 |============ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 58.36 |=============== HBv2 . 91.54 |======================= HBv3 . 103.51 |========================== HBv4 . 256.35 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 62.98 |=========== HBv2 . 95.88 |================= HBv3 . 135.69 |======================== HBv4 . 355.86 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 114.03 |============ HBv2 . 191.78 |==================== HBv3 . 254.25 |========================== HBv4 . 622.58 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 33.52 |============= HBv2 . 47.61 |=================== HBv3 . 57.33 |======================= HBv4 . 159.18 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 59.73 |================ HBv2 . 91.26 |======================== HBv3 . 103.41 |=========================== HBv4 . 244.34 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 57.76 |=========== HBv2 . 93.79 |=================== HBv3 . 123.24 |======================== HBv4 . 323.36 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 57.31 |============== HBv2 . 91.92 |====================== HBv3 . 103.25 |========================= HBv4 . 261.90 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 60.88 |============ HBv2 . 91.48 |=================== HBv3 . 121.28 |========================= HBv4 . 314.34 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 110.05 |============ HBv2 . 190.95 |==================== HBv3 . 232.17 |========================= HBv4 . 596.23 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 31.57 |============= HBv2 . 46.98 |=================== HBv3 . 56.22 |======================= HBv4 . 154.65 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 60.57 |=============== HBv2 . 93.31 |======================= HBv3 . 102.70 |========================= HBv4 . 264.95 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 59.82 |============ HBv2 . 94.53 |=================== HBv3 . 117.73 |======================== HBv4 . 311.80 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 58.55 |=============== HBv2 . 90.79 |======================= HBv3 . 105.09 |========================== HBv4 . 255.97 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 62.90 |=========== HBv2 . 96.49 |================= HBv3 . 135.95 |======================== HBv4 . 355.51 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 122.77 |================== HBv2 . 200.04 |============================== HBv3 . 221.86 |================================= HBv4 . 427.10 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 113.94 |============ HBv2 . 191.14 |==================== HBv3 . 257.42 |========================== HBv4 . 624.95 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 33.55 |============= HBv2 . 47.37 |=================== HBv3 . 57.23 |======================= HBv4 . 159.26 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 59.55 |=============== HBv2 . 92.13 |======================== HBv3 . 105.36 |=========================== HBv4 . 247.73 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 57.92 |=========== HBv2 . 93.26 |================== HBv3 . 124.60 |========================= HBv4 . 323.70 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 57.13 |============= HBv2 . 88.61 |===================== HBv3 . 106.63 |========================= HBv4 . 273.12 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 60.82 |============ HBv2 . 91.43 |=================== HBv3 . 120.96 |======================== HBv4 . 315.98 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 110.20 |============ HBv2 . 189.21 |==================== HBv3 . 233.80 |========================= HBv4 . 590.93 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 31.58 |============= HBv2 . 46.93 |=================== HBv3 . 56.27 |======================= HBv4 . 154.57 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ... 60.89 |=============== HBv2 . 92.39 |======================= HBv3 . 105.50 |========================== HBv4 . 258.72 |================================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ... 59.90 |============ HBv2 . 95.20 |==================== HBv3 . 118.24 |======================== HBv4 . 311.27 |================================================================ High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 GFLOP/s > Higher Is Better HC ... 26.00 |=================== HBv2 . 37.04 |=========================== HBv3 . 39.61 |============================= HBv4 . 89.38 |================================================================= High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 GFLOP/s > Higher Is Better HC ... 25.87 |=================== HBv2 . 36.09 |=========================== HBv3 . 38.97 |============================= HBv4 . 88.52 |================================================================= High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 GFLOP/s > Higher Is Better HC ... 25.56 |=================== HBv2 . 36.02 |=========================== HBv3 . 39.11 |============================= HBv4 . 87.90 |================================================================= Intel Open Image Denoise 2.0 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only Images / Sec > Higher Is Better HC ... 1.85 |======================================= HBv2 . 2.03 |=========================================== HBv3 . 1.72 |===================================== HBv4 . 3.11 |================================================================== Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only Images / Sec > Higher Is Better HC ... 1.85 |======================================== HBv2 . 2.01 |=========================================== HBv3 . 1.69 |==================================== HBv4 . 3.08 |================================================================== Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only Images / Sec > Higher Is Better HC ... 0.87 |============================================ HBv2 . 0.96 |================================================ HBv3 . 0.80 |======================================== HBv4 . 1.32 |================================================================== Laghos 3.1 Test: Triple Point Problem Major Kernels Total Rate > Higher Is Better HC ... 156.52 |============================================ HBv2 . 183.82 |==================================================== HBv3 . 192.74 |====================================================== HBv4 . 228.15 |================================================================ Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh Major Kernels Total Rate > Higher Is Better HC ... 247.49 |======================================= HBv2 . 345.14 |======================================================= HBv3 . 361.81 |========================================================= HBv4 . 402.94 |================================================================ libxsmm 2-1.17-3645 M N K: 128 GFLOPS/s > Higher Is Better HC ... 1284.8 |============ HBv2 . 1011.4 |========== HBv3 . 2273.5 |====================== HBv4 . 6655.2 |================================================================ libxsmm 2-1.17-3645 M N K: 256 GFLOPS/s > Higher Is Better HC ... 904.1 |======== HBv2 . 1128.3 |========== HBv3 . 2045.7 |=================== HBv4 . 6908.6 |================================================================ libxsmm 2-1.17-3645 M N K: 32 GFLOPS/s > Higher Is Better HC ... 384.9 |==== HBv2 . 164.8 |== HBv3 . 1438.1 |=============== HBv4 . 6163.0 |================================================================ libxsmm 2-1.17-3645 M N K: 64 GFLOPS/s > Higher Is Better HC ... 748.1 |======== HBv2 . 331.4 |==== HBv3 . 2413.7 |========================== HBv4 . 5898.2 |================================================================ Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better HC ... 1570633333 |================= HBv2 . 4309133333 |================================================ HBv3 . 4216966667 |=============================================== HBv4 . 5412900000 |============================================================ Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better HC ... 1536633333 |=============== HBv2 . 4275533333 |========================================= HBv3 . 3864000000 |====================================== HBv4 . 6181766667 |============================================================ Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better HC ... 1683033333 |============== HBv2 . 4350100000 |===================================== HBv3 . 4281533333 |==================================== HBv4 . 7095033333 |============================================================ Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better HC ... 544626667 |=============== HBv2 . 924243333 |========================= HBv3 . 814950000 |====================== HBv4 . 2221966667 |============================================================ NAMD 2.14 ATPase Simulation - 327,506 Atoms days/ns < Lower Is Better HC ... 0.52697 |=============================================================== HBv2 . 0.26505 |================================ HBv3 . 0.27111 |================================ HBv4 . 0.14380 |================= NAS Parallel Benchmarks 3.4 Test / Class: BT.C Total Mop/s > Higher Is Better HC ... 106230.52 |========= HBv2 . 241509.88 |==================== HBv3 . 313813.98 |========================== HBv4 . 744413.90 |============================================================= NAS Parallel Benchmarks 3.4 Test / Class: CG.C Total Mop/s > Higher Is Better HC ... 27619.05 |======================= HBv2 . 36367.35 |============================== HBv3 . 36681.43 |=============================== HBv4 . 74101.94 |============================================================== NAS Parallel Benchmarks 3.4 Test / Class: FT.C Total Mop/s > Higher Is Better HC ... 55288.19 |=============== HBv2 . 98485.23 |========================== HBv3 . 102122.36 |=========================== HBv4 . 230164.79 |============================================================= NAS Parallel Benchmarks 3.4 Test / Class: IS.D Total Mop/s > Higher Is Better HC ... 1864.68 |========= HBv2 . 3977.02 |=================== HBv3 . 5730.01 |=========================== HBv4 . 12967.37 |============================================================== NAS Parallel Benchmarks 3.4 Test / Class: MG.C Total Mop/s > Higher Is Better HC ... 63404.01 |========= HBv2 . 108985.72 |=============== HBv3 . 131635.41 |================== HBv4 . 437417.16 |============================================================= NAS Parallel Benchmarks 3.4 Test / Class: SP.C Total Mop/s > Higher Is Better HC ... 41543.94 |====== HBv2 . 104771.90 |=============== HBv3 . 205795.59 |============================= HBv4 . 427298.99 |============================================================= oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better HC ... 707.32 |================================= HBv2 . 1367.73 |=============================================================== HBv3 . 886.81 |========================================= HBv4 . 533.49 |========================= oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better HC ... 442.47 |=============================== HBv2 . 910.94 |================================================================ HBv3 . 529.97 |===================================== HBv4 . 411.23 |============================= OSPRay 2.12 Benchmark: particle_volume/ao/real_time Items Per Second > Higher Is Better HC ... 8.99618 |=============== HBv2 . 22.36680 |====================================== HBv3 . 24.47100 |========================================= HBv4 . 36.65480 |============================================================== OSPRay 2.12 Benchmark: particle_volume/scivis/real_time Items Per Second > Higher Is Better HC ... 8.87831 |=============== HBv2 . 22.17470 |====================================== HBv3 . 24.21970 |========================================= HBv4 . 36.54460 |============================================================== OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time Items Per Second > Higher Is Better HC ... 96.76 |============================== HBv2 . 162.45 |================================================== HBv3 . 167.50 |==================================================== HBv4 . 208.05 |================================================================ OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time Items Per Second > Higher Is Better HC ... 9.52293 |================ HBv2 . 8.66888 |============== HBv3 . 11.75010 |=================== HBv4 . 38.07690 |============================================================== OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time Items Per Second > Higher Is Better HC ... 9.02689 |=============== HBv2 . 8.32323 |============== HBv3 . 11.17230 |=================== HBv4 . 37.06240 |============================================================== OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time Items Per Second > Higher Is Better HC ... 10.06 |==================== HBv2 . 13.94 |============================ HBv3 . 14.61 |============================= HBv4 . 32.58 |================================================================= Pennant 1.0.1 Test: sedovbig Hydro Cycle Time - Seconds < Lower Is Better HC ... 25.019560 |============================================================= HBv2 . 5.915805 |============== HBv3 . 6.277107 |=============== HBv4 . 3.581391 |========= Pennant 1.0.1 Test: leblancbig Hydro Cycle Time - Seconds < Lower Is Better HC ... 10.645480 |============================================================= HBv2 . 3.466885 |==================== HBv3 . 3.649317 |===================== HBv4 . 2.122074 |============ PETSc 3.19 Test: Streams MB/s > Higher Is Better HC ... 151286.25 |=============== HBv2 . 197895.47 |==================== HBv3 . 284001.92 |============================= HBv4 . 598417.70 |============================================================= PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only TPS > Higher Is Better HC ... 1353510 |=========================== HBv2 . 2467328 |================================================= HBv3 . 2434749 |================================================= HBv4 . 3161848 |=============================================================== PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency ms < Lower Is Better HC ... 0.369 |================================================================= HBv2 . 0.203 |==================================== HBv3 . 0.206 |==================================== HBv4 . 0.158 |============================ PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only TPS > Higher Is Better HC ... 1159492 |======================= HBv2 . 2481320 |================================================== HBv3 . 2478917 |================================================== HBv4 . 3146173 |=============================================================== PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency ms < Lower Is Better HC ... 0.690 |================================================================= HBv2 . 0.323 |============================== HBv3 . 0.323 |============================== HBv4 . 0.254 |======================== Timed Node.js Compilation 19.8.1 Time To Compile Seconds < Lower Is Better HC ... 330.61 |================================================================ HBv2 . 194.37 |====================================== HBv3 . 185.57 |==================================== HBv4 . 150.56 |=============================