Microsoft Azure HBv4 HPC Comparison Benchmarks Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance.. HC: Processor: 2 x Intel Xeon Platinum 8168 (44 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 60928 MB + 118272 MB + 176 GB, Disk: 32GB Virtual Disk + 752GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv2: Processor: 2 x AMD EPYC 7V12 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv3: Processor: 2 x AMD EPYC 7V73X 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv4: Processor: 2 x AMD EPYC 9V33X 96-Core (176 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB, Disk: 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 8.5.0 20210514 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv4 + Optimizations: Processor: 2 x AMD EPYC 9V33X 96-Core (176 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB, Disk: 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.8, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv3 + Optimizations: Processor: 2 x AMD EPYC 7V73X 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HBv2 + Optimizations: Processor: 2 x AMD EPYC 7V12 64-Core (120 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB, Disk: 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft HC + Optimizations: Processor: 2 x Intel Xeon Platinum 8168 (44 Cores), Motherboard: Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS), Memory: 1 GB + 60928 MB + 118272 MB + 176 GB, Disk: 32GB Virtual Disk + 752GB Virtual Disk, Graphics: hyperv_fb OS: AlmaLinux 8.7, Kernel: 4.18.0-425.3.1.el8.x86_64 (x86_64), Compiler: GCC 13.1.0 + CUDA 12.1, File-System: nfs, Screen Resolution: 1024x768, System Layer: microsoft High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 GFLOP/s > Higher Is Better HC ................... 26.00 |============== HBv2 ................. 37.04 |==================== HBv3 ................. 39.61 |====================== HBv4 ................. 89.38 |================================================= High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 GFLOP/s > Higher Is Better HC ................... 25.87 |============== HBv2 ................. 36.09 |==================== HBv3 ................. 38.97 |====================== HBv4 ................. 88.52 |================================================= High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 GFLOP/s > Higher Is Better HC ................... 25.56 |============== HBv2 ................. 36.02 |==================== HBv3 ................. 39.11 |====================== HBv4 ................. 87.90 |================================================= NAS Parallel Benchmarks 3.4 Test / Class: BT.C Total Mop/s > Higher Is Better HC ................... 28794.28 |== HBv2 ................. 66829.18 |==== HBv3 ................. 62427.86 |==== HBv4 ................. 151067.81 |========= HBv4 + Optimizations . 744413.90 |============================================= HBv3 + Optimizations . 313813.98 |=================== HBv2 + Optimizations . 241509.88 |=============== HC + Optimizations ... 106230.52 |====== NAS Parallel Benchmarks 3.4 Test / Class: CG.C Total Mop/s > Higher Is Better HC ................... 14356.20 |========= HBv2 ................. 22314.02 |============== HBv3 ................. 21551.48 |============= HBv4 ................. 40326.29 |========================= HBv4 + Optimizations . 74101.94 |============================================== HBv3 + Optimizations . 36681.43 |======================= HBv2 + Optimizations . 36367.35 |======================= HC + Optimizations ... 27619.05 |================= NAS Parallel Benchmarks 3.4 Test / Class: EP.D Total Mop/s > Higher Is Better HC ................... 1642.03 |========= HBv2 ................. 3222.82 |================= HBv3 ................. 2879.08 |=============== HBv4 ................. 5985.75 |=============================== HBv4 + Optimizations . 9031.46 |=============================================== HBv3 + Optimizations . 4840.07 |========================= HBv2 + Optimizations . 5542.08 |============================= HC + Optimizations ... 1853.47 |========== NAS Parallel Benchmarks 3.4 Test / Class: FT.C Total Mop/s > Higher Is Better HC ................... 20188.89 |==== HBv2 ................. 41977.69 |======== HBv3 ................. 36619.29 |======= HBv4 ................. 69051.63 |============== HBv4 + Optimizations . 230164.79 |============================================= HBv3 + Optimizations . 102122.36 |==================== HBv2 + Optimizations . 98485.23 |=================== HC + Optimizations ... 55288.19 |=========== NAS Parallel Benchmarks 3.4 Test / Class: IS.D Total Mop/s > Higher Is Better HC ................... 1181.48 |==== HBv2 ................. 1884.22 |======= HBv3 ................. 2793.55 |========== HBv4 ................. 5870.00 |===================== HBv4 + Optimizations . 12967.37 |============================================== HBv3 + Optimizations . 5730.01 |==================== HBv2 + Optimizations . 3977.02 |============== HC + Optimizations ... 1864.68 |======= NAS Parallel Benchmarks 3.4 Test / Class: MG.C Total Mop/s > Higher Is Better HC ................... 19508.00 |== HBv2 ................. 43410.71 |==== HBv3 ................. 46705.47 |===== HBv4 ................. 108125.86 |=========== HBv4 + Optimizations . 437417.16 |============================================= HBv3 + Optimizations . 131635.41 |============== HBv2 + Optimizations . 108985.72 |=========== HC + Optimizations ... 63404.01 |======= NAS Parallel Benchmarks 3.4 Test / Class: SP.C Total Mop/s > Higher Is Better HC ................... 12907.54 |= HBv2 ................. 32495.89 |=== HBv3 ................. 31024.76 |=== HBv4 ................. 68819.34 |======= HBv4 + Optimizations . 427298.99 |============================================= HBv3 + Optimizations . 205795.59 |====================== HBv2 + Optimizations . 104771.90 |=========== HC + Optimizations ... 41543.94 |==== NAMD 2.14 ATPase Simulation - 327,506 Atoms days/ns < Lower Is Better HC ................... 0.52650 |=============================================== HBv2 ................. 0.26385 |======================== HBv3 ................. 0.27115 |======================== HBv4 ................. 0.14292 |============= HBv4 + Optimizations . 0.14380 |============= HBv3 + Optimizations . 0.27111 |======================== HBv2 + Optimizations . 0.26505 |======================== HC + Optimizations ... 0.52697 |=============================================== libxsmm 2-1.17-3645 M N K: 128 GFLOPS/s > Higher Is Better HC ................... 1328.4 |========== HBv2 ................. 1519.5 |=========== HBv3 ................. 2284.6 |================ HBv4 ................. 6585.6 |=============================================== HBv4 + Optimizations . 6655.2 |================================================ HBv3 + Optimizations . 2273.5 |================ HBv2 + Optimizations . 1011.4 |======= HC + Optimizations ... 1284.8 |========= libxsmm 2-1.17-3645 M N K: 256 GFLOPS/s > Higher Is Better HC ................... 898.8 |====== HBv2 ................. 1444.2 |========== HBv3 ................. 2032.1 |============== HBv4 ................. 6983.2 |================================================ HBv4 + Optimizations . 6908.6 |=============================================== HBv3 + Optimizations . 2045.7 |============== HBv2 + Optimizations . 1128.3 |======== HC + Optimizations ... 904.1 |====== libxsmm 2-1.17-3645 M N K: 32 GFLOPS/s > Higher Is Better HC ................... 379.9 |=== HBv2 ................. 195.1 |== HBv3 ................. 1506.3 |============ HBv4 ................. 5006.8 |======================================= HBv4 + Optimizations . 6163.0 |================================================ HBv3 + Optimizations . 1438.1 |=========== HBv2 + Optimizations . 164.8 |= HC + Optimizations ... 384.9 |=== libxsmm 2-1.17-3645 M N K: 64 GFLOPS/s > Higher Is Better HC ................... 731.6 |====== HBv2 ................. 411.7 |=== HBv3 ................. 2435.6 |==================== HBv4 ................. 5719.0 |=============================================== HBv4 + Optimizations . 5898.2 |================================================ HBv3 + Optimizations . 2413.7 |==================== HBv2 + Optimizations . 331.4 |=== HC + Optimizations ... 748.1 |====== Laghos 3.1 Test: Triple Point Problem Major Kernels Total Rate > Higher Is Better HC ................... 156.52 |================================= HBv2 ................. 183.82 |======================================= HBv3 ................. 192.74 |========================================= HBv4 ................. 228.15 |================================================ Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh Major Kernels Total Rate > Higher Is Better HC ................... 247.49 |============================= HBv2 ................. 345.14 |========================================= HBv3 ................. 361.81 |=========================================== HBv4 ................. 402.94 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 58.36 |=========== HBv2 ................. 91.54 |================= HBv3 ................. 103.51 |=================== HBv4 ................. 256.35 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 62.98 |======== HBv2 ................. 95.88 |============= HBv3 ................. 135.69 |================== HBv4 ................. 355.86 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 123.63 |============= HBv2 ................. 203.77 |====================== HBv3 ................. 198.66 |====================== HBv4 ................. 442.83 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 114.03 |========= HBv2 ................. 191.78 |=============== HBv3 ................. 254.25 |==================== HBv4 ................. 622.58 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 GFLOP/s > Higher Is Better HC ................... 59.14 |==================================== HBv2 ................. 59.42 |==================================== HBv3 ................. 59.38 |==================================== HBv4 ................. 80.25 |================================================= HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 30.12 |============ HBv2 ................. 50.90 |==================== HBv3 ................. 39.81 |=============== HBv4 ................. 123.39 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 33.52 |========== HBv2 ................. 47.61 |============== HBv3 ................. 57.33 |================= HBv4 ................. 159.18 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 59.73 |============ HBv2 ................. 91.26 |================== HBv3 ................. 103.41 |==================== HBv4 ................. 244.34 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 57.76 |========= HBv2 ................. 93.79 |============== HBv3 ................. 123.24 |================== HBv4 ................. 323.36 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 57.31 |=========== HBv2 ................. 91.92 |================= HBv3 ................. 103.25 |=================== HBv4 ................. 261.90 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 60.88 |========= HBv2 ................. 91.48 |============== HBv3 ................. 121.28 |=================== HBv4 ................. 314.34 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 134.76 |============== HBv2 ................. 205.21 |===================== HBv3 ................. 214.06 |====================== HBv4 ................. 459.92 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 110.05 |========= HBv2 ................. 190.95 |=============== HBv3 ................. 232.17 |=================== HBv4 ................. 596.23 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 128 GFLOP/s > Higher Is Better HC ................... 41.73 |======================= HBv2 ................. 51.40 |============================= HBv3 ................. 50.61 |============================ HBv4 ................. 87.66 |================================================= HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 30.17 |============ HBv2 ................. 50.71 |==================== HBv3 ................. 38.45 |=============== HBv4 ................. 121.61 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 31.57 |========== HBv2 ................. 46.98 |=============== HBv3 ................. 56.22 |================= HBv4 ................. 154.65 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 60.57 |=========== HBv2 ................. 93.31 |================= HBv3 ................. 102.70 |=================== HBv4 ................. 264.95 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 59.82 |========= HBv2 ................. 94.53 |=============== HBv3 ................. 117.73 |================== HBv4 ................. 311.80 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 58.55 |=========== HBv2 ................. 90.79 |================= HBv3 ................. 105.09 |==================== HBv4 ................. 255.97 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 62.90 |======== HBv2 ................. 96.49 |============= HBv3 ................. 135.95 |================== HBv4 ................. 355.51 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 122.77 |============== HBv2 ................. 200.04 |====================== HBv3 ................. 221.86 |========================= HBv4 ................. 427.10 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 113.94 |========= HBv2 ................. 191.14 |=============== HBv3 ................. 257.42 |==================== HBv4 ................. 624.95 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 GFLOP/s > Higher Is Better HC ................... 58.91 |================================== HBv2 ................. 61.14 |=================================== HBv3 ................. 56.87 |================================= HBv4 ................. 85.01 |================================================= HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 30.22 |============ HBv2 ................. 51.20 |==================== HBv3 ................. 39.37 |=============== HBv4 ................. 122.98 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 33.55 |========== HBv2 ................. 47.37 |============== HBv3 ................. 57.23 |================= HBv4 ................. 159.26 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 59.55 |============ HBv2 ................. 92.13 |================== HBv3 ................. 105.36 |==================== HBv4 ................. 247.73 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 57.92 |========= HBv2 ................. 93.26 |============== HBv3 ................. 124.60 |================== HBv4 ................. 323.70 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 57.13 |========== HBv2 ................. 88.61 |================ HBv3 ................. 106.63 |=================== HBv4 ................. 273.12 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 60.82 |========= HBv2 ................. 91.43 |============== HBv3 ................. 120.96 |================== HBv4 ................. 315.98 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 131.96 |============== HBv2 ................. 211.42 |====================== HBv3 ................. 207.97 |===================== HBv4 ................. 467.72 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 110.20 |========= HBv2 ................. 189.21 |=============== HBv3 ................. 233.80 |=================== HBv4 ................. 590.93 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 30.27 |============ HBv2 ................. 50.08 |=================== HBv3 ................. 38.57 |=============== HBv4 ................. 123.41 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 31.58 |========== HBv2 ................. 46.93 |=============== HBv3 ................. 56.27 |================= HBv4 ................. 154.57 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 GFLOP/s > Higher Is Better HC ................... 60.89 |=========== HBv2 ................. 92.39 |================= HBv3 ................. 105.50 |==================== HBv4 ................. 258.72 |================================================ HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 GFLOP/s > Higher Is Better HC ................... 59.90 |========= HBv2 ................. 95.20 |=============== HBv3 ................. 118.24 |================== HBv4 ................. 311.27 |================================================ Pennant 1.0.1 Test: sedovbig Hydro Cycle Time - Seconds < Lower Is Better HC ................... 25.019560 |============================================= HBv2 ................. 5.915805 |=========== HBv3 ................. 6.277107 |=========== HBv4 ................. 3.581391 |====== Pennant 1.0.1 Test: leblancbig Hydro Cycle Time - Seconds < Lower Is Better HC ................... 10.645480 |============================================= HBv2 ................. 3.466885 |=============== HBv3 ................. 3.649317 |=============== HBv4 ................. 2.122074 |========= Remhos 1.0 Test: Sample Remap Example Seconds < Lower Is Better HC ................... 27.38 |================================================= HBv2 ................. 14.93 |=========================== HBv3 ................. 15.26 |=========================== HBv4 ................. 15.37 |============================ ACES DGEMM 1.0 Sustained Floating-Point Rate GFLOP/s > Higher Is Better HC ................... 14.340830 |============ HBv2 ................. 5.899903 |===== HBv3 ................. 25.104876 |===================== HBv4 ................. 53.175691 |============================================= HBv4 + Optimizations . 52.802440 |============================================= HBv3 + Optimizations . 25.048352 |===================== HBv2 + Optimizations . 6.395415 |===== HC + Optimizations ... 14.072027 |============ Intel Open Image Denoise 2.0 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only Images / Sec > Higher Is Better HC ................... 1.82 |============================= HBv2 ................. 2.08 |================================= HBv3 ................. 1.68 |=========================== HBv4 ................. 3.08 |================================================== HBv4 + Optimizations . 3.11 |================================================== HBv3 + Optimizations . 1.72 |============================ HBv2 + Optimizations . 2.03 |================================= HC + Optimizations ... 1.85 |============================== Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only Images / Sec > Higher Is Better HC ................... 1.84 |============================= HBv2 ................. 2.03 |================================ HBv3 ................. 1.69 |=========================== HBv4 ................. 3.13 |================================================== HBv4 + Optimizations . 3.08 |================================================= HBv3 + Optimizations . 1.69 |=========================== HBv2 + Optimizations . 2.01 |================================ HC + Optimizations ... 1.85 |============================== Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only Images / Sec > Higher Is Better HC ................... 0.88 |================================= HBv2 ................. 1.04 |======================================= HBv3 ................. 0.79 |============================== HBv4 ................. 1.29 |================================================= HBv4 + Optimizations . 1.32 |================================================== HBv3 + Optimizations . 0.80 |============================== HBv2 + Optimizations . 0.96 |==================================== HC + Optimizations ... 0.87 |================================= OSPRay 2.12 Benchmark: particle_volume/ao/real_time Items Per Second > Higher Is Better HC ................... 8.97547 |=========== HBv2 ................. 22.33360 |============================ HBv3 ................. 24.45860 |=============================== HBv4 ................. 36.61210 |============================================== HBv4 + Optimizations . 36.65480 |============================================== HBv3 + Optimizations . 24.47100 |=============================== HBv2 + Optimizations . 22.36680 |============================ HC + Optimizations ... 8.99618 |=========== OSPRay 2.12 Benchmark: particle_volume/scivis/real_time Items Per Second > Higher Is Better HC ................... 8.97020 |=========== HBv2 ................. 22.15330 |============================ HBv3 ................. 24.17360 |============================== HBv4 ................. 36.56710 |============================================== HBv4 + Optimizations . 36.54460 |============================================== HBv3 + Optimizations . 24.21970 |============================== HBv2 + Optimizations . 22.17470 |============================ HC + Optimizations ... 8.87831 |=========== OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time Items Per Second > Higher Is Better HC ................... 86.57 |==================== HBv2 ................. 157.13 |==================================== HBv3 ................. 168.24 |======================================= HBv4 ................. 208.34 |================================================ HBv4 + Optimizations . 208.05 |================================================ HBv3 + Optimizations . 167.50 |======================================= HBv2 + Optimizations . 162.45 |===================================== HC + Optimizations ... 96.76 |====================== OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time Items Per Second > Higher Is Better HC ................... 9.49421 |=========== HBv2 ................. 8.67327 |========== HBv3 ................. 11.74850 |============== HBv4 ................. 38.07640 |============================================== HBv4 + Optimizations . 38.07690 |============================================== HBv3 + Optimizations . 11.75010 |============== HBv2 + Optimizations . 8.66888 |========== HC + Optimizations ... 9.52293 |============ OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time Items Per Second > Higher Is Better HC ................... 8.98723 |=========== HBv2 ................. 8.12356 |========== HBv3 ................. 11.18450 |============== HBv4 ................. 37.09180 |============================================== HBv4 + Optimizations . 37.06240 |============================================== HBv3 + Optimizations . 11.17230 |============== HBv2 + Optimizations . 8.32323 |========== HC + Optimizations ... 9.02689 |=========== OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time Items Per Second > Higher Is Better HC ................... 10.05 |=============== HBv2 ................. 13.92 |===================== HBv3 ................. 14.61 |====================== HBv4 ................. 32.79 |================================================= HBv4 + Optimizations . 32.58 |================================================= HBv3 + Optimizations . 14.61 |====================== HBv2 + Optimizations . 13.94 |===================== HC + Optimizations ... 10.06 |=============== 7-Zip Compression 22.01 Test: Compression Rating MIPS > Higher Is Better HC ................... 210732 |========= HBv2 ................. 489456 |===================== HBv3 ................. 558290 |======================== HBv4 ................. 1032267 |============================================= HBv4 + Optimizations . 1083523 |=============================================== HBv3 + Optimizations . 566595 |========================= HBv2 + Optimizations . 501534 |====================== HC + Optimizations ... 216451 |========= 7-Zip Compression 22.01 Test: Decompression Rating MIPS > Higher Is Better HC ................... 148193 |========== HBv2 ................. 371044 |======================== HBv3 ................. 397505 |========================== HBv4 ................. 727995 |=============================================== HBv4 + Optimizations . 742859 |================================================ HBv3 + Optimizations . 406516 |========================== HBv2 + Optimizations . 388577 |========================= HC + Optimizations ... 150841 |========== Timed Linux Kernel Compilation 6.1 Build: allmodconfig Seconds < Lower Is Better HC ................... 1950.63 |=============================================== HBv2 ................. 1782.93 |=========================================== HBv3 ................. 1889.46 |============================================== HBv4 ................. 1681.26 |========================================= Timed Node.js Compilation 19.8.1 Time To Compile Seconds < Lower Is Better HC ................... 330.61 |================================================ HBv2 ................. 194.37 |============================ HBv3 ................. 185.57 |=========================== HBv4 ................. 150.56 |====================== oneDNN 3.1 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU ms < Lower Is Better HC ................... 0.882446 |============================= HBv2 ................. 1.407580 |============================================== HBv3 ................. 0.910091 |============================== HBv4 ................. 0.752929 |========================= oneDNN 3.1 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU ms < Lower Is Better HC ................... 2.079200 |============== HBv2 ................. 6.838250 |============================================== HBv3 ................. 0.624233 |==== HBv4 ................. 0.306141 |== oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ms < Lower Is Better HC ................... 3.111210 |============================================== HBv2 ................. 0.573878 |======== HBv3 ................. 0.556741 |======== HBv4 ................. 0.276472 |==== oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU ms < Lower Is Better HC ................... 1.244800 |==================================== HBv2 ................. 1.610020 |============================================== HBv3 ................. 1.408620 |======================================== HBv4 ................. 0.582806 |================= oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU ms < Lower Is Better HC ................... 707.35 |========================= HBv2 ................. 1345.14 |=============================================== HBv3 ................. 860.98 |============================== HBv4 ................. 535.85 |=================== oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ms < Lower Is Better HC ................... 450.25 |======================== HBv2 ................. 896.81 |================================================ HBv3 ................. 533.50 |============================= HBv4 ................. 401.86 |====================== oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better HC ................... 707.32 |======================== HBv2 ................. 1367.73 |=============================================== HBv3 ................. 886.81 |============================== HBv4 ................. 533.49 |================== oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better HC ................... 442.47 |======================= HBv2 ................. 910.94 |================================================ HBv3 ................. 529.97 |============================ HBv4 ................. 411.23 |====================== Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better HC ................... 31796333 |======================================= HBv2 ................. 33211667 |========================================= HBv3 ................. 32817333 |========================================= HBv4 ................. 35362667 |============================================ HBv4 + Optimizations . 35693667 |============================================ HBv3 + Optimizations . 37175000 |============================================== HBv2 + Optimizations . 35080667 |=========================================== HC + Optimizations ... 31262000 |======================================= Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better HC ................... 964423333 |===================================== HBv2 ................. 1061433333 |========================================= HBv3 ................. 917336667 |==================================== HBv4 ................. 1113300000 |=========================================== HBv4 + Optimizations . 1122866667 |=========================================== HBv3 + Optimizations . 1045000000 |======================================== HBv2 + Optimizations . 1136733333 |============================================ HC + Optimizations ... 948450000 |===================================== Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better HC ................... 721290909 |====================== HBv2 ................. 1193400000 |==================================== HBv3 ................. 1086000000 |================================= HBv4 ................. 1390540000 |========================================== HBv4 + Optimizations . 1463200000 |============================================ HBv3 + Optimizations . 1347733333 |========================================= HBv2 + Optimizations . 1257833333 |====================================== HC + Optimizations ... 719580000 |====================== Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better HC ................... 1512600000 |=============== HBv2 ................. 3925933333 |======================================= HBv3 ................. 3366733333 |================================= HBv4 ................. 4426300000 |============================================ HBv4 + Optimizations . 4467266667 |============================================ HBv3 + Optimizations . 3832800000 |====================================== HBv2 + Optimizations . 4196833333 |========================================= HC + Optimizations ... 1478433333 |=============== Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better HC ................... 1572400000 |============= HBv2 ................. 4045933333 |================================= HBv3 ................. 3516300000 |============================= HBv4 ................. 5168233333 |========================================== HBv4 + Optimizations . 5412900000 |============================================ HBv3 + Optimizations . 4216966667 |================================== HBv2 + Optimizations . 4309133333 |=================================== HC + Optimizations ... 1570633333 |============= Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 32 samples/s > Higher Is Better HC ................... 1566133333 |=========== HBv2 ................. 4027100000 |============================= HBv3 ................. 3419533333 |======================== HBv4 ................. 6122233333 |============================================ HBv4 + Optimizations . 6181766667 |============================================ HBv3 + Optimizations . 3864000000 |============================ HBv2 + Optimizations . 4275533333 |============================== HC + Optimizations ... 1536633333 |=========== Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 57 samples/s > Higher Is Better HC ................... 1664733333 |========== HBv2 ................. 4106700000 |========================= HBv3 ................. 3563433333 |====================== HBv4 ................. 6758166667 |========================================== HBv4 + Optimizations . 7095033333 |============================================ HBv3 + Optimizations . 4281533333 |=========================== HBv2 + Optimizations . 4350100000 |=========================== HC + Optimizations ... 1683033333 |========== Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 512 samples/s > Higher Is Better HC ................... 529213333 |========== HBv2 ................. 825653333 |================ HBv3 ................. 735370000 |=============== HBv4 ................. 2058233333 |========================================= HBv4 + Optimizations . 2221966667 |============================================ HBv3 + Optimizations . 814950000 |================ HBv2 + Optimizations . 924243333 |================== HC + Optimizations ... 544626667 |=========== PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only TPS > Higher Is Better HC ................... 1354877 |==================== HBv2 ................. 2466249 |===================================== HBv3 ................. 2375005 |=================================== HBv4 ................. 3139846 |=============================================== HBv4 + Optimizations . 3161848 |=============================================== HBv3 + Optimizations . 2434749 |==================================== HBv2 + Optimizations . 2467328 |===================================== HC + Optimizations ... 1353510 |==================== PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency ms < Lower Is Better HC ................... 0.369 |================================================= HBv2 ................. 0.203 |=========================== HBv3 ................. 0.210 |============================ HBv4 ................. 0.159 |===================== HBv4 + Optimizations . 0.158 |===================== HBv3 + Optimizations . 0.206 |=========================== HBv2 + Optimizations . 0.203 |=========================== HC + Optimizations ... 0.369 |================================================= PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only TPS > Higher Is Better HC ................... 1161800 |================= HBv2 ................. 2439650 |==================================== HBv3 ................. 2407602 |==================================== HBv4 ................. 3123042 |=============================================== HBv4 + Optimizations . 3146173 |=============================================== HBv3 + Optimizations . 2478917 |===================================== HBv2 + Optimizations . 2481320 |===================================== HC + Optimizations ... 1159492 |================= PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency ms < Lower Is Better HC ................... 0.688 |================================================= HBv2 ................. 0.328 |======================= HBv3 ................. 0.332 |======================== HBv4 ................. 0.256 |================== HBv4 + Optimizations . 0.254 |================== HBv3 + Optimizations . 0.323 |======================= HBv2 + Optimizations . 0.323 |======================= HC + Optimizations ... 0.690 |================================================= Blender 3.6 Blend File: BMW27 - Compute: CPU-Only Seconds < Lower Is Better HC ................... 50.53 |================================================= HBv2 ................. 19.46 |=================== HBv3 ................. 19.49 |=================== HBv4 ................. 9.97 |========== HBv4 + Optimizations . 10.11 |========== HBv3 + Optimizations . 19.43 |=================== HBv2 + Optimizations . 19.58 |=================== HC + Optimizations ... 49.95 |================================================ Blender 3.6 Blend File: Classroom - Compute: CPU-Only Seconds < Lower Is Better HC ................... 138.81 |================================================ HBv2 ................. 50.86 |================== HBv3 ................. 51.08 |================== HBv4 ................. 25.26 |========= HBv4 + Optimizations . 25.61 |========= HBv3 + Optimizations . 50.71 |================== HBv2 + Optimizations . 50.95 |================== HC + Optimizations ... 138.51 |================================================ Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only Seconds < Lower Is Better HC ................... 72.57 |================================================= HBv2 ................. 26.19 |================== HBv3 ................. 25.47 |================= HBv4 ................. 13.96 |========= HBv4 + Optimizations . 13.74 |========= HBv3 + Optimizations . 25.59 |================= HBv2 + Optimizations . 26.43 |================== HC + Optimizations ... 71.76 |================================================ Blender 3.6 Blend File: Barbershop - Compute: CPU-Only Seconds < Lower Is Better HC ................... 524.86 |================================================ HBv2 ................. 210.18 |=================== HBv3 ................. 189.30 |================= HBv4 ................. 96.77 |========= HBv4 + Optimizations . 97.52 |========= HBv3 + Optimizations . 188.96 |================= HBv2 + Optimizations . 211.46 |=================== HC + Optimizations ... 526.93 |================================================ Blender 3.6 Blend File: Pabellon Barcelona - Compute: CPU-Only Seconds < Lower Is Better HC ................... 176.21 |================================================ HBv2 ................. 64.14 |================= HBv3 ................. 62.64 |================= HBv4 ................. 33.40 |========= HBv4 + Optimizations . 33.01 |========= HBv3 + Optimizations . 62.90 |================= HBv2 + Optimizations . 64.84 |================== HC + Optimizations ... 175.07 |================================================ PETSc 3.19 Test: Streams MB/s > Higher Is Better HC ................... 151286.25 |=========== HBv2 ................. 197895.47 |=============== HBv3 ................. 284001.92 |===================== HBv4 ................. 598417.70 |=============================================