Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2308011-PTS-AZUREHBV71 Microsoft Azure HBv4 HPC Performance Benchmarks - Phoronix Test Suite Microsoft Azure HBv4 HPC Performance Benchmarks Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..
HTML result view exported from: https://openbenchmarking.org/result/2308011-PTS-AZUREHBV71&rdt&gru .
Microsoft Azure HBv4 HPC Performance Benchmarks Processor Motherboard Memory Disk Graphics OS Kernel Compiler File-System Screen Resolution System Layer HBv4 HBv3 HBv2 HC 2 x AMD EPYC 9V33X 96-Core (176 Cores) Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS) 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk hyperv_fb AlmaLinux 8.8 4.18.0-425.3.1.el8.x86_64 (x86_64) GCC 13.1.0 + CUDA 12.1 nfs 1024x768 microsoft 2 x AMD EPYC 7V73X 64-Core (120 Cores) 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk 2 x AMD EPYC 7V12 64-Core (120 Cores) 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk 2 x Intel Xeon Platinum 8168 (44 Cores) 1 GB + 60928 MB + 118272 MB + 176 GB 32GB Virtual Disk + 752GB Virtual Disk OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Environment Details - CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native" Compiler Details - --disable-multilib --enable-checking=release Processor Details - CPU Microcode: 0xffffffff Python Details - Python 3.6.8 Security Details - HBv4: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv3: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv2: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HC: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown
Microsoft Azure HBv4 HPC Performance Benchmarks hpcg: 104 104 104 - 60 hpcg: 144 144 144 - 60 hpcg: 160 160 160 - 60 heffte: c2c - FFTW - float - 256 heffte: c2c - FFTW - float - 512 heffte: r2c - FFTW - float - 512 heffte: c2c - FFTW - double - 512 heffte: c2c - Stock - float - 256 heffte: c2c - Stock - float - 512 heffte: r2c - FFTW - double - 256 heffte: r2c - FFTW - double - 512 heffte: r2c - Stock - float - 512 heffte: c2c - Stock - double - 512 heffte: r2c - Stock - double - 256 heffte: r2c - Stock - double - 512 heffte: c2c - FFTW - float-long - 256 heffte: c2c - FFTW - float-long - 512 heffte: r2c - FFTW - float-long - 256 heffte: r2c - FFTW - float-long - 512 heffte: c2c - FFTW - double-long - 512 heffte: c2c - Stock - float-long - 256 heffte: c2c - Stock - float-long - 512 heffte: r2c - FFTW - double-long - 256 heffte: r2c - FFTW - double-long - 512 heffte: r2c - Stock - float-long - 512 heffte: c2c - Stock - double-long - 512 heffte: r2c - Stock - double-long - 256 heffte: r2c - Stock - double-long - 512 mt-dgemm: Sustained Floating-Point Rate libxsmm: 128 libxsmm: 256 libxsmm: 32 libxsmm: 64 oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only oidn: RTLightmap.hdr.4096x4096 - CPU-Only ospray: particle_volume/ao/real_time ospray: particle_volume/scivis/real_time ospray: particle_volume/pathtracer/real_time ospray: gravity_spheres_volume/dim_512/ao/real_time ospray: gravity_spheres_volume/dim_512/scivis/real_time ospray: gravity_spheres_volume/dim_512/pathtracer/real_time laghos: Triple Point Problem laghos: Sedov Blast Wave, ube_922_hex.mesh petsc: Streams compress-7zip: Compression Rating compress-7zip: Decompression Rating liquid-dsp: 128 - 256 - 57 liquid-dsp: 176 - 256 - 32 liquid-dsp: 176 - 256 - 57 liquid-dsp: 176 - 256 - 512 npb: BT.C npb: CG.C npb: FT.C npb: IS.D npb: MG.C npb: SP.C pgbench: 1 - 500 - Read Only pgbench: 1 - 800 - Read Only namd: ATPase Simulation - 327,506 Atoms pennant: sedovbig pennant: leblancbig onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU pgbench: 1 - 500 - Read Only - Average Latency pgbench: 1 - 800 - Read Only - Average Latency build-nodejs: Time To Compile blender: BMW27 - CPU-Only blender: Classroom - CPU-Only blender: Fishy Cat - CPU-Only blender: Barbershop - CPU-Only blender: Pabellon Barcelona - CPU-Only HBv4 HBv3 HBv2 HC 89.3840 88.5160 87.9013 256.349 355.855 622.580 159.175 244.342 323.356 261.903 314.336 596.226 154.648 264.954 311.803 255.968 355.512 427.101 624.951 159.258 247.725 323.696 273.121 315.982 590.925 154.568 258.716 311.267 52.802440 6655.2 6908.6 6163.0 5898.2 3.11 3.08 1.32 36.6548 36.5446 208.050 38.0769 37.0624 32.5839 228.15 402.94 598417.6957 1083523 742859 5412900000 6181766667 7095033333 2221966667 744413.90 74101.94 230164.79 12967.37 437417.16 427298.99 3161848 3146173 0.14380 3.581391 2.122074 533.494 411.234 0.158 0.254 150.558 10.11 25.61 13.74 97.52 33.01 39.6093 38.9739 39.1106 103.5147 135.694 254.252 57.3307 103.409 123.242 103.2457 121.283 232.166 56.2161 102.7046 117.731 105.093 135.950 221.861 257.419 57.2263 105.361 124.595 106.632 120.957 233.797 56.2690 105.5003 118.236 25.048352 2273.5 2045.7 1438.1 2413.7 1.72 1.69 0.80 24.4710 24.2197 167.504 11.7501 11.1723 14.6088 192.74 361.81 284001.9162 566595 406516 4216966667 3864000000 4281533333 814950000 313813.98 36681.43 102122.36 5730.01 131635.41 205795.59 2434749 2478917 0.27111 6.277107 3.649317 886.810 529.973 0.206 0.323 185.567 19.43 50.71 25.59 188.96 62.90 37.0410 36.0866 36.0167 91.5383 95.8801 191.775 47.6050 91.2601 93.7923 91.9186 91.4802 190.949 46.9794 93.3137 94.5301 90.7883 96.4941 200.035 191.141 47.3696 92.1290 93.2573 88.6081 91.4296 189.208 46.9289 92.3883 95.1989 6.395415 1011.4 1128.3 164.8 331.4 2.03 2.01 0.96 22.3668 22.1747 162.449 8.66888 8.32323 13.9416 183.82 345.14 197895.4717 501534 388577 4309133333 4275533333 4350100000 924243333 241509.88 36367.35 98485.23 3977.02 108985.72 104771.90 2467328 2481320 0.26505 5.915805 3.466885 1367.73 910.937 0.203 0.323 194.367 19.58 50.95 26.43 211.46 64.84 25.9971 25.8659 25.5635 58.3567 62.9750 114.025 33.5193 59.7292 57.7643 57.3101 60.8804 110.049 31.5718 60.5727 59.8216 58.5498 62.9027 122.772 113.940 33.5545 59.5527 57.9203 57.1290 60.8204 110.197 31.5846 60.8872 59.8954 14.072027 1284.8 904.1 384.9 748.1 1.85 1.85 0.87 8.99618 8.87831 96.7630 9.52293 9.02689 10.0611 156.52 247.49 151286.2491 216451 150841 1570633333 1536633333 1683033333 544626667 106230.52 27619.05 55288.19 1864.68 63404.01 41543.94 1353510 1159492 0.52697 25.01956 10.64548 707.322 442.471 0.369 0.690 330.613 49.95 138.51 71.76 526.93 175.07 OpenBenchmarking.org
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 HBv4 HBv3 HBv2 HC 20 40 60 80 100 SE +/- 0.26, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 89.38 39.61 37.04 26.00 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
High Performance Conjugate Gradient X Y Z: 144 144 144 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 HBv4 HBv3 HBv2 HC 20 40 60 80 100 SE +/- 0.11, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 88.52 38.97 36.09 25.87 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
High Performance Conjugate Gradient X Y Z: 160 160 160 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 HBv4 HBv3 HBv2 HC 20 40 60 80 100 SE +/- 0.12, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 87.90 39.11 36.02 25.56 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 1.07, N = 3 SE +/- 1.41, N = 15 SE +/- 0.67, N = 15 SE +/- 0.07, N = 3 256.35 103.51 91.54 58.36 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 HBv4 HBv3 HBv2 HC 80 160 240 320 400 SE +/- 1.24, N = 3 SE +/- 0.93, N = 3 SE +/- 0.47, N = 3 SE +/- 0.04, N = 3 355.86 135.69 95.88 62.98 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 HBv4 HBv3 HBv2 HC 130 260 390 520 650 SE +/- 2.25, N = 3 SE +/- 2.52, N = 6 SE +/- 1.03, N = 3 SE +/- 0.09, N = 3 622.58 254.25 191.78 114.03 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 HBv4 HBv3 HBv2 HC 40 80 120 160 200 SE +/- 0.34, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 SE +/- 0.03, N = 3 159.18 57.33 47.61 33.52 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 HBv4 HBv3 HBv2 HC 50 100 150 200 250 SE +/- 3.04, N = 4 SE +/- 0.77, N = 15 SE +/- 0.61, N = 15 SE +/- 0.02, N = 3 244.34 103.41 91.26 59.73 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 0.80, N = 3 SE +/- 0.73, N = 3 SE +/- 0.34, N = 3 SE +/- 0.02, N = 3 323.36 123.24 93.79 57.76 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 5.66, N = 15 SE +/- 0.75, N = 15 SE +/- 1.31, N = 3 SE +/- 0.25, N = 3 261.90 103.25 91.92 57.31 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 0.50, N = 3 SE +/- 0.86, N = 3 SE +/- 0.15, N = 3 SE +/- 0.05, N = 3 314.34 121.28 91.48 60.88 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 HBv4 HBv3 HBv2 HC 130 260 390 520 650 SE +/- 2.14, N = 3 SE +/- 1.85, N = 3 SE +/- 2.04, N = 3 SE +/- 0.06, N = 3 596.23 232.17 190.95 110.05 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 HBv4 HBv3 HBv2 HC 30 60 90 120 150 SE +/- 0.27, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 154.65 56.22 46.98 31.57 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 4.27, N = 12 SE +/- 0.80, N = 15 SE +/- 1.10, N = 4 SE +/- 0.08, N = 3 264.95 102.70 93.31 60.57 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 1.60, N = 3 SE +/- 0.40, N = 3 SE +/- 0.25, N = 3 SE +/- 0.05, N = 3 311.80 117.73 94.53 59.82 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 3.64, N = 15 SE +/- 1.13, N = 3 SE +/- 0.74, N = 15 SE +/- 0.16, N = 3 255.97 105.09 90.79 58.55 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 80 160 240 320 400 SE +/- 1.18, N = 3 SE +/- 0.58, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 355.51 135.95 96.49 62.90 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 90 180 270 360 450 SE +/- 10.91, N = 15 SE +/- 3.45, N = 15 SE +/- 3.34, N = 12 SE +/- 0.53, N = 3 427.10 221.86 200.04 122.77 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 130 260 390 520 650 SE +/- 4.23, N = 3 SE +/- 2.91, N = 3 SE +/- 1.39, N = 3 SE +/- 0.18, N = 3 624.95 257.42 191.14 113.94 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 40 80 120 160 200 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 159.26 57.23 47.37 33.55 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 50 100 150 200 250 SE +/- 4.85, N = 15 SE +/- 1.07, N = 6 SE +/- 1.33, N = 3 SE +/- 0.27, N = 3 247.73 105.36 92.13 59.55 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 0.96, N = 3 SE +/- 0.05, N = 3 SE +/- 0.23, N = 3 SE +/- 0.06, N = 3 323.70 124.60 93.26 57.92 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 4.03, N = 14 SE +/- 1.05, N = 3 SE +/- 1.12, N = 15 SE +/- 0.12, N = 3 273.12 106.63 88.61 57.13 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 1.65, N = 3 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 315.98 120.96 91.43 60.82 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 130 260 390 520 650 SE +/- 2.49, N = 3 SE +/- 0.15, N = 3 SE +/- 1.02, N = 3 SE +/- 0.10, N = 3 590.93 233.80 189.21 110.20 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.09, N = 3 SE +/- 0.02, N = 3 154.57 56.27 46.93 31.58 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 2.84, N = 15 SE +/- 0.81, N = 15 SE +/- 1.27, N = 3 SE +/- 0.19, N = 3 258.72 105.50 92.39 60.89 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 0.81, N = 3 SE +/- 0.49, N = 3 SE +/- 0.16, N = 3 SE +/- 0.03, N = 3 311.27 118.24 95.20 59.90 1. (CXX) g++ options: -O3 -pthread
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate HBv4 HBv3 HBv2 HC 12 24 36 48 60 SE +/- 0.581762, N = 5 SE +/- 0.146977, N = 3 SE +/- 0.275809, N = 12 SE +/- 0.474074, N = 12 52.802440 25.048352 6.395415 14.072027 1. (CC) gcc options: -O3 -march=native -fopenmp
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 HBv4 HBv3 HBv2 HC 1400 2800 4200 5600 7000 SE +/- 59.23, N = 3 SE +/- 20.51, N = 9 SE +/- 169.50, N = 9 SE +/- 13.64, N = 15 6655.2 2273.5 1011.4 1284.8 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 HBv4 HBv3 HBv2 HC 1500 3000 4500 6000 7500 SE +/- 57.85, N = 9 SE +/- 25.11, N = 4 SE +/- 17.53, N = 9 SE +/- 23.39, N = 9 6908.6 2045.7 1128.3 904.1 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 HBv4 HBv3 HBv2 HC 1300 2600 3900 5200 6500 SE +/- 87.98, N = 3 SE +/- 38.99, N = 12 SE +/- 1.72, N = 3 SE +/- 3.15, N = 9 6163.0 1438.1 164.8 384.9 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 HBv4 HBv3 HBv2 HC 1300 2600 3900 5200 6500 SE +/- 74.65, N = 3 SE +/- 8.24, N = 3 SE +/- 2.64, N = 15 SE +/- 7.70, N = 3 5898.2 2413.7 331.4 748.1 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only HBv4 HBv3 HBv2 HC 0.6998 1.3996 2.0994 2.7992 3.499 SE +/- 0.03, N = 3 SE +/- 0.02, N = 4 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.11 1.72 2.03 1.85
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only HBv4 HBv3 HBv2 HC 0.693 1.386 2.079 2.772 3.465 SE +/- 0.02, N = 3 SE +/- 0.01, N = 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 3.08 1.69 2.01 1.85
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only HBv4 HBv3 HBv2 HC 0.297 0.594 0.891 1.188 1.485 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 15 SE +/- 0.00, N = 3 1.32 0.80 0.96 0.87
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/ao/real_time HBv4 HBv3 HBv2 HC 8 16 24 32 40 SE +/- 0.04011, N = 3 SE +/- 0.00987, N = 3 SE +/- 0.00858, N = 3 SE +/- 0.01510, N = 3 36.65480 24.47100 22.36680 8.99618
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/scivis/real_time HBv4 HBv3 HBv2 HC 8 16 24 32 40 SE +/- 0.05762, N = 3 SE +/- 0.00564, N = 3 SE +/- 0.02944, N = 3 SE +/- 0.05412, N = 3 36.54460 24.21970 22.17470 8.87831
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time HBv4 HBv3 HBv2 HC 50 100 150 200 250 SE +/- 0.81, N = 3 SE +/- 1.50, N = 7 SE +/- 0.83, N = 3 SE +/- 7.22, N = 9 208.05 167.50 162.45 96.76
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time HBv4 HBv3 HBv2 HC 9 18 27 36 45 SE +/- 0.02835, N = 3 SE +/- 0.01464, N = 3 SE +/- 0.15055, N = 15 SE +/- 0.03191, N = 3 38.07690 11.75010 8.66888 9.52293
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time HBv4 HBv3 HBv2 HC 9 18 27 36 45 SE +/- 0.12574, N = 3 SE +/- 0.02977, N = 3 SE +/- 0.13284, N = 15 SE +/- 0.01641, N = 3 37.06240 11.17230 8.32323 9.02689
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time HBv4 HBv3 HBv2 HC 8 16 24 32 40 SE +/- 0.08, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 32.58 14.61 13.94 10.06
Laghos Test: Triple Point Problem OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Triple Point Problem HBv4 HBv3 HBv2 HC 50 100 150 200 250 SE +/- 1.25, N = 3 SE +/- 0.38, N = 3 SE +/- 0.57, N = 3 SE +/- 0.08, N = 3 228.15 192.74 183.82 156.52 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi
Laghos Test: Sedov Blast Wave, ube_922_hex.mesh OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh HBv4 HBv3 HBv2 HC 90 180 270 360 450 SE +/- 0.78, N = 3 SE +/- 0.15, N = 3 SE +/- 3.57, N = 5 SE +/- 1.35, N = 3 402.94 361.81 345.14 247.49 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi
PETSc Test: Streams OpenBenchmarking.org MB/s, More Is Better PETSc 3.19 Test: Streams HBv4 HBv3 HBv2 HC 130K 260K 390K 520K 650K SE +/- 46271.80, N = 9 SE +/- 2674.31, N = 7 SE +/- 12025.83, N = 6 SE +/- 256.75, N = 3 598417.70 284001.92 197895.47 151286.25 1. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64
7-Zip Compression Test: Compression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Compression Rating HBv4 HBv3 HBv2 HC 200K 400K 600K 800K 1000K SE +/- 4158.65, N = 3 SE +/- 7198.45, N = 3 SE +/- 3504.63, N = 3 SE +/- 672.17, N = 3 1083523 566595 501534 216451 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
7-Zip Compression Test: Decompression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating HBv4 HBv3 HBv2 HC 160K 320K 480K 640K 800K SE +/- 8621.97, N = 3 SE +/- 3365.82, N = 3 SE +/- 10621.28, N = 3 SE +/- 300.63, N = 3 742859 406516 388577 150841 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 HBv4 HBv3 HBv2 HC 1200M 2400M 3600M 4800M 6000M SE +/- 24008123.63, N = 3 SE +/- 6263474.36, N = 3 SE +/- 14518991.39, N = 3 SE +/- 4733333.33, N = 3 5412900000 4216966667 4309133333 1570633333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 32 HBv4 HBv3 HBv2 HC 1300M 2600M 3900M 5200M 6500M SE +/- 6999365.05, N = 3 SE +/- 2858321.19, N = 3 SE +/- 25439885.57, N = 3 SE +/- 8873431.00, N = 3 6181766667 3864000000 4275533333 1536633333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 57 HBv4 HBv3 HBv2 HC 1500M 3000M 4500M 6000M 7500M SE +/- 36788419.07, N = 3 SE +/- 8996542.55, N = 3 SE +/- 8195730.60, N = 3 SE +/- 7033807.25, N = 3 7095033333 4281533333 4350100000 1683033333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 512 HBv4 HBv3 HBv2 HC 500M 1000M 1500M 2000M 2500M SE +/- 5336145.09, N = 3 SE +/- 1919487.78, N = 3 SE +/- 3265385.80, N = 3 SE +/- 2270626.44, N = 3 2221966667 814950000 924243333 544626667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C HBv4 HBv3 HBv2 HC 160K 320K 480K 640K 800K SE +/- 6061.11, N = 3 SE +/- 2034.04, N = 3 SE +/- 108.10, N = 3 SE +/- 62.47, N = 3 744413.90 313813.98 241509.88 106230.52 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C HBv4 HBv3 HBv2 HC 16K 32K 48K 64K 80K SE +/- 599.32, N = 3 SE +/- 503.29, N = 3 SE +/- 778.45, N = 15 SE +/- 218.98, N = 3 74101.94 36681.43 36367.35 27619.05 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C HBv4 HBv3 HBv2 HC 50K 100K 150K 200K 250K SE +/- 1773.50, N = 3 SE +/- 339.33, N = 3 SE +/- 320.45, N = 3 SE +/- 131.36, N = 3 230164.79 102122.36 98485.23 55288.19 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D HBv4 HBv3 HBv2 HC 3K 6K 9K 12K 15K SE +/- 308.75, N = 15 SE +/- 67.99, N = 4 SE +/- 35.84, N = 7 SE +/- 7.55, N = 3 12967.37 5730.01 3977.02 1864.68 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C HBv4 HBv3 HBv2 HC 90K 180K 270K 360K 450K SE +/- 5249.92, N = 15 SE +/- 1313.15, N = 15 SE +/- 768.30, N = 3 SE +/- 149.23, N = 3 437417.16 131635.41 108985.72 63404.01 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
NAS Parallel Benchmarks Test / Class: SP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C HBv4 HBv3 HBv2 HC 90K 180K 270K 360K 450K SE +/- 2970.97, N = 15 SE +/- 1576.20, N = 3 SE +/- 324.54, N = 3 SE +/- 105.69, N = 3 427298.99 205795.59 104771.90 41543.94 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
PostgreSQL Scaling Factor: 1 - Clients: 500 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only HBv4 HBv3 HBv2 HC 700K 1400K 2100K 2800K 3500K SE +/- 3042.04, N = 3 SE +/- 28428.57, N = 4 SE +/- 4710.42, N = 3 SE +/- 2849.38, N = 3 3161848 2434749 2467328 1353510 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL Scaling Factor: 1 - Clients: 800 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only HBv4 HBv3 HBv2 HC 700K 1400K 2100K 2800K 3500K SE +/- 2972.36, N = 3 SE +/- 13675.06, N = 3 SE +/- 9212.17, N = 3 SE +/- 2818.34, N = 3 3146173 2478917 2481320 1159492 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms HBv4 HBv3 HBv2 HC 0.1186 0.2372 0.3558 0.4744 0.593 SE +/- 0.00011, N = 3 SE +/- 0.00015, N = 3 SE +/- 0.00069, N = 3 SE +/- 0.00060, N = 3 0.14380 0.27111 0.26505 0.52697
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig HBv4 HBv3 HBv2 HC 6 12 18 24 30 SE +/- 0.018282, N = 3 SE +/- 0.027453, N = 3 SE +/- 0.011742, N = 3 SE +/- 0.026763, N = 3 3.581391 6.277107 5.915805 25.019560 1. (CXX) g++ options: -fopenmp -pthread -lmpi
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig HBv4 HBv3 HBv2 HC 3 6 9 12 15 SE +/- 0.029043, N = 3 SE +/- 0.006682, N = 3 SE +/- 0.009233, N = 3 SE +/- 0.017495, N = 3 2.122074 3.649317 3.466885 10.645480 1. (CXX) g++ options: -fopenmp -pthread -lmpi
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU HBv4 HBv3 HBv2 HC 300 600 900 1200 1500 SE +/- 1.90, N = 3 SE +/- 6.66, N = 3 SE +/- 13.52, N = 15 SE +/- 1.51, N = 3 533.49 886.81 1367.73 707.32 MIN: 518.68 MIN: 849.06 MIN: 687.14 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU HBv4 HBv3 HBv2 HC 200 400 600 800 1000 SE +/- 3.60, N = 8 SE +/- 4.36, N = 3 SE +/- 9.54, N = 15 SE +/- 1.89, N = 3 411.23 529.97 910.94 442.47 MIN: 469.93 MIN: 429.93 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
PostgreSQL Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency HBv4 HBv3 HBv2 HC 0.083 0.166 0.249 0.332 0.415 SE +/- 0.000, N = 3 SE +/- 0.002, N = 4 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 0.158 0.206 0.203 0.369 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency HBv4 HBv3 HBv2 HC 0.1553 0.3106 0.4659 0.6212 0.7765 SE +/- 0.000, N = 3 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 0.254 0.323 0.323 0.690 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
Timed Node.js Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Node.js Compilation 19.8.1 Time To Compile HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 2.23, N = 12 SE +/- 1.46, N = 3 SE +/- 1.32, N = 3 SE +/- 2.37, N = 3 150.56 185.57 194.37 330.61
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: CPU-Only HBv4 HBv3 HBv2 HC 11 22 33 44 55 SE +/- 0.08, N = 3 SE +/- 0.10, N = 3 SE +/- 0.16, N = 3 SE +/- 0.36, N = 3 10.11 19.43 19.58 49.95
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Classroom - Compute: CPU-Only HBv4 HBv3 HBv2 HC 30 60 90 120 150 SE +/- 0.11, N = 3 SE +/- 0.06, N = 3 SE +/- 0.15, N = 3 SE +/- 0.04, N = 3 25.61 50.71 50.95 138.51
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only HBv4 HBv3 HBv2 HC 16 32 48 64 80 SE +/- 0.09, N = 3 SE +/- 0.15, N = 3 SE +/- 0.04, N = 3 SE +/- 0.23, N = 3 13.74 25.59 26.43 71.76
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Barbershop - Compute: CPU-Only HBv4 HBv3 HBv2 HC 110 220 330 440 550 SE +/- 0.47, N = 3 SE +/- 0.38, N = 3 SE +/- 0.22, N = 3 SE +/- 1.15, N = 3 97.52 188.96 211.46 526.93
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Pabellon Barcelona - Compute: CPU-Only HBv4 HBv3 HBv2 HC 40 80 120 160 200 SE +/- 0.12, N = 3 SE +/- 0.45, N = 3 SE +/- 0.28, N = 3 SE +/- 0.33, N = 3 33.01 62.90 64.84 175.07
Phoronix Test Suite v10.8.4