Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2308011-PTS-AZUREHBV71 Microsoft Azure HBv4 HPC Performance Benchmarks - Phoronix Test Suite Microsoft Azure HBv4 HPC Performance Benchmarks Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..
HTML result view exported from: https://openbenchmarking.org/result/2308011-PTS-AZUREHBV71&rdt&grs .
Microsoft Azure HBv4 HPC Performance Benchmarks Processor Motherboard Memory Disk Graphics OS Kernel Compiler File-System Screen Resolution System Layer HBv4 HBv3 HBv2 HC 2 x AMD EPYC 9V33X 96-Core (176 Cores) Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS) 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk hyperv_fb AlmaLinux 8.8 4.18.0-425.3.1.el8.x86_64 (x86_64) GCC 13.1.0 + CUDA 12.1 nfs 1024x768 microsoft 2 x AMD EPYC 7V73X 64-Core (120 Cores) 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk 2 x AMD EPYC 7V12 64-Core (120 Cores) 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk 2 x Intel Xeon Platinum 8168 (44 Cores) 1 GB + 60928 MB + 118272 MB + 176 GB 32GB Virtual Disk + 752GB Virtual Disk OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Environment Details - CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native" Compiler Details - --disable-multilib --enable-checking=release Processor Details - CPU Microcode: 0xffffffff Python Details - Python 3.6.8 Security Details - HBv4: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv3: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv2: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HC: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown
Microsoft Azure HBv4 HPC Performance Benchmarks npb: BT.C pennant: sedovbig npb: MG.C heffte: c2c - FFTW - float-long - 512 heffte: c2c - FFTW - float - 512 heffte: c2c - Stock - float - 512 heffte: c2c - Stock - float-long - 512 heffte: r2c - FFTW - float-long - 512 heffte: r2c - FFTW - float - 512 heffte: r2c - Stock - float - 512 blender: Classroom - CPU-Only blender: Barbershop - CPU-Only heffte: r2c - Stock - float-long - 512 blender: Pabellon Barcelona - CPU-Only blender: Fishy Cat - CPU-Only heffte: r2c - Stock - double - 512 heffte: r2c - Stock - double-long - 512 heffte: r2c - FFTW - double-long - 512 heffte: r2c - FFTW - double - 512 pennant: leblancbig compress-7zip: Compression Rating blender: BMW27 - CPU-Only compress-7zip: Decompression Rating heffte: c2c - Stock - double - 512 heffte: c2c - Stock - double-long - 512 heffte: r2c - FFTW - double-long - 256 heffte: c2c - FFTW - double - 512 heffte: c2c - FFTW - double-long - 512 heffte: c2c - FFTW - float - 256 heffte: r2c - Stock - double - 256 heffte: c2c - FFTW - float-long - 256 heffte: r2c - Stock - double-long - 256 liquid-dsp: 176 - 256 - 57 npb: FT.C ospray: particle_volume/scivis/real_time heffte: c2c - Stock - float - 256 liquid-dsp: 176 - 256 - 512 ospray: particle_volume/ao/real_time liquid-dsp: 176 - 256 - 32 namd: ATPase Simulation - 327,506 Atoms liquid-dsp: 128 - 256 - 57 hpcg: 160 160 160 - 60 hpcg: 104 104 104 - 60 hpcg: 144 144 144 - 60 ospray: gravity_spheres_volume/dim_512/pathtracer/real_time pgbench: 1 - 800 - Read Only - Average Latency pgbench: 1 - 800 - Read Only onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU pgbench: 1 - 500 - Read Only pgbench: 1 - 500 - Read Only - Average Latency onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU build-nodejs: Time To Compile libxsmm: 64 npb: SP.C oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only oidn: RTLightmap.hdr.4096x4096 - CPU-Only laghos: Sedov Blast Wave, ube_922_hex.mesh laghos: Triple Point Problem petsc: Streams ospray: gravity_spheres_volume/dim_512/scivis/real_time ospray: gravity_spheres_volume/dim_512/ao/real_time ospray: particle_volume/pathtracer/real_time mt-dgemm: Sustained Floating-Point Rate heffte: c2c - Stock - float-long - 256 heffte: r2c - FFTW - float-long - 256 heffte: r2c - FFTW - double - 256 libxsmm: 32 libxsmm: 256 libxsmm: 128 npb: IS.D npb: CG.C HBv4 HBv3 HBv2 HC 744413.90 3.581391 437417.16 355.512 355.855 323.356 323.696 624.951 622.580 596.226 25.61 97.52 590.925 33.01 13.74 311.803 311.267 315.982 314.336 2.122074 1083523 10.11 742859 154.648 154.568 273.121 159.175 159.258 256.349 264.954 255.968 258.716 7095033333 230164.79 36.5446 244.342 2221966667 36.6548 6181766667 0.14380 5412900000 87.9013 89.3840 88.5160 32.5839 0.254 3146173 533.494 3161848 0.158 411.234 150.558 5898.2 427298.99 3.08 3.11 1.32 402.94 228.15 598417.6957 37.0624 38.0769 208.050 52.802440 247.725 427.101 261.903 6163.0 6908.6 6655.2 12967.37 74101.94 313813.98 6.277107 131635.41 135.950 135.694 123.242 124.595 257.419 254.252 232.166 50.71 188.96 233.797 62.90 25.59 117.731 118.236 120.957 121.283 3.649317 566595 19.43 406516 56.2161 56.2690 106.632 57.3307 57.2263 103.5147 102.7046 105.093 105.5003 4281533333 102122.36 24.2197 103.409 814950000 24.4710 3864000000 0.27111 4216966667 39.1106 39.6093 38.9739 14.6088 0.323 2478917 886.810 2434749 0.206 529.973 185.567 2413.7 205795.59 1.69 1.72 0.80 361.81 192.74 284001.9162 11.1723 11.7501 167.504 25.048352 105.361 221.861 103.2457 1438.1 2045.7 2273.5 5730.01 36681.43 241509.88 5.915805 108985.72 96.4941 95.8801 93.7923 93.2573 191.141 191.775 190.949 50.95 211.46 189.208 64.84 26.43 94.5301 95.1989 91.4296 91.4802 3.466885 501534 19.58 388577 46.9794 46.9289 88.6081 47.6050 47.3696 91.5383 93.3137 90.7883 92.3883 4350100000 98485.23 22.1747 91.2601 924243333 22.3668 4275533333 0.26505 4309133333 36.0167 37.0410 36.0866 13.9416 0.323 2481320 1367.73 2467328 0.203 910.937 194.367 331.4 104771.90 2.01 2.03 0.96 345.14 183.82 197895.4717 8.32323 8.66888 162.449 6.395415 92.1290 200.035 91.9186 164.8 1128.3 1011.4 3977.02 36367.35 106230.52 25.01956 63404.01 62.9027 62.9750 57.7643 57.9203 113.940 114.025 110.049 138.51 526.93 110.197 175.07 71.76 59.8216 59.8954 60.8204 60.8804 10.64548 216451 49.95 150841 31.5718 31.5846 57.1290 33.5193 33.5545 58.3567 60.5727 58.5498 60.8872 1683033333 55288.19 8.87831 59.7292 544626667 8.99618 1536633333 0.52697 1570633333 25.5635 25.9971 25.8659 10.0611 0.690 1159492 707.322 1353510 0.369 442.471 330.613 748.1 41543.94 1.85 1.85 0.87 247.49 156.52 151286.2491 9.02689 9.52293 96.7630 14.072027 59.5527 122.772 57.3101 384.9 904.1 1284.8 1864.68 27619.05 OpenBenchmarking.org
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C HBv4 HBv3 HBv2 HC 160K 320K 480K 640K 800K SE +/- 6061.11, N = 3 SE +/- 2034.04, N = 3 SE +/- 108.10, N = 3 SE +/- 62.47, N = 3 744413.90 313813.98 241509.88 106230.52 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig HBv4 HBv3 HBv2 HC 6 12 18 24 30 SE +/- 0.018282, N = 3 SE +/- 0.027453, N = 3 SE +/- 0.011742, N = 3 SE +/- 0.026763, N = 3 3.581391 6.277107 5.915805 25.019560 1. (CXX) g++ options: -fopenmp -pthread -lmpi
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C HBv4 HBv3 HBv2 HC 90K 180K 270K 360K 450K SE +/- 5249.92, N = 15 SE +/- 1313.15, N = 15 SE +/- 768.30, N = 3 SE +/- 149.23, N = 3 437417.16 131635.41 108985.72 63404.01 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 80 160 240 320 400 SE +/- 1.18, N = 3 SE +/- 0.58, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 355.51 135.95 96.49 62.90 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 HBv4 HBv3 HBv2 HC 80 160 240 320 400 SE +/- 1.24, N = 3 SE +/- 0.93, N = 3 SE +/- 0.47, N = 3 SE +/- 0.04, N = 3 355.86 135.69 95.88 62.98 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 0.80, N = 3 SE +/- 0.73, N = 3 SE +/- 0.34, N = 3 SE +/- 0.02, N = 3 323.36 123.24 93.79 57.76 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 0.96, N = 3 SE +/- 0.05, N = 3 SE +/- 0.23, N = 3 SE +/- 0.06, N = 3 323.70 124.60 93.26 57.92 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 130 260 390 520 650 SE +/- 4.23, N = 3 SE +/- 2.91, N = 3 SE +/- 1.39, N = 3 SE +/- 0.18, N = 3 624.95 257.42 191.14 113.94 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 HBv4 HBv3 HBv2 HC 130 260 390 520 650 SE +/- 2.25, N = 3 SE +/- 2.52, N = 6 SE +/- 1.03, N = 3 SE +/- 0.09, N = 3 622.58 254.25 191.78 114.03 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 HBv4 HBv3 HBv2 HC 130 260 390 520 650 SE +/- 2.14, N = 3 SE +/- 1.85, N = 3 SE +/- 2.04, N = 3 SE +/- 0.06, N = 3 596.23 232.17 190.95 110.05 1. (CXX) g++ options: -O3 -pthread
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Classroom - Compute: CPU-Only HBv4 HBv3 HBv2 HC 30 60 90 120 150 SE +/- 0.11, N = 3 SE +/- 0.06, N = 3 SE +/- 0.15, N = 3 SE +/- 0.04, N = 3 25.61 50.71 50.95 138.51
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Barbershop - Compute: CPU-Only HBv4 HBv3 HBv2 HC 110 220 330 440 550 SE +/- 0.47, N = 3 SE +/- 0.38, N = 3 SE +/- 0.22, N = 3 SE +/- 1.15, N = 3 97.52 188.96 211.46 526.93
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 130 260 390 520 650 SE +/- 2.49, N = 3 SE +/- 0.15, N = 3 SE +/- 1.02, N = 3 SE +/- 0.10, N = 3 590.93 233.80 189.21 110.20 1. (CXX) g++ options: -O3 -pthread
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Pabellon Barcelona - Compute: CPU-Only HBv4 HBv3 HBv2 HC 40 80 120 160 200 SE +/- 0.12, N = 3 SE +/- 0.45, N = 3 SE +/- 0.28, N = 3 SE +/- 0.33, N = 3 33.01 62.90 64.84 175.07
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only HBv4 HBv3 HBv2 HC 16 32 48 64 80 SE +/- 0.09, N = 3 SE +/- 0.15, N = 3 SE +/- 0.04, N = 3 SE +/- 0.23, N = 3 13.74 25.59 26.43 71.76
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 1.60, N = 3 SE +/- 0.40, N = 3 SE +/- 0.25, N = 3 SE +/- 0.05, N = 3 311.80 117.73 94.53 59.82 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 0.81, N = 3 SE +/- 0.49, N = 3 SE +/- 0.16, N = 3 SE +/- 0.03, N = 3 311.27 118.24 95.20 59.90 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 1.65, N = 3 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 315.98 120.96 91.43 60.82 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 0.50, N = 3 SE +/- 0.86, N = 3 SE +/- 0.15, N = 3 SE +/- 0.05, N = 3 314.34 121.28 91.48 60.88 1. (CXX) g++ options: -O3 -pthread
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig HBv4 HBv3 HBv2 HC 3 6 9 12 15 SE +/- 0.029043, N = 3 SE +/- 0.006682, N = 3 SE +/- 0.009233, N = 3 SE +/- 0.017495, N = 3 2.122074 3.649317 3.466885 10.645480 1. (CXX) g++ options: -fopenmp -pthread -lmpi
7-Zip Compression Test: Compression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Compression Rating HBv4 HBv3 HBv2 HC 200K 400K 600K 800K 1000K SE +/- 4158.65, N = 3 SE +/- 7198.45, N = 3 SE +/- 3504.63, N = 3 SE +/- 672.17, N = 3 1083523 566595 501534 216451 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: CPU-Only HBv4 HBv3 HBv2 HC 11 22 33 44 55 SE +/- 0.08, N = 3 SE +/- 0.10, N = 3 SE +/- 0.16, N = 3 SE +/- 0.36, N = 3 10.11 19.43 19.58 49.95
7-Zip Compression Test: Decompression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating HBv4 HBv3 HBv2 HC 160K 320K 480K 640K 800K SE +/- 8621.97, N = 3 SE +/- 3365.82, N = 3 SE +/- 10621.28, N = 3 SE +/- 300.63, N = 3 742859 406516 388577 150841 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 HBv4 HBv3 HBv2 HC 30 60 90 120 150 SE +/- 0.27, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 154.65 56.22 46.98 31.57 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.09, N = 3 SE +/- 0.02, N = 3 154.57 56.27 46.93 31.58 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 4.03, N = 14 SE +/- 1.05, N = 3 SE +/- 1.12, N = 15 SE +/- 0.12, N = 3 273.12 106.63 88.61 57.13 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 HBv4 HBv3 HBv2 HC 40 80 120 160 200 SE +/- 0.34, N = 3 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 SE +/- 0.03, N = 3 159.18 57.33 47.61 33.52 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 HBv4 HBv3 HBv2 HC 40 80 120 160 200 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 159.26 57.23 47.37 33.55 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 1.07, N = 3 SE +/- 1.41, N = 15 SE +/- 0.67, N = 15 SE +/- 0.07, N = 3 256.35 103.51 91.54 58.36 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 4.27, N = 12 SE +/- 0.80, N = 15 SE +/- 1.10, N = 4 SE +/- 0.08, N = 3 264.95 102.70 93.31 60.57 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 3.64, N = 15 SE +/- 1.13, N = 3 SE +/- 0.74, N = 15 SE +/- 0.16, N = 3 255.97 105.09 90.79 58.55 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 2.84, N = 15 SE +/- 0.81, N = 15 SE +/- 1.27, N = 3 SE +/- 0.19, N = 3 258.72 105.50 92.39 60.89 1. (CXX) g++ options: -O3 -pthread
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 57 HBv4 HBv3 HBv2 HC 1500M 3000M 4500M 6000M 7500M SE +/- 36788419.07, N = 3 SE +/- 8996542.55, N = 3 SE +/- 8195730.60, N = 3 SE +/- 7033807.25, N = 3 7095033333 4281533333 4350100000 1683033333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C HBv4 HBv3 HBv2 HC 50K 100K 150K 200K 250K SE +/- 1773.50, N = 3 SE +/- 339.33, N = 3 SE +/- 320.45, N = 3 SE +/- 131.36, N = 3 230164.79 102122.36 98485.23 55288.19 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/scivis/real_time HBv4 HBv3 HBv2 HC 8 16 24 32 40 SE +/- 0.05762, N = 3 SE +/- 0.00564, N = 3 SE +/- 0.02944, N = 3 SE +/- 0.05412, N = 3 36.54460 24.21970 22.17470 8.87831
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 HBv4 HBv3 HBv2 HC 50 100 150 200 250 SE +/- 3.04, N = 4 SE +/- 0.77, N = 15 SE +/- 0.61, N = 15 SE +/- 0.02, N = 3 244.34 103.41 91.26 59.73 1. (CXX) g++ options: -O3 -pthread
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 512 HBv4 HBv3 HBv2 HC 500M 1000M 1500M 2000M 2500M SE +/- 5336145.09, N = 3 SE +/- 1919487.78, N = 3 SE +/- 3265385.80, N = 3 SE +/- 2270626.44, N = 3 2221966667 814950000 924243333 544626667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/ao/real_time HBv4 HBv3 HBv2 HC 8 16 24 32 40 SE +/- 0.04011, N = 3 SE +/- 0.00987, N = 3 SE +/- 0.00858, N = 3 SE +/- 0.01510, N = 3 36.65480 24.47100 22.36680 8.99618
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 32 HBv4 HBv3 HBv2 HC 1300M 2600M 3900M 5200M 6500M SE +/- 6999365.05, N = 3 SE +/- 2858321.19, N = 3 SE +/- 25439885.57, N = 3 SE +/- 8873431.00, N = 3 6181766667 3864000000 4275533333 1536633333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms HBv4 HBv3 HBv2 HC 0.1186 0.2372 0.3558 0.4744 0.593 SE +/- 0.00011, N = 3 SE +/- 0.00015, N = 3 SE +/- 0.00069, N = 3 SE +/- 0.00060, N = 3 0.14380 0.27111 0.26505 0.52697
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 HBv4 HBv3 HBv2 HC 1200M 2400M 3600M 4800M 6000M SE +/- 24008123.63, N = 3 SE +/- 6263474.36, N = 3 SE +/- 14518991.39, N = 3 SE +/- 4733333.33, N = 3 5412900000 4216966667 4309133333 1570633333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
High Performance Conjugate Gradient X Y Z: 160 160 160 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 HBv4 HBv3 HBv2 HC 20 40 60 80 100 SE +/- 0.12, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 87.90 39.11 36.02 25.56 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 HBv4 HBv3 HBv2 HC 20 40 60 80 100 SE +/- 0.26, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 89.38 39.61 37.04 26.00 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
High Performance Conjugate Gradient X Y Z: 144 144 144 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 HBv4 HBv3 HBv2 HC 20 40 60 80 100 SE +/- 0.11, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 88.52 38.97 36.09 25.87 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time HBv4 HBv3 HBv2 HC 8 16 24 32 40 SE +/- 0.08, N = 3 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 32.58 14.61 13.94 10.06
PostgreSQL Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency HBv4 HBv3 HBv2 HC 0.1553 0.3106 0.4659 0.6212 0.7765 SE +/- 0.000, N = 3 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 0.254 0.323 0.323 0.690 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL Scaling Factor: 1 - Clients: 800 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only HBv4 HBv3 HBv2 HC 700K 1400K 2100K 2800K 3500K SE +/- 2972.36, N = 3 SE +/- 13675.06, N = 3 SE +/- 9212.17, N = 3 SE +/- 2818.34, N = 3 3146173 2478917 2481320 1159492 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU HBv4 HBv3 HBv2 HC 300 600 900 1200 1500 SE +/- 1.90, N = 3 SE +/- 6.66, N = 3 SE +/- 13.52, N = 15 SE +/- 1.51, N = 3 533.49 886.81 1367.73 707.32 MIN: 518.68 MIN: 849.06 MIN: 687.14 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
PostgreSQL Scaling Factor: 1 - Clients: 500 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only HBv4 HBv3 HBv2 HC 700K 1400K 2100K 2800K 3500K SE +/- 3042.04, N = 3 SE +/- 28428.57, N = 4 SE +/- 4710.42, N = 3 SE +/- 2849.38, N = 3 3161848 2434749 2467328 1353510 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency HBv4 HBv3 HBv2 HC 0.083 0.166 0.249 0.332 0.415 SE +/- 0.000, N = 3 SE +/- 0.002, N = 4 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 0.158 0.206 0.203 0.369 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU HBv4 HBv3 HBv2 HC 200 400 600 800 1000 SE +/- 3.60, N = 8 SE +/- 4.36, N = 3 SE +/- 9.54, N = 15 SE +/- 1.89, N = 3 411.23 529.97 910.94 442.47 MIN: 469.93 MIN: 429.93 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Timed Node.js Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Node.js Compilation 19.8.1 Time To Compile HBv4 HBv3 HBv2 HC 70 140 210 280 350 SE +/- 2.23, N = 12 SE +/- 1.46, N = 3 SE +/- 1.32, N = 3 SE +/- 2.37, N = 3 150.56 185.57 194.37 330.61
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 HBv4 HBv3 HBv2 HC 1300 2600 3900 5200 6500 SE +/- 74.65, N = 3 SE +/- 8.24, N = 3 SE +/- 2.64, N = 15 SE +/- 7.70, N = 3 5898.2 2413.7 331.4 748.1 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
NAS Parallel Benchmarks Test / Class: SP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C HBv4 HBv3 HBv2 HC 90K 180K 270K 360K 450K SE +/- 2970.97, N = 15 SE +/- 1576.20, N = 3 SE +/- 324.54, N = 3 SE +/- 105.69, N = 3 427298.99 205795.59 104771.90 41543.94 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only HBv4 HBv3 HBv2 HC 0.693 1.386 2.079 2.772 3.465 SE +/- 0.02, N = 3 SE +/- 0.01, N = 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 3.08 1.69 2.01 1.85
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only HBv4 HBv3 HBv2 HC 0.6998 1.3996 2.0994 2.7992 3.499 SE +/- 0.03, N = 3 SE +/- 0.02, N = 4 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 3.11 1.72 2.03 1.85
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only HBv4 HBv3 HBv2 HC 0.297 0.594 0.891 1.188 1.485 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 15 SE +/- 0.00, N = 3 1.32 0.80 0.96 0.87
Laghos Test: Sedov Blast Wave, ube_922_hex.mesh OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh HBv4 HBv3 HBv2 HC 90 180 270 360 450 SE +/- 0.78, N = 3 SE +/- 0.15, N = 3 SE +/- 3.57, N = 5 SE +/- 1.35, N = 3 402.94 361.81 345.14 247.49 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi
Laghos Test: Triple Point Problem OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Triple Point Problem HBv4 HBv3 HBv2 HC 50 100 150 200 250 SE +/- 1.25, N = 3 SE +/- 0.38, N = 3 SE +/- 0.57, N = 3 SE +/- 0.08, N = 3 228.15 192.74 183.82 156.52 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi
PETSc Test: Streams OpenBenchmarking.org MB/s, More Is Better PETSc 3.19 Test: Streams HBv4 HBv3 HBv2 HC 130K 260K 390K 520K 650K SE +/- 46271.80, N = 9 SE +/- 2674.31, N = 7 SE +/- 12025.83, N = 6 SE +/- 256.75, N = 3 598417.70 284001.92 197895.47 151286.25 1. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time HBv4 HBv3 HBv2 HC 9 18 27 36 45 SE +/- 0.12574, N = 3 SE +/- 0.02977, N = 3 SE +/- 0.13284, N = 15 SE +/- 0.01641, N = 3 37.06240 11.17230 8.32323 9.02689
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time HBv4 HBv3 HBv2 HC 9 18 27 36 45 SE +/- 0.02835, N = 3 SE +/- 0.01464, N = 3 SE +/- 0.15055, N = 15 SE +/- 0.03191, N = 3 38.07690 11.75010 8.66888 9.52293
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time HBv4 HBv3 HBv2 HC 50 100 150 200 250 SE +/- 0.81, N = 3 SE +/- 1.50, N = 7 SE +/- 0.83, N = 3 SE +/- 7.22, N = 9 208.05 167.50 162.45 96.76
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate HBv4 HBv3 HBv2 HC 12 24 36 48 60 SE +/- 0.581762, N = 5 SE +/- 0.146977, N = 3 SE +/- 0.275809, N = 12 SE +/- 0.474074, N = 12 52.802440 25.048352 6.395415 14.072027 1. (CC) gcc options: -O3 -march=native -fopenmp
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 50 100 150 200 250 SE +/- 4.85, N = 15 SE +/- 1.07, N = 6 SE +/- 1.33, N = 3 SE +/- 0.27, N = 3 247.73 105.36 92.13 59.55 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 HBv4 HBv3 HBv2 HC 90 180 270 360 450 SE +/- 10.91, N = 15 SE +/- 3.45, N = 15 SE +/- 3.34, N = 12 SE +/- 0.53, N = 3 427.10 221.86 200.04 122.77 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 HBv4 HBv3 HBv2 HC 60 120 180 240 300 SE +/- 5.66, N = 15 SE +/- 0.75, N = 15 SE +/- 1.31, N = 3 SE +/- 0.25, N = 3 261.90 103.25 91.92 57.31 1. (CXX) g++ options: -O3 -pthread
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 HBv4 HBv3 HBv2 HC 1300 2600 3900 5200 6500 SE +/- 87.98, N = 3 SE +/- 38.99, N = 12 SE +/- 1.72, N = 3 SE +/- 3.15, N = 9 6163.0 1438.1 164.8 384.9 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 HBv4 HBv3 HBv2 HC 1500 3000 4500 6000 7500 SE +/- 57.85, N = 9 SE +/- 25.11, N = 4 SE +/- 17.53, N = 9 SE +/- 23.39, N = 9 6908.6 2045.7 1128.3 904.1 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 HBv4 HBv3 HBv2 HC 1400 2800 4200 5600 7000 SE +/- 59.23, N = 3 SE +/- 20.51, N = 9 SE +/- 169.50, N = 9 SE +/- 13.64, N = 15 6655.2 2273.5 1011.4 1284.8 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D HBv4 HBv3 HBv2 HC 3K 6K 9K 12K 15K SE +/- 308.75, N = 15 SE +/- 67.99, N = 4 SE +/- 35.84, N = 7 SE +/- 7.55, N = 3 12967.37 5730.01 3977.02 1864.68 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C HBv4 HBv3 HBv2 HC 16K 32K 48K 64K 80K SE +/- 599.32, N = 3 SE +/- 503.29, N = 3 SE +/- 778.45, N = 15 SE +/- 218.98, N = 3 74101.94 36681.43 36367.35 27619.05 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Phoronix Test Suite v10.8.4