Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2308011-PTS-AZUREHBV71 Microsoft Azure HBv4 HPC Performance Benchmarks - Phoronix Test Suite Microsoft Azure HBv4 HPC Performance Benchmarks Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..
HTML result view exported from: https://openbenchmarking.org/result/2308011-PTS-AZUREHBV71&grr .
Microsoft Azure HBv4 HPC Performance Benchmarks Processor Motherboard Memory Disk Graphics OS Kernel Compiler File-System Screen Resolution System Layer HC HBv2 HBv3 HBv4 2 x Intel Xeon Platinum 8168 (44 Cores) Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS) 1 GB + 60928 MB + 118272 MB + 176 GB 32GB Virtual Disk + 752GB Virtual Disk hyperv_fb AlmaLinux 8.8 4.18.0-425.3.1.el8.x86_64 (x86_64) GCC 13.1.0 + CUDA 12.1 nfs 1024x768 microsoft 2 x AMD EPYC 7V12 64-Core (120 Cores) 1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk 2 x AMD EPYC 7V73X 64-Core (120 Cores) 2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk 2 x AMD EPYC 9V33X 96-Core (176 Cores) 1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB 2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Environment Details - CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native" Compiler Details - --disable-multilib --enable-checking=release Processor Details - CPU Microcode: 0xffffffff Python Details - Python 3.6.8 Security Details - HC: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown - HBv2: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv3: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv4: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Microsoft Azure HBv4 HPC Performance Benchmarks libxsmm: 128 petsc: Streams hpcg: 160 160 160 - 60 libxsmm: 256 hpcg: 144 144 144 - 60 build-nodejs: Time To Compile ospray: particle_volume/pathtracer/real_time blender: Barbershop - CPU-Only hpcg: 104 104 104 - 60 ospray: particle_volume/scivis/real_time pgbench: 1 - 500 - Read Only - Average Latency pgbench: 1 - 500 - Read Only pgbench: 1 - 800 - Read Only - Average Latency pgbench: 1 - 800 - Read Only ospray: gravity_spheres_volume/dim_512/scivis/real_time ospray: gravity_spheres_volume/dim_512/ao/real_time laghos: Sedov Blast Wave, ube_922_hex.mesh ospray: particle_volume/ao/real_time blender: Pabellon Barcelona - CPU-Only onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU blender: Classroom - CPU-Only laghos: Triple Point Problem mt-dgemm: Sustained Floating-Point Rate ospray: gravity_spheres_volume/dim_512/pathtracer/real_time libxsmm: 64 oidn: RTLightmap.hdr.4096x4096 - CPU-Only heffte: c2c - Stock - double-long - 512 heffte: c2c - Stock - double - 512 heffte: c2c - FFTW - double-long - 512 heffte: c2c - FFTW - double - 512 blender: Fishy Cat - CPU-Only liquid-dsp: 176 - 256 - 512 liquid-dsp: 176 - 256 - 57 liquid-dsp: 176 - 256 - 32 npb: SP.C namd: ATPase Simulation - 327,506 Atoms liquid-dsp: 128 - 256 - 57 npb: IS.D compress-7zip: Decompression Rating compress-7zip: Compression Rating blender: BMW27 - CPU-Only heffte: r2c - FFTW - float-long - 256 libxsmm: 32 heffte: r2c - FFTW - double - 256 heffte: r2c - Stock - double-long - 256 heffte: c2c - FFTW - float-long - 256 heffte: c2c - Stock - float-long - 512 heffte: c2c - Stock - float - 512 heffte: r2c - FFTW - double-long - 256 heffte: r2c - Stock - double-long - 512 heffte: r2c - Stock - double - 512 heffte: r2c - FFTW - double-long - 512 heffte: r2c - FFTW - double - 512 heffte: r2c - Stock - double - 256 oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only heffte: c2c - FFTW - float - 512 heffte: c2c - FFTW - float-long - 512 heffte: c2c - Stock - float - 256 heffte: c2c - FFTW - float - 256 heffte: c2c - Stock - float-long - 256 npb: BT.C heffte: r2c - FFTW - float - 512 npb: MG.C pennant: sedovbig heffte: r2c - Stock - float - 512 heffte: r2c - Stock - float-long - 512 heffte: r2c - FFTW - float-long - 512 npb: CG.C oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only pennant: leblancbig npb: FT.C HC HBv2 HBv3 HBv4 1284.8 151286.2491 25.5635 904.1 25.8659 330.613 96.7630 526.93 25.9971 8.87831 0.369 1353510 0.690 1159492 9.02689 9.52293 247.49 8.99618 175.07 707.322 442.471 138.51 156.52 14.072027 10.0611 748.1 0.87 31.5846 31.5718 33.5545 33.5193 71.76 544626667 1683033333 1536633333 41543.94 0.52697 1570633333 1864.68 150841 216451 49.95 122.772 384.9 57.3101 60.8872 58.5498 57.9203 57.7643 57.1290 59.8954 59.8216 60.8204 60.8804 60.5727 1.85 62.9750 62.9027 59.7292 58.3567 59.5527 106230.52 114.025 63404.01 25.01956 110.049 110.197 113.940 27619.05 1.85 10.64548 55288.19 1011.4 197895.4717 36.0167 1128.3 36.0866 194.367 162.449 211.46 37.0410 22.1747 0.203 2467328 0.323 2481320 8.32323 8.66888 345.14 22.3668 64.84 1367.73 910.937 50.95 183.82 6.395415 13.9416 331.4 0.96 46.9289 46.9794 47.3696 47.6050 26.43 924243333 4350100000 4275533333 104771.90 0.26505 4309133333 3977.02 388577 501534 19.58 200.035 164.8 91.9186 92.3883 90.7883 93.2573 93.7923 88.6081 95.1989 94.5301 91.4296 91.4802 93.3137 2.01 95.8801 96.4941 91.2601 91.5383 92.1290 241509.88 191.775 108985.72 5.915805 190.949 189.208 191.141 36367.35 2.03 3.466885 98485.23 2273.5 284001.9162 39.1106 2045.7 38.9739 185.567 167.504 188.96 39.6093 24.2197 0.206 2434749 0.323 2478917 11.1723 11.7501 361.81 24.4710 62.90 886.810 529.973 50.71 192.74 25.048352 14.6088 2413.7 0.80 56.2690 56.2161 57.2263 57.3307 25.59 814950000 4281533333 3864000000 205795.59 0.27111 4216966667 5730.01 406516 566595 19.43 221.861 1438.1 103.2457 105.5003 105.093 124.595 123.242 106.632 118.236 117.731 120.957 121.283 102.7046 1.69 135.694 135.950 103.409 103.5147 105.361 313813.98 254.252 131635.41 6.277107 232.166 233.797 257.419 36681.43 1.72 3.649317 102122.36 6655.2 598417.6957 87.9013 6908.6 88.5160 150.558 208.050 97.52 89.3840 36.5446 0.158 3161848 0.254 3146173 37.0624 38.0769 402.94 36.6548 33.01 533.494 411.234 25.61 228.15 52.802440 32.5839 5898.2 1.32 154.568 154.648 159.258 159.175 13.74 2221966667 7095033333 6181766667 427298.99 0.14380 5412900000 12967.37 742859 1083523 10.11 427.101 6163.0 261.903 258.716 255.968 323.696 323.356 273.121 311.267 311.803 315.982 314.336 264.954 3.08 355.855 355.512 244.342 256.349 247.725 744413.90 622.580 437417.16 3.581391 596.226 590.925 624.951 74101.94 3.11 2.122074 230164.79 OpenBenchmarking.org
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 HC HBv2 HBv3 HBv4 1400 2800 4200 5600 7000 SE +/- 13.64, N = 15 SE +/- 169.50, N = 9 SE +/- 20.51, N = 9 SE +/- 59.23, N = 3 1284.8 1011.4 2273.5 6655.2 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
PETSc Test: Streams OpenBenchmarking.org MB/s, More Is Better PETSc 3.19 Test: Streams HC HBv2 HBv3 HBv4 130K 260K 390K 520K 650K SE +/- 256.75, N = 3 SE +/- 12025.83, N = 6 SE +/- 2674.31, N = 7 SE +/- 46271.80, N = 9 151286.25 197895.47 284001.92 598417.70 1. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64
High Performance Conjugate Gradient X Y Z: 160 160 160 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 HC HBv2 HBv3 HBv4 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.12, N = 3 25.56 36.02 39.11 87.90 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 HC HBv2 HBv3 HBv4 1500 3000 4500 6000 7500 SE +/- 23.39, N = 9 SE +/- 17.53, N = 9 SE +/- 25.11, N = 4 SE +/- 57.85, N = 9 904.1 1128.3 2045.7 6908.6 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
High Performance Conjugate Gradient X Y Z: 144 144 144 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 HC HBv2 HBv3 HBv4 20 40 60 80 100 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.11, N = 3 25.87 36.09 38.97 88.52 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
Timed Node.js Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Node.js Compilation 19.8.1 Time To Compile HC HBv2 HBv3 HBv4 70 140 210 280 350 SE +/- 2.37, N = 3 SE +/- 1.32, N = 3 SE +/- 1.46, N = 3 SE +/- 2.23, N = 12 330.61 194.37 185.57 150.56
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time HC HBv2 HBv3 HBv4 50 100 150 200 250 SE +/- 7.22, N = 9 SE +/- 0.83, N = 3 SE +/- 1.50, N = 7 SE +/- 0.81, N = 3 96.76 162.45 167.50 208.05
Blender Blend File: Barbershop - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Barbershop - Compute: CPU-Only HC HBv2 HBv3 HBv4 110 220 330 440 550 SE +/- 1.15, N = 3 SE +/- 0.22, N = 3 SE +/- 0.38, N = 3 SE +/- 0.47, N = 3 526.93 211.46 188.96 97.52
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 HC HBv2 HBv3 HBv4 20 40 60 80 100 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.26, N = 3 26.00 37.04 39.61 89.38 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/scivis/real_time HC HBv2 HBv3 HBv4 8 16 24 32 40 SE +/- 0.05412, N = 3 SE +/- 0.02944, N = 3 SE +/- 0.00564, N = 3 SE +/- 0.05762, N = 3 8.87831 22.17470 24.21970 36.54460
PostgreSQL Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency HC HBv2 HBv3 HBv4 0.083 0.166 0.249 0.332 0.415 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 SE +/- 0.002, N = 4 SE +/- 0.000, N = 3 0.369 0.203 0.206 0.158 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL Scaling Factor: 1 - Clients: 500 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 500 - Mode: Read Only HC HBv2 HBv3 HBv4 700K 1400K 2100K 2800K 3500K SE +/- 2849.38, N = 3 SE +/- 4710.42, N = 3 SE +/- 28428.57, N = 4 SE +/- 3042.04, N = 3 1353510 2467328 2434749 3161848 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency OpenBenchmarking.org ms, Fewer Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency HC HBv2 HBv3 HBv4 0.1553 0.3106 0.4659 0.6212 0.7765 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 SE +/- 0.000, N = 3 0.690 0.323 0.323 0.254 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
PostgreSQL Scaling Factor: 1 - Clients: 800 - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL 15 Scaling Factor: 1 - Clients: 800 - Mode: Read Only HC HBv2 HBv3 HBv4 700K 1400K 2100K 2800K 3500K SE +/- 2818.34, N = 3 SE +/- 9212.17, N = 3 SE +/- 13675.06, N = 3 SE +/- 2972.36, N = 3 1159492 2481320 2478917 3146173 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time HC HBv2 HBv3 HBv4 9 18 27 36 45 SE +/- 0.01641, N = 3 SE +/- 0.13284, N = 15 SE +/- 0.02977, N = 3 SE +/- 0.12574, N = 3 9.02689 8.32323 11.17230 37.06240
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time HC HBv2 HBv3 HBv4 9 18 27 36 45 SE +/- 0.03191, N = 3 SE +/- 0.15055, N = 15 SE +/- 0.01464, N = 3 SE +/- 0.02835, N = 3 9.52293 8.66888 11.75010 38.07690
Laghos Test: Sedov Blast Wave, ube_922_hex.mesh OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh HC HBv2 HBv3 HBv4 90 180 270 360 450 SE +/- 1.35, N = 3 SE +/- 3.57, N = 5 SE +/- 0.15, N = 3 SE +/- 0.78, N = 3 247.49 345.14 361.81 402.94 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/ao/real_time HC HBv2 HBv3 HBv4 8 16 24 32 40 SE +/- 0.01510, N = 3 SE +/- 0.00858, N = 3 SE +/- 0.00987, N = 3 SE +/- 0.04011, N = 3 8.99618 22.36680 24.47100 36.65480
Blender Blend File: Pabellon Barcelona - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Pabellon Barcelona - Compute: CPU-Only HC HBv2 HBv3 HBv4 40 80 120 160 200 SE +/- 0.33, N = 3 SE +/- 0.28, N = 3 SE +/- 0.45, N = 3 SE +/- 0.12, N = 3 175.07 64.84 62.90 33.01
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU HC HBv2 HBv3 HBv4 300 600 900 1200 1500 SE +/- 1.51, N = 3 SE +/- 13.52, N = 15 SE +/- 6.66, N = 3 SE +/- 1.90, N = 3 707.32 1367.73 886.81 533.49 MIN: 687.14 MIN: 849.06 MIN: 518.68 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU HC HBv2 HBv3 HBv4 200 400 600 800 1000 SE +/- 1.89, N = 3 SE +/- 9.54, N = 15 SE +/- 4.36, N = 3 SE +/- 3.60, N = 8 442.47 910.94 529.97 411.23 MIN: 429.93 MIN: 469.93 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Blender Blend File: Classroom - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Classroom - Compute: CPU-Only HC HBv2 HBv3 HBv4 30 60 90 120 150 SE +/- 0.04, N = 3 SE +/- 0.15, N = 3 SE +/- 0.06, N = 3 SE +/- 0.11, N = 3 138.51 50.95 50.71 25.61
Laghos Test: Triple Point Problem OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Triple Point Problem HC HBv2 HBv3 HBv4 50 100 150 200 250 SE +/- 0.08, N = 3 SE +/- 0.57, N = 3 SE +/- 0.38, N = 3 SE +/- 1.25, N = 3 156.52 183.82 192.74 228.15 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate HC HBv2 HBv3 HBv4 12 24 36 48 60 SE +/- 0.474074, N = 12 SE +/- 0.275809, N = 12 SE +/- 0.146977, N = 3 SE +/- 0.581762, N = 5 14.072027 6.395415 25.048352 52.802440 1. (CC) gcc options: -O3 -march=native -fopenmp
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time HC HBv2 HBv3 HBv4 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 10.06 13.94 14.61 32.58
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 HC HBv2 HBv3 HBv4 1300 2600 3900 5200 6500 SE +/- 7.70, N = 3 SE +/- 2.64, N = 15 SE +/- 8.24, N = 3 SE +/- 74.65, N = 3 748.1 331.4 2413.7 5898.2 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only HC HBv2 HBv3 HBv4 0.297 0.594 0.891 1.188 1.485 SE +/- 0.00, N = 3 SE +/- 0.01, N = 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 0.87 0.96 0.80 1.32
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 HC HBv2 HBv3 HBv4 30 60 90 120 150 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 31.58 46.93 56.27 154.57 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double - X Y Z: 512 HC HBv2 HBv3 HBv4 30 60 90 120 150 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 SE +/- 0.27, N = 3 31.57 46.98 56.22 154.65 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 HC HBv2 HBv3 HBv4 40 80 120 160 200 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 33.55 47.37 57.23 159.26 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512 HC HBv2 HBv3 HBv4 40 80 120 160 200 SE +/- 0.03, N = 3 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 SE +/- 0.34, N = 3 33.52 47.61 57.33 159.18 1. (CXX) g++ options: -O3 -pthread
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only HC HBv2 HBv3 HBv4 16 32 48 64 80 SE +/- 0.23, N = 3 SE +/- 0.04, N = 3 SE +/- 0.15, N = 3 SE +/- 0.09, N = 3 71.76 26.43 25.59 13.74
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 512 HC HBv2 HBv3 HBv4 500M 1000M 1500M 2000M 2500M SE +/- 2270626.44, N = 3 SE +/- 3265385.80, N = 3 SE +/- 1919487.78, N = 3 SE +/- 5336145.09, N = 3 544626667 924243333 814950000 2221966667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 57 HC HBv2 HBv3 HBv4 1500M 3000M 4500M 6000M 7500M SE +/- 7033807.25, N = 3 SE +/- 8195730.60, N = 3 SE +/- 8996542.55, N = 3 SE +/- 36788419.07, N = 3 1683033333 4350100000 4281533333 7095033333 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
Liquid-DSP Threads: 176 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 176 - Buffer Length: 256 - Filter Length: 32 HC HBv2 HBv3 HBv4 1300M 2600M 3900M 5200M 6500M SE +/- 8873431.00, N = 3 SE +/- 25439885.57, N = 3 SE +/- 2858321.19, N = 3 SE +/- 6999365.05, N = 3 1536633333 4275533333 3864000000 6181766667 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
NAS Parallel Benchmarks Test / Class: SP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: SP.C HC HBv2 HBv3 HBv4 90K 180K 270K 360K 450K SE +/- 105.69, N = 3 SE +/- 324.54, N = 3 SE +/- 1576.20, N = 3 SE +/- 2970.97, N = 15 41543.94 104771.90 205795.59 427298.99 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms HC HBv2 HBv3 HBv4 0.1186 0.2372 0.3558 0.4744 0.593 SE +/- 0.00060, N = 3 SE +/- 0.00069, N = 3 SE +/- 0.00015, N = 3 SE +/- 0.00011, N = 3 0.52697 0.26505 0.27111 0.14380
Liquid-DSP Threads: 128 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 128 - Buffer Length: 256 - Filter Length: 57 HC HBv2 HBv3 HBv4 1200M 2400M 3600M 4800M 6000M SE +/- 4733333.33, N = 3 SE +/- 14518991.39, N = 3 SE +/- 6263474.36, N = 3 SE +/- 24008123.63, N = 3 1570633333 4309133333 4216966667 5412900000 1. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
NAS Parallel Benchmarks Test / Class: IS.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: IS.D HC HBv2 HBv3 HBv4 3K 6K 9K 12K 15K SE +/- 7.55, N = 3 SE +/- 35.84, N = 7 SE +/- 67.99, N = 4 SE +/- 308.75, N = 15 1864.68 3977.02 5730.01 12967.37 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
7-Zip Compression Test: Decompression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Decompression Rating HC HBv2 HBv3 HBv4 160K 320K 480K 640K 800K SE +/- 300.63, N = 3 SE +/- 10621.28, N = 3 SE +/- 3365.82, N = 3 SE +/- 8621.97, N = 3 150841 388577 406516 742859 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
7-Zip Compression Test: Compression Rating OpenBenchmarking.org MIPS, More Is Better 7-Zip Compression 22.01 Test: Compression Rating HC HBv2 HBv3 HBv4 200K 400K 600K 800K 1000K SE +/- 672.17, N = 3 SE +/- 3504.63, N = 3 SE +/- 7198.45, N = 3 SE +/- 4158.65, N = 3 216451 501534 566595 1083523 1. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: CPU-Only HC HBv2 HBv3 HBv4 11 22 33 44 55 SE +/- 0.36, N = 3 SE +/- 0.16, N = 3 SE +/- 0.10, N = 3 SE +/- 0.08, N = 3 49.95 19.58 19.43 10.11
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256 HC HBv2 HBv3 HBv4 90 180 270 360 450 SE +/- 0.53, N = 3 SE +/- 3.34, N = 12 SE +/- 3.45, N = 15 SE +/- 10.91, N = 15 122.77 200.04 221.86 427.10 1. (CXX) g++ options: -O3 -pthread
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 HC HBv2 HBv3 HBv4 1300 2600 3900 5200 6500 SE +/- 3.15, N = 9 SE +/- 1.72, N = 3 SE +/- 38.99, N = 12 SE +/- 87.98, N = 3 384.9 164.8 1438.1 6163.0 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 HC HBv2 HBv3 HBv4 60 120 180 240 300 SE +/- 0.25, N = 3 SE +/- 1.31, N = 3 SE +/- 0.75, N = 15 SE +/- 5.66, N = 15 57.31 91.92 103.25 261.90 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 HC HBv2 HBv3 HBv4 60 120 180 240 300 SE +/- 0.19, N = 3 SE +/- 1.27, N = 3 SE +/- 0.81, N = 15 SE +/- 2.84, N = 15 60.89 92.39 105.50 258.72 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256 HC HBv2 HBv3 HBv4 60 120 180 240 300 SE +/- 0.16, N = 3 SE +/- 0.74, N = 15 SE +/- 1.13, N = 3 SE +/- 3.64, N = 15 58.55 90.79 105.09 255.97 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512 HC HBv2 HBv3 HBv4 70 140 210 280 350 SE +/- 0.06, N = 3 SE +/- 0.23, N = 3 SE +/- 0.05, N = 3 SE +/- 0.96, N = 3 57.92 93.26 124.60 323.70 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 512 HC HBv2 HBv3 HBv4 70 140 210 280 350 SE +/- 0.02, N = 3 SE +/- 0.34, N = 3 SE +/- 0.73, N = 3 SE +/- 0.80, N = 3 57.76 93.79 123.24 323.36 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 HC HBv2 HBv3 HBv4 60 120 180 240 300 SE +/- 0.12, N = 3 SE +/- 1.12, N = 15 SE +/- 1.05, N = 3 SE +/- 4.03, N = 14 57.13 88.61 106.63 273.12 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 HC HBv2 HBv3 HBv4 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.16, N = 3 SE +/- 0.49, N = 3 SE +/- 0.81, N = 3 59.90 95.20 118.24 311.27 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 512 HC HBv2 HBv3 HBv4 70 140 210 280 350 SE +/- 0.05, N = 3 SE +/- 0.25, N = 3 SE +/- 0.40, N = 3 SE +/- 1.60, N = 3 59.82 94.53 117.73 311.80 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 HC HBv2 HBv3 HBv4 70 140 210 280 350 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 SE +/- 1.65, N = 3 60.82 91.43 120.96 315.98 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512 HC HBv2 HBv3 HBv4 70 140 210 280 350 SE +/- 0.05, N = 3 SE +/- 0.15, N = 3 SE +/- 0.86, N = 3 SE +/- 0.50, N = 3 60.88 91.48 121.28 314.34 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double - X Y Z: 256 HC HBv2 HBv3 HBv4 60 120 180 240 300 SE +/- 0.08, N = 3 SE +/- 1.10, N = 4 SE +/- 0.80, N = 15 SE +/- 4.27, N = 12 60.57 93.31 102.70 264.95 1. (CXX) g++ options: -O3 -pthread
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only HC HBv2 HBv3 HBv4 0.693 1.386 2.079 2.772 3.465 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 15 SE +/- 0.02, N = 3 1.85 2.01 1.69 3.08
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512 HC HBv2 HBv3 HBv4 80 160 240 320 400 SE +/- 0.04, N = 3 SE +/- 0.47, N = 3 SE +/- 0.93, N = 3 SE +/- 1.24, N = 3 62.98 95.88 135.69 355.86 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512 HC HBv2 HBv3 HBv4 80 160 240 320 400 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.58, N = 3 SE +/- 1.18, N = 3 62.90 96.49 135.95 355.51 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float - X Y Z: 256 HC HBv2 HBv3 HBv4 50 100 150 200 250 SE +/- 0.02, N = 3 SE +/- 0.61, N = 15 SE +/- 0.77, N = 15 SE +/- 3.04, N = 4 59.73 91.26 103.41 244.34 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 HC HBv2 HBv3 HBv4 60 120 180 240 300 SE +/- 0.07, N = 3 SE +/- 0.67, N = 15 SE +/- 1.41, N = 15 SE +/- 1.07, N = 3 58.36 91.54 103.51 256.35 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256 HC HBv2 HBv3 HBv4 50 100 150 200 250 SE +/- 0.27, N = 3 SE +/- 1.33, N = 3 SE +/- 1.07, N = 6 SE +/- 4.85, N = 15 59.55 92.13 105.36 247.73 1. (CXX) g++ options: -O3 -pthread
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C HC HBv2 HBv3 HBv4 160K 320K 480K 640K 800K SE +/- 62.47, N = 3 SE +/- 108.10, N = 3 SE +/- 2034.04, N = 3 SE +/- 6061.11, N = 3 106230.52 241509.88 313813.98 744413.90 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512 HC HBv2 HBv3 HBv4 130 260 390 520 650 SE +/- 0.09, N = 3 SE +/- 1.03, N = 3 SE +/- 2.52, N = 6 SE +/- 2.25, N = 3 114.03 191.78 254.25 622.58 1. (CXX) g++ options: -O3 -pthread
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C HC HBv2 HBv3 HBv4 90K 180K 270K 360K 450K SE +/- 149.23, N = 3 SE +/- 768.30, N = 3 SE +/- 1313.15, N = 15 SE +/- 5249.92, N = 15 63404.01 108985.72 131635.41 437417.16 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig HC HBv2 HBv3 HBv4 6 12 18 24 30 SE +/- 0.026763, N = 3 SE +/- 0.011742, N = 3 SE +/- 0.027453, N = 3 SE +/- 0.018282, N = 3 25.019560 5.915805 6.277107 3.581391 1. (CXX) g++ options: -fopenmp -pthread -lmpi
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float - X Y Z: 512 HC HBv2 HBv3 HBv4 130 260 390 520 650 SE +/- 0.06, N = 3 SE +/- 2.04, N = 3 SE +/- 1.85, N = 3 SE +/- 2.14, N = 3 110.05 190.95 232.17 596.23 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512 HC HBv2 HBv3 HBv4 130 260 390 520 650 SE +/- 0.10, N = 3 SE +/- 1.02, N = 3 SE +/- 0.15, N = 3 SE +/- 2.49, N = 3 110.20 189.21 233.80 590.93 1. (CXX) g++ options: -O3 -pthread
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512 HC HBv2 HBv3 HBv4 130 260 390 520 650 SE +/- 0.18, N = 3 SE +/- 1.39, N = 3 SE +/- 2.91, N = 3 SE +/- 4.23, N = 3 113.94 191.14 257.42 624.95 1. (CXX) g++ options: -O3 -pthread
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C HC HBv2 HBv3 HBv4 16K 32K 48K 64K 80K SE +/- 218.98, N = 3 SE +/- 778.45, N = 15 SE +/- 503.29, N = 3 SE +/- 599.32, N = 3 27619.05 36367.35 36681.43 74101.94 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only HC HBv2 HBv3 HBv4 0.6998 1.3996 2.0994 2.7992 3.499 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 4 SE +/- 0.03, N = 3 1.85 2.03 1.72 3.11
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig HC HBv2 HBv3 HBv4 3 6 9 12 15 SE +/- 0.017495, N = 3 SE +/- 0.009233, N = 3 SE +/- 0.006682, N = 3 SE +/- 0.029043, N = 3 10.645480 3.466885 3.649317 2.122074 1. (CXX) g++ options: -fopenmp -pthread -lmpi
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C HC HBv2 HBv3 HBv4 50K 100K 150K 200K 250K SE +/- 131.36, N = 3 SE +/- 320.45, N = 3 SE +/- 339.33, N = 3 SE +/- 1773.50, N = 3 55288.19 98485.23 102122.36 230164.79 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Phoronix Test Suite v10.8.4