Microsoft Azure HBv4 HPC Performance Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

HTML result view exported from: https://openbenchmarking.org/result/2308011-PTS-AZUREHBV71&grs.

Microsoft Azure HBv4 HPC Performance BenchmarksProcessorMotherboardMemoryDiskGraphicsOSKernelCompilerFile-SystemScreen ResolutionSystem LayerHCHBv2HBv3HBv42 x Intel Xeon Platinum 8168 (44 Cores)Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS)1 GB + 60928 MB + 118272 MB + 176 GB32GB Virtual Disk + 752GB Virtual Diskhyperv_fbAlmaLinux 8.84.18.0-425.3.1.el8.x86_64 (x86_64)GCC 13.1.0 + CUDA 12.1nfs1024x768microsoft2 x AMD EPYC 7V12 64-Core (120 Cores)1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 7V73X 64-Core (120 Cores)2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 9V33X 96-Core (176 Cores)1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual DiskOpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysEnvironment Details- CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native"Compiler Details- --disable-multilib --enable-checking=releaseProcessor Details- CPU Microcode: 0xffffffffPython Details- Python 3.6.8Security Details- HC: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown - HBv2: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv3: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv4: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Microsoft Azure HBv4 HPC Performance Benchmarksnpb: BT.Cpennant: sedovbignpb: MG.Cheffte: c2c - FFTW - float-long - 512heffte: c2c - FFTW - float - 512heffte: c2c - Stock - float - 512heffte: c2c - Stock - float-long - 512heffte: r2c - FFTW - float-long - 512heffte: r2c - FFTW - float - 512heffte: r2c - Stock - float - 512blender: Classroom - CPU-Onlyblender: Barbershop - CPU-Onlyheffte: r2c - Stock - float-long - 512blender: Pabellon Barcelona - CPU-Onlyblender: Fishy Cat - CPU-Onlyheffte: r2c - Stock - double - 512heffte: r2c - Stock - double-long - 512heffte: r2c - FFTW - double-long - 512heffte: r2c - FFTW - double - 512pennant: leblancbigcompress-7zip: Compression Ratingblender: BMW27 - CPU-Onlycompress-7zip: Decompression Ratingheffte: c2c - Stock - double - 512heffte: c2c - Stock - double-long - 512heffte: r2c - FFTW - double-long - 256heffte: c2c - FFTW - double - 512heffte: c2c - FFTW - double-long - 512heffte: c2c - FFTW - float - 256heffte: r2c - Stock - double - 256heffte: c2c - FFTW - float-long - 256heffte: r2c - Stock - double-long - 256liquid-dsp: 176 - 256 - 57npb: FT.Cospray: particle_volume/scivis/real_timeheffte: c2c - Stock - float - 256liquid-dsp: 176 - 256 - 512ospray: particle_volume/ao/real_timeliquid-dsp: 176 - 256 - 32namd: ATPase Simulation - 327,506 Atomsliquid-dsp: 128 - 256 - 57hpcg: 160 160 160 - 60hpcg: 104 104 104 - 60hpcg: 144 144 144 - 60ospray: gravity_spheres_volume/dim_512/pathtracer/real_timepgbench: 1 - 800 - Read Only - Average Latencypgbench: 1 - 800 - Read Onlyonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUpgbench: 1 - 500 - Read Onlypgbench: 1 - 500 - Read Only - Average Latencyonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUbuild-nodejs: Time To Compilelibxsmm: 64npb: SP.Coidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlylaghos: Sedov Blast Wave, ube_922_hex.meshlaghos: Triple Point Problempetsc: Streamsospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timeospray: particle_volume/pathtracer/real_timemt-dgemm: Sustained Floating-Point Rateheffte: c2c - Stock - float-long - 256heffte: r2c - FFTW - float-long - 256heffte: r2c - FFTW - double - 256libxsmm: 32libxsmm: 256libxsmm: 128npb: IS.Dnpb: CG.CHCHBv2HBv3HBv4106230.5225.0195663404.0162.902762.975057.764357.9203113.940114.025110.049138.51526.93110.197175.0771.7659.821659.895460.820460.880410.6454821645149.9515084131.571831.584657.129033.519333.554558.356760.572758.549860.8872168303333355288.198.8783159.72925446266678.9961815366333330.52697157063333325.563525.997125.865910.06110.6901159492707.32213535100.369442.471330.613748.141543.941.851.850.87247.49156.52151286.24919.026899.5229396.763014.07202759.5527122.77257.3101384.9904.11284.81864.6827619.05241509.885.915805108985.7296.494195.880193.792393.2573191.141191.775190.94950.95211.46189.20864.8426.4394.530195.198991.429691.48023.46688550153419.5838857746.979446.928988.608147.605047.369691.538393.313790.788392.3883435010000098485.2322.174791.260192424333322.366842755333330.26505430913333336.016737.041036.086613.94160.32324813201367.7324673280.203910.937194.367331.4104771.902.012.030.96345.14183.82197895.47178.323238.66888162.4496.39541592.1290200.03591.9186164.81128.31011.43977.0236367.35313813.986.277107131635.41135.950135.694123.242124.595257.419254.252232.16650.71188.96233.79762.9025.59117.731118.236120.957121.2833.64931756659519.4340651656.216156.2690106.63257.330757.2263103.5147102.7046105.093105.50034281533333102122.3624.2197103.40981495000024.471038640000000.27111421696666739.110639.609338.973914.60880.3232478917886.81024347490.206529.973185.5672413.7205795.591.691.720.80361.81192.74284001.916211.172311.7501167.50425.048352105.361221.861103.24571438.12045.72273.55730.0136681.43744413.903.581391437417.16355.512355.855323.356323.696624.951622.580596.22625.6197.52590.92533.0113.74311.803311.267315.982314.3362.122074108352310.11742859154.648154.568273.121159.175159.258256.349264.954255.968258.7167095033333230164.7936.5446244.342222196666736.654861817666670.14380541290000087.901389.384088.516032.58390.2543146173533.49431618480.158411.234150.5585898.2427298.993.083.111.32402.94228.15598417.695737.062438.0769208.05052.802440247.725427.101261.9036163.06908.66655.212967.3774101.94OpenBenchmarking.org

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHCHBv2HBv3HBv4160K320K480K640K800KSE +/- 62.47, N = 3SE +/- 108.10, N = 3SE +/- 2034.04, N = 3SE +/- 6061.11, N = 3106230.52241509.88313813.98744413.901. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHCHBv2HBv3HBv4612182430SE +/- 0.026763, N = 3SE +/- 0.011742, N = 3SE +/- 0.027453, N = 3SE +/- 0.018282, N = 325.0195605.9158056.2771073.5813911. (CXX) g++ options: -fopenmp -pthread -lmpi

NAS Parallel Benchmarks

Test / Class: MG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHCHBv2HBv3HBv490K180K270K360K450KSE +/- 149.23, N = 3SE +/- 768.30, N = 3SE +/- 1313.15, N = 15SE +/- 5249.92, N = 1563404.01108985.72131635.41437417.161. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv480160240320400SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.58, N = 3SE +/- 1.18, N = 362.9096.49135.95355.511. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HCHBv2HBv3HBv480160240320400SE +/- 0.04, N = 3SE +/- 0.47, N = 3SE +/- 0.93, N = 3SE +/- 1.24, N = 362.9895.88135.69355.861. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.02, N = 3SE +/- 0.34, N = 3SE +/- 0.73, N = 3SE +/- 0.80, N = 357.7693.79123.24323.361. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.06, N = 3SE +/- 0.23, N = 3SE +/- 0.05, N = 3SE +/- 0.96, N = 357.9293.26124.60323.701. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.18, N = 3SE +/- 1.39, N = 3SE +/- 2.91, N = 3SE +/- 4.23, N = 3113.94191.14257.42624.951. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.09, N = 3SE +/- 1.03, N = 3SE +/- 2.52, N = 6SE +/- 2.25, N = 3114.03191.78254.25622.581. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.06, N = 3SE +/- 2.04, N = 3SE +/- 1.85, N = 3SE +/- 2.14, N = 3110.05190.95232.17596.231. (CXX) g++ options: -O3 -pthread

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHCHBv2HBv3HBv4306090120150SE +/- 0.04, N = 3SE +/- 0.15, N = 3SE +/- 0.06, N = 3SE +/- 0.11, N = 3138.5150.9550.7125.61

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHCHBv2HBv3HBv4110220330440550SE +/- 1.15, N = 3SE +/- 0.22, N = 3SE +/- 0.38, N = 3SE +/- 0.47, N = 3526.93211.46188.9697.52

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.10, N = 3SE +/- 1.02, N = 3SE +/- 0.15, N = 3SE +/- 2.49, N = 3110.20189.21233.80590.931. (CXX) g++ options: -O3 -pthread

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHCHBv2HBv3HBv44080120160200SE +/- 0.33, N = 3SE +/- 0.28, N = 3SE +/- 0.45, N = 3SE +/- 0.12, N = 3175.0764.8462.9033.01

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHCHBv2HBv3HBv41632486480SE +/- 0.23, N = 3SE +/- 0.04, N = 3SE +/- 0.15, N = 3SE +/- 0.09, N = 371.7626.4325.5913.74

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.05, N = 3SE +/- 0.25, N = 3SE +/- 0.40, N = 3SE +/- 1.60, N = 359.8294.53117.73311.801. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.03, N = 3SE +/- 0.16, N = 3SE +/- 0.49, N = 3SE +/- 0.81, N = 359.9095.20118.24311.271. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.06, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 3SE +/- 1.65, N = 360.8291.43120.96315.981. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.05, N = 3SE +/- 0.15, N = 3SE +/- 0.86, N = 3SE +/- 0.50, N = 360.8891.48121.28314.341. (CXX) g++ options: -O3 -pthread

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHCHBv2HBv3HBv43691215SE +/- 0.017495, N = 3SE +/- 0.009233, N = 3SE +/- 0.006682, N = 3SE +/- 0.029043, N = 310.6454803.4668853.6493172.1220741. (CXX) g++ options: -fopenmp -pthread -lmpi

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHCHBv2HBv3HBv4200K400K600K800K1000KSE +/- 672.17, N = 3SE +/- 3504.63, N = 3SE +/- 7198.45, N = 3SE +/- 4158.65, N = 321645150153456659510835231. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHCHBv2HBv3HBv41122334455SE +/- 0.36, N = 3SE +/- 0.16, N = 3SE +/- 0.10, N = 3SE +/- 0.08, N = 349.9519.5819.4310.11

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHCHBv2HBv3HBv4160K320K480K640K800KSE +/- 300.63, N = 3SE +/- 10621.28, N = 3SE +/- 3365.82, N = 3SE +/- 8621.97, N = 31508413885774065167428591. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HCHBv2HBv3HBv4306090120150SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.27, N = 331.5746.9856.22154.651. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv4306090120150SE +/- 0.02, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 331.5846.9356.27154.571. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.12, N = 3SE +/- 1.12, N = 15SE +/- 1.05, N = 3SE +/- 4.03, N = 1457.1388.61106.63273.121. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HCHBv2HBv3HBv44080120160200SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.34, N = 333.5247.6157.33159.181. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv44080120160200SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 333.5547.3757.23159.261. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.07, N = 3SE +/- 0.67, N = 15SE +/- 1.41, N = 15SE +/- 1.07, N = 358.3691.54103.51256.351. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.08, N = 3SE +/- 1.10, N = 4SE +/- 0.80, N = 15SE +/- 4.27, N = 1260.5793.31102.70264.951. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.16, N = 3SE +/- 0.74, N = 15SE +/- 1.13, N = 3SE +/- 3.64, N = 1558.5590.79105.09255.971. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.19, N = 3SE +/- 1.27, N = 3SE +/- 0.81, N = 15SE +/- 2.84, N = 1560.8992.39105.50258.721. (CXX) g++ options: -O3 -pthread

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HCHBv2HBv3HBv41500M3000M4500M6000M7500MSE +/- 7033807.25, N = 3SE +/- 8195730.60, N = 3SE +/- 8996542.55, N = 3SE +/- 36788419.07, N = 316830333334350100000428153333370950333331. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHCHBv2HBv3HBv450K100K150K200K250KSE +/- 131.36, N = 3SE +/- 320.45, N = 3SE +/- 339.33, N = 3SE +/- 1773.50, N = 355288.1998485.23102122.36230164.791. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.05412, N = 3SE +/- 0.02944, N = 3SE +/- 0.00564, N = 3SE +/- 0.05762, N = 38.8783122.1747024.2197036.54460

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HCHBv2HBv3HBv450100150200250SE +/- 0.02, N = 3SE +/- 0.61, N = 15SE +/- 0.77, N = 15SE +/- 3.04, N = 459.7391.26103.41244.341. (CXX) g++ options: -O3 -pthread

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HCHBv2HBv3HBv4500M1000M1500M2000M2500MSE +/- 2270626.44, N = 3SE +/- 3265385.80, N = 3SE +/- 1919487.78, N = 3SE +/- 5336145.09, N = 354462666792424333381495000022219666671. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.01510, N = 3SE +/- 0.00858, N = 3SE +/- 0.00987, N = 3SE +/- 0.04011, N = 38.9961822.3668024.4710036.65480

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv41300M2600M3900M5200M6500MSE +/- 8873431.00, N = 3SE +/- 25439885.57, N = 3SE +/- 2858321.19, N = 3SE +/- 6999365.05, N = 315366333334275533333386400000061817666671. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHCHBv2HBv3HBv40.11860.23720.35580.47440.593SE +/- 0.00060, N = 3SE +/- 0.00069, N = 3SE +/- 0.00015, N = 3SE +/- 0.00011, N = 30.526970.265050.271110.14380

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HCHBv2HBv3HBv41200M2400M3600M4800M6000MSE +/- 4733333.33, N = 3SE +/- 14518991.39, N = 3SE +/- 6263474.36, N = 3SE +/- 24008123.63, N = 315706333334309133333421696666754129000001. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

High Performance Conjugate Gradient

X Y Z: 160 160 160 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.06, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.12, N = 325.5636.0239.1187.901. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

High Performance Conjugate Gradient

X Y Z: 104 104 104 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.26, N = 326.0037.0439.6189.381. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

High Performance Conjugate Gradient

X Y Z: 144 144 144 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.11, N = 325.8736.0938.9788.521. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

OSPRay

Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.08, N = 310.0613.9414.6132.58

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHCHBv2HBv3HBv40.15530.31060.46590.62120.7765SE +/- 0.002, N = 3SE +/- 0.001, N = 3SE +/- 0.002, N = 3SE +/- 0.000, N = 30.6900.3230.3230.2541. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHCHBv2HBv3HBv4700K1400K2100K2800K3500KSE +/- 2818.34, N = 3SE +/- 9212.17, N = 3SE +/- 13675.06, N = 3SE +/- 2972.36, N = 311594922481320247891731461731. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHCHBv2HBv3HBv430060090012001500SE +/- 1.51, N = 3SE +/- 13.52, N = 15SE +/- 6.66, N = 3SE +/- 1.90, N = 3707.321367.73886.81533.49MIN: 687.14MIN: 849.06MIN: 518.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHCHBv2HBv3HBv4700K1400K2100K2800K3500KSE +/- 2849.38, N = 3SE +/- 4710.42, N = 3SE +/- 28428.57, N = 4SE +/- 3042.04, N = 313535102467328243474931618481. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHCHBv2HBv3HBv40.0830.1660.2490.3320.415SE +/- 0.001, N = 3SE +/- 0.000, N = 3SE +/- 0.002, N = 4SE +/- 0.000, N = 30.3690.2030.2060.1581. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHCHBv2HBv3HBv42004006008001000SE +/- 1.89, N = 3SE +/- 9.54, N = 15SE +/- 4.36, N = 3SE +/- 3.60, N = 8442.47910.94529.97411.23MIN: 429.93MIN: 469.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Timed Node.js Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHCHBv2HBv3HBv470140210280350SE +/- 2.37, N = 3SE +/- 1.32, N = 3SE +/- 1.46, N = 3SE +/- 2.23, N = 12330.61194.37185.57150.56

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HCHBv2HBv3HBv413002600390052006500SE +/- 7.70, N = 3SE +/- 2.64, N = 15SE +/- 8.24, N = 3SE +/- 74.65, N = 3748.1331.42413.75898.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHCHBv2HBv3HBv490K180K270K360K450KSE +/- 105.69, N = 3SE +/- 324.54, N = 3SE +/- 1576.20, N = 3SE +/- 2970.97, N = 1541543.94104771.90205795.59427298.991. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHCHBv2HBv3HBv40.6931.3862.0792.7723.465SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 15SE +/- 0.02, N = 31.852.011.693.08

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHCHBv2HBv3HBv40.69981.39962.09942.79923.499SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 4SE +/- 0.03, N = 31.852.031.723.11

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHCHBv2HBv3HBv40.2970.5940.8911.1881.485SE +/- 0.00, N = 3SE +/- 0.01, N = 15SE +/- 0.01, N = 3SE +/- 0.01, N = 30.870.960.801.32

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHCHBv2HBv3HBv490180270360450SE +/- 1.35, N = 3SE +/- 3.57, N = 5SE +/- 0.15, N = 3SE +/- 0.78, N = 3247.49345.14361.81402.941. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

Laghos

Test: Triple Point Problem

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHCHBv2HBv3HBv450100150200250SE +/- 0.08, N = 3SE +/- 0.57, N = 3SE +/- 0.38, N = 3SE +/- 1.25, N = 3156.52183.82192.74228.151. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

PETSc

Test: Streams

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHCHBv2HBv3HBv4130K260K390K520K650KSE +/- 256.75, N = 3SE +/- 12025.83, N = 6SE +/- 2674.31, N = 7SE +/- 46271.80, N = 9151286.25197895.47284001.92598417.701. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHCHBv2HBv3HBv4918273645SE +/- 0.01641, N = 3SE +/- 0.13284, N = 15SE +/- 0.02977, N = 3SE +/- 0.12574, N = 39.026898.3232311.1723037.06240

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHCHBv2HBv3HBv4918273645SE +/- 0.03191, N = 3SE +/- 0.15055, N = 15SE +/- 0.01464, N = 3SE +/- 0.02835, N = 39.522938.6688811.7501038.07690

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHCHBv2HBv3HBv450100150200250SE +/- 7.22, N = 9SE +/- 0.83, N = 3SE +/- 1.50, N = 7SE +/- 0.81, N = 396.76162.45167.50208.05

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHCHBv2HBv3HBv41224364860SE +/- 0.474074, N = 12SE +/- 0.275809, N = 12SE +/- 0.146977, N = 3SE +/- 0.581762, N = 514.0720276.39541525.04835252.8024401. (CC) gcc options: -O3 -march=native -fopenmp

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv450100150200250SE +/- 0.27, N = 3SE +/- 1.33, N = 3SE +/- 1.07, N = 6SE +/- 4.85, N = 1559.5592.13105.36247.731. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv490180270360450SE +/- 0.53, N = 3SE +/- 3.34, N = 12SE +/- 3.45, N = 15SE +/- 10.91, N = 15122.77200.04221.86427.101. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.25, N = 3SE +/- 1.31, N = 3SE +/- 0.75, N = 15SE +/- 5.66, N = 1557.3191.92103.25261.901. (CXX) g++ options: -O3 -pthread

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HCHBv2HBv3HBv413002600390052006500SE +/- 3.15, N = 9SE +/- 1.72, N = 3SE +/- 38.99, N = 12SE +/- 87.98, N = 3384.9164.81438.16163.01. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HCHBv2HBv3HBv415003000450060007500SE +/- 23.39, N = 9SE +/- 17.53, N = 9SE +/- 25.11, N = 4SE +/- 57.85, N = 9904.11128.32045.76908.61. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HCHBv2HBv3HBv414002800420056007000SE +/- 13.64, N = 15SE +/- 169.50, N = 9SE +/- 20.51, N = 9SE +/- 59.23, N = 31284.81011.42273.56655.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

NAS Parallel Benchmarks

Test / Class: IS.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHCHBv2HBv3HBv43K6K9K12K15KSE +/- 7.55, N = 3SE +/- 35.84, N = 7SE +/- 67.99, N = 4SE +/- 308.75, N = 151864.683977.025730.0112967.371. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: CG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHCHBv2HBv3HBv416K32K48K64K80KSE +/- 218.98, N = 3SE +/- 778.45, N = 15SE +/- 503.29, N = 3SE +/- 599.32, N = 327619.0536367.3536681.4374101.941. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi


Phoronix Test Suite v10.8.5