Microsoft Azure HBv4 HPC Performance Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

HTML result view exported from: https://openbenchmarking.org/result/2308011-PTS-AZUREHBV71&sor&grw.

Microsoft Azure HBv4 HPC Performance BenchmarksProcessorMotherboardMemoryDiskGraphicsOSKernelCompilerFile-SystemScreen ResolutionSystem LayerHCHBv2HBv3HBv42 x Intel Xeon Platinum 8168 (44 Cores)Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS)1 GB + 60928 MB + 118272 MB + 176 GB32GB Virtual Disk + 752GB Virtual Diskhyperv_fbAlmaLinux 8.84.18.0-425.3.1.el8.x86_64 (x86_64)GCC 13.1.0 + CUDA 12.1nfs1024x768microsoft2 x AMD EPYC 7V12 64-Core (120 Cores)1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 7V73X 64-Core (120 Cores)2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 9V33X 96-Core (176 Cores)1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual DiskOpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysEnvironment Details- CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native"Compiler Details- --disable-multilib --enable-checking=releaseProcessor Details- CPU Microcode: 0xffffffffPython Details- Python 3.6.8Security Details- HC: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown - HBv2: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv3: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv4: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Microsoft Azure HBv4 HPC Performance Benchmarkslibxsmm: 32hpcg: 144 144 144 - 60heffte: r2c - FFTW - float-long - 512heffte: c2c - Stock - float-long - 256npb: BT.Cheffte: c2c - FFTW - float-long - 512heffte: c2c - Stock - float - 512heffte: r2c - FFTW - double - 512npb: CG.Cheffte: c2c - FFTW - double - 512npb: FT.Cnpb: IS.Dnpb: MG.Cnpb: SP.Cnamd: ATPase Simulation - 327,506 Atomsonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUlibxsmm: 128heffte: r2c - FFTW - float - 512libxsmm: 256heffte: c2c - Stock - float - 256heffte: c2c - Stock - double - 512onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUheffte: r2c - Stock - double - 512libxsmm: 64hpcg: 160 160 160 - 60heffte: r2c - Stock - double - 256laghos: Triple Point Problemheffte: c2c - FFTW - float-long - 256laghos: Sedov Blast Wave, ube_922_hex.meshheffte: r2c - FFTW - float-long - 256heffte: c2c - FFTW - float - 256heffte: c2c - FFTW - double-long - 512heffte: c2c - FFTW - float - 512heffte: c2c - Stock - float-long - 512heffte: r2c - FFTW - double - 256mt-dgemm: Sustained Floating-Point Rateheffte: r2c - Stock - float - 512pennant: sedovbigpennant: leblancbigheffte: r2c - Stock - double-long - 512heffte: r2c - Stock - double-long - 256heffte: c2c - Stock - double-long - 512heffte: r2c - Stock - float-long - 512compress-7zip: Compression Ratingcompress-7zip: Decompression Ratingheffte: r2c - FFTW - double-long - 256blender: BMW27 - CPU-Onlyblender: Classroom - CPU-Onlyblender: Fishy Cat - CPU-Onlyblender: Barbershop - CPU-Onlyblender: Pabellon Barcelona - CPU-Onlyoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyospray: particle_volume/ao/real_timeospray: particle_volume/scivis/real_timeospray: particle_volume/pathtracer/real_timeheffte: r2c - FFTW - double-long - 512ospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/pathtracer/real_timebuild-nodejs: Time To Compileliquid-dsp: 128 - 256 - 57liquid-dsp: 176 - 256 - 32liquid-dsp: 176 - 256 - 57liquid-dsp: 176 - 256 - 512pgbench: 1 - 500 - Read Onlypgbench: 1 - 500 - Read Only - Average Latencypgbench: 1 - 800 - Read Onlypgbench: 1 - 800 - Read Only - Average Latencypetsc: Streamshpcg: 104 104 104 - 60HCHBv2HBv3HBv4384.925.8659113.94059.5527106230.5262.902757.764360.880427619.0533.519355288.191864.6863404.0141543.940.52697707.3221284.8114.025904.159.729231.5718442.47159.8216748.125.563560.5727156.5258.5498247.49122.77258.356733.554562.975057.920357.310114.072027110.04925.0195610.6454859.895460.887231.5846110.19721645115084157.129049.95138.5171.76526.93175.071.851.850.878.996188.8783196.763060.82049.522939.0268910.0611330.61315706333331536633333168303333354462666713535100.36911594920.690151286.249125.9971164.836.0866191.14192.1290241509.8896.494193.792391.480236367.3547.605098485.233977.02108985.72104771.900.265051367.731011.4191.7751128.391.260146.9794910.93794.5301331.436.016793.3137183.8290.7883345.14200.03591.538347.369695.880193.257391.91866.395415190.9495.9158053.46688595.198992.388346.9289189.20850153438857788.608119.5850.9526.43211.4664.842.032.010.9622.366822.1747162.44991.42968.668888.3232313.9416194.36743091333334275533333435010000092424333324673280.20324813200.323197895.471737.04101438.138.9739257.419105.361313813.98135.950123.242121.28336681.4357.3307102122.365730.01131635.41205795.590.27111886.8102273.5254.2522045.7103.40956.2161529.973117.7312413.739.1106102.7046192.74105.093361.81221.861103.514757.2263135.694124.595103.245725.048352232.1666.2771073.649317118.236105.500356.2690233.797566595406516106.63219.4350.7125.59188.9662.901.721.690.8024.471024.2197167.504120.95711.750111.172314.6088185.56742169666673864000000428153333381495000024347490.20624789170.323284001.916239.60936163.088.5160624.951247.725744413.90355.512323.356314.33674101.94159.175230164.7912967.37437417.16427298.990.14380533.4946655.2622.5806908.6244.342154.648411.234311.8035898.287.9013264.954228.15255.968402.94427.101256.349159.258355.855323.696261.90352.802440596.2263.5813912.122074311.267258.716154.568590.9251083523742859273.12110.1125.6113.7497.5233.013.113.081.3236.654836.5446208.050315.98238.076937.062432.5839150.558541290000061817666677095033333222196666731618480.15831461730.254598417.695789.3840OpenBenchmarking.org

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HBv4HBv3HCHBv213002600390052006500SE +/- 87.98, N = 3SE +/- 38.99, N = 12SE +/- 3.15, N = 9SE +/- 1.72, N = 36163.01438.1384.9164.81. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

High Performance Conjugate Gradient

X Y Z: 144 144 144 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.11, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.05, N = 388.5238.9736.0925.871. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 4.23, N = 3SE +/- 2.91, N = 3SE +/- 1.39, N = 3SE +/- 0.18, N = 3624.95257.42191.14113.941. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC50100150200250SE +/- 4.85, N = 15SE +/- 1.07, N = 6SE +/- 1.33, N = 3SE +/- 0.27, N = 3247.73105.3692.1359.551. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHBv4HBv3HBv2HC160K320K480K640K800KSE +/- 6061.11, N = 3SE +/- 2034.04, N = 3SE +/- 108.10, N = 3SE +/- 62.47, N = 3744413.90313813.98241509.88106230.521. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC80160240320400SE +/- 1.18, N = 3SE +/- 0.58, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 3355.51135.9596.4962.901. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.80, N = 3SE +/- 0.73, N = 3SE +/- 0.34, N = 3SE +/- 0.02, N = 3323.36123.2493.7957.761. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.50, N = 3SE +/- 0.86, N = 3SE +/- 0.15, N = 3SE +/- 0.05, N = 3314.34121.2891.4860.881. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: CG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHBv4HBv3HBv2HC16K32K48K64K80KSE +/- 599.32, N = 3SE +/- 503.29, N = 3SE +/- 778.45, N = 15SE +/- 218.98, N = 374101.9436681.4336367.3527619.051. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HBv3HBv2HC4080120160200SE +/- 0.34, N = 3SE +/- 0.07, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3159.1857.3347.6133.521. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHBv4HBv3HBv2HC50K100K150K200K250KSE +/- 1773.50, N = 3SE +/- 339.33, N = 3SE +/- 320.45, N = 3SE +/- 131.36, N = 3230164.79102122.3698485.2355288.191. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: IS.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHBv4HBv3HBv2HC3K6K9K12K15KSE +/- 308.75, N = 15SE +/- 67.99, N = 4SE +/- 35.84, N = 7SE +/- 7.55, N = 312967.375730.013977.021864.681. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: MG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHBv4HBv3HBv2HC90K180K270K360K450KSE +/- 5249.92, N = 15SE +/- 1313.15, N = 15SE +/- 768.30, N = 3SE +/- 149.23, N = 3437417.16131635.41108985.7263404.011. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHBv4HBv3HBv2HC90K180K270K360K450KSE +/- 2970.97, N = 15SE +/- 1576.20, N = 3SE +/- 324.54, N = 3SE +/- 105.69, N = 3427298.99205795.59104771.9041543.941. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHBv4HBv2HBv3HC0.11860.23720.35580.47440.593SE +/- 0.00011, N = 3SE +/- 0.00069, N = 3SE +/- 0.00015, N = 3SE +/- 0.00060, N = 30.143800.265050.271110.52697

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHBv4HCHBv3HBv230060090012001500SE +/- 1.90, N = 3SE +/- 1.51, N = 3SE +/- 6.66, N = 3SE +/- 13.52, N = 15533.49707.32886.811367.73MIN: 518.68MIN: 687.14MIN: 849.061. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HBv4HBv3HCHBv214002800420056007000SE +/- 59.23, N = 3SE +/- 20.51, N = 9SE +/- 13.64, N = 15SE +/- 169.50, N = 96655.22273.51284.81011.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.25, N = 3SE +/- 2.52, N = 6SE +/- 1.03, N = 3SE +/- 0.09, N = 3622.58254.25191.78114.031. (CXX) g++ options: -O3 -pthread

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HBv4HBv3HBv2HC15003000450060007500SE +/- 57.85, N = 9SE +/- 25.11, N = 4SE +/- 17.53, N = 9SE +/- 23.39, N = 96908.62045.71128.3904.11. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HBv4HBv3HBv2HC50100150200250SE +/- 3.04, N = 4SE +/- 0.77, N = 15SE +/- 0.61, N = 15SE +/- 0.02, N = 3244.34103.4191.2659.731. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HBv3HBv2HC306090120150SE +/- 0.27, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 3154.6556.2246.9831.571. (CXX) g++ options: -O3 -pthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHBv4HCHBv3HBv22004006008001000SE +/- 3.60, N = 8SE +/- 1.89, N = 3SE +/- 4.36, N = 3SE +/- 9.54, N = 15411.23442.47529.97910.94MIN: 429.93MIN: 469.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 1.60, N = 3SE +/- 0.40, N = 3SE +/- 0.25, N = 3SE +/- 0.05, N = 3311.80117.7394.5359.821. (CXX) g++ options: -O3 -pthread

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HBv4HBv3HCHBv213002600390052006500SE +/- 74.65, N = 3SE +/- 8.24, N = 3SE +/- 7.70, N = 3SE +/- 2.64, N = 155898.22413.7748.1331.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

High Performance Conjugate Gradient

X Y Z: 160 160 160 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.12, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 387.9039.1136.0225.561. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 4.27, N = 12SE +/- 0.80, N = 15SE +/- 1.10, N = 4SE +/- 0.08, N = 3264.95102.7093.3160.571. (CXX) g++ options: -O3 -pthread

Laghos

Test: Triple Point Problem

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHBv4HBv3HBv2HC50100150200250SE +/- 1.25, N = 3SE +/- 0.38, N = 3SE +/- 0.57, N = 3SE +/- 0.08, N = 3228.15192.74183.82156.521. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 3.64, N = 15SE +/- 1.13, N = 3SE +/- 0.74, N = 15SE +/- 0.16, N = 3255.97105.0990.7958.551. (CXX) g++ options: -O3 -pthread

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHBv4HBv3HBv2HC90180270360450SE +/- 0.78, N = 3SE +/- 0.15, N = 3SE +/- 3.57, N = 5SE +/- 1.35, N = 3402.94361.81345.14247.491. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC90180270360450SE +/- 10.91, N = 15SE +/- 3.45, N = 15SE +/- 3.34, N = 12SE +/- 0.53, N = 3427.10221.86200.04122.771. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 1.07, N = 3SE +/- 1.41, N = 15SE +/- 0.67, N = 15SE +/- 0.07, N = 3256.35103.5191.5458.361. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC4080120160200SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3159.2657.2347.3733.551. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HBv3HBv2HC80160240320400SE +/- 1.24, N = 3SE +/- 0.93, N = 3SE +/- 0.47, N = 3SE +/- 0.04, N = 3355.86135.6995.8862.981. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.96, N = 3SE +/- 0.05, N = 3SE +/- 0.23, N = 3SE +/- 0.06, N = 3323.70124.6093.2657.921. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 5.66, N = 15SE +/- 0.75, N = 15SE +/- 1.31, N = 3SE +/- 0.25, N = 3261.90103.2591.9257.311. (CXX) g++ options: -O3 -pthread

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHBv4HBv3HCHBv21224364860SE +/- 0.581762, N = 5SE +/- 0.146977, N = 3SE +/- 0.474074, N = 12SE +/- 0.275809, N = 1252.80244025.04835214.0720276.3954151. (CC) gcc options: -O3 -march=native -fopenmp

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.14, N = 3SE +/- 1.85, N = 3SE +/- 2.04, N = 3SE +/- 0.06, N = 3596.23232.17190.95110.051. (CXX) g++ options: -O3 -pthread

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHBv4HBv2HBv3HC612182430SE +/- 0.018282, N = 3SE +/- 0.011742, N = 3SE +/- 0.027453, N = 3SE +/- 0.026763, N = 33.5813915.9158056.27710725.0195601. (CXX) g++ options: -fopenmp -pthread -lmpi

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHBv4HBv2HBv3HC3691215SE +/- 0.029043, N = 3SE +/- 0.009233, N = 3SE +/- 0.006682, N = 3SE +/- 0.017495, N = 32.1220743.4668853.64931710.6454801. (CXX) g++ options: -fopenmp -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.81, N = 3SE +/- 0.49, N = 3SE +/- 0.16, N = 3SE +/- 0.03, N = 3311.27118.2495.2059.901. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 2.84, N = 15SE +/- 0.81, N = 15SE +/- 1.27, N = 3SE +/- 0.19, N = 3258.72105.5092.3960.891. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC306090120150SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.02, N = 3154.5756.2746.9331.581. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.49, N = 3SE +/- 0.15, N = 3SE +/- 1.02, N = 3SE +/- 0.10, N = 3590.93233.80189.21110.201. (CXX) g++ options: -O3 -pthread

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHBv4HBv3HBv2HC200K400K600K800K1000KSE +/- 4158.65, N = 3SE +/- 7198.45, N = 3SE +/- 3504.63, N = 3SE +/- 672.17, N = 310835235665955015342164511. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHBv4HBv3HBv2HC160K320K480K640K800KSE +/- 8621.97, N = 3SE +/- 3365.82, N = 3SE +/- 10621.28, N = 3SE +/- 300.63, N = 37428594065163885771508411. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 4.03, N = 14SE +/- 1.05, N = 3SE +/- 1.12, N = 15SE +/- 0.12, N = 3273.12106.6388.6157.131. (CXX) g++ options: -O3 -pthread

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHBv4HBv3HBv2HC1122334455SE +/- 0.08, N = 3SE +/- 0.10, N = 3SE +/- 0.16, N = 3SE +/- 0.36, N = 310.1119.4319.5849.95

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHBv4HBv3HBv2HC306090120150SE +/- 0.11, N = 3SE +/- 0.06, N = 3SE +/- 0.15, N = 3SE +/- 0.04, N = 325.6150.7150.95138.51

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHBv4HBv3HBv2HC1632486480SE +/- 0.09, N = 3SE +/- 0.15, N = 3SE +/- 0.04, N = 3SE +/- 0.23, N = 313.7425.5926.4371.76

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHBv4HBv3HBv2HC110220330440550SE +/- 0.47, N = 3SE +/- 0.38, N = 3SE +/- 0.22, N = 3SE +/- 1.15, N = 397.52188.96211.46526.93

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHBv4HBv3HBv2HC4080120160200SE +/- 0.12, N = 3SE +/- 0.45, N = 3SE +/- 0.28, N = 3SE +/- 0.33, N = 333.0162.9064.84175.07

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv4HBv2HCHBv30.69981.39962.09942.79923.499SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 43.112.031.851.72

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv4HBv2HCHBv30.6931.3862.0792.7723.465SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 153.082.011.851.69

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHBv4HBv2HCHBv30.2970.5940.8911.1881.485SE +/- 0.01, N = 3SE +/- 0.01, N = 15SE +/- 0.00, N = 3SE +/- 0.01, N = 31.320.960.870.80

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.04011, N = 3SE +/- 0.00987, N = 3SE +/- 0.00858, N = 3SE +/- 0.01510, N = 336.6548024.4710022.366808.99618

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.05762, N = 3SE +/- 0.00564, N = 3SE +/- 0.02944, N = 3SE +/- 0.05412, N = 336.5446024.2197022.174708.87831

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHBv4HBv3HBv2HC50100150200250SE +/- 0.81, N = 3SE +/- 1.50, N = 7SE +/- 0.83, N = 3SE +/- 7.22, N = 9208.05167.50162.4596.76

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 1.65, N = 3SE +/- 0.04, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 3315.98120.9691.4360.821. (CXX) g++ options: -O3 -pthread

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHBv4HBv3HCHBv2918273645SE +/- 0.02835, N = 3SE +/- 0.01464, N = 3SE +/- 0.03191, N = 3SE +/- 0.15055, N = 1538.0769011.750109.522938.66888

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHBv4HBv3HCHBv2918273645SE +/- 0.12574, N = 3SE +/- 0.02977, N = 3SE +/- 0.01641, N = 3SE +/- 0.13284, N = 1537.0624011.172309.026898.32323

OSPRay

Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.08, N = 3SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 332.5814.6113.9410.06

Timed Node.js Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHBv4HBv3HBv2HC70140210280350SE +/- 2.23, N = 12SE +/- 1.46, N = 3SE +/- 1.32, N = 3SE +/- 2.37, N = 3150.56185.57194.37330.61

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HBv4HBv2HBv3HC1200M2400M3600M4800M6000MSE +/- 24008123.63, N = 3SE +/- 14518991.39, N = 3SE +/- 6263474.36, N = 3SE +/- 4733333.33, N = 354129000004309133333421696666715706333331. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HBv4HBv2HBv3HC1300M2600M3900M5200M6500MSE +/- 6999365.05, N = 3SE +/- 25439885.57, N = 3SE +/- 2858321.19, N = 3SE +/- 8873431.00, N = 361817666674275533333386400000015366333331. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HBv4HBv2HBv3HC1500M3000M4500M6000M7500MSE +/- 36788419.07, N = 3SE +/- 8195730.60, N = 3SE +/- 8996542.55, N = 3SE +/- 7033807.25, N = 370950333334350100000428153333316830333331. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HBv4HBv2HBv3HC500M1000M1500M2000M2500MSE +/- 5336145.09, N = 3SE +/- 3265385.80, N = 3SE +/- 1919487.78, N = 3SE +/- 2270626.44, N = 322219666679242433338149500005446266671. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHBv4HBv2HBv3HC700K1400K2100K2800K3500KSE +/- 3042.04, N = 3SE +/- 4710.42, N = 3SE +/- 28428.57, N = 4SE +/- 2849.38, N = 331618482467328243474913535101. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHBv4HBv2HBv3HC0.0830.1660.2490.3320.415SE +/- 0.000, N = 3SE +/- 0.000, N = 3SE +/- 0.002, N = 4SE +/- 0.001, N = 30.1580.2030.2060.3691. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHBv4HBv2HBv3HC700K1400K2100K2800K3500KSE +/- 2972.36, N = 3SE +/- 9212.17, N = 3SE +/- 13675.06, N = 3SE +/- 2818.34, N = 331461732481320247891711594921. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHBv4HBv2HBv3HC0.15530.31060.46590.62120.7765SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.002, N = 3SE +/- 0.002, N = 30.2540.3230.3230.6901. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PETSc

Test: Streams

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHBv4HBv3HBv2HC130K260K390K520K650KSE +/- 46271.80, N = 9SE +/- 2674.31, N = 7SE +/- 12025.83, N = 6SE +/- 256.75, N = 3598417.70284001.92197895.47151286.251. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64

High Performance Conjugate Gradient

X Y Z: 104 104 104 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.26, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 389.3839.6137.0426.001. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi


Phoronix Test Suite v10.8.5