Microsoft Azure HBv4 HPC Comparison Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

HTML result view exported from: https://openbenchmarking.org/result/2307054-PTS-AZUREHPC63&grs&sro.

Microsoft Azure HBv4 HPC Comparison BenchmarksProcessorMotherboardMemoryDiskGraphicsOSKernelCompilerFile-SystemScreen ResolutionSystem LayerHCHBv2HBv3HBv42 x Intel Xeon Platinum 8168 (44 Cores)Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS)1 GB + 60928 MB + 118272 MB + 176 GB32GB Virtual Disk + 752GB Virtual Diskhyperv_fbAlmaLinux 8.74.18.0-425.3.1.el8.x86_64 (x86_64)GCC 8.5.0 20210514 + CUDA 12.1nfs1024x768microsoft2 x AMD EPYC 7V12 64-Core (120 Cores)1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 7V73X 64-Core (120 Cores)2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 9V33X 96-Core (176 Cores)1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual DiskAlmaLinux 8.8OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysCompiler Details- --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details- CPU Microcode: 0xffffffffPython Details- Python 3.6.8Security Details- HC: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown - HBv2: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv3: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv4: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Microsoft Azure HBv4 HPC Comparison Benchmarkspennant: sedovbigheffte: c2c - FFTW - float-long - 512heffte: c2c - FFTW - float - 512heffte: c2c - Stock - float - 512heffte: c2c - Stock - float-long - 512npb: MG.Cblender: Classroom - CPU-Onlyheffte: r2c - FFTW - float-long - 512heffte: r2c - FFTW - float - 512blender: Barbershop - CPU-Onlyheffte: r2c - Stock - float - 512heffte: r2c - Stock - float-long - 512npb: SP.Cblender: Pabellon Barcelona - CPU-Onlynpb: BT.Cheffte: r2c - Stock - double - 512blender: Fishy Cat - CPU-Onlyheffte: r2c - Stock - double-long - 512heffte: r2c - FFTW - double-long - 512heffte: r2c - FFTW - double - 512blender: BMW27 - CPU-Onlypennant: leblancbignpb: IS.Dcompress-7zip: Compression Ratingheffte: c2c - Stock - double - 512heffte: c2c - Stock - double-long - 512heffte: r2c - FFTW - double-long - 256heffte: c2c - FFTW - double - 512heffte: c2c - FFTW - double-long - 512ospray: gravity_spheres_volume/dim_512/scivis/real_timeheffte: c2c - FFTW - float - 256ospray: gravity_spheres_volume/dim_512/ao/real_timeheffte: r2c - Stock - double - 256heffte: c2c - FFTW - float-long - 256heffte: r2c - Stock - double-long - 256heffte: c2c - FFTW - double - 256heffte: c2c - Stock - float - 256ospray: particle_volume/ao/real_timeheffte: c2c - Stock - double-long - 256ospray: particle_volume/scivis/real_timeheffte: c2c - FFTW - double-long - 256liquid-dsp: 176 - 256 - 57heffte: c2c - Stock - double - 256liquid-dsp: 176 - 256 - 32liquid-dsp: 176 - 256 - 512namd: ATPase Simulation - 327,506 Atomshpcg: 160 160 160 - 60hpcg: 104 104 104 - 60hpcg: 144 144 144 - 60npb: FT.Cliquid-dsp: 128 - 256 - 57ospray: gravity_spheres_volume/dim_512/pathtracer/real_timeliquid-dsp: 128 - 256 - 32onednn: Deconvolution Batch shapes_3d - f32 - CPUpgbench: 1 - 800 - Read Onlypgbench: 1 - 800 - Read Only - Average Latencyonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - f32 - CPUpgbench: 1 - 500 - Read Only - Average Latencypgbench: 1 - 500 - Read Onlyonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUbuild-nodejs: Time To Compileonednn: Convolution Batch Shapes Auto - f32 - CPUliquid-dsp: 32 - 256 - 57onednn: IP Shapes 1D - f32 - CPUoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyremhos: Sample Remap Exampleoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlylaghos: Sedov Blast Wave, ube_922_hex.meshlaghos: Triple Point Problemliquid-dsp: 32 - 256 - 32build-linux-kernel: allmodconfigliquid-dsp: 1 - 256 - 32petsc: Streamsonednn: IP Shapes 3D - f32 - CPUcompress-7zip: Decompression Ratingospray: particle_volume/pathtracer/real_timemt-dgemm: Sustained Floating-Point Rateheffte: r2c - Stock - float-long - 256heffte: c2c - Stock - float-long - 256heffte: c2c - FFTW - double-long - 128heffte: r2c - FFTW - float-long - 256heffte: c2c - Stock - double - 128heffte: r2c - Stock - float - 256heffte: r2c - FFTW - double - 256heffte: c2c - FFTW - double - 128heffte: r2c - FFTW - float - 256libxsmm: 64libxsmm: 32libxsmm: 256libxsmm: 128npb: EP.Dnpb: CG.CHCHBv2HBv3HBv425.0195662.902762.975057.764357.920319508.00138.81113.940114.025524.86110.049110.19712907.54176.2128794.2859.821672.5759.895460.820460.880450.5310.645481181.4821073231.571831.584657.129033.519333.55458.9872358.35679.4942160.572758.549860.887230.119059.72928.9754730.26728.9702030.2175166473333330.166315661333335292133330.5265025.563525.997125.865920188.89157240000010.049015126000001.2448011618000.688707.322707.3530.3691354877450.247442.471330.6133.111217212909090.8824461.8427.3781.820.88247.49156.529644233331950.62631796333151286.24912.0792014819386.573414.340830131.96259.552758.9125122.77241.7345134.76057.310159.1442123.632731.6379.9898.81328.41642.0314356.205.91580596.494195.880193.792393.257343410.7150.86191.141191.775210.18190.949189.20832495.8964.1466829.1894.530126.1995.198991.429691.480219.463.4668851884.2248945646.979446.928988.608147.605047.36968.1235691.53838.6732793.313790.788392.388350.903291.260122.333650.075922.153351.1954410670000050.707040271000008256533330.2638536.016737.041036.086641977.69404593333313.915139259333331.6100224396500.3281367.731345.140.2032466249896.813910.937194.3670.57387811934000001.407582.0314.9312.081.04345.14183.8210614333331782.93333211667197895.47176.83825371044157.1335.899903211.41892.129061.1403200.03551.3955205.20691.918659.4244203.772411.7195.11444.21519.53222.8222314.026.277107135.950135.694123.242124.59546705.4751.08257.419254.252189.30232.166233.79731024.7662.6462427.86117.73125.47118.236120.957121.28319.493.6493172793.5555829056.216156.2690106.63257.330757.226311.1845103.514711.7485102.7046105.093105.500339.8117103.40924.458638.569424.173639.3709356343333338.446134195333337353700000.2711539.110639.609338.973936619.29351630000014.606733667333331.4086224076020.332886.810860.9750.2102375005533.496529.973185.5670.55674110860000000.9100911.6915.2561.680.79361.81192.749173366671889.46332817333284001.91620.624233397505168.24225.104876207.974105.36156.8693221.86150.6068214.063103.245759.3811198.6602435.61506.32032.12284.62879.0821551.483.581391355.512355.855323.356323.696108125.8625.26624.951622.58096.77596.226590.92568819.3433.40151067.81311.80313.96311.267315.982314.3369.972.1220745870.001032267154.648154.568273.121159.175159.25837.0918256.34938.0764264.954255.968258.716123.391244.34236.6121123.40836.5671122.9816758166667121.605612223333320582333330.1429287.901389.384088.516069051.63516823333332.791144263000000.58280631230420.256533.494535.8530.1593139846401.855411.234150.5580.27647213905400000.7529293.1315.3703.081.29402.94228.1511133000001681.25535362667598417.69570.306141727995208.33853.175691467.718247.72585.0078427.10187.6623459.918261.90380.2514442.8295719.05006.86983.26585.65985.7540326.29OpenBenchmarking.org

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHBv2HBv3HBv4HC612182430SE +/- 0.011742, N = 3SE +/- 0.027453, N = 3SE +/- 0.018282, N = 3SE +/- 0.026763, N = 35.9158056.2771073.58139125.0195601. (CXX) g++ options: -fopenmp -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv2HBv3HBv4HC80160240320400SE +/- 0.05, N = 3SE +/- 0.58, N = 3SE +/- 1.18, N = 3SE +/- 0.04, N = 396.49135.95355.5162.901. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HBv2HBv3HBv4HC80160240320400SE +/- 0.47, N = 3SE +/- 0.93, N = 3SE +/- 1.24, N = 3SE +/- 0.04, N = 395.88135.69355.8662.981. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HBv2HBv3HBv4HC70140210280350SE +/- 0.34, N = 3SE +/- 0.73, N = 3SE +/- 0.80, N = 3SE +/- 0.02, N = 393.79123.24323.3657.761. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv2HBv3HBv4HC70140210280350SE +/- 0.23, N = 3SE +/- 0.05, N = 3SE +/- 0.96, N = 3SE +/- 0.06, N = 393.26124.60323.7057.921. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: MG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHBv2HBv3HBv4HC20K40K60K80K100KSE +/- 354.81, N = 3SE +/- 613.84, N = 15SE +/- 748.94, N = 13SE +/- 24.47, N = 343410.7146705.47108125.8619508.001. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHBv2HBv3HBv4HC306090120150SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.11, N = 3SE +/- 0.49, N = 350.8651.0825.26138.81

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv2HBv3HBv4HC130260390520650SE +/- 1.39, N = 3SE +/- 2.91, N = 3SE +/- 4.23, N = 3SE +/- 0.18, N = 3191.14257.42624.95113.941. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HBv2HBv3HBv4HC130260390520650SE +/- 1.03, N = 3SE +/- 2.52, N = 6SE +/- 2.25, N = 3SE +/- 0.09, N = 3191.78254.25622.58114.031. (CXX) g++ options: -O3 -pthread

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHBv2HBv3HBv4HC110220330440550SE +/- 0.01, N = 3SE +/- 0.45, N = 3SE +/- 0.12, N = 3SE +/- 2.13, N = 3210.18189.3096.77524.86

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HBv2HBv3HBv4HC130260390520650SE +/- 2.04, N = 3SE +/- 1.85, N = 3SE +/- 2.14, N = 3SE +/- 0.06, N = 3190.95232.17596.23110.051. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv2HBv3HBv4HC130260390520650SE +/- 1.02, N = 3SE +/- 0.15, N = 3SE +/- 2.49, N = 3SE +/- 0.10, N = 3189.21233.80590.93110.201. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHBv2HBv3HBv4HC15K30K45K60K75KSE +/- 34.59, N = 3SE +/- 273.09, N = 8SE +/- 954.46, N = 12SE +/- 12.00, N = 332495.8931024.7668819.3412907.541. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHBv2HBv3HBv4HC4080120160200SE +/- 0.10, N = 3SE +/- 0.24, N = 3SE +/- 0.06, N = 3SE +/- 1.13, N = 364.1462.6433.40176.21

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHBv2HBv3HBv4HC30K60K90K120K150KSE +/- 32.07, N = 3SE +/- 36.56, N = 3SE +/- 760.56, N = 3SE +/- 15.19, N = 366829.1862427.86151067.8128794.281. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HBv2HBv3HBv4HC70140210280350SE +/- 0.25, N = 3SE +/- 0.40, N = 3SE +/- 1.60, N = 3SE +/- 0.05, N = 394.53117.73311.8059.821. (CXX) g++ options: -O3 -pthread

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHBv2HBv3HBv4HC1632486480SE +/- 0.10, N = 3SE +/- 0.08, N = 3SE +/- 0.14, N = 3SE +/- 0.48, N = 326.1925.4713.9672.57

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv2HBv3HBv4HC70140210280350SE +/- 0.16, N = 3SE +/- 0.49, N = 3SE +/- 0.81, N = 3SE +/- 0.03, N = 395.20118.24311.2759.901. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv2HBv3HBv4HC70140210280350SE +/- 0.07, N = 3SE +/- 0.04, N = 3SE +/- 1.65, N = 3SE +/- 0.06, N = 391.43120.96315.9860.821. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HBv2HBv3HBv4HC70140210280350SE +/- 0.15, N = 3SE +/- 0.86, N = 3SE +/- 0.50, N = 3SE +/- 0.05, N = 391.48121.28314.3460.881. (CXX) g++ options: -O3 -pthread

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHBv2HBv3HBv4HC1122334455SE +/- 0.11, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 3SE +/- 0.65, N = 1519.4619.499.9750.53

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHBv2HBv3HBv4HC3691215SE +/- 0.009233, N = 3SE +/- 0.006682, N = 3SE +/- 0.029043, N = 3SE +/- 0.017495, N = 33.4668853.6493172.12207410.6454801. (CXX) g++ options: -fopenmp -pthread -lmpi

NAS Parallel Benchmarks

Test / Class: IS.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHBv2HBv3HBv4HC13002600390052006500SE +/- 11.15, N = 3SE +/- 22.55, N = 3SE +/- 17.88, N = 3SE +/- 2.10, N = 31884.222793.555870.001181.481. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHBv2HBv3HBv4HC200K400K600K800K1000KSE +/- 2650.49, N = 3SE +/- 6724.92, N = 3SE +/- 7680.08, N = 15SE +/- 748.55, N = 348945655829010322672107321. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HBv2HBv3HBv4HC306090120150SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.27, N = 3SE +/- 0.02, N = 346.9856.22154.6531.571. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv2HBv3HBv4HC306090120150SE +/- 0.09, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 346.9356.27154.5731.581. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HBv2HBv3HBv4HC60120180240300SE +/- 1.12, N = 15SE +/- 1.05, N = 3SE +/- 4.03, N = 14SE +/- 0.12, N = 388.61106.63273.1257.131. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HBv2HBv3HBv4HC4080120160200SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.34, N = 3SE +/- 0.03, N = 347.6157.33159.1833.521. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv2HBv3HBv4HC4080120160200SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 347.3757.23159.2633.551. (CXX) g++ options: -O3 -pthread

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHBv2HBv3HBv4HC918273645SE +/- 0.12026, N = 15SE +/- 0.01165, N = 3SE +/- 0.11164, N = 3SE +/- 0.03491, N = 38.1235611.1845037.091808.98723

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HBv2HBv3HBv4HC60120180240300SE +/- 0.67, N = 15SE +/- 1.41, N = 15SE +/- 1.07, N = 3SE +/- 0.07, N = 391.54103.51256.3558.361. (CXX) g++ options: -O3 -pthread

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHBv2HBv3HBv4HC918273645SE +/- 0.13915, N = 12SE +/- 0.03837, N = 3SE +/- 0.03610, N = 3SE +/- 0.02906, N = 38.6732711.7485038.076409.49421

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HBv2HBv3HBv4HC60120180240300SE +/- 1.10, N = 4SE +/- 0.80, N = 15SE +/- 4.27, N = 12SE +/- 0.08, N = 393.31102.70264.9560.571. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv2HBv3HBv4HC60120180240300SE +/- 0.74, N = 15SE +/- 1.13, N = 3SE +/- 3.64, N = 15SE +/- 0.16, N = 390.79105.09255.9758.551. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HBv2HBv3HBv4HC60120180240300SE +/- 1.27, N = 3SE +/- 0.81, N = 15SE +/- 2.84, N = 15SE +/- 0.19, N = 392.39105.50258.7260.891. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256HBv2HBv3HBv4HC306090120150SE +/- 0.55, N = 3SE +/- 0.14, N = 3SE +/- 1.65, N = 3SE +/- 0.08, N = 350.9039.81123.3930.121. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HBv2HBv3HBv4HC50100150200250SE +/- 0.61, N = 15SE +/- 0.77, N = 15SE +/- 3.04, N = 4SE +/- 0.02, N = 391.26103.41244.3459.731. (CXX) g++ options: -O3 -pthread

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHBv2HBv3HBv4HC816243240SE +/- 0.00495, N = 3SE +/- 0.01755, N = 3SE +/- 0.04053, N = 3SE +/- 0.01225, N = 322.3336024.4586036.612108.97547

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256HBv2HBv3HBv4HC306090120150SE +/- 0.55, N = 3SE +/- 0.14, N = 3SE +/- 1.16, N = 3SE +/- 0.03, N = 350.0838.57123.4130.271. (CXX) g++ options: -O3 -pthread

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHBv2HBv3HBv4HC816243240SE +/- 0.01671, N = 3SE +/- 0.01956, N = 3SE +/- 0.03598, N = 3SE +/- 0.00763, N = 322.1533024.1736036.567108.97020

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256HBv2HBv3HBv4HC306090120150SE +/- 0.57, N = 3SE +/- 0.33, N = 3SE +/- 1.21, N = 15SE +/- 0.05, N = 351.2039.37122.9830.221. (CXX) g++ options: -O3 -pthread

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HBv2HBv3HBv4HC1400M2800M4200M5600M7000MSE +/- 13588352.86, N = 3SE +/- 4247482.91, N = 3SE +/- 11394345.58, N = 3SE +/- 5446813.54, N = 341067000003563433333675816666716647333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 256HBv2HBv3HBv4HC306090120150SE +/- 0.29, N = 3SE +/- 0.29, N = 11SE +/- 1.20, N = 3SE +/- 0.08, N = 350.7138.45121.6130.171. (CXX) g++ options: -O3 -pthread

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HBv2HBv3HBv4HC1300M2600M3900M5200M6500MSE +/- 44818002.34, N = 3SE +/- 8912600.32, N = 3SE +/- 9214903.39, N = 3SE +/- 2852094.75, N = 340271000003419533333612223333315661333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HBv2HBv3HBv4HC400M800M1200M1600M2000MSE +/- 3174614.59, N = 3SE +/- 3040334.41, N = 3SE +/- 4603018.33, N = 3SE +/- 6341443.93, N = 382565333373537000020582333335292133331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHBv2HBv3HBv4HC0.11850.2370.35550.4740.5925SE +/- 0.00045, N = 3SE +/- 0.00027, N = 3SE +/- 0.00035, N = 3SE +/- 0.00096, N = 30.263850.271150.142920.52650

High Performance Conjugate Gradient

X Y Z: 160 160 160 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HBv2HBv3HBv4HC20406080100SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.12, N = 3SE +/- 0.06, N = 336.0239.1187.9025.561. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

High Performance Conjugate Gradient

X Y Z: 104 104 104 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HBv2HBv3HBv4HC20406080100SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.26, N = 3SE +/- 0.02, N = 337.0439.6189.3826.001. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

High Performance Conjugate Gradient

X Y Z: 144 144 144 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HBv2HBv3HBv4HC20406080100SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.11, N = 3SE +/- 0.05, N = 336.0938.9788.5225.871. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHBv2HBv3HBv4HC15K30K45K60K75KSE +/- 219.43, N = 3SE +/- 194.34, N = 3SE +/- 745.61, N = 3SE +/- 13.57, N = 341977.6936619.2969051.6320188.891. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HBv2HBv3HBv4HC1100M2200M3300M4400M5500MSE +/- 4421286.89, N = 3SE +/- 6947661.48, N = 3SE +/- 10401335.38, N = 3SE +/- 8373967.60, N = 340459333333516300000516823333315724000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OSPRay

Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHBv2HBv3HBv4HC816243240SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 313.9214.6132.7910.05

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 32HBv2HBv3HBv4HC900M1800M2700M3600M4500MSE +/- 3602930.91, N = 3SE +/- 5345506.94, N = 3SE +/- 3774034.09, N = 3SE +/- 8213606.60, N = 339259333333366733333442630000015126000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUHBv2HBv3HBv4HC0.36230.72461.08691.44921.8115SE +/- 0.021847, N = 3SE +/- 0.003506, N = 3SE +/- 0.001551, N = 3SE +/- 0.002723, N = 31.6100201.4086200.5828061.244800MIN: 1.49MIN: 1.36MIN: 0.56MIN: 1.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHBv2HBv3HBv4HC700K1400K2100K2800K3500KSE +/- 4115.38, N = 3SE +/- 11149.78, N = 3SE +/- 20304.79, N = 3SE +/- 4936.18, N = 324396502407602312304211618001. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHBv2HBv3HBv4HC0.15480.30960.46440.61920.774SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.003, N = 30.3280.3320.2560.6881. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHBv2HBv3HBv4HC30060090012001500SE +/- 13.52, N = 15SE +/- 6.66, N = 3SE +/- 1.90, N = 3SE +/- 1.51, N = 31367.73886.81533.49707.32MIN: 1212.94MIN: 849.06MIN: 518.68MIN: 687.141. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUHBv2HBv3HBv4HC30060090012001500SE +/- 13.31, N = 3SE +/- 3.89, N = 3SE +/- 3.26, N = 3SE +/- 1.60, N = 31345.14860.98535.85707.35MIN: 1237.17MIN: 814.31MIN: 521.12MIN: 689.521. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHBv2HBv3HBv4HC0.0830.1660.2490.3320.415SE +/- 0.001, N = 3SE +/- 0.000, N = 3SE +/- 0.000, N = 3SE +/- 0.001, N = 30.2030.2100.1590.3691. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHBv2HBv3HBv4HC700K1400K2100K2800K3500KSE +/- 8486.11, N = 3SE +/- 4803.91, N = 3SE +/- 4762.10, N = 3SE +/- 3475.53, N = 324662492375005313984613548771. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUHBv2HBv3HBv4HC2004006008001000SE +/- 9.52, N = 15SE +/- 4.61, N = 15SE +/- 1.40, N = 3SE +/- 4.72, N = 3896.81533.50401.86450.25MIN: 799.26MIN: 469.44MIN: 388.53MIN: 432.991. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHBv2HBv3HBv4HC2004006008001000SE +/- 9.54, N = 15SE +/- 4.36, N = 3SE +/- 3.60, N = 8SE +/- 1.89, N = 3910.94529.97411.23442.47MIN: 799.88MIN: 469.93MIN: 384.83MIN: 429.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Timed Node.js Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHBv2HBv3HBv4HC70140210280350SE +/- 1.32, N = 3SE +/- 1.46, N = 3SE +/- 2.23, N = 12SE +/- 2.37, N = 3194.37185.57150.56330.61

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUHBv2HBv3HBv4HC0.71.42.12.83.5SE +/- 0.002431, N = 3SE +/- 0.001799, N = 3SE +/- 0.000440, N = 3SE +/- 0.015370, N = 30.5738780.5567410.2764723.111210MIN: 0.47MIN: 0.5MIN: 1.731. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 57HBv2HBv3HBv4HC300M600M900M1200M1500MSE +/- 472581.56, N = 3SE +/- 550757.05, N = 3SE +/- 14294460.47, N = 5SE +/- 5360840.75, N = 111193400000108600000013905400007212909091. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUHBv2HBv3HBv4HC0.31670.63340.95011.26681.5835SE +/- 0.014464, N = 3SE +/- 0.013826, N = 12SE +/- 0.001421, N = 3SE +/- 0.000702, N = 31.4075800.9100910.7529290.882446MIN: 1.11MIN: 0.76MIN: 0.69MIN: 0.831. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv2HBv3HBv4HC0.70431.40862.11292.81723.5215SE +/- 0.02, N = 9SE +/- 0.01, N = 15SE +/- 0.01, N = 3SE +/- 0.01, N = 32.031.693.131.84

Remhos

Test: Sample Remap Example

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap ExampleHBv2HBv3HBv4HC612182430SE +/- 0.07, N = 3SE +/- 0.02, N = 3SE +/- 0.14, N = 3SE +/- 0.06, N = 314.9315.2615.3727.381. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv2HBv3HBv4HC0.6931.3862.0792.7723.465SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 32.081.683.081.82

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHBv2HBv3HBv4HC0.29030.58060.87091.16121.4515SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.040.791.290.88

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHBv2HBv3HBv4HC90180270360450SE +/- 3.57, N = 5SE +/- 0.15, N = 3SE +/- 0.78, N = 3SE +/- 1.35, N = 3345.14361.81402.94247.491. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

Laghos

Test: Triple Point Problem

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHBv2HBv3HBv4HC50100150200250SE +/- 0.57, N = 3SE +/- 0.38, N = 3SE +/- 1.25, N = 3SE +/- 0.08, N = 3183.82192.74228.15156.521. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 32HBv2HBv3HBv4HC200M400M600M800M1000MSE +/- 33333.33, N = 3SE +/- 2475306.94, N = 3SE +/- 1950213.66, N = 3SE +/- 3947135.39, N = 3106143333391733666711133000009644233331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigHBv2HBv3HBv4HC400800120016002000SE +/- 22.46, N = 3SE +/- 22.02, N = 3SE +/- 32.03, N = 9SE +/- 7.59, N = 31782.931889.461681.261950.63

Liquid-DSP

Threads: 1 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 1 - Buffer Length: 256 - Filter Length: 32HBv2HBv3HBv4HC8M16M24M32M40MSE +/- 2185.81, N = 3SE +/- 4096.07, N = 3SE +/- 20201.76, N = 3SE +/- 1333.33, N = 3332116673281733335362667317963331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

PETSc

Test: Streams

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHBv2HBv3HBv4HC130K260K390K520K650KSE +/- 12025.83, N = 6SE +/- 2674.31, N = 7SE +/- 46271.80, N = 9SE +/- 256.75, N = 3197895.47284001.92598417.70151286.251. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUHBv2HBv3HBv4HC246810SE +/- 0.032665, N = 3SE +/- 0.039917, N = 15SE +/- 0.002422, N = 3SE +/- 0.093711, N = 126.8382500.6242330.3061412.079200MIN: 5.97MIN: 1.411. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHBv2HBv3HBv4HC160K320K480K640K800KSE +/- 2438.40, N = 3SE +/- 19127.89, N = 3SE +/- 8360.33, N = 15SE +/- 256.58, N = 33710443975057279951481931. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHBv2HBv3HBv4HC50100150200250SE +/- 3.07, N = 12SE +/- 0.23, N = 3SE +/- 0.07, N = 3SE +/- 8.14, N = 12157.13168.24208.3486.57

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHBv2HBv3HBv4HC1224364860SE +/- 0.272351, N = 15SE +/- 0.132089, N = 3SE +/- 0.359007, N = 3SE +/- 0.199669, N = 155.89990325.10487653.17569114.3408301. (CC) gcc options: -O3 -march=native -fopenmp

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256HBv2HBv3HBv4HC100200300400500SE +/- 2.37, N = 15SE +/- 7.34, N = 15SE +/- 17.46, N = 12SE +/- 0.90, N = 3211.42207.97467.72131.961. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HBv2HBv3HBv4HC50100150200250SE +/- 1.33, N = 3SE +/- 1.07, N = 6SE +/- 4.85, N = 15SE +/- 0.27, N = 392.13105.36247.7359.551. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128HBv2HBv3HBv4HC20406080100SE +/- 1.30, N = 15SE +/- 0.34, N = 3SE +/- 4.77, N = 15SE +/- 0.23, N = 361.1456.8785.0158.911. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv2HBv3HBv4HC90180270360450SE +/- 3.34, N = 12SE +/- 3.45, N = 15SE +/- 10.91, N = 15SE +/- 0.53, N = 3200.04221.86427.10122.771. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 128HBv2HBv3HBv4HC20406080100SE +/- 1.33, N = 15SE +/- 1.12, N = 15SE +/- 3.68, N = 14SE +/- 0.30, N = 351.4050.6187.6641.731. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 256HBv2HBv3HBv4HC100200300400500SE +/- 2.79, N = 12SE +/- 5.19, N = 15SE +/- 14.34, N = 15SE +/- 0.57, N = 3205.21214.06459.92134.761. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HBv2HBv3HBv4HC60120180240300SE +/- 1.31, N = 3SE +/- 0.75, N = 15SE +/- 5.66, N = 15SE +/- 0.25, N = 391.92103.25261.9057.311. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128HBv2HBv3HBv4HC20406080100SE +/- 1.72, N = 15SE +/- 1.84, N = 15SE +/- 3.67, N = 15SE +/- 0.65, N = 559.4259.3880.2559.141. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256HBv2HBv3HBv4HC100200300400500SE +/- 1.85, N = 3SE +/- 5.11, N = 15SE +/- 14.97, N = 12SE +/- 0.52, N = 3203.77198.66442.83123.631. (CXX) g++ options: -O3 -pthread

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HBv2HBv3HBv4HC12002400360048006000SE +/- 18.03, N = 13SE +/- 17.54, N = 12SE +/- 226.33, N = 12SE +/- 5.15, N = 15411.72435.65719.0731.6-fopenmp -march=core-avx2-msse4.2-fopenmp -march=core-avx21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HBv2HBv3HBv4HC11002200330044005500SE +/- 3.90, N = 12SE +/- 32.59, N = 14SE +/- 443.26, N = 12SE +/- 2.82, N = 11195.11506.35006.8379.9-fopenmp -march=core-avx2-msse4.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HBv2HBv3HBv4HC15003000450060007500SE +/- 51.69, N = 9SE +/- 23.34, N = 3SE +/- 63.60, N = 3SE +/- 13.41, N = 121444.22032.16983.2898.8-msse4.2-fopenmp -march=core-avx2-msse4.2-fopenmp -march=core-avx21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HBv2HBv3HBv4HC14002800420056007000SE +/- 153.42, N = 6SE +/- 29.40, N = 3SE +/- 59.85, N = 3SE +/- 11.02, N = 31519.52284.66585.61328.4-msse4.2-fopenmp -march=core-avx2-msse4.2-fopenmp -march=core-avx21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

NAS Parallel Benchmarks

Test / Class: EP.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.DHBv2HBv3HBv4HC13002600390052006500SE +/- 32.15, N = 6SE +/- 80.22, N = 12SE +/- 37.41, N = 3SE +/- 1.76, N = 33222.822879.085985.751642.031. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: CG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHBv2HBv3HBv4HC9K18K27K36K45KSE +/- 108.02, N = 3SE +/- 20.87, N = 3SE +/- 77.41, N = 3SE +/- 233.39, N = 1522314.0221551.4840326.2914356.201. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi


Phoronix Test Suite v10.8.5