Microsoft Azure HBv4 HPC Comparison Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2307054-PTS-AZUREHPC63
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts

Limit displaying results to tests within:

Timed Code Compilation 2 Tests
C/C++ Compiler Tests 2 Tests
CPU Massive 8 Tests
Creator Workloads 4 Tests
Fortran Tests 4 Tests
Game Development 2 Tests
HPC - High Performance Computing 6 Tests
Molecular Dynamics 2 Tests
MPI Benchmarks 3 Tests
Multi-Core 13 Tests
Intel oneAPI 3 Tests
OpenMPI Tests 6 Tests
Programmer / Developer System Benchmarks 3 Tests
Python Tests 2 Tests
Renderers 2 Tests
Scientific Computing 3 Tests
Server CPU Tests 6 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Additional Graphs

Show Perf Per Core/Thread Calculation Graphs Where Applicable

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
HC
July 04 2023
  7 Hours, 40 Minutes
HBv2
July 03 2023
  10 Hours, 37 Minutes
HBv3
July 02 2023
  8 Hours, 22 Minutes
HBv4
July 01 2023
  11 Hours, 13 Minutes
Invert Hiding All Results Option
  9 Hours, 28 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


Microsoft Azure HBv4 HPC Comparison BenchmarksProcessorMotherboardMemoryDiskGraphicsOSKernelCompilerFile-SystemScreen ResolutionSystem LayerHCHBv2HBv3HBv42 x Intel Xeon Platinum 8168 (44 Cores)Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS)1 GB + 60928 MB + 118272 MB + 176 GB32GB Virtual Disk + 752GB Virtual Diskhyperv_fbAlmaLinux 8.74.18.0-425.3.1.el8.x86_64 (x86_64)GCC 8.5.0 20210514 + CUDA 12.1nfs1024x768microsoft2 x AMD EPYC 7V12 64-Core (120 Cores)1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 7V73X 64-Core (120 Cores)2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 9V33X 96-Core (176 Cores)1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual DiskAlmaLinux 8.8OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysCompiler Details- --build=x86_64-redhat-linux --disable-libmpx --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-gcc-major-version-only --with-isl --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details- CPU Microcode: 0xffffffffPython Details- Python 3.6.8Security Details- HC: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown - HBv2: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv3: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv4: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

HCHBv2HBv3HBv4Logarithmic Result OverviewPhoronix Test SuitelibxsmmACES DGEMMPennantBlender7-Zip CompressionNAS Parallel BenchmarksHeFFTe - Highly Efficient FFT for ExascalePETScNAMDOSPRayHigh Performance Conjugate GradientoneDNNPostgreSQLLiquid-DSPTimed Node.js CompilationRemhosIntel Open Image DenoiseLaghosTimed Linux Kernel Compilation

Microsoft Azure HBv4 HPC Comparison Benchmarkspennant: sedovbigheffte: c2c - FFTW - float-long - 512heffte: c2c - FFTW - float - 512heffte: c2c - Stock - float - 512heffte: c2c - Stock - float-long - 512npb: MG.Cblender: Classroom - CPU-Onlyheffte: r2c - FFTW - float-long - 512heffte: r2c - FFTW - float - 512blender: Barbershop - CPU-Onlyheffte: r2c - Stock - float - 512heffte: r2c - Stock - float-long - 512npb: SP.Cblender: Pabellon Barcelona - CPU-Onlynpb: BT.Cheffte: r2c - Stock - double - 512blender: Fishy Cat - CPU-Onlyheffte: r2c - Stock - double-long - 512heffte: r2c - FFTW - double-long - 512heffte: r2c - FFTW - double - 512blender: BMW27 - CPU-Onlypennant: leblancbignpb: IS.Dcompress-7zip: Compression Ratingheffte: c2c - Stock - double - 512heffte: c2c - Stock - double-long - 512heffte: r2c - FFTW - double-long - 256heffte: c2c - FFTW - double - 512heffte: c2c - FFTW - double-long - 512ospray: gravity_spheres_volume/dim_512/scivis/real_timeheffte: c2c - FFTW - float - 256ospray: gravity_spheres_volume/dim_512/ao/real_timeheffte: r2c - Stock - double - 256heffte: c2c - FFTW - float-long - 256heffte: r2c - Stock - double-long - 256heffte: c2c - FFTW - double - 256heffte: c2c - Stock - float - 256ospray: particle_volume/ao/real_timeheffte: c2c - Stock - double-long - 256ospray: particle_volume/scivis/real_timeheffte: c2c - FFTW - double-long - 256liquid-dsp: 176 - 256 - 57heffte: c2c - Stock - double - 256liquid-dsp: 176 - 256 - 32liquid-dsp: 176 - 256 - 512namd: ATPase Simulation - 327,506 Atomshpcg: 160 160 160 - 60hpcg: 104 104 104 - 60hpcg: 144 144 144 - 60npb: FT.Cliquid-dsp: 128 - 256 - 57ospray: gravity_spheres_volume/dim_512/pathtracer/real_timeliquid-dsp: 128 - 256 - 32onednn: Deconvolution Batch shapes_3d - f32 - CPUpgbench: 1 - 800 - Read Onlypgbench: 1 - 800 - Read Only - Average Latencyonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - f32 - CPUpgbench: 1 - 500 - Read Only - Average Latencypgbench: 1 - 500 - Read Onlyonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUbuild-nodejs: Time To Compileonednn: Convolution Batch Shapes Auto - f32 - CPUliquid-dsp: 32 - 256 - 57onednn: IP Shapes 1D - f32 - CPUoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyremhos: Sample Remap Exampleoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlylaghos: Sedov Blast Wave, ube_922_hex.meshlaghos: Triple Point Problemliquid-dsp: 32 - 256 - 32build-linux-kernel: allmodconfigliquid-dsp: 1 - 256 - 32petsc: Streamsonednn: IP Shapes 3D - f32 - CPUcompress-7zip: Decompression Ratingospray: particle_volume/pathtracer/real_timemt-dgemm: Sustained Floating-Point Rateheffte: r2c - Stock - float-long - 256heffte: c2c - Stock - float-long - 256heffte: c2c - FFTW - double-long - 128heffte: r2c - FFTW - float-long - 256heffte: c2c - Stock - double - 128heffte: r2c - Stock - float - 256heffte: r2c - FFTW - double - 256heffte: c2c - FFTW - double - 128heffte: r2c - FFTW - float - 256libxsmm: 64libxsmm: 32libxsmm: 256libxsmm: 128npb: EP.Dnpb: CG.CHCHBv2HBv3HBv425.0195662.902762.975057.764357.920319508.00138.81113.940114.025524.86110.049110.19712907.54176.2128794.2859.821672.5759.895460.820460.880450.5310.645481181.4821073231.571831.584657.129033.519333.55458.9872358.35679.4942160.572758.549860.887230.119059.72928.9754730.26728.9702030.2175166473333330.166315661333335292133330.5265025.563525.997125.865920188.89157240000010.049015126000001.2448011618000.688707.322707.3530.3691354877450.247442.471330.6133.111217212909090.8824461.8427.3781.820.88247.49156.529644233331950.62631796333151286.24912.0792014819386.573414.340830131.96259.552758.9125122.77241.7345134.76057.310159.1442123.632731.6379.9898.81328.41642.0314356.205.91580596.494195.880193.792393.257343410.7150.86191.141191.775210.18190.949189.20832495.8964.1466829.1894.530126.1995.198991.429691.480219.463.4668851884.2248945646.979446.928988.608147.605047.36968.1235691.53838.6732793.313790.788392.388350.903291.260122.333650.075922.153351.1954410670000050.707040271000008256533330.2638536.016737.041036.086641977.69404593333313.915139259333331.6100224396500.3281367.731345.140.2032466249896.813910.937194.3670.57387811934000001.407582.0314.9312.081.04345.14183.8210614333331782.93333211667197895.47176.83825371044157.1335.899903211.41892.129061.1403200.03551.3955205.20691.918659.4244203.772411.7195.11444.21519.53222.8222314.026.277107135.950135.694123.242124.59546705.4751.08257.419254.252189.30232.166233.79731024.7662.6462427.86117.73125.47118.236120.957121.28319.493.6493172793.5555829056.216156.2690106.63257.330757.226311.1845103.514711.7485102.7046105.093105.500339.8117103.40924.458638.569424.173639.3709356343333338.446134195333337353700000.2711539.110639.609338.973936619.29351630000014.606733667333331.4086224076020.332886.810860.9750.2102375005533.496529.973185.5670.55674110860000000.9100911.6915.2561.680.79361.81192.749173366671889.46332817333284001.91620.624233397505168.24225.104876207.974105.36156.8693221.86150.6068214.063103.245759.3811198.6602435.61506.32032.12284.62879.0821551.483.581391355.512355.855323.356323.696108125.8625.26624.951622.58096.77596.226590.92568819.3433.40151067.81311.80313.96311.267315.982314.3369.972.1220745870.001032267154.648154.568273.121159.175159.25837.0918256.34938.0764264.954255.968258.716123.391244.34236.6121123.40836.5671122.9816758166667121.605612223333320582333330.1429287.901389.384088.516069051.63516823333332.791144263000000.58280631230420.256533.494535.8530.1593139846401.855411.234150.5580.27647213905400000.7529293.1315.3703.081.29402.94228.1511133000001681.25535362667598417.69570.306141727995208.33853.175691467.718247.72585.0078427.10187.6623459.918261.90380.2514442.8295719.05006.86983.26585.65985.7540326.29OpenBenchmarking.org

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHCHBv3HBv2HBv4612182430SE +/- 0.026763, N = 3SE +/- 0.027453, N = 3SE +/- 0.011742, N = 3SE +/- 0.018282, N = 325.0195606.2771075.9158053.5813911. (CXX) g++ options: -fopenmp -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv480160240320400SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.58, N = 3SE +/- 1.18, N = 362.9096.49135.95355.511. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HCHBv2HBv3HBv480160240320400SE +/- 0.04, N = 3SE +/- 0.47, N = 3SE +/- 0.93, N = 3SE +/- 1.24, N = 362.9895.88135.69355.861. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.02, N = 3SE +/- 0.34, N = 3SE +/- 0.73, N = 3SE +/- 0.80, N = 357.7693.79123.24323.361. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.06, N = 3SE +/- 0.23, N = 3SE +/- 0.05, N = 3SE +/- 0.96, N = 357.9293.26124.60323.701. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHCHBv2HBv3HBv420K40K60K80K100KSE +/- 24.47, N = 3SE +/- 354.81, N = 3SE +/- 613.84, N = 15SE +/- 748.94, N = 1319508.0043410.7146705.47108125.861. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHCHBv3HBv2HBv4306090120150SE +/- 0.49, N = 3SE +/- 0.04, N = 3SE +/- 0.10, N = 3SE +/- 0.11, N = 3138.8151.0850.8625.26

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.18, N = 3SE +/- 1.39, N = 3SE +/- 2.91, N = 3SE +/- 4.23, N = 3113.94191.14257.42624.951. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.09, N = 3SE +/- 1.03, N = 3SE +/- 2.52, N = 6SE +/- 2.25, N = 3114.03191.78254.25622.581. (CXX) g++ options: -O3 -pthread

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHCHBv2HBv3HBv4110220330440550SE +/- 2.13, N = 3SE +/- 0.01, N = 3SE +/- 0.45, N = 3SE +/- 0.12, N = 3524.86210.18189.3096.77

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.06, N = 3SE +/- 2.04, N = 3SE +/- 1.85, N = 3SE +/- 2.14, N = 3110.05190.95232.17596.231. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.10, N = 3SE +/- 1.02, N = 3SE +/- 0.15, N = 3SE +/- 2.49, N = 3110.20189.21233.80590.931. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHCHBv3HBv2HBv415K30K45K60K75KSE +/- 12.00, N = 3SE +/- 273.09, N = 8SE +/- 34.59, N = 3SE +/- 954.46, N = 1212907.5431024.7632495.8968819.341. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHCHBv2HBv3HBv44080120160200SE +/- 1.13, N = 3SE +/- 0.10, N = 3SE +/- 0.24, N = 3SE +/- 0.06, N = 3176.2164.1462.6433.40

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHCHBv3HBv2HBv430K60K90K120K150KSE +/- 15.19, N = 3SE +/- 36.56, N = 3SE +/- 32.07, N = 3SE +/- 760.56, N = 328794.2862427.8666829.18151067.811. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.05, N = 3SE +/- 0.25, N = 3SE +/- 0.40, N = 3SE +/- 1.60, N = 359.8294.53117.73311.801. (CXX) g++ options: -O3 -pthread

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHCHBv2HBv3HBv41632486480SE +/- 0.48, N = 3SE +/- 0.10, N = 3SE +/- 0.08, N = 3SE +/- 0.14, N = 372.5726.1925.4713.96

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.03, N = 3SE +/- 0.16, N = 3SE +/- 0.49, N = 3SE +/- 0.81, N = 359.9095.20118.24311.271. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.06, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 3SE +/- 1.65, N = 360.8291.43120.96315.981. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.05, N = 3SE +/- 0.15, N = 3SE +/- 0.86, N = 3SE +/- 0.50, N = 360.8891.48121.28314.341. (CXX) g++ options: -O3 -pthread

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHCHBv3HBv2HBv41122334455SE +/- 0.65, N = 15SE +/- 0.02, N = 3SE +/- 0.11, N = 3SE +/- 0.06, N = 350.5319.4919.469.97

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHCHBv3HBv2HBv43691215SE +/- 0.017495, N = 3SE +/- 0.006682, N = 3SE +/- 0.009233, N = 3SE +/- 0.029043, N = 310.6454803.6493173.4668852.1220741. (CXX) g++ options: -fopenmp -pthread -lmpi

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHCHBv2HBv3HBv413002600390052006500SE +/- 2.10, N = 3SE +/- 11.15, N = 3SE +/- 22.55, N = 3SE +/- 17.88, N = 31181.481884.222793.555870.001. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHCHBv2HBv3HBv4200K400K600K800K1000KSE +/- 748.55, N = 3SE +/- 2650.49, N = 3SE +/- 6724.92, N = 3SE +/- 7680.08, N = 1521073248945655829010322671. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HCHBv2HBv3HBv4306090120150SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.27, N = 331.5746.9856.22154.651. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv4306090120150SE +/- 0.02, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 331.5846.9356.27154.571. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.12, N = 3SE +/- 1.12, N = 15SE +/- 1.05, N = 3SE +/- 4.03, N = 1457.1388.61106.63273.121. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HCHBv2HBv3HBv44080120160200SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.34, N = 333.5247.6157.33159.181. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv44080120160200SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 333.5547.3757.23159.261. (CXX) g++ options: -O3 -pthread

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHBv2HCHBv3HBv4918273645SE +/- 0.12026, N = 15SE +/- 0.03491, N = 3SE +/- 0.01165, N = 3SE +/- 0.11164, N = 38.123568.9872311.1845037.09180

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.07, N = 3SE +/- 0.67, N = 15SE +/- 1.41, N = 15SE +/- 1.07, N = 358.3691.54103.51256.351. (CXX) g++ options: -O3 -pthread

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHBv2HCHBv3HBv4918273645SE +/- 0.13915, N = 12SE +/- 0.02906, N = 3SE +/- 0.03837, N = 3SE +/- 0.03610, N = 38.673279.4942111.7485038.07640

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.08, N = 3SE +/- 1.10, N = 4SE +/- 0.80, N = 15SE +/- 4.27, N = 1260.5793.31102.70264.951. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.16, N = 3SE +/- 0.74, N = 15SE +/- 1.13, N = 3SE +/- 3.64, N = 1558.5590.79105.09255.971. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.19, N = 3SE +/- 1.27, N = 3SE +/- 0.81, N = 15SE +/- 2.84, N = 1560.8992.39105.50258.721. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256HCHBv3HBv2HBv4306090120150SE +/- 0.08, N = 3SE +/- 0.14, N = 3SE +/- 0.55, N = 3SE +/- 1.65, N = 330.1239.8150.90123.391. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HCHBv2HBv3HBv450100150200250SE +/- 0.02, N = 3SE +/- 0.61, N = 15SE +/- 0.77, N = 15SE +/- 3.04, N = 459.7391.26103.41244.341. (CXX) g++ options: -O3 -pthread

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.01225, N = 3SE +/- 0.00495, N = 3SE +/- 0.01755, N = 3SE +/- 0.04053, N = 38.9754722.3336024.4586036.61210

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256HCHBv3HBv2HBv4306090120150SE +/- 0.03, N = 3SE +/- 0.14, N = 3SE +/- 0.55, N = 3SE +/- 1.16, N = 330.2738.5750.08123.411. (CXX) g++ options: -O3 -pthread

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.00763, N = 3SE +/- 0.01671, N = 3SE +/- 0.01956, N = 3SE +/- 0.03598, N = 38.9702022.1533024.1736036.56710

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256HCHBv3HBv2HBv4306090120150SE +/- 0.05, N = 3SE +/- 0.33, N = 3SE +/- 0.57, N = 3SE +/- 1.21, N = 1530.2239.3751.20122.981. (CXX) g++ options: -O3 -pthread

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HCHBv3HBv2HBv41400M2800M4200M5600M7000MSE +/- 5446813.54, N = 3SE +/- 4247482.91, N = 3SE +/- 13588352.86, N = 3SE +/- 11394345.58, N = 316647333333563433333410670000067581666671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 256HCHBv3HBv2HBv4306090120150SE +/- 0.08, N = 3SE +/- 0.29, N = 11SE +/- 0.29, N = 3SE +/- 1.20, N = 330.1738.4550.71121.611. (CXX) g++ options: -O3 -pthread

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HCHBv3HBv2HBv41300M2600M3900M5200M6500MSE +/- 2852094.75, N = 3SE +/- 8912600.32, N = 3SE +/- 44818002.34, N = 3SE +/- 9214903.39, N = 315661333333419533333402710000061222333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HCHBv3HBv2HBv4400M800M1200M1600M2000MSE +/- 6341443.93, N = 3SE +/- 3040334.41, N = 3SE +/- 3174614.59, N = 3SE +/- 4603018.33, N = 352921333373537000082565333320582333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHCHBv3HBv2HBv40.11850.2370.35550.4740.5925SE +/- 0.00096, N = 3SE +/- 0.00027, N = 3SE +/- 0.00045, N = 3SE +/- 0.00035, N = 30.526500.271150.263850.14292

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.06, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.12, N = 325.5636.0239.1187.901. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.26, N = 326.0037.0439.6189.381. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.11, N = 325.8736.0938.9788.521. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHCHBv3HBv2HBv415K30K45K60K75KSE +/- 13.57, N = 3SE +/- 194.34, N = 3SE +/- 219.43, N = 3SE +/- 745.61, N = 320188.8936619.2941977.6969051.631. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HCHBv3HBv2HBv41100M2200M3300M4400M5500MSE +/- 8373967.60, N = 3SE +/- 6947661.48, N = 3SE +/- 4421286.89, N = 3SE +/- 10401335.38, N = 315724000003516300000404593333351682333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 310.0513.9214.6132.79

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 32HCHBv3HBv2HBv4900M1800M2700M3600M4500MSE +/- 8213606.60, N = 3SE +/- 5345506.94, N = 3SE +/- 3602930.91, N = 3SE +/- 3774034.09, N = 315126000003366733333392593333344263000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUHBv2HBv3HCHBv40.36230.72461.08691.44921.8115SE +/- 0.021847, N = 3SE +/- 0.003506, N = 3SE +/- 0.002723, N = 3SE +/- 0.001551, N = 31.6100201.4086201.2448000.582806MIN: 1.49MIN: 1.36MIN: 1.22MIN: 0.561. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHCHBv3HBv2HBv4700K1400K2100K2800K3500KSE +/- 4936.18, N = 3SE +/- 11149.78, N = 3SE +/- 4115.38, N = 3SE +/- 20304.79, N = 311618002407602243965031230421. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHCHBv3HBv2HBv40.15480.30960.46440.61920.774SE +/- 0.003, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 30.6880.3320.3280.2561. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHBv2HBv3HCHBv430060090012001500SE +/- 13.52, N = 15SE +/- 6.66, N = 3SE +/- 1.51, N = 3SE +/- 1.90, N = 31367.73886.81707.32533.49MIN: 1212.94MIN: 849.06MIN: 687.14MIN: 518.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUHBv2HBv3HCHBv430060090012001500SE +/- 13.31, N = 3SE +/- 3.89, N = 3SE +/- 1.60, N = 3SE +/- 3.26, N = 31345.14860.98707.35535.85MIN: 1237.17MIN: 814.31MIN: 689.52MIN: 521.121. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHCHBv3HBv2HBv40.0830.1660.2490.3320.415SE +/- 0.001, N = 3SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.000, N = 30.3690.2100.2030.1591. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHCHBv3HBv2HBv4700K1400K2100K2800K3500KSE +/- 3475.53, N = 3SE +/- 4803.91, N = 3SE +/- 8486.11, N = 3SE +/- 4762.10, N = 313548772375005246624931398461. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUHBv2HBv3HCHBv42004006008001000SE +/- 9.52, N = 15SE +/- 4.61, N = 15SE +/- 4.72, N = 3SE +/- 1.40, N = 3896.81533.50450.25401.86MIN: 799.26MIN: 469.44MIN: 432.99MIN: 388.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHBv2HBv3HCHBv42004006008001000SE +/- 9.54, N = 15SE +/- 4.36, N = 3SE +/- 1.89, N = 3SE +/- 3.60, N = 8910.94529.97442.47411.23MIN: 799.88MIN: 469.93MIN: 429.93MIN: 384.831. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHCHBv2HBv3HBv470140210280350SE +/- 2.37, N = 3SE +/- 1.32, N = 3SE +/- 1.46, N = 3SE +/- 2.23, N = 12330.61194.37185.57150.56

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv40.71.42.12.83.5SE +/- 0.015370, N = 3SE +/- 0.002431, N = 3SE +/- 0.001799, N = 3SE +/- 0.000440, N = 33.1112100.5738780.5567410.276472MIN: 1.73MIN: 0.47MIN: 0.51. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 57HCHBv3HBv2HBv4300M600M900M1200M1500MSE +/- 5360840.75, N = 11SE +/- 550757.05, N = 3SE +/- 472581.56, N = 3SE +/- 14294460.47, N = 57212909091086000000119340000013905400001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUHBv2HBv3HCHBv40.31670.63340.95011.26681.5835SE +/- 0.014464, N = 3SE +/- 0.013826, N = 12SE +/- 0.000702, N = 3SE +/- 0.001421, N = 31.4075800.9100910.8824460.752929MIN: 1.11MIN: 0.76MIN: 0.83MIN: 0.691. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv3HCHBv2HBv40.70431.40862.11292.81723.5215SE +/- 0.01, N = 15SE +/- 0.01, N = 3SE +/- 0.02, N = 9SE +/- 0.01, N = 31.691.842.033.13

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap ExampleHCHBv4HBv3HBv2612182430SE +/- 0.06, N = 3SE +/- 0.14, N = 3SE +/- 0.02, N = 3SE +/- 0.07, N = 327.3815.3715.2614.931. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv3HCHBv2HBv40.6931.3862.0792.7723.465SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 31.681.822.083.08

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHBv3HCHBv2HBv40.29030.58060.87091.16121.4515SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 30.790.881.041.29

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHCHBv2HBv3HBv490180270360450SE +/- 1.35, N = 3SE +/- 3.57, N = 5SE +/- 0.15, N = 3SE +/- 0.78, N = 3247.49345.14361.81402.941. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHCHBv2HBv3HBv450100150200250SE +/- 0.08, N = 3SE +/- 0.57, N = 3SE +/- 0.38, N = 3SE +/- 1.25, N = 3156.52183.82192.74228.151. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 32HBv3HCHBv2HBv4200M400M600M800M1000MSE +/- 2475306.94, N = 3SE +/- 3947135.39, N = 3SE +/- 33333.33, N = 3SE +/- 1950213.66, N = 3917336667964423333106143333311133000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested or alternatively an allmodconfig for building all possible kernel modules for the build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigHCHBv3HBv2HBv4400800120016002000SE +/- 7.59, N = 3SE +/- 22.02, N = 3SE +/- 22.46, N = 3SE +/- 32.03, N = 91950.631889.461782.931681.26

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 1 - Buffer Length: 256 - Filter Length: 32HCHBv3HBv2HBv48M16M24M32M40MSE +/- 1333.33, N = 3SE +/- 4096.07, N = 3SE +/- 2185.81, N = 3SE +/- 20201.76, N = 3317963333281733333211667353626671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

PETSc

PETSc, the Portable, Extensible Toolkit for Scientific Computation, is for the scalable (parallel) solution of scientific applications modeled by partial differential equations. This test profile runs the PETSc "make streams" benchmark and records the throughput rate when all available cores are utilized for the MPI Streams build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHCHBv2HBv3HBv4130K260K390K520K650KSE +/- 256.75, N = 3SE +/- 12025.83, N = 6SE +/- 2674.31, N = 7SE +/- 46271.80, N = 9151286.25197895.47284001.92598417.701. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUHBv2HCHBv3HBv4246810SE +/- 0.032665, N = 3SE +/- 0.093711, N = 12SE +/- 0.039917, N = 15SE +/- 0.002422, N = 36.8382502.0792000.6242330.306141MIN: 5.97MIN: 1.411. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHCHBv2HBv3HBv4160K320K480K640K800KSE +/- 256.58, N = 3SE +/- 2438.40, N = 3SE +/- 19127.89, N = 3SE +/- 8360.33, N = 151481933710443975057279951. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHCHBv2HBv3HBv450100150200250SE +/- 8.14, N = 12SE +/- 3.07, N = 12SE +/- 0.23, N = 3SE +/- 0.07, N = 386.57157.13168.24208.34

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHBv2HCHBv3HBv41224364860SE +/- 0.272351, N = 15SE +/- 0.199669, N = 15SE +/- 0.132089, N = 3SE +/- 0.359007, N = 35.89990314.34083025.10487653.1756911. (CC) gcc options: -O3 -march=native -fopenmp

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256HCHBv3HBv2HBv4100200300400500SE +/- 0.90, N = 3SE +/- 7.34, N = 15SE +/- 2.37, N = 15SE +/- 17.46, N = 12131.96207.97211.42467.721. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv450100150200250SE +/- 0.27, N = 3SE +/- 1.33, N = 3SE +/- 1.07, N = 6SE +/- 4.85, N = 1559.5592.13105.36247.731. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128HBv3HCHBv2HBv420406080100SE +/- 0.34, N = 3SE +/- 0.23, N = 3SE +/- 1.30, N = 15SE +/- 4.77, N = 1556.8758.9161.1485.011. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv490180270360450SE +/- 0.53, N = 3SE +/- 3.34, N = 12SE +/- 3.45, N = 15SE +/- 10.91, N = 15122.77200.04221.86427.101. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 128HCHBv3HBv2HBv420406080100SE +/- 0.30, N = 3SE +/- 1.12, N = 15SE +/- 1.33, N = 15SE +/- 3.68, N = 1441.7350.6151.4087.661. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 256HCHBv2HBv3HBv4100200300400500SE +/- 0.57, N = 3SE +/- 2.79, N = 12SE +/- 5.19, N = 15SE +/- 14.34, N = 15134.76205.21214.06459.921. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.25, N = 3SE +/- 1.31, N = 3SE +/- 0.75, N = 15SE +/- 5.66, N = 1557.3191.92103.25261.901. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128HCHBv3HBv2HBv420406080100SE +/- 0.65, N = 5SE +/- 1.84, N = 15SE +/- 1.72, N = 15SE +/- 3.67, N = 1559.1459.3859.4280.251. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256HCHBv3HBv2HBv4100200300400500SE +/- 0.52, N = 3SE +/- 5.11, N = 15SE +/- 1.85, N = 3SE +/- 14.97, N = 12123.63198.66203.77442.831. (CXX) g++ options: -O3 -pthread

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HBv2HCHBv3HBv412002400360048006000SE +/- 18.03, N = 13SE +/- 5.15, N = 15SE +/- 17.54, N = 12SE +/- 226.33, N = 12411.7731.62435.65719.0-fopenmp -march=core-avx2-fopenmp -march=core-avx2-msse4.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HBv2HCHBv3HBv411002200330044005500SE +/- 3.90, N = 12SE +/- 2.82, N = 11SE +/- 32.59, N = 14SE +/- 443.26, N = 12195.1379.91506.35006.8-fopenmp -march=core-avx2-msse4.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HCHBv2HBv3HBv415003000450060007500SE +/- 13.41, N = 12SE +/- 51.69, N = 9SE +/- 23.34, N = 3SE +/- 63.60, N = 3898.81444.22032.16983.2-fopenmp -march=core-avx2-msse4.2-fopenmp -march=core-avx2-msse4.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HCHBv2HBv3HBv414002800420056007000SE +/- 11.02, N = 3SE +/- 153.42, N = 6SE +/- 29.40, N = 3SE +/- 59.85, N = 31328.41519.52284.66585.6-fopenmp -march=core-avx2-msse4.2-fopenmp -march=core-avx2-msse4.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.DHCHBv3HBv2HBv413002600390052006500SE +/- 1.76, N = 3SE +/- 80.22, N = 12SE +/- 32.15, N = 6SE +/- 37.41, N = 31642.032879.083222.825985.751. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHCHBv3HBv2HBv49K18K27K36K45KSE +/- 233.39, N = 15SE +/- 20.87, N = 3SE +/- 108.02, N = 3SE +/- 77.41, N = 314356.2021551.4822314.0240326.291. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

95 Results Shown

Pennant
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - FFTW - float-long - 512
  c2c - FFTW - float - 512
  c2c - Stock - float - 512
  c2c - Stock - float-long - 512
NAS Parallel Benchmarks
Blender
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - FFTW - float-long - 512
  r2c - FFTW - float - 512
Blender
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - Stock - float - 512
  r2c - Stock - float-long - 512
NAS Parallel Benchmarks
Blender
NAS Parallel Benchmarks
HeFFTe - Highly Efficient FFT for Exascale
Blender
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - Stock - double-long - 512
  r2c - FFTW - double-long - 512
  r2c - FFTW - double - 512
Blender
Pennant
NAS Parallel Benchmarks
7-Zip Compression
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - Stock - double - 512
  c2c - Stock - double-long - 512
  r2c - FFTW - double-long - 256
  c2c - FFTW - double - 512
  c2c - FFTW - double-long - 512
OSPRay
HeFFTe - Highly Efficient FFT for Exascale
OSPRay
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - Stock - double - 256
  c2c - FFTW - float-long - 256
  r2c - Stock - double-long - 256
  c2c - FFTW - double - 256
  c2c - Stock - float - 256
OSPRay
HeFFTe - Highly Efficient FFT for Exascale
OSPRay
HeFFTe - Highly Efficient FFT for Exascale
Liquid-DSP
HeFFTe - Highly Efficient FFT for Exascale
Liquid-DSP:
  176 - 256 - 32
  176 - 256 - 512
NAMD
High Performance Conjugate Gradient:
  160 160 160 - 60
  104 104 104 - 60
  144 144 144 - 60
NAS Parallel Benchmarks
Liquid-DSP
OSPRay
Liquid-DSP
oneDNN
PostgreSQL:
  1 - 800 - Read Only
  1 - 800 - Read Only - Average Latency
oneDNN:
  Recurrent Neural Network Training - bf16bf16bf16 - CPU
  Recurrent Neural Network Training - f32 - CPU
PostgreSQL:
  1 - 500 - Read Only - Average Latency
  1 - 500 - Read Only
oneDNN:
  Recurrent Neural Network Inference - f32 - CPU
  Recurrent Neural Network Inference - bf16bf16bf16 - CPU
Timed Node.js Compilation
oneDNN
Liquid-DSP
oneDNN
Intel Open Image Denoise
Remhos
Intel Open Image Denoise:
  RT.hdr_alb_nrm.3840x2160 - CPU-Only
  RTLightmap.hdr.4096x4096 - CPU-Only
Laghos:
  Sedov Blast Wave, ube_922_hex.mesh
  Triple Point Problem
Liquid-DSP
Timed Linux Kernel Compilation
Liquid-DSP
PETSc
oneDNN
7-Zip Compression
OSPRay
ACES DGEMM
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - Stock - float-long - 256
  c2c - Stock - float-long - 256
  c2c - FFTW - double-long - 128
  r2c - FFTW - float-long - 256
  c2c - Stock - double - 128
  r2c - Stock - float - 256
  r2c - FFTW - double - 256
  c2c - FFTW - double - 128
  r2c - FFTW - float - 256
libxsmm:
  64
  32
  256
  128
NAS Parallel Benchmarks:
  EP.D
  CG.C