Microsoft Azure HBv4 HPC Comparison Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2307260-PTS-AZUREHPC62
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Additional Graphs

Show Perf Per Core/Thread Calculation Graphs Where Applicable

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
HC
July 04 2023
  6 Hours, 58 Minutes
HBv2
July 03 2023
  7 Hours, 49 Minutes
HBv3
July 02 2023
  6 Hours, 48 Minutes
HBv4
July 01 2023
  9 Hours, 44 Minutes
Invert Behavior (Only Show Selected Data)
  7 Hours, 50 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


Microsoft Azure HBv4 HPC Comparison BenchmarksProcessorMotherboardMemoryDiskGraphicsOSKernelCompilerFile-SystemScreen ResolutionSystem LayerHBv4HBv3HBv2HC2 x AMD EPYC 9V33X 96-Core (176 Cores)Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS)1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Diskhyperv_fbAlmaLinux 8.84.18.0-425.3.1.el8.x86_64 (x86_64)GCC 8.5.0 20210514 + CUDA 12.1nfs1024x768microsoft2 x AMD EPYC 7V73X 64-Core (120 Cores)1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual DiskAlmaLinux 8.72 x AMD EPYC 7V12 64-Core (120 Cores)960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x Intel Xeon Platinum 8168 (44 Cores)1 GB + 60928 MB + 118272 MB + 176 GB32GB Virtual Disk + 752GB Virtual Disk hyperv_fbOpenBenchmarking.org

HBv4HBv3HBv2HCLogarithmic Result OverviewPhoronix Test SuitelibxsmmACES DGEMMPennantBlender7-Zip CompressionNAS Parallel BenchmarksHeFFTe - Highly Efficient FFT for ExascalePETScNAMDOSPRayHigh Performance Conjugate GradientoneDNNPostgreSQLLiquid-DSPTimed Node.js CompilationRemhosIntel Open Image DenoiseLaghosTimed Linux Kernel Compilation

Microsoft Azure HBv4 HPC Comparison Benchmarksheffte: r2c - FFTW - double - 512heffte: r2c - Stock - float - 512hpcg: 144 144 144 - 60heffte: r2c - Stock - double-long - 512heffte: r2c - FFTW - float - 256heffte: c2c - FFTW - double - 128hpcg: 160 160 160 - 60heffte: c2c - FFTW - float - 256npb: BT.Cnpb: CG.Cnpb: EP.Dnpb: FT.Cnpb: IS.Dnpb: MG.Conednn: IP Shapes 1D - f32 - CPUheffte: r2c - Stock - double - 256npb: SP.Cheffte: c2c - FFTW - float - 512namd: ATPase Simulation - 327,506 Atomsheffte: r2c - FFTW - float - 512libxsmm: 128heffte: c2c - FFTW - double - 256libxsmm: 256heffte: c2c - Stock - double-long - 512libxsmm: 32heffte: r2c - FFTW - double - 256libxsmm: 64heffte: r2c - Stock - float - 256laghos: Triple Point Problemheffte: c2c - Stock - double - 128laghos: Sedov Blast Wave, ube_922_hex.meshheffte: c2c - Stock - double - 512heffte: c2c - FFTW - double - 512onednn: IP Shapes 3D - f32 - CPUheffte: r2c - FFTW - double-long - 512heffte: c2c - Stock - float - 256heffte: c2c - Stock - float-long - 512heffte: r2c - FFTW - double-long - 256heffte: c2c - FFTW - double-long - 512heffte: c2c - Stock - float-long - 256heffte: c2c - FFTW - double-long - 128heffte: c2c - FFTW - double-long - 256heffte: r2c - FFTW - float-long - 512heffte: r2c - FFTW - float-long - 256heffte: c2c - FFTW - float-long - 512heffte: c2c - FFTW - float-long - 256onednn: Convolution Batch Shapes Auto - f32 - CPUheffte: c2c - Stock - double-long - 256heffte: c2c - Stock - float - 512heffte: r2c - Stock - double-long - 256heffte: c2c - Stock - double - 256onednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUheffte: r2c - Stock - double - 512onednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUmt-dgemm: Sustained Floating-Point Rateremhos: Sample Remap Examplepennant: sedovbigpennant: leblancbigcompress-7zip: Compression Ratingcompress-7zip: Decompression Ratingbuild-linux-kernel: allmodconfigheffte: r2c - Stock - float-long - 256blender: BMW27 - CPU-Onlyblender: Classroom - CPU-Onlyblender: Fishy Cat - CPU-Onlyblender: Barbershop - CPU-Onlyblender: Pabellon Barcelona - CPU-Onlyoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyospray: particle_volume/ao/real_timeospray: particle_volume/scivis/real_timeospray: particle_volume/pathtracer/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/pathtracer/real_timeheffte: r2c - Stock - float-long - 512build-nodejs: Time To Compileliquid-dsp: 1 - 256 - 32liquid-dsp: 32 - 256 - 32liquid-dsp: 32 - 256 - 57liquid-dsp: 128 - 256 - 32liquid-dsp: 128 - 256 - 57liquid-dsp: 176 - 256 - 32liquid-dsp: 176 - 256 - 57liquid-dsp: 176 - 256 - 512pgbench: 1 - 500 - Read Onlypgbench: 1 - 500 - Read Only - Average Latencypgbench: 1 - 800 - Read Onlypgbench: 1 - 800 - Read Only - Average Latencypetsc: Streamshpcg: 104 104 104 - 60HBv4HBv3HBv2HC314.336596.22688.5160311.267442.82980.251487.9013256.349151067.8140326.295985.7569051.635870.00108125.860.752929264.95468819.34355.8550.14292622.5806585.6123.3916983.2154.5685006.8261.9035719.0459.918228.1587.6623402.94154.648159.1750.306141315.982244.342323.696273.121159.258247.72585.0078122.981624.951427.101355.512255.9680.276472123.408323.356258.716121.6050.582806535.853311.803401.855533.494411.23453.17569115.3703.5813912.12207410322677279951681.255467.7189.9725.2613.9696.7733.403.083.131.2936.612136.5671208.33838.076437.091832.7911590.925150.55835362667111330000013905400004426300000516823333361222333336758166667205823333331398460.15931230420.256598417.695789.3840121.283232.16638.9739118.236198.66059.381139.1106103.514762427.8621551.482879.0836619.292793.5546705.470.910091102.704631024.76135.6940.27115254.2522284.639.81172032.156.26901506.3103.24572435.6214.063192.7450.6068361.8156.216157.33070.624233120.957103.409124.595106.63257.2263105.36156.869339.3709257.419221.861135.950105.0930.55674138.5694123.242105.500338.44611.40862860.975117.731533.496886.810529.97325.10487615.2566.2771073.6493175582903975051889.463207.97419.4951.0825.47189.3062.641.681.690.7924.458624.1736168.24211.748511.184514.6067233.797185.567328173339173366671086000000336673333335163000003419533333356343333373537000023750050.21024076020.332284001.916239.609391.4802190.94936.086695.1989203.77259.424436.016791.538366829.1822314.023222.8241977.691884.2243410.711.4075893.313732495.8995.88010.26385191.7751519.550.90321444.246.9289195.191.9186411.7205.206183.8251.3955345.1446.979447.60506.8382591.429691.260193.257388.608147.369692.129061.140351.1954191.141200.03596.494190.78830.57387850.075993.792392.388350.70701.610021345.1494.5301896.8131367.73910.9375.89990314.9315.9158053.4668854894563710441782.933211.41819.4650.8626.19210.1864.142.082.031.0422.333622.1533157.1338.673278.1235613.9151189.208194.3673321166710614333331193400000392593333340459333334027100000410670000082565333324662490.20324396500.328197895.471737.041060.8804110.04925.865959.8954123.63259.144225.563558.356728794.2814356.201642.0320188.891181.4819508.000.88244660.572712907.5462.97500.52650114.0251328.430.1190898.831.5846379.957.3101731.6134.760156.5241.7345247.4931.571833.51932.0792060.820459.729257.920357.129033.554559.552758.912530.2175113.940122.77262.902758.54983.1112130.267257.764360.887230.16631.24480707.35359.8216450.247707.322442.47114.34083027.37825.0195610.645482107321481931950.626131.96250.53138.8172.57524.86176.211.821.840.888.975478.9702086.57349.494218.9872310.0490110.197330.61331796333964423333721290909151260000015724000001566133333166473333352921333313548770.36911618000.688151286.249125.9971OpenBenchmarking.org

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.05, N = 3SE +/- 0.15, N = 3SE +/- 0.86, N = 3SE +/- 0.50, N = 360.8891.48121.28314.341. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.06, N = 3SE +/- 2.04, N = 3SE +/- 1.85, N = 3SE +/- 2.14, N = 3110.05190.95232.17596.231. (CXX) g++ options: -O3 -pthread

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.11, N = 325.8736.0938.9788.521. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.03, N = 3SE +/- 0.16, N = 3SE +/- 0.49, N = 3SE +/- 0.81, N = 359.9095.20118.24311.271. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256HCHBv2HBv3HBv4100200300400500SE +/- 0.52, N = 3SE +/- 1.85, N = 3SE +/- 5.11, N = 15SE +/- 14.97, N = 12123.63203.77198.66442.831. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128HCHBv2HBv3HBv420406080100SE +/- 0.65, N = 5SE +/- 1.72, N = 15SE +/- 1.84, N = 15SE +/- 3.67, N = 1559.1459.4259.3880.251. (CXX) g++ options: -O3 -pthread

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.06, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.12, N = 325.5636.0239.1187.901. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.07, N = 3SE +/- 0.67, N = 15SE +/- 1.41, N = 15SE +/- 1.07, N = 358.3691.54103.51256.351. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHCHBv2HBv3HBv430K60K90K120K150KSE +/- 15.19, N = 3SE +/- 32.07, N = 3SE +/- 36.56, N = 3SE +/- 760.56, N = 328794.2866829.1862427.86151067.811. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHCHBv2HBv3HBv49K18K27K36K45KSE +/- 233.39, N = 15SE +/- 108.02, N = 3SE +/- 20.87, N = 3SE +/- 77.41, N = 314356.2022314.0221551.4840326.291. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.DHCHBv2HBv3HBv413002600390052006500SE +/- 1.76, N = 3SE +/- 32.15, N = 6SE +/- 80.22, N = 12SE +/- 37.41, N = 31642.033222.822879.085985.751. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHCHBv2HBv3HBv415K30K45K60K75KSE +/- 13.57, N = 3SE +/- 219.43, N = 3SE +/- 194.34, N = 3SE +/- 745.61, N = 320188.8941977.6936619.2969051.631. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHCHBv2HBv3HBv413002600390052006500SE +/- 2.10, N = 3SE +/- 11.15, N = 3SE +/- 22.55, N = 3SE +/- 17.88, N = 31181.481884.222793.555870.001. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHCHBv2HBv3HBv420K40K60K80K100KSE +/- 24.47, N = 3SE +/- 354.81, N = 3SE +/- 613.84, N = 15SE +/- 748.94, N = 1319508.0043410.7146705.47108125.861. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv40.31670.63340.95011.26681.5835SE +/- 0.000702, N = 3SE +/- 0.014464, N = 3SE +/- 0.013826, N = 12SE +/- 0.001421, N = 30.8824461.4075800.9100910.752929MIN: 0.83MIN: 1.11MIN: 0.691. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.08, N = 3SE +/- 1.10, N = 4SE +/- 0.80, N = 15SE +/- 4.27, N = 1260.5793.31102.70264.951. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHCHBv2HBv3HBv415K30K45K60K75KSE +/- 12.00, N = 3SE +/- 34.59, N = 3SE +/- 273.09, N = 8SE +/- 954.46, N = 1212907.5432495.8931024.7668819.341. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HCHBv2HBv3HBv480160240320400SE +/- 0.04, N = 3SE +/- 0.47, N = 3SE +/- 0.93, N = 3SE +/- 1.24, N = 362.9895.88135.69355.861. (CXX) g++ options: -O3 -pthread

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHCHBv2HBv3HBv40.11850.2370.35550.4740.5925SE +/- 0.00096, N = 3SE +/- 0.00045, N = 3SE +/- 0.00027, N = 3SE +/- 0.00035, N = 30.526500.263850.271150.14292

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.09, N = 3SE +/- 1.03, N = 3SE +/- 2.52, N = 6SE +/- 2.25, N = 3114.03191.78254.25622.581. (CXX) g++ options: -O3 -pthread

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HCHBv2HBv3HBv414002800420056007000SE +/- 11.02, N = 3SE +/- 153.42, N = 6SE +/- 29.40, N = 3SE +/- 59.85, N = 31328.41519.52284.66585.6

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256HCHBv2HBv3HBv4306090120150SE +/- 0.08, N = 3SE +/- 0.55, N = 3SE +/- 0.14, N = 3SE +/- 1.65, N = 330.1250.9039.81123.391. (CXX) g++ options: -O3 -pthread

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HCHBv2HBv3HBv415003000450060007500SE +/- 13.41, N = 12SE +/- 51.69, N = 9SE +/- 23.34, N = 3SE +/- 63.60, N = 3898.81444.22032.16983.2

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv4306090120150SE +/- 0.02, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 331.5846.9356.27154.571. (CXX) g++ options: -O3 -pthread

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HCHBv2HBv3HBv411002200330044005500SE +/- 2.82, N = 11SE +/- 3.90, N = 12SE +/- 32.59, N = 14SE +/- 443.26, N = 12379.9195.11506.35006.8

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.25, N = 3SE +/- 1.31, N = 3SE +/- 0.75, N = 15SE +/- 5.66, N = 1557.3191.92103.25261.901. (CXX) g++ options: -O3 -pthread

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HCHBv2HBv3HBv412002400360048006000SE +/- 5.15, N = 15SE +/- 18.03, N = 13SE +/- 17.54, N = 12SE +/- 226.33, N = 12731.6411.72435.65719.0

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 256HCHBv2HBv3HBv4100200300400500SE +/- 0.57, N = 3SE +/- 2.79, N = 12SE +/- 5.19, N = 15SE +/- 14.34, N = 15134.76205.21214.06459.921. (CXX) g++ options: -O3 -pthread

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHCHBv2HBv3HBv450100150200250SE +/- 0.08, N = 3SE +/- 0.57, N = 3SE +/- 0.38, N = 3SE +/- 1.25, N = 3156.52183.82192.74228.151. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 128HCHBv2HBv3HBv420406080100SE +/- 0.30, N = 3SE +/- 1.33, N = 15SE +/- 1.12, N = 15SE +/- 3.68, N = 1441.7351.4050.6187.661. (CXX) g++ options: -O3 -pthread

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHCHBv2HBv3HBv490180270360450SE +/- 1.35, N = 3SE +/- 3.57, N = 5SE +/- 0.15, N = 3SE +/- 0.78, N = 3247.49345.14361.81402.941. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HCHBv2HBv3HBv4306090120150SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.27, N = 331.5746.9856.22154.651. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HCHBv2HBv3HBv44080120160200SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.34, N = 333.5247.6157.33159.181. (CXX) g++ options: -O3 -pthread

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv4246810SE +/- 0.093711, N = 12SE +/- 0.032665, N = 3SE +/- 0.039917, N = 15SE +/- 0.002422, N = 32.0792006.8382500.6242330.306141MIN: 5.971. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.06, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 3SE +/- 1.65, N = 360.8291.43120.96315.981. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HCHBv2HBv3HBv450100150200250SE +/- 0.02, N = 3SE +/- 0.61, N = 15SE +/- 0.77, N = 15SE +/- 3.04, N = 459.7391.26103.41244.341. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.06, N = 3SE +/- 0.23, N = 3SE +/- 0.05, N = 3SE +/- 0.96, N = 357.9293.26124.60323.701. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.12, N = 3SE +/- 1.12, N = 15SE +/- 1.05, N = 3SE +/- 4.03, N = 1457.1388.61106.63273.121. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv44080120160200SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 333.5547.3757.23159.261. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv450100150200250SE +/- 0.27, N = 3SE +/- 1.33, N = 3SE +/- 1.07, N = 6SE +/- 4.85, N = 1559.5592.13105.36247.731. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128HCHBv2HBv3HBv420406080100SE +/- 0.23, N = 3SE +/- 1.30, N = 15SE +/- 0.34, N = 3SE +/- 4.77, N = 1558.9161.1456.8785.011. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv4306090120150SE +/- 0.05, N = 3SE +/- 0.57, N = 3SE +/- 0.33, N = 3SE +/- 1.21, N = 1530.2251.2039.37122.981. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.18, N = 3SE +/- 1.39, N = 3SE +/- 2.91, N = 3SE +/- 4.23, N = 3113.94191.14257.42624.951. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv490180270360450SE +/- 0.53, N = 3SE +/- 3.34, N = 12SE +/- 3.45, N = 15SE +/- 10.91, N = 15122.77200.04221.86427.101. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv480160240320400SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.58, N = 3SE +/- 1.18, N = 362.9096.49135.95355.511. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.16, N = 3SE +/- 0.74, N = 15SE +/- 1.13, N = 3SE +/- 3.64, N = 1558.5590.79105.09255.971. (CXX) g++ options: -O3 -pthread

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv40.71.42.12.83.5SE +/- 0.015370, N = 3SE +/- 0.002431, N = 3SE +/- 0.001799, N = 3SE +/- 0.000440, N = 33.1112100.5738780.5567410.276472MIN: 1.73MIN: 0.47MIN: 0.51. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv4306090120150SE +/- 0.03, N = 3SE +/- 0.55, N = 3SE +/- 0.14, N = 3SE +/- 1.16, N = 330.2750.0838.57123.411. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.02, N = 3SE +/- 0.34, N = 3SE +/- 0.73, N = 3SE +/- 0.80, N = 357.7693.79123.24323.361. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.19, N = 3SE +/- 1.27, N = 3SE +/- 0.81, N = 15SE +/- 2.84, N = 1560.8992.39105.50258.721. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 256HCHBv2HBv3HBv4306090120150SE +/- 0.08, N = 3SE +/- 0.29, N = 3SE +/- 0.29, N = 11SE +/- 1.20, N = 330.1750.7138.45121.611. (CXX) g++ options: -O3 -pthread

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv40.36230.72461.08691.44921.8115SE +/- 0.002723, N = 3SE +/- 0.021847, N = 3SE +/- 0.003506, N = 3SE +/- 0.001551, N = 31.2448001.6100201.4086200.582806MIN: 1.22MIN: 1.49MIN: 1.36MIN: 0.561. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv430060090012001500SE +/- 1.60, N = 3SE +/- 13.31, N = 3SE +/- 3.89, N = 3SE +/- 3.26, N = 3707.351345.14860.98535.85MIN: 689.52MIN: 1237.17MIN: 814.31MIN: 521.121. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.05, N = 3SE +/- 0.25, N = 3SE +/- 0.40, N = 3SE +/- 1.60, N = 359.8294.53117.73311.801. (CXX) g++ options: -O3 -pthread

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv42004006008001000SE +/- 4.72, N = 3SE +/- 9.52, N = 15SE +/- 4.61, N = 15SE +/- 1.40, N = 3450.25896.81533.50401.86MIN: 432.99MIN: 388.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHCHBv2HBv3HBv430060090012001500SE +/- 1.51, N = 3SE +/- 13.52, N = 15SE +/- 6.66, N = 3SE +/- 1.90, N = 3707.321367.73886.81533.49MIN: 687.14MIN: 849.06MIN: 518.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHCHBv2HBv3HBv42004006008001000SE +/- 1.89, N = 3SE +/- 9.54, N = 15SE +/- 4.36, N = 3SE +/- 3.60, N = 8442.47910.94529.97411.23MIN: 429.93MIN: 469.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHCHBv2HBv3HBv41224364860SE +/- 0.199669, N = 15SE +/- 0.272351, N = 15SE +/- 0.132089, N = 3SE +/- 0.359007, N = 314.3408305.89990325.10487653.1756911. (CC) gcc options: -O3 -march=native -fopenmp

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap ExampleHCHBv2HBv3HBv4612182430SE +/- 0.06, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 3SE +/- 0.14, N = 327.3814.9315.2615.371. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHCHBv2HBv3HBv4612182430SE +/- 0.026763, N = 3SE +/- 0.011742, N = 3SE +/- 0.027453, N = 3SE +/- 0.018282, N = 325.0195605.9158056.2771073.5813911. (CXX) g++ options: -fopenmp -pthread -lmpi

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHCHBv2HBv3HBv43691215SE +/- 0.017495, N = 3SE +/- 0.009233, N = 3SE +/- 0.006682, N = 3SE +/- 0.029043, N = 310.6454803.4668853.6493172.1220741. (CXX) g++ options: -fopenmp -pthread -lmpi

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHCHBv2HBv3HBv4200K400K600K800K1000KSE +/- 748.55, N = 3SE +/- 2650.49, N = 3SE +/- 6724.92, N = 3SE +/- 7680.08, N = 1521073248945655829010322671. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHCHBv2HBv3HBv4160K320K480K640K800KSE +/- 256.58, N = 3SE +/- 2438.40, N = 3SE +/- 19127.89, N = 3SE +/- 8360.33, N = 151481933710443975057279951. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested or alternatively an allmodconfig for building all possible kernel modules for the build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigHCHBv2HBv3HBv4400800120016002000SE +/- 7.59, N = 3SE +/- 22.46, N = 3SE +/- 22.02, N = 3SE +/- 32.03, N = 91950.631782.931889.461681.26

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv4100200300400500SE +/- 0.90, N = 3SE +/- 2.37, N = 15SE +/- 7.34, N = 15SE +/- 17.46, N = 12131.96211.42207.97467.721. (CXX) g++ options: -O3 -pthread

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHCHBv2HBv3HBv41122334455SE +/- 0.65, N = 15SE +/- 0.11, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 350.5319.4619.499.97

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHCHBv2HBv3HBv4306090120150SE +/- 0.49, N = 3SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.11, N = 3138.8150.8651.0825.26

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHCHBv2HBv3HBv41632486480SE +/- 0.48, N = 3SE +/- 0.10, N = 3SE +/- 0.08, N = 3SE +/- 0.14, N = 372.5726.1925.4713.96

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHCHBv2HBv3HBv4110220330440550SE +/- 2.13, N = 3SE +/- 0.01, N = 3SE +/- 0.45, N = 3SE +/- 0.12, N = 3524.86210.18189.3096.77

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHCHBv2HBv3HBv44080120160200SE +/- 1.13, N = 3SE +/- 0.10, N = 3SE +/- 0.24, N = 3SE +/- 0.06, N = 3176.2164.1462.6433.40

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHCHBv2HBv3HBv40.6931.3862.0792.7723.465SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 31.822.081.683.08

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHCHBv2HBv3HBv40.70431.40862.11292.81723.5215SE +/- 0.01, N = 3SE +/- 0.02, N = 9SE +/- 0.01, N = 15SE +/- 0.01, N = 31.842.031.693.13

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHCHBv2HBv3HBv40.29030.58060.87091.16121.4515SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 30.881.040.791.29

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.01225, N = 3SE +/- 0.00495, N = 3SE +/- 0.01755, N = 3SE +/- 0.04053, N = 38.9754722.3336024.4586036.61210

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.00763, N = 3SE +/- 0.01671, N = 3SE +/- 0.01956, N = 3SE +/- 0.03598, N = 38.9702022.1533024.1736036.56710

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHCHBv2HBv3HBv450100150200250SE +/- 8.14, N = 12SE +/- 3.07, N = 12SE +/- 0.23, N = 3SE +/- 0.07, N = 386.57157.13168.24208.34

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHCHBv2HBv3HBv4918273645SE +/- 0.02906, N = 3SE +/- 0.13915, N = 12SE +/- 0.03837, N = 3SE +/- 0.03610, N = 39.494218.6732711.7485038.07640

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHCHBv2HBv3HBv4918273645SE +/- 0.03491, N = 3SE +/- 0.12026, N = 15SE +/- 0.01165, N = 3SE +/- 0.11164, N = 38.987238.1235611.1845037.09180

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 310.0513.9214.6132.79

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.10, N = 3SE +/- 1.02, N = 3SE +/- 0.15, N = 3SE +/- 2.49, N = 3110.20189.21233.80590.931. (CXX) g++ options: -O3 -pthread

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHCHBv2HBv3HBv470140210280350SE +/- 2.37, N = 3SE +/- 1.32, N = 3SE +/- 1.46, N = 3SE +/- 2.23, N = 12330.61194.37185.57150.56

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 1 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv48M16M24M32M40MSE +/- 1333.33, N = 3SE +/- 2185.81, N = 3SE +/- 4096.07, N = 3SE +/- 20201.76, N = 3317963333321166732817333353626671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv4200M400M600M800M1000MSE +/- 3947135.39, N = 3SE +/- 33333.33, N = 3SE +/- 2475306.94, N = 3SE +/- 1950213.66, N = 3964423333106143333391733666711133000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 57HCHBv2HBv3HBv4300M600M900M1200M1500MSE +/- 5360840.75, N = 11SE +/- 472581.56, N = 3SE +/- 550757.05, N = 3SE +/- 14294460.47, N = 57212909091193400000108600000013905400001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv4900M1800M2700M3600M4500MSE +/- 8213606.60, N = 3SE +/- 3602930.91, N = 3SE +/- 5345506.94, N = 3SE +/- 3774034.09, N = 315126000003925933333336673333344263000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HCHBv2HBv3HBv41100M2200M3300M4400M5500MSE +/- 8373967.60, N = 3SE +/- 4421286.89, N = 3SE +/- 6947661.48, N = 3SE +/- 10401335.38, N = 315724000004045933333351630000051682333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv41300M2600M3900M5200M6500MSE +/- 2852094.75, N = 3SE +/- 44818002.34, N = 3SE +/- 8912600.32, N = 3SE +/- 9214903.39, N = 315661333334027100000341953333361222333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HCHBv2HBv3HBv41400M2800M4200M5600M7000MSE +/- 5446813.54, N = 3SE +/- 13588352.86, N = 3SE +/- 4247482.91, N = 3SE +/- 11394345.58, N = 316647333334106700000356343333367581666671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HCHBv2HBv3HBv4400M800M1200M1600M2000MSE +/- 6341443.93, N = 3SE +/- 3174614.59, N = 3SE +/- 3040334.41, N = 3SE +/- 4603018.33, N = 352921333382565333373537000020582333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHCHBv2HBv3HBv4700K1400K2100K2800K3500KSE +/- 3475.53, N = 3SE +/- 8486.11, N = 3SE +/- 4803.91, N = 3SE +/- 4762.10, N = 313548772466249237500531398461. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHCHBv2HBv3HBv40.0830.1660.2490.3320.415SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.000, N = 3SE +/- 0.000, N = 30.3690.2030.2100.1591. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHCHBv2HBv3HBv4700K1400K2100K2800K3500KSE +/- 4936.18, N = 3SE +/- 4115.38, N = 3SE +/- 11149.78, N = 3SE +/- 20304.79, N = 311618002439650240760231230421. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHCHBv2HBv3HBv40.15480.30960.46440.61920.774SE +/- 0.003, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 30.6880.3280.3320.2561. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PETSc

PETSc, the Portable, Extensible Toolkit for Scientific Computation, is for the scalable (parallel) solution of scientific applications modeled by partial differential equations. This test profile runs the PETSc "make streams" benchmark and records the throughput rate when all available cores are utilized for the MPI Streams build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHCHBv2HBv3HBv4130K260K390K520K650KSE +/- 256.75, N = 3SE +/- 12025.83, N = 6SE +/- 2674.31, N = 7SE +/- 46271.80, N = 9151286.25197895.47284001.92598417.701. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.26, N = 326.0037.0439.6189.381. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

95 Results Shown

HeFFTe - Highly Efficient FFT for Exascale:
  r2c - FFTW - double - 512
  r2c - Stock - float - 512
High Performance Conjugate Gradient
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - Stock - double-long - 512
  r2c - FFTW - float - 256
  c2c - FFTW - double - 128
High Performance Conjugate Gradient
HeFFTe - Highly Efficient FFT for Exascale
NAS Parallel Benchmarks:
  BT.C
  CG.C
  EP.D
  FT.C
  IS.D
  MG.C
oneDNN
HeFFTe - Highly Efficient FFT for Exascale
NAS Parallel Benchmarks
HeFFTe - Highly Efficient FFT for Exascale
NAMD
HeFFTe - Highly Efficient FFT for Exascale
libxsmm
HeFFTe - Highly Efficient FFT for Exascale
libxsmm
HeFFTe - Highly Efficient FFT for Exascale
libxsmm
HeFFTe - Highly Efficient FFT for Exascale
libxsmm
HeFFTe - Highly Efficient FFT for Exascale
Laghos
HeFFTe - Highly Efficient FFT for Exascale
Laghos
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - Stock - double - 512
  c2c - FFTW - double - 512
oneDNN
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - FFTW - double-long - 512
  c2c - Stock - float - 256
  c2c - Stock - float-long - 512
  r2c - FFTW - double-long - 256
  c2c - FFTW - double-long - 512
  c2c - Stock - float-long - 256
  c2c - FFTW - double-long - 128
  c2c - FFTW - double-long - 256
  r2c - FFTW - float-long - 512
  r2c - FFTW - float-long - 256
  c2c - FFTW - float-long - 512
  c2c - FFTW - float-long - 256
oneDNN
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - Stock - double-long - 256
  c2c - Stock - float - 512
  r2c - Stock - double-long - 256
  c2c - Stock - double - 256
oneDNN:
  Deconvolution Batch shapes_3d - f32 - CPU
  Recurrent Neural Network Training - f32 - CPU
HeFFTe - Highly Efficient FFT for Exascale
oneDNN:
  Recurrent Neural Network Inference - f32 - CPU
  Recurrent Neural Network Training - bf16bf16bf16 - CPU
  Recurrent Neural Network Inference - bf16bf16bf16 - CPU
ACES DGEMM
Remhos
Pennant:
  sedovbig
  leblancbig
7-Zip Compression:
  Compression Rating
  Decompression Rating
Timed Linux Kernel Compilation
HeFFTe - Highly Efficient FFT for Exascale
Blender:
  BMW27 - CPU-Only
  Classroom - CPU-Only
  Fishy Cat - CPU-Only
  Barbershop - CPU-Only
  Pabellon Barcelona - CPU-Only
Intel Open Image Denoise:
  RT.hdr_alb_nrm.3840x2160 - CPU-Only
  RT.ldr_alb_nrm.3840x2160 - CPU-Only
  RTLightmap.hdr.4096x4096 - CPU-Only
OSPRay:
  particle_volume/ao/real_time
  particle_volume/scivis/real_time
  particle_volume/pathtracer/real_time
  gravity_spheres_volume/dim_512/ao/real_time
  gravity_spheres_volume/dim_512/scivis/real_time
  gravity_spheres_volume/dim_512/pathtracer/real_time
HeFFTe - Highly Efficient FFT for Exascale
Timed Node.js Compilation
Liquid-DSP:
  1 - 256 - 32
  32 - 256 - 32
  32 - 256 - 57
  128 - 256 - 32
  128 - 256 - 57
  176 - 256 - 32
  176 - 256 - 57
  176 - 256 - 512
PostgreSQL:
  1 - 500 - Read Only
  1 - 500 - Read Only - Average Latency
  1 - 800 - Read Only
  1 - 800 - Read Only - Average Latency
PETSc
High Performance Conjugate Gradient