Microsoft Azure HBv4 HPC Performance Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2308011-PTS-AZUREHBV71
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results

Limit displaying results to tests within:

C/C++ Compiler Tests 2 Tests
CPU Massive 7 Tests
Creator Workloads 4 Tests
Fortran Tests 4 Tests
Game Development 2 Tests
HPC - High Performance Computing 6 Tests
Molecular Dynamics 2 Tests
MPI Benchmarks 3 Tests
Multi-Core 12 Tests
Intel oneAPI 3 Tests
OpenMPI Tests 5 Tests
Programmer / Developer System Benchmarks 2 Tests
Python Tests 2 Tests
Renderers 2 Tests
Scientific Computing 3 Tests
Server CPU Tests 5 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Additional Graphs

Show Perf Per Core/Thread Calculation Graphs Where Applicable

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
HC
July 27 2023
  5 Hours, 11 Minutes
HBv2
July 27 2023
  7 Hours, 22 Minutes
HBv3
July 27 2023
  6 Hours, 40 Minutes
HBv4
July 26 2023
  6 Hours, 44 Minutes
Invert Hiding All Results Option
  6 Hours, 29 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


Microsoft Azure HBv4 HPC Performance BenchmarksProcessorMotherboardMemoryDiskGraphicsOSKernelCompilerFile-SystemScreen ResolutionSystem LayerHCHBv2HBv3HBv42 x Intel Xeon Platinum 8168 (44 Cores)Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS)1 GB + 60928 MB + 118272 MB + 176 GB32GB Virtual Disk + 752GB Virtual Diskhyperv_fbAlmaLinux 8.84.18.0-425.3.1.el8.x86_64 (x86_64)GCC 13.1.0 + CUDA 12.1nfs1024x768microsoft2 x AMD EPYC 7V12 64-Core (120 Cores)1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 7V73X 64-Core (120 Cores)2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 9V33X 96-Core (176 Cores)1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual DiskOpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysEnvironment Details- CFLAGS="-O3 -march=native" CXXFLAGS="-O3 -march=native"Compiler Details- --disable-multilib --enable-checking=releaseProcessor Details- CPU Microcode: 0xffffffffPython Details- Python 3.6.8Security Details- HC: itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Vulnerable + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown - HBv2: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT disabled + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv3: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - HBv4: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

HCHBv2HBv3HBv4Logarithmic Result OverviewPhoronix Test SuiteACES DGEMMPennantNAS Parallel BenchmarksBlender7-Zip CompressionHeFFTe - Highly Efficient FFT for ExascalePETScLiquid-DSPNAMDOSPRayHigh Performance Conjugate GradientPostgreSQLoneDNNTimed Node.js CompilationlibxsmmIntel Open Image DenoiseLaghos

HCHBv2HBv3HBv4Logarithmic Per Dollar Result OverviewPhoronix Test SuiteACES DGEMMPETScNAS Parallel Benchmarkslibxsmm7-Zip CompressionHeFFTe - Highly Efficient FFT for ExascaleLiquid-DSPHigh Performance Conjugate GradientPostgreSQLLaghosOSPRayIntel Open Image DenoiseP.D.G.MP.D.G.MP.D.G.MP.D.G.MP.D.G.MP.D.G.MP.D.G.MP.D.G.MP.D.G.MP.D.G.MP.D.G.MP.D.G.M

Microsoft Azure HBv4 HPC Performance Benchmarkshpcg: 104 104 104 - 60hpcg: 144 144 144 - 60hpcg: 160 160 160 - 60npb: BT.Cnpb: CG.Cnpb: FT.Cnpb: IS.Dnpb: MG.Cnpb: SP.Cnamd: ATPase Simulation - 327,506 Atomslibxsmm: 128libxsmm: 256libxsmm: 32libxsmm: 64laghos: Triple Point Problemlaghos: Sedov Blast Wave, ube_922_hex.meshheffte: c2c - FFTW - float - 256heffte: c2c - FFTW - float - 512heffte: r2c - FFTW - float - 512heffte: c2c - FFTW - double - 512heffte: c2c - Stock - float - 256heffte: c2c - Stock - float - 512heffte: r2c - FFTW - double - 256heffte: r2c - FFTW - double - 512heffte: r2c - Stock - float - 512heffte: c2c - Stock - double - 512heffte: r2c - Stock - double - 256heffte: r2c - Stock - double - 512heffte: c2c - FFTW - float-long - 256heffte: c2c - FFTW - float-long - 512heffte: r2c - FFTW - float-long - 256heffte: r2c - FFTW - float-long - 512heffte: c2c - FFTW - double-long - 512heffte: c2c - Stock - float-long - 256heffte: c2c - Stock - float-long - 512heffte: r2c - FFTW - double-long - 256heffte: r2c - FFTW - double-long - 512heffte: r2c - Stock - float-long - 512heffte: c2c - Stock - double-long - 512heffte: r2c - Stock - double-long - 256heffte: r2c - Stock - double-long - 512pennant: sedovbigpennant: leblancbigmt-dgemm: Sustained Floating-Point Rateoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyospray: particle_volume/ao/real_timeospray: particle_volume/scivis/real_timeospray: particle_volume/pathtracer/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/pathtracer/real_timecompress-7zip: Compression Ratingcompress-7zip: Decompression Ratingbuild-nodejs: Time To Compileonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUliquid-dsp: 128 - 256 - 57liquid-dsp: 176 - 256 - 32liquid-dsp: 176 - 256 - 57liquid-dsp: 176 - 256 - 512pgbench: 1 - 500 - Read Onlypgbench: 1 - 500 - Read Only - Average Latencypgbench: 1 - 800 - Read Onlypgbench: 1 - 800 - Read Only - Average Latencyblender: BMW27 - CPU-Onlyblender: Classroom - CPU-Onlyblender: Fishy Cat - CPU-Onlyblender: Barbershop - CPU-Onlyblender: Pabellon Barcelona - CPU-Onlypetsc: StreamsHCHBv2HBv3HBv425.997125.865925.5635106230.5227619.0555288.191864.6863404.0141543.940.526971284.8904.1384.9748.1156.52247.4958.356762.9750114.02533.519359.729257.764357.310160.8804110.04931.571860.572759.821658.549862.9027122.772113.94033.554559.552757.920357.129060.8204110.19731.584660.887259.895425.0195610.6454814.0720271.851.850.878.996188.8783196.76309.522939.0268910.0611216451150841330.613707.322442.47115706333331536633333168303333354462666713535100.36911594920.69049.95138.5171.76526.93175.07151286.249137.041036.086636.0167241509.8836367.3598485.233977.02108985.72104771.900.265051011.41128.3164.8331.4183.82345.1491.538395.8801191.77547.605091.260193.792391.918691.4802190.94946.979493.313794.530190.788396.4941200.035191.14147.369692.129093.257388.608191.4296189.20846.928992.388395.19895.9158053.4668856.3954152.032.010.9622.366822.1747162.4498.668888.3232313.9416501534388577194.3671367.73910.93743091333334275533333435010000092424333324673280.20324813200.32319.5850.9526.43211.4664.84197895.471739.609338.973939.1106313813.9836681.43102122.365730.01131635.41205795.590.271112273.52045.71438.12413.7192.74361.81103.5147135.694254.25257.3307103.409123.242103.2457121.283232.16656.2161102.7046117.731105.093135.950221.861257.41957.2263105.361124.595106.632120.957233.79756.2690105.5003118.2366.2771073.64931725.0483521.721.690.8024.471024.2197167.50411.750111.172314.6088566595406516185.567886.810529.97342169666673864000000428153333381495000024347490.20624789170.32319.4350.7125.59188.9662.90284001.916289.384088.516087.9013744413.9074101.94230164.7912967.37437417.16427298.990.143806655.26908.66163.05898.2228.15402.94256.349355.855622.580159.175244.342323.356261.903314.336596.226154.648264.954311.803255.968355.512427.101624.951159.258247.725323.696273.121315.982590.925154.568258.716311.2673.5813912.12207452.8024403.113.081.3236.654836.5446208.05038.076937.062432.58391083523742859150.558533.494411.234541290000061817666677095033333222196666731618480.15831461730.25410.1125.6113.7497.5233.01598417.6957OpenBenchmarking.org

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.26, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 389.3839.6137.0426.001. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHigh Performance Conjugate Gradient 3.1Performance Per Core - X Y Z: 104 104 104 - RT: 60HCHBv4HBv3HBv20.13290.26580.39870.53160.66450.59080.50790.33010.30871. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HBv4HBv3HBv2HC20406080100Min: 88.87 / Avg: 89.38 / Max: 89.68Min: 39.56 / Avg: 39.61 / Max: 39.68Min: 36.98 / Avg: 37.04 / Max: 37.08Min: 25.95 / Avg: 26 / Max: 26.021. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.11, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.05, N = 388.5238.9736.0925.871. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHigh Performance Conjugate Gradient 3.1Performance Per Core - X Y Z: 144 144 144 - RT: 60HCHBv4HBv3HBv20.13230.26460.39690.52920.66150.58790.50290.32480.30071. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HBv4HBv3HBv2HC20406080100Min: 88.38 / Avg: 88.52 / Max: 88.73Min: 38.94 / Avg: 38.97 / Max: 39Min: 36.06 / Avg: 36.09 / Max: 36.13Min: 25.77 / Avg: 25.87 / Max: 25.941. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.12, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 387.9039.1136.0225.561. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHigh Performance Conjugate Gradient 3.1Performance Per Core - X Y Z: 160 160 160 - RT: 60HCHBv4HBv3HBv20.13070.26140.39210.52280.65350.58100.49940.32590.30011. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HBv4HBv3HBv2HC20406080100Min: 87.66 / Avg: 87.9 / Max: 88.05Min: 39.07 / Avg: 39.11 / Max: 39.15Min: 36.01 / Avg: 36.02 / Max: 36.03Min: 25.44 / Avg: 25.56 / Max: 25.651. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHBv4HBv3HBv2HC160K320K480K640K800KSE +/- 6061.11, N = 3SE +/- 2034.04, N = 3SE +/- 108.10, N = 3SE +/- 62.47, N = 3744413.90313813.98241509.88106230.521. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenBenchmarking.orgTotal Mop/s Per Core, More Is BetterNAS Parallel Benchmarks 3.4Performance Per Core - Test / Class: BT.CHBv4HBv3HCHBv290018002700360045004229.622615.122414.332012.581. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHBv4HBv3HBv2HC130K260K390K520K650KMin: 733730.44 / Avg: 744413.9 / Max: 754716.38Min: 311190.27 / Avg: 313813.98 / Max: 317818.23Min: 241308.65 / Avg: 241509.88 / Max: 241678.94Min: 106134.39 / Avg: 106230.52 / Max: 106347.71. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHBv4HBv3HBv2HC16K32K48K64K80KSE +/- 599.32, N = 3SE +/- 503.29, N = 3SE +/- 778.45, N = 15SE +/- 218.98, N = 374101.9436681.4336367.3527619.051. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenBenchmarking.orgTotal Mop/s Per Core, More Is BetterNAS Parallel Benchmarks 3.4Performance Per Core - Test / Class: CG.CHCHBv4HBv3HBv2140280420560700627.71421.03305.68303.061. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHBv4HBv3HBv2HC13K26K39K52K65KMin: 73046.39 / Avg: 74101.94 / Max: 75121.55Min: 35696.72 / Avg: 36681.43 / Max: 37354.55Min: 31145.32 / Avg: 36367.35 / Max: 41343.61Min: 27277.99 / Avg: 27619.05 / Max: 28027.521. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHBv4HBv3HBv2HC50K100K150K200K250KSE +/- 1773.50, N = 3SE +/- 339.33, N = 3SE +/- 320.45, N = 3SE +/- 131.36, N = 3230164.79102122.3698485.2355288.191. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenBenchmarking.orgTotal Mop/s Per Core, More Is BetterNAS Parallel Benchmarks 3.4Performance Per Core - Test / Class: FT.CHBv4HCHBv3HBv2300600900120015001307.751256.55851.02820.711. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHBv4HBv3HBv2HC40K80K120K160K200KMin: 226865.45 / Avg: 230164.79 / Max: 232942.14Min: 101535.53 / Avg: 102122.36 / Max: 102710.99Min: 97940.68 / Avg: 98485.23 / Max: 99050.19Min: 55038.7 / Avg: 55288.19 / Max: 55484.241. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHBv4HBv3HBv2HC3K6K9K12K15KSE +/- 308.75, N = 15SE +/- 67.99, N = 4SE +/- 35.84, N = 7SE +/- 7.55, N = 312967.375730.013977.021864.681. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenBenchmarking.orgTotal Mop/s Per Core, More Is BetterNAS Parallel Benchmarks 3.4Performance Per Core - Test / Class: IS.DHBv4HBv3HCHBv2163248648073.6847.7542.3833.141. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHBv4HBv3HBv2HC2K4K6K8K10KMin: 11582.83 / Avg: 12967.37 / Max: 15992.01Min: 5554.37 / Avg: 5730.01 / Max: 5864.94Min: 3808.28 / Avg: 3977.02 / Max: 4112.15Min: 1849.95 / Avg: 1864.68 / Max: 1874.891. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHBv4HBv3HBv2HC90K180K270K360K450KSE +/- 5249.92, N = 15SE +/- 1313.15, N = 15SE +/- 768.30, N = 3SE +/- 149.23, N = 3437417.16131635.41108985.7263404.011. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenBenchmarking.orgTotal Mop/s Per Core, More Is BetterNAS Parallel Benchmarks 3.4Performance Per Core - Test / Class: MG.CHBv4HCHBv3HBv250010001500200025002485.321441.001096.96908.211. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHBv4HBv3HBv2HC80K160K240K320K400KMin: 415671.76 / Avg: 437417.16 / Max: 488545.03Min: 121866.81 / Avg: 131635.41 / Max: 138438.25Min: 107819.8 / Avg: 108985.72 / Max: 110435.47Min: 63145.73 / Avg: 63404.01 / Max: 63662.671. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHBv4HBv3HBv2HC90K180K270K360K450KSE +/- 2970.97, N = 15SE +/- 1576.20, N = 3SE +/- 324.54, N = 3SE +/- 105.69, N = 3427298.99205795.59104771.9041543.941. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
OpenBenchmarking.orgTotal Mop/s Per Core, More Is BetterNAS Parallel Benchmarks 3.4Performance Per Core - Test / Class: SP.CHBv4HBv3HCHBv250010001500200025002427.841714.96944.18873.101. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHBv4HBv3HBv2HC70K140K210K280K350KMin: 407516.77 / Avg: 427298.99 / Max: 448615.1Min: 203506.4 / Avg: 205795.59 / Max: 208817.12Min: 104388.54 / Avg: 104771.9 / Max: 105417.17Min: 41392.57 / Avg: 41543.94 / Max: 41747.411. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHBv4HBv2HBv3HC0.11860.23720.35580.47440.593SE +/- 0.00011, N = 3SE +/- 0.00069, N = 3SE +/- 0.00015, N = 3SE +/- 0.00060, N = 30.143800.265050.271110.52697
OpenBenchmarking.orgdays/ns x Core, Fewer Is BetterNAMD 2.14Performance Per Core - ATPase Simulation - 327,506 AtomsHCHBv4HBv2HBv381624324023.1925.3131.8132.531. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv2: Detected core count of 1204. HBv3: Detected core count of 120
OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHBv4HBv2HBv3HC246810Min: 0.14 / Avg: 0.14 / Max: 0.14Min: 0.26 / Avg: 0.27 / Max: 0.27Min: 0.27 / Avg: 0.27 / Max: 0.27Min: 0.53 / Avg: 0.53 / Max: 0.53

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HBv4HBv3HCHBv214002800420056007000SE +/- 59.23, N = 3SE +/- 20.51, N = 9SE +/- 13.64, N = 15SE +/- 169.50, N = 96655.22273.51284.81011.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
OpenBenchmarking.orgGFLOPS/s Per Core, More Is Betterlibxsmm 2-1.17-3645Performance Per Core - M N K: 128HBv4HCHBv3HBv291827364537.8129.2018.958.431. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HBv4HBv3HCHBv212002400360048006000Min: 6537.6 / Avg: 6655.17 / Max: 6726.5Min: 2191.5 / Avg: 2273.48 / Max: 2339.4Min: 1161.5 / Avg: 1284.77 / Max: 1351.9Min: 644.4 / Avg: 1011.42 / Max: 2094.81. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HBv4HBv3HBv2HC15003000450060007500SE +/- 57.85, N = 9SE +/- 25.11, N = 4SE +/- 17.53, N = 9SE +/- 23.39, N = 96908.62045.71128.3904.11. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
OpenBenchmarking.orgGFLOPS/s Per Core, More Is Betterlibxsmm 2-1.17-3645Performance Per Core - M N K: 256HBv4HCHBv3HBv291827364539.2520.5517.059.401. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HBv4HBv3HBv2HC12002400360048006000Min: 6502.6 / Avg: 6908.58 / Max: 7067.9Min: 1972.1 / Avg: 2045.7 / Max: 2084.7Min: 1046 / Avg: 1128.28 / Max: 1194.2Min: 779.8 / Avg: 904.09 / Max: 982.11. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HBv4HBv3HCHBv213002600390052006500SE +/- 87.98, N = 3SE +/- 38.99, N = 12SE +/- 3.15, N = 9SE +/- 1.72, N = 36163.01438.1384.9164.81. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
OpenBenchmarking.orgGFLOPS/s Per Core, More Is Betterlibxsmm 2-1.17-3645Performance Per Core - M N K: 32HBv4HBv3HCHBv281624324035.0211.988.751.371. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HBv4HBv3HCHBv211002200330044005500Min: 5987.5 / Avg: 6162.97 / Max: 6262.1Min: 1187.3 / Avg: 1438.08 / Max: 1602.2Min: 369.1 / Avg: 384.89 / Max: 396.3Min: 162.7 / Avg: 164.8 / Max: 168.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HBv4HBv3HCHBv213002600390052006500SE +/- 74.65, N = 3SE +/- 8.24, N = 3SE +/- 7.70, N = 3SE +/- 2.64, N = 155898.22413.7748.1331.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2
OpenBenchmarking.orgGFLOPS/s Per Core, More Is Betterlibxsmm 2-1.17-3645Performance Per Core - M N K: 64HBv4HBv3HCHBv281624324033.5120.1117.002.761. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HBv4HBv3HCHBv210002000300040005000Min: 5766.8 / Avg: 5898.17 / Max: 6025.3Min: 2397.2 / Avg: 2413.67 / Max: 2422.3Min: 732.7 / Avg: 748.1 / Max: 755.8Min: 314.3 / Avg: 331.37 / Max: 354.31. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lpthread -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -pedantic -O2 -fopenmp -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -march=core-avx2

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHBv4HBv3HBv2HC50100150200250SE +/- 1.25, N = 3SE +/- 0.38, N = 3SE +/- 0.57, N = 3SE +/- 0.08, N = 3228.15192.74183.82156.521. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi
OpenBenchmarking.orgMajor Kernels Total Rate Per Core, More Is BetterLaghos 3.1Performance Per Core - Test: Triple Point ProblemHCHBv3HBv2HBv40.8011.6022.4033.2044.0053.561.611.531.301. HC: Detected core count of 442. HBv3: Detected core count of 1203. HBv2: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHBv4HBv3HBv2HC4080120160200Min: 226.1 / Avg: 228.15 / Max: 230.43Min: 192.08 / Avg: 192.74 / Max: 193.41Min: 182.76 / Avg: 183.82 / Max: 184.72Min: 156.43 / Avg: 156.52 / Max: 156.691. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHBv4HBv3HBv2HC90180270360450SE +/- 0.78, N = 3SE +/- 0.15, N = 3SE +/- 3.57, N = 5SE +/- 1.35, N = 3402.94361.81345.14247.491. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi
OpenBenchmarking.orgMajor Kernels Total Rate Per Core, More Is BetterLaghos 3.1Performance Per Core - Test: Sedov Blast Wave, ube_922_hex.meshHCHBv3HBv2HBv41.26452.5293.79355.0586.32255.623.022.882.291. HC: Detected core count of 442. HBv3: Detected core count of 1203. HBv2: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHBv4HBv3HBv2HC70140210280350Min: 401.38 / Avg: 402.94 / Max: 403.76Min: 361.51 / Avg: 361.81 / Max: 361.97Min: 331 / Avg: 345.14 / Max: 349.78Min: 244.82 / Avg: 247.49 / Max: 249.171. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 1.07, N = 3SE +/- 1.41, N = 15SE +/- 0.67, N = 15SE +/- 0.07, N = 3256.35103.5191.5458.361. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HBv4HCHBv3HBv20.32850.6570.98551.3141.64251.46001.33000.86260.76281. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HBv4HBv3HBv2HC50100150200250Min: 254.28 / Avg: 256.35 / Max: 257.85Min: 88.39 / Avg: 103.51 / Max: 109.14Min: 87.54 / Avg: 91.54 / Max: 95.47Min: 58.27 / Avg: 58.36 / Max: 58.491. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HBv3HBv2HC80160240320400SE +/- 1.24, N = 3SE +/- 0.93, N = 3SE +/- 0.47, N = 3SE +/- 0.04, N = 3355.86135.6995.8862.981. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HCHBv3HBv20.45450.9091.36351.8182.27252.0201.4301.1300.7991. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HBv3HBv2HC60120180240300Min: 353.48 / Avg: 355.86 / Max: 357.66Min: 134.06 / Avg: 135.69 / Max: 137.3Min: 95.31 / Avg: 95.88 / Max: 96.8Min: 62.9 / Avg: 62.98 / Max: 63.021. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.25, N = 3SE +/- 2.52, N = 6SE +/- 1.03, N = 3SE +/- 0.09, N = 3622.58254.25191.78114.031. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HCHBv3HBv20.79651.5932.38953.1863.98253.542.592.121.601. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HBv3HBv2HC110220330440550Min: 618.16 / Avg: 622.58 / Max: 625.55Min: 241.85 / Avg: 254.25 / Max: 257.94Min: 189.88 / Avg: 191.78 / Max: 193.44Min: 113.9 / Avg: 114.02 / Max: 114.191. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HBv3HBv2HC4080120160200SE +/- 0.34, N = 3SE +/- 0.07, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3159.1857.3347.6133.521. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HCHBv3HBv20.20350.4070.61050.8141.01750.90440.76180.47780.39671. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HBv3HBv2HC306090120150Min: 158.75 / Avg: 159.18 / Max: 159.85Min: 57.24 / Avg: 57.33 / Max: 57.46Min: 47.43 / Avg: 47.61 / Max: 47.73Min: 33.48 / Avg: 33.52 / Max: 33.571. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HBv4HBv3HBv2HC50100150200250SE +/- 3.04, N = 4SE +/- 0.77, N = 15SE +/- 0.61, N = 15SE +/- 0.02, N = 3244.34103.4191.2659.731. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HBv4HCHBv3HBv20.31280.62560.93841.25121.5641.39001.36000.86170.76051. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HBv4HBv3HBv2HC4080120160200Min: 236.73 / Avg: 244.34 / Max: 251.53Min: 100.4 / Avg: 103.41 / Max: 109.56Min: 85.05 / Avg: 91.26 / Max: 93.88Min: 59.7 / Avg: 59.73 / Max: 59.761. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.80, N = 3SE +/- 0.73, N = 3SE +/- 0.34, N = 3SE +/- 0.02, N = 3323.36123.2493.7957.761. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HCHBv3HBv20.4140.8281.2421.6562.071.84001.31001.03000.78161. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HBv3HBv2HC60120180240300Min: 321.77 / Avg: 323.36 / Max: 324.36Min: 121.78 / Avg: 123.24 / Max: 124.03Min: 93.15 / Avg: 93.79 / Max: 94.28Min: 57.73 / Avg: 57.76 / Max: 57.791. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 5.66, N = 15SE +/- 0.75, N = 15SE +/- 1.31, N = 3SE +/- 0.25, N = 3261.90103.2591.9257.311. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HBv4HCHBv3HBv20.33530.67061.00591.34121.67651.49001.30000.86040.76601. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HBv4HBv3HBv2HC50100150200250Min: 224.36 / Avg: 261.9 / Max: 295.7Min: 97.13 / Avg: 103.25 / Max: 108.35Min: 89.33 / Avg: 91.92 / Max: 93.58Min: 56.81 / Avg: 57.31 / Max: 57.641. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.50, N = 3SE +/- 0.86, N = 3SE +/- 0.15, N = 3SE +/- 0.05, N = 3314.34121.2891.4860.881. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HCHBv3HBv20.40280.80561.20841.61122.0141.79001.38001.01000.76231. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HBv3HBv2HC60120180240300Min: 313.51 / Avg: 314.34 / Max: 315.25Min: 119.67 / Avg: 121.28 / Max: 122.61Min: 91.27 / Avg: 91.48 / Max: 91.78Min: 60.78 / Avg: 60.88 / Max: 60.951. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.14, N = 3SE +/- 1.85, N = 3SE +/- 2.04, N = 3SE +/- 0.06, N = 3596.23232.17190.95110.051. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HCHBv3HBv20.76281.52562.28843.05123.8143.392.501.931.591. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HBv3HBv2HC110220330440550Min: 593.05 / Avg: 596.23 / Max: 600.29Min: 228.68 / Avg: 232.17 / Max: 235Min: 187.44 / Avg: 190.95 / Max: 194.5Min: 109.93 / Avg: 110.05 / Max: 110.131. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HBv3HBv2HC306090120150SE +/- 0.27, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 3154.6556.2246.9831.571. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HCHBv3HBv20.19770.39540.59310.79080.98850.87870.71750.46850.39151. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HBv3HBv2HC306090120150Min: 154.16 / Avg: 154.65 / Max: 155.11Min: 56.14 / Avg: 56.22 / Max: 56.26Min: 46.9 / Avg: 46.98 / Max: 47.05Min: 31.53 / Avg: 31.57 / Max: 31.61. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 4.27, N = 12SE +/- 0.80, N = 15SE +/- 1.10, N = 4SE +/- 0.08, N = 3264.95102.7093.3160.571. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HBv4HCHBv3HBv20.33980.67961.01941.35921.6991.51001.38000.85590.77761. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HBv4HBv3HBv2HC50100150200250Min: 239.43 / Avg: 264.95 / Max: 281.79Min: 97.65 / Avg: 102.7 / Max: 107.66Min: 90.82 / Avg: 93.31 / Max: 96.2Min: 60.41 / Avg: 60.57 / Max: 60.671. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 1.60, N = 3SE +/- 0.40, N = 3SE +/- 0.25, N = 3SE +/- 0.05, N = 3311.80117.7394.5359.821. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HCHBv3HBv20.39830.79661.19491.59321.99151.77001.36000.98110.78781. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HBv3HBv2HC60120180240300Min: 309.28 / Avg: 311.8 / Max: 314.77Min: 116.97 / Avg: 117.73 / Max: 118.33Min: 94.04 / Avg: 94.53 / Max: 94.85Min: 59.76 / Avg: 59.82 / Max: 59.911. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 3.64, N = 15SE +/- 1.13, N = 3SE +/- 0.74, N = 15SE +/- 0.16, N = 3255.97105.0990.7958.551. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HCHBv3HBv20.32630.65260.97891.30521.63151.45001.33000.87580.75661. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC50100150200250Min: 232.66 / Avg: 255.97 / Max: 287.94Min: 102.88 / Avg: 105.09 / Max: 106.61Min: 85.49 / Avg: 90.79 / Max: 94.98Min: 58.23 / Avg: 58.55 / Max: 58.771. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC80160240320400SE +/- 1.18, N = 3SE +/- 0.58, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 3355.51135.9596.4962.901. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HCHBv3HBv20.45450.9091.36351.8182.27252.02001.43001.13000.80411. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC60120180240300Min: 353.91 / Avg: 355.51 / Max: 357.82Min: 134.82 / Avg: 135.95 / Max: 136.75Min: 96.4 / Avg: 96.49 / Max: 96.54Min: 62.84 / Avg: 62.9 / Max: 62.981. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC90180270360450SE +/- 10.91, N = 15SE +/- 3.45, N = 15SE +/- 3.34, N = 12SE +/- 0.53, N = 3427.10221.86200.04122.771. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv4HBv3HBv20.62781.25561.88342.51123.1392.792.431.851.671. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC80160240320400Min: 364.81 / Avg: 427.1 / Max: 510.37Min: 202.54 / Avg: 221.86 / Max: 240.96Min: 180.58 / Avg: 200.04 / Max: 215.7Min: 121.95 / Avg: 122.77 / Max: 123.751. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 4.23, N = 3SE +/- 2.91, N = 3SE +/- 1.39, N = 3SE +/- 0.18, N = 3624.95257.42191.14113.941. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HCHBv3HBv20.79881.59762.39643.19523.9943.552.592.151.591. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC110220330440550Min: 616.54 / Avg: 624.95 / Max: 629.91Min: 251.62 / Avg: 257.42 / Max: 260.68Min: 188.61 / Avg: 191.14 / Max: 193.38Min: 113.71 / Avg: 113.94 / Max: 114.31. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC4080120160200SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3159.2657.2347.3733.551. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HCHBv3HBv20.20360.40720.61080.81441.0180.90490.76260.47690.39471. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC306090120150Min: 159.17 / Avg: 159.26 / Max: 159.33Min: 57.15 / Avg: 57.23 / Max: 57.28Min: 47.34 / Avg: 47.37 / Max: 47.41Min: 33.52 / Avg: 33.55 / Max: 33.571. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC50100150200250SE +/- 4.85, N = 15SE +/- 1.07, N = 6SE +/- 1.33, N = 3SE +/- 0.27, N = 3247.73105.3692.1359.551. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HBv4HCHBv3HBv20.31730.63460.95191.26921.58651.41001.35000.87800.76771. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC4080120160200Min: 209.23 / Avg: 247.73 / Max: 279.64Min: 102.03 / Avg: 105.36 / Max: 109.89Min: 90.79 / Avg: 92.13 / Max: 94.78Min: 59.24 / Avg: 59.55 / Max: 60.11. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.96, N = 3SE +/- 0.05, N = 3SE +/- 0.23, N = 3SE +/- 0.06, N = 3323.70124.6093.2657.921. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HCHBv3HBv20.4140.8281.2421.6562.071.84001.32001.04000.77711. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC60120180240300Min: 322.27 / Avg: 323.7 / Max: 325.51Min: 124.52 / Avg: 124.59 / Max: 124.68Min: 92.87 / Avg: 93.26 / Max: 93.67Min: 57.83 / Avg: 57.92 / Max: 58.031. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 4.03, N = 14SE +/- 1.05, N = 3SE +/- 1.12, N = 15SE +/- 0.12, N = 3273.12106.6388.6157.131. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HBv4HCHBv3HBv20.34880.69761.04641.39521.7441.55001.30000.88860.73841. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC50100150200250Min: 242.69 / Avg: 273.12 / Max: 295.76Min: 104.73 / Avg: 106.63 / Max: 108.34Min: 74.31 / Avg: 88.61 / Max: 93.3Min: 56.9 / Avg: 57.13 / Max: 57.271. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 1.65, N = 3SE +/- 0.04, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 3315.98120.9691.4360.821. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HCHBv3HBv20.4050.811.2151.622.0251.80001.38001.01000.76191. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC60120180240300Min: 312.74 / Avg: 315.98 / Max: 318.07Min: 120.88 / Avg: 120.96 / Max: 121.02Min: 91.29 / Avg: 91.43 / Max: 91.55Min: 60.75 / Avg: 60.82 / Max: 60.951. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.49, N = 3SE +/- 0.15, N = 3SE +/- 1.02, N = 3SE +/- 0.10, N = 3590.93233.80189.21110.201. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HCHBv3HBv20.7561.5122.2683.0243.783.362.501.951.581. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC100200300400500Min: 587.06 / Avg: 590.93 / Max: 595.58Min: 233.51 / Avg: 233.8 / Max: 234.02Min: 187.17 / Avg: 189.21 / Max: 190.36Min: 110.1 / Avg: 110.2 / Max: 110.391. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC306090120150SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.02, N = 3154.5756.2746.9331.581. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HCHBv3HBv20.19760.39520.59280.79040.9880.87820.71780.46890.39111. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC306090120150Min: 154.52 / Avg: 154.57 / Max: 154.61Min: 56.21 / Avg: 56.27 / Max: 56.32Min: 46.8 / Avg: 46.93 / Max: 47.1Min: 31.55 / Avg: 31.58 / Max: 31.621. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 2.84, N = 15SE +/- 0.81, N = 15SE +/- 1.27, N = 3SE +/- 0.19, N = 3258.72105.5092.3960.891. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HBv4HCHBv3HBv20.33080.66160.99241.32321.6541.47001.38000.87920.76991. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC50100150200250Min: 239.08 / Avg: 258.72 / Max: 284.57Min: 99.82 / Avg: 105.5 / Max: 110.78Min: 89.86 / Avg: 92.39 / Max: 93.77Min: 60.52 / Avg: 60.89 / Max: 61.171. (CXX) g++ options: -O3 -pthread

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.81, N = 3SE +/- 0.49, N = 3SE +/- 0.16, N = 3SE +/- 0.03, N = 3311.27118.2495.2059.901. (CXX) g++ options: -O3 -pthread
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Performance Per Core - Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HCHBv3HBv20.39830.79661.19491.59321.99151.77001.36000.98530.79331. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC60120180240300Min: 310.4 / Avg: 311.27 / Max: 312.88Min: 117.66 / Avg: 118.24 / Max: 119.21Min: 95.03 / Avg: 95.2 / Max: 95.52Min: 59.87 / Avg: 59.9 / Max: 59.951. (CXX) g++ options: -O3 -pthread

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHBv4HBv2HBv3HC612182430SE +/- 0.018282, N = 3SE +/- 0.011742, N = 3SE +/- 0.027453, N = 3SE +/- 0.026763, N = 33.5813915.9158056.27710725.0195601. (CXX) g++ options: -fopenmp -pthread -lmpi
OpenBenchmarking.orgHydro Cycle Time - Seconds x Core, Fewer Is BetterPennant 1.0.1Performance Per Core - Test: sedovbigHBv4HBv2HBv3HC2004006008001000630.33709.90753.251100.861. HBv4: Detected core count of 1762. HBv2: Detected core count of 1203. HBv3: Detected core count of 1204. HC: Detected core count of 44
OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHBv4HBv2HBv3HC612182430Min: 3.56 / Avg: 3.58 / Max: 3.62Min: 5.89 / Avg: 5.92 / Max: 5.94Min: 6.23 / Avg: 6.28 / Max: 6.33Min: 24.98 / Avg: 25.02 / Max: 25.071. (CXX) g++ options: -fopenmp -pthread -lmpi

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHBv4HBv2HBv3HC3691215SE +/- 0.029043, N = 3SE +/- 0.009233, N = 3SE +/- 0.006682, N = 3SE +/- 0.017495, N = 32.1220743.4668853.64931710.6454801. (CXX) g++ options: -fopenmp -pthread -lmpi
OpenBenchmarking.orgHydro Cycle Time - Seconds x Core, Fewer Is BetterPennant 1.0.1Performance Per Core - Test: leblancbigHBv4HBv2HBv3HC100200300400500373.49416.03437.92468.401. HBv4: Detected core count of 1762. HBv2: Detected core count of 1203. HBv3: Detected core count of 1204. HC: Detected core count of 44
OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHBv4HBv2HBv3HC3691215Min: 2.06 / Avg: 2.12 / Max: 2.16Min: 3.45 / Avg: 3.47 / Max: 3.48Min: 3.64 / Avg: 3.65 / Max: 3.66Min: 10.62 / Avg: 10.65 / Max: 10.681. (CXX) g++ options: -fopenmp -pthread -lmpi

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHBv4HBv3HCHBv21224364860SE +/- 0.581762, N = 5SE +/- 0.146977, N = 3SE +/- 0.474074, N = 12SE +/- 0.275809, N = 1252.80244025.04835214.0720276.3954151. (CC) gcc options: -O3 -march=native -fopenmp
OpenBenchmarking.orgGFLOP/s Per Core, More Is BetterACES DGEMM 1.0Performance Per Core - Sustained Floating-Point RateHCHBv4HBv3HBv20.0720.1440.2160.2880.360.31980.30000.20870.05331. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHBv4HBv3HCHBv21122334455Min: 51.88 / Avg: 52.8 / Max: 55.03Min: 24.87 / Avg: 25.05 / Max: 25.34Min: 9.89 / Avg: 14.07 / Max: 15.85Min: 5.19 / Avg: 6.4 / Max: 8.051. (CC) gcc options: -O3 -march=native -fopenmp

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv4HBv2HCHBv30.69981.39962.09942.79923.499SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 43.112.031.851.72
OpenBenchmarking.orgImages / Sec Per Core, More Is BetterIntel Open Image Denoise 2.0Performance Per Core - Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHCHBv4HBv2HBv30.00950.0190.02850.0380.04750.04200.01770.01690.01431. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv2: Detected core count of 1204. HBv3: Detected core count of 120
OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv4HBv2HCHBv3246810Min: 3.05 / Avg: 3.11 / Max: 3.16Min: 2.02 / Avg: 2.03 / Max: 2.05Min: 1.84 / Avg: 1.85 / Max: 1.86Min: 1.67 / Avg: 1.72 / Max: 1.77

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv4HBv2HCHBv30.6931.3862.0792.7723.465SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 153.082.011.851.69
OpenBenchmarking.orgImages / Sec Per Core, More Is BetterIntel Open Image Denoise 2.0Performance Per Core - Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHCHBv4HBv2HBv30.00950.0190.02850.0380.04750.04200.01750.01680.01411. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv2: Detected core count of 1204. HBv3: Detected core count of 120
OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv4HBv2HCHBv3246810Min: 3.06 / Avg: 3.08 / Max: 3.13Min: 2 / Avg: 2.01 / Max: 2.02Min: 1.84 / Avg: 1.85 / Max: 1.86Min: 1.6 / Avg: 1.69 / Max: 1.76

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHBv4HBv2HCHBv30.2970.5940.8911.1881.485SE +/- 0.01, N = 3SE +/- 0.01, N = 15SE +/- 0.00, N = 3SE +/- 0.01, N = 31.320.960.870.80
OpenBenchmarking.orgImages / Sec Per Core, More Is BetterIntel Open Image Denoise 2.0Performance Per Core - Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHCHBv2HBv4HBv30.00450.0090.01350.0180.02250.01980.00800.00750.00671. HC: Detected core count of 442. HBv2: Detected core count of 1203. HBv4: Detected core count of 1764. HBv3: Detected core count of 120
OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHBv4HBv2HCHBv3246810Min: 1.31 / Avg: 1.32 / Max: 1.33Min: 0.92 / Avg: 0.96 / Max: 1.03Min: 0.86 / Avg: 0.87 / Max: 0.87Min: 0.79 / Avg: 0.8 / Max: 0.81

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.04011, N = 3SE +/- 0.00987, N = 3SE +/- 0.00858, N = 3SE +/- 0.01510, N = 336.6548024.4710022.366808.99618
OpenBenchmarking.orgItems Per Second Per Core, More Is BetterOSPRay 2.12Performance Per Core - Benchmark: particle_volume/ao/real_timeHBv4HCHBv3HBv20.04690.09380.14070.18760.23450.20830.20450.20390.18641. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHBv4HBv3HBv2HC816243240Min: 36.59 / Avg: 36.65 / Max: 36.73Min: 24.46 / Avg: 24.47 / Max: 24.49Min: 22.35 / Avg: 22.37 / Max: 22.38Min: 8.97 / Avg: 9 / Max: 9.01

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.05762, N = 3SE +/- 0.00564, N = 3SE +/- 0.02944, N = 3SE +/- 0.05412, N = 336.5446024.2197022.174708.87831
OpenBenchmarking.orgItems Per Second Per Core, More Is BetterOSPRay 2.12Performance Per Core - Benchmark: particle_volume/scivis/real_timeHBv4HBv3HCHBv20.04670.09340.14010.18680.23350.20760.20180.20180.18481. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHBv4HBv3HBv2HC816243240Min: 36.43 / Avg: 36.54 / Max: 36.61Min: 24.21 / Avg: 24.22 / Max: 24.23Min: 22.12 / Avg: 22.17 / Max: 22.21Min: 8.77 / Avg: 8.88 / Max: 8.95

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHBv4HBv3HBv2HC50100150200250SE +/- 0.81, N = 3SE +/- 1.50, N = 7SE +/- 0.83, N = 3SE +/- 7.22, N = 9208.05167.50162.4596.76
OpenBenchmarking.orgItems Per Second Per Core, More Is BetterOSPRay 2.12Performance Per Core - Benchmark: particle_volume/pathtracer/real_timeHCHBv3HBv2HBv40.4950.991.4851.982.4752.201.401.351.181. HC: Detected core count of 442. HBv3: Detected core count of 1203. HBv2: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHBv4HBv3HBv2HC4080120160200Min: 206.5 / Avg: 208.05 / Max: 209.25Min: 158.58 / Avg: 167.5 / Max: 169.91Min: 160.92 / Avg: 162.45 / Max: 163.75Min: 50.4 / Avg: 96.76 / Max: 122.91

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHBv4HBv3HCHBv2918273645SE +/- 0.02835, N = 3SE +/- 0.01464, N = 3SE +/- 0.03191, N = 3SE +/- 0.15055, N = 1538.0769011.750109.522938.66888
OpenBenchmarking.orgItems Per Second Per Core, More Is BetterOSPRay 2.12Performance Per Core - Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHCHBv4HBv3HBv20.04870.09740.14610.19480.24350.21640.21630.09790.07221. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHBv4HBv3HCHBv2816243240Min: 38.02 / Avg: 38.08 / Max: 38.11Min: 11.72 / Avg: 11.75 / Max: 11.77Min: 9.47 / Avg: 9.52 / Max: 9.58Min: 7.13 / Avg: 8.67 / Max: 9.27

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHBv4HBv3HCHBv2918273645SE +/- 0.12574, N = 3SE +/- 0.02977, N = 3SE +/- 0.01641, N = 3SE +/- 0.13284, N = 1537.0624011.172309.026898.32323
OpenBenchmarking.orgItems Per Second Per Core, More Is BetterOSPRay 2.12Performance Per Core - Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHBv4HCHBv3HBv20.04740.09480.14220.18960.2370.21060.20520.09310.06941. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHBv4HBv3HCHBv2816243240Min: 36.92 / Avg: 37.06 / Max: 37.31Min: 11.13 / Avg: 11.17 / Max: 11.23Min: 9 / Avg: 9.03 / Max: 9.06Min: 6.92 / Avg: 8.32 / Max: 8.82

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.08, N = 3SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 332.5814.6113.9410.06
OpenBenchmarking.orgItems Per Second Per Core, More Is BetterOSPRay 2.12Performance Per Core - Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHCHBv4HBv3HBv20.05150.1030.15450.2060.25750.22870.18510.12170.11621. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHBv4HBv3HBv2HC714212835Min: 32.45 / Avg: 32.58 / Max: 32.72Min: 14.59 / Avg: 14.61 / Max: 14.64Min: 13.84 / Avg: 13.94 / Max: 13.99Min: 10.03 / Avg: 10.06 / Max: 10.09

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHBv4HBv3HBv2HC200K400K600K800K1000KSE +/- 4158.65, N = 3SE +/- 7198.45, N = 3SE +/- 3504.63, N = 3SE +/- 672.17, N = 310835235665955015342164511. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
OpenBenchmarking.orgMIPS Per Core, More Is Better7-Zip Compression 22.01Performance Per Core - Test: Compression RatingHBv4HCHBv3HBv2130026003900520065006156.384919.344721.634179.451. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHBv4HBv3HBv2HC200K400K600K800K1000KMin: 1075330 / Avg: 1083523.33 / Max: 1088859Min: 557703 / Avg: 566595.33 / Max: 580847Min: 494658 / Avg: 501534 / Max: 506150Min: 215107 / Avg: 216451.33 / Max: 2171241. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHBv4HBv3HBv2HC160K320K480K640K800KSE +/- 8621.97, N = 3SE +/- 3365.82, N = 3SE +/- 10621.28, N = 3SE +/- 300.63, N = 37428594065163885771508411. (CXX) g++ options: -lpthread -ldl -O2 -fPIC
OpenBenchmarking.orgMIPS Per Core, More Is Better7-Zip Compression 22.01Performance Per Core - Test: Decompression RatingHBv4HCHBv3HBv290018002700360045004220.793428.203387.633238.141. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHBv4HBv3HBv2HC130K260K390K520K650KMin: 727340 / Avg: 742858.67 / Max: 757129Min: 400081 / Avg: 406516 / Max: 411445Min: 369434 / Avg: 388577.33 / Max: 406123Min: 150363 / Avg: 150841.33 / Max: 1513961. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHBv4HBv3HBv2HC70140210280350SE +/- 2.23, N = 12SE +/- 1.46, N = 3SE +/- 1.32, N = 3SE +/- 2.37, N = 3150.56185.57194.37330.61
OpenBenchmarking.orgSeconds x Core, Fewer Is BetterTimed Node.js Compilation 19.8.1Performance Per Core - Time To CompileHCHBv3HBv2HBv46K12K18K24K30K14546.9722268.0423324.0426498.211. HC: Detected core count of 442. HBv3: Detected core count of 1203. HBv2: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHBv4HBv3HBv2HC60120180240300Min: 142.56 / Avg: 150.56 / Max: 162.06Min: 182.65 / Avg: 185.57 / Max: 187.16Min: 192.29 / Avg: 194.37 / Max: 196.81Min: 326.07 / Avg: 330.61 / Max: 334.06

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHBv4HCHBv3HBv230060090012001500SE +/- 1.90, N = 3SE +/- 1.51, N = 3SE +/- 6.66, N = 3SE +/- 13.52, N = 15533.49707.32886.811367.73MIN: 518.68MIN: 687.14MIN: 849.061. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms x Core, Fewer Is BetteroneDNN 3.1Performance Per Core - Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHCHBv4HBv3HBv240K80K120K160K200K31122.1793894.94106417.20164127.601. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHBv4HCHBv3HBv22004006008001000Min: 530.55 / Avg: 533.49 / Max: 537.05Min: 704.5 / Avg: 707.32 / Max: 709.66Min: 874.05 / Avg: 886.81 / Max: 896.51Min: 1308.08 / Avg: 1367.73 / Max: 1490.481. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHBv4HCHBv3HBv22004006008001000SE +/- 3.60, N = 8SE +/- 1.89, N = 3SE +/- 4.36, N = 3SE +/- 9.54, N = 15411.23442.47529.97910.94MIN: 429.93MIN: 469.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
OpenBenchmarking.orgms x Core, Fewer Is BetteroneDNN 3.1Performance Per Core - Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHCHBv3HBv4HBv220K40K60K80K100K19468.7263596.7672377.18109312.441. HC: Detected core count of 442. HBv3: Detected core count of 1203. HBv4: Detected core count of 1764. HBv2: Detected core count of 120
OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHBv4HCHBv3HBv2160320480640800Min: 397.55 / Avg: 411.23 / Max: 424.89Min: 438.95 / Avg: 442.47 / Max: 445.42Min: 523.55 / Avg: 529.97 / Max: 538.28Min: 845.56 / Avg: 910.94 / Max: 969.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HBv4HBv2HBv3HC1200M2400M3600M4800M6000MSE +/- 24008123.63, N = 3SE +/- 14518991.39, N = 3SE +/- 6263474.36, N = 3SE +/- 4733333.33, N = 354129000004309133333421696666715706333331. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.orgsamples/s Per Core, More Is BetterLiquid-DSP 1.6Performance Per Core - Threads: 128 - Buffer Length: 256 - Filter Length: 57HBv2HCHBv3HBv48M16M24M32M40M35909444.4435696212.1135141388.8930755113.641. HBv2: Detected core count of 1202. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HBv4HBv2HBv3HC900M1800M2700M3600M4500MMin: 5380200000 / Avg: 5412900000 / Max: 5459700000Min: 4286000000 / Avg: 4309133333.33 / Max: 4335900000Min: 4207300000 / Avg: 4216966666.67 / Max: 4228700000Min: 1565900000 / Avg: 1570633333.33 / Max: 15801000001. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HBv4HBv2HBv3HC1300M2600M3900M5200M6500MSE +/- 6999365.05, N = 3SE +/- 25439885.57, N = 3SE +/- 2858321.19, N = 3SE +/- 8873431.00, N = 361817666674275533333386400000015366333331. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.orgsamples/s Per Core, More Is BetterLiquid-DSP 1.6Performance Per Core - Threads: 176 - Buffer Length: 256 - Filter Length: 32HBv2HBv4HCHBv38M16M24M32M40M35629444.4435123674.2434923484.8432200000.001. HBv2: Detected core count of 1202. HBv4: Detected core count of 1763. HC: Detected core count of 444. HBv3: Detected core count of 120
OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HBv4HBv2HBv3HC1100M2200M3300M4400M5500MMin: 6170100000 / Avg: 6181766666.67 / Max: 6194300000Min: 4249100000 / Avg: 4275533333.33 / Max: 4326400000Min: 3858500000 / Avg: 3864000000 / Max: 3868100000Min: 1519700000 / Avg: 1536633333.33 / Max: 15497000001. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HBv4HBv2HBv3HC1500M3000M4500M6000M7500MSE +/- 36788419.07, N = 3SE +/- 8195730.60, N = 3SE +/- 8996542.55, N = 3SE +/- 7033807.25, N = 370950333334350100000428153333316830333331. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.orgsamples/s Per Core, More Is BetterLiquid-DSP 1.6Performance Per Core - Threads: 176 - Buffer Length: 256 - Filter Length: 57HBv4HCHBv2HBv39M18M27M36M45M40312689.3938250757.5736250833.3335679444.441. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv2: Detected core count of 1204. HBv3: Detected core count of 120
OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HBv4HBv2HBv3HC1200M2400M3600M4800M6000MMin: 7033100000 / Avg: 7095033333.33 / Max: 7160400000Min: 4338700000 / Avg: 4350100000 / Max: 4366000000Min: 4264000000 / Avg: 4281533333.33 / Max: 4293800000Min: 1669000000 / Avg: 1683033333.33 / Max: 16909000001. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HBv4HBv2HBv3HC500M1000M1500M2000M2500MSE +/- 5336145.09, N = 3SE +/- 3265385.80, N = 3SE +/- 1919487.78, N = 3SE +/- 2270626.44, N = 322219666679242433338149500005446266671. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid
OpenBenchmarking.orgsamples/s Per Core, More Is BetterLiquid-DSP 1.6Performance Per Core - Threads: 176 - Buffer Length: 256 - Filter Length: 512HBv4HCHBv2HBv33M6M9M12M15M12624810.6112377878.807702027.786791250.001. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv2: Detected core count of 1204. HBv3: Detected core count of 120
OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HBv4HBv2HBv3HC400M800M1200M1600M2000MMin: 2214000000 / Avg: 2221966666.67 / Max: 2232100000Min: 919240000 / Avg: 924243333.33 / Max: 930380000Min: 811460000 / Avg: 814950000 / Max: 818080000Min: 540260000 / Avg: 544626666.67 / Max: 5478900001. (CC) gcc options: -O3 -march=native -pthread -lm -lc -lliquid

PostgreSQL

This is a benchmark of PostgreSQL using the integrated pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHBv4HBv2HBv3HC700K1400K2100K2800K3500KSE +/- 3042.04, N = 3SE +/- 4710.42, N = 3SE +/- 28428.57, N = 4SE +/- 2849.38, N = 331618482467328243474913535101. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.orgTPS Per Core, More Is BetterPostgreSQL 15Performance Per Core - Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHCHBv2HBv3HBv47K14K21K28K35K30761.5920561.0720289.5817965.051. HC: Detected core count of 442. HBv2: Detected core count of 1203. HBv3: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHBv4HBv2HBv3HC500K1000K1500K2000K2500KMin: 3156146.16 / Avg: 3161848.42 / Max: 3166536.73Min: 2458848.41 / Avg: 2467328.26 / Max: 2475122.53Min: 2381365.28 / Avg: 2434748.65 / Max: 2502058.08Min: 1349873.15 / Avg: 1353509.9 / Max: 1359127.941. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHBv4HBv2HBv3HC0.0830.1660.2490.3320.415SE +/- 0.000, N = 3SE +/- 0.000, N = 3SE +/- 0.002, N = 4SE +/- 0.001, N = 30.1580.2030.2060.3691. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.orgms x Core, Fewer Is BetterPostgreSQL 15Performance Per Core - Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHCHBv2HBv3HBv471421283516.2424.3624.7227.811. HC: Detected core count of 442. HBv2: Detected core count of 1203. HBv3: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHBv4HBv2HBv3HC12345Min: 0.16 / Avg: 0.16 / Max: 0.16Min: 0.2 / Avg: 0.2 / Max: 0.2Min: 0.2 / Avg: 0.21 / Max: 0.21Min: 0.37 / Avg: 0.37 / Max: 0.371. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHBv4HBv2HBv3HC700K1400K2100K2800K3500KSE +/- 2972.36, N = 3SE +/- 9212.17, N = 3SE +/- 13675.06, N = 3SE +/- 2818.34, N = 331461732481320247891711594921. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.orgTPS Per Core, More Is BetterPostgreSQL 15Performance Per Core - Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHCHBv2HBv3HBv46K12K18K24K30K26352.0920677.6720657.6417875.981. HC: Detected core count of 442. HBv2: Detected core count of 1203. HBv3: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHBv4HBv2HBv3HC500K1000K1500K2000K2500KMin: 3142297.34 / Avg: 3146173.31 / Max: 3152014.82Min: 2463458.88 / Avg: 2481320.36 / Max: 2494164.91Min: 2452680.26 / Avg: 2478917.08 / Max: 2498724.57Min: 1154151.15 / Avg: 1159492.38 / Max: 1163722.661. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHBv4HBv2HBv3HC0.15530.31060.46590.62120.7765SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.002, N = 3SE +/- 0.002, N = 30.2540.3230.3230.6901. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm
OpenBenchmarking.orgms x Core, Fewer Is BetterPostgreSQL 15Performance Per Core - Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHCHBv2HBv3HBv4102030405030.3638.7638.7644.701. HC: Detected core count of 442. HBv2: Detected core count of 1203. HBv3: Detected core count of 1204. HBv4: Detected core count of 176
OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHBv4HBv2HBv3HC246810Min: 0.25 / Avg: 0.25 / Max: 0.26Min: 0.32 / Avg: 0.32 / Max: 0.33Min: 0.32 / Avg: 0.32 / Max: 0.33Min: 0.69 / Avg: 0.69 / Max: 0.691. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHBv4HBv3HBv2HC1122334455SE +/- 0.08, N = 3SE +/- 0.10, N = 3SE +/- 0.16, N = 3SE +/- 0.36, N = 310.1119.4319.5849.95
OpenBenchmarking.orgSeconds x Core, Fewer Is BetterBlender 3.6Performance Per Core - Blend File: BMW27 - Compute: CPU-OnlyHBv4HCHBv3HBv250010001500200025001779.362197.802331.602349.601. HBv4: Detected core count of 1762. HC: Detected core count of 443. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHBv4HBv3HBv2HC1020304050Min: 10.02 / Avg: 10.11 / Max: 10.26Min: 19.26 / Avg: 19.43 / Max: 19.62Min: 19.38 / Avg: 19.58 / Max: 19.89Min: 49.58 / Avg: 49.95 / Max: 50.67

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHBv4HBv3HBv2HC306090120150SE +/- 0.11, N = 3SE +/- 0.06, N = 3SE +/- 0.15, N = 3SE +/- 0.04, N = 325.6150.7150.95138.51
OpenBenchmarking.orgSeconds x Core, Fewer Is BetterBlender 3.6Performance Per Core - Blend File: Classroom - Compute: CPU-OnlyHBv4HBv3HCHBv2130026003900520065004507.366085.206094.446114.001. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHBv4HBv3HBv2HC306090120150Min: 25.46 / Avg: 25.61 / Max: 25.82Min: 50.63 / Avg: 50.71 / Max: 50.83Min: 50.65 / Avg: 50.95 / Max: 51.12Min: 138.45 / Avg: 138.51 / Max: 138.58

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHBv4HBv3HBv2HC1632486480SE +/- 0.09, N = 3SE +/- 0.15, N = 3SE +/- 0.04, N = 3SE +/- 0.23, N = 313.7425.5926.4371.76
OpenBenchmarking.orgSeconds x Core, Fewer Is BetterBlender 3.6Performance Per Core - Blend File: Fishy Cat - Compute: CPU-OnlyHBv4HBv3HCHBv270014002100280035002418.243070.803157.443171.601. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHBv4HBv3HBv2HC1428425670Min: 13.65 / Avg: 13.74 / Max: 13.92Min: 25.42 / Avg: 25.59 / Max: 25.9Min: 26.38 / Avg: 26.43 / Max: 26.5Min: 71.37 / Avg: 71.76 / Max: 72.16

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHBv4HBv3HBv2HC110220330440550SE +/- 0.47, N = 3SE +/- 0.38, N = 3SE +/- 0.22, N = 3SE +/- 1.15, N = 397.52188.96211.46526.93
OpenBenchmarking.orgSeconds x Core, Fewer Is BetterBlender 3.6Performance Per Core - Blend File: Barbershop - Compute: CPU-OnlyHBv4HBv3HCHBv25K10K15K20K25K17163.5222675.2023184.9225375.201. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHBv4HBv3HBv2HC90180270360450Min: 96.65 / Avg: 97.52 / Max: 98.27Min: 188.32 / Avg: 188.96 / Max: 189.64Min: 211.01 / Avg: 211.46 / Max: 211.7Min: 525.76 / Avg: 526.93 / Max: 529.24

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHBv4HBv3HBv2HC4080120160200SE +/- 0.12, N = 3SE +/- 0.45, N = 3SE +/- 0.28, N = 3SE +/- 0.33, N = 333.0162.9064.84175.07
OpenBenchmarking.orgSeconds x Core, Fewer Is BetterBlender 3.6Performance Per Core - Blend File: Pabellon Barcelona - Compute: CPU-OnlyHBv4HBv3HCHBv22K4K6K8K10K5809.767548.007703.087780.801. HBv4: Detected core count of 1762. HBv3: Detected core count of 1203. HC: Detected core count of 444. HBv2: Detected core count of 120
OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHBv4HBv3HBv2HC306090120150Min: 32.76 / Avg: 33.01 / Max: 33.14Min: 62.35 / Avg: 62.9 / Max: 63.79Min: 64.42 / Avg: 64.84 / Max: 65.38Min: 174.47 / Avg: 175.07 / Max: 175.62

PETSc

PETSc, the Portable, Extensible Toolkit for Scientific Computation, is for the scalable (parallel) solution of scientific applications modeled by partial differential equations. This test profile runs the PETSc "make streams" benchmark and records the throughput rate when all available cores are utilized for the MPI Streams build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHBv4HBv3HBv2HC130K260K390K520K650KSE +/- 46271.80, N = 9SE +/- 2674.31, N = 7SE +/- 12025.83, N = 6SE +/- 256.75, N = 3598417.70284001.92197895.47151286.251. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64
OpenBenchmarking.orgMB/s Per Core, More Is BetterPETSc 3.19Performance Per Core - Test: StreamsHCHBv4HBv3HBv270014002100280035003438.323400.102366.681649.131. HC: Detected core count of 442. HBv4: Detected core count of 1763. HBv3: Detected core count of 1204. HBv2: Detected core count of 120
OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHBv4HBv3HBv2HC100K200K300K400K500KMin: 336060.74 / Avg: 598417.7 / Max: 671902.13Min: 267956.5 / Avg: 284001.92 / Max: 286773.56Min: 142495.78 / Avg: 197895.47 / Max: 216555.03Min: 150867.06 / Avg: 151286.25 / Max: 151752.71. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64

Geometric Mean Of All Test Results

OpenBenchmarking.orgGeometric Mean, More Is BetterGeometric Mean Of All Test ResultsResult Composite - Microsoft Azure HBv4 HPC Performance BenchmarksHBv4HBv3HBv2HC2004006008001000955.53453.62365.47233.00

73 Results Shown

High Performance Conjugate Gradient:
  104 104 104 - 60
  144 144 144 - 60
  160 160 160 - 60
NAS Parallel Benchmarks:
  BT.C
  CG.C
  FT.C
  IS.D
  MG.C
  SP.C
NAMD
libxsmm:
  128
  256
  32
  64
Laghos:
  Triple Point Problem
  Sedov Blast Wave, ube_922_hex.mesh
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - FFTW - float - 256
  c2c - FFTW - float - 512
  r2c - FFTW - float - 512
  c2c - FFTW - double - 512
  c2c - Stock - float - 256
  c2c - Stock - float - 512
  r2c - FFTW - double - 256
  r2c - FFTW - double - 512
  r2c - Stock - float - 512
  c2c - Stock - double - 512
  r2c - Stock - double - 256
  r2c - Stock - double - 512
  c2c - FFTW - float-long - 256
  c2c - FFTW - float-long - 512
  r2c - FFTW - float-long - 256
  r2c - FFTW - float-long - 512
  c2c - FFTW - double-long - 512
  c2c - Stock - float-long - 256
  c2c - Stock - float-long - 512
  r2c - FFTW - double-long - 256
  r2c - FFTW - double-long - 512
  r2c - Stock - float-long - 512
  c2c - Stock - double-long - 512
  r2c - Stock - double-long - 256
  r2c - Stock - double-long - 512
Pennant:
  sedovbig
  leblancbig
ACES DGEMM
Intel Open Image Denoise:
  RT.hdr_alb_nrm.3840x2160 - CPU-Only
  RT.ldr_alb_nrm.3840x2160 - CPU-Only
  RTLightmap.hdr.4096x4096 - CPU-Only
OSPRay:
  particle_volume/ao/real_time
  particle_volume/scivis/real_time
  particle_volume/pathtracer/real_time
  gravity_spheres_volume/dim_512/ao/real_time
  gravity_spheres_volume/dim_512/scivis/real_time
  gravity_spheres_volume/dim_512/pathtracer/real_time
7-Zip Compression:
  Compression Rating
  Decompression Rating
Timed Node.js Compilation
oneDNN:
  Recurrent Neural Network Training - bf16bf16bf16 - CPU
  Recurrent Neural Network Inference - bf16bf16bf16 - CPU
Liquid-DSP:
  128 - 256 - 57
  176 - 256 - 32
  176 - 256 - 57
  176 - 256 - 512
PostgreSQL:
  1 - 500 - Read Only
  1 - 500 - Read Only - Average Latency
  1 - 800 - Read Only
  1 - 800 - Read Only - Average Latency
Blender:
  BMW27 - CPU-Only
  Classroom - CPU-Only
  Fishy Cat - CPU-Only
  Barbershop - CPU-Only
  Pabellon Barcelona - CPU-Only
PETSc
Geometric Mean Of All Test Results