Microsoft Azure HBv4 HPC Comparison Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

HTML result view exported from: https://openbenchmarking.org/result/2307260-PTS-AZUREHPC62&grw.

Microsoft Azure HBv4 HPC Comparison BenchmarksProcessorMotherboardMemoryDiskGraphicsOSKernelCompilerFile-SystemScreen ResolutionSystem LayerHCHBv2HBv3HBv42 x Intel Xeon Platinum 8168 (44 Cores)Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS)1 GB + 60928 MB + 118272 MB + 176 GB32GB Virtual Disk + 752GB Virtual Disk hyperv_fbAlmaLinux 8.74.18.0-425.3.1.el8.x86_64 (x86_64)GCC 8.5.0 20210514 + CUDA 12.1nfs1024x768microsoft2 x AMD EPYC 7V12 64-Core (120 Cores)1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Diskhyperv_fb2 x AMD EPYC 7V73X 64-Core (120 Cores)2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x AMD EPYC 9V33X 96-Core (176 Cores)1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual DiskAlmaLinux 8.8OpenBenchmarking.org

Microsoft Azure HBv4 HPC Comparison Benchmarksheffte: r2c - FFTW - double - 512heffte: r2c - Stock - float - 512hpcg: 144 144 144 - 60heffte: r2c - Stock - double-long - 512heffte: r2c - FFTW - float - 256heffte: c2c - FFTW - double - 128hpcg: 160 160 160 - 60heffte: c2c - FFTW - float - 256npb: BT.Cnpb: CG.Cnpb: EP.Dnpb: FT.Cnpb: IS.Dnpb: MG.Conednn: IP Shapes 1D - f32 - CPUheffte: r2c - Stock - double - 256npb: SP.Cheffte: c2c - FFTW - float - 512namd: ATPase Simulation - 327,506 Atomsheffte: r2c - FFTW - float - 512libxsmm: 128heffte: c2c - FFTW - double - 256libxsmm: 256heffte: c2c - Stock - double-long - 512libxsmm: 32heffte: r2c - FFTW - double - 256libxsmm: 64heffte: r2c - Stock - float - 256laghos: Triple Point Problemheffte: c2c - Stock - double - 128laghos: Sedov Blast Wave, ube_922_hex.meshheffte: c2c - Stock - double - 512heffte: c2c - FFTW - double - 512onednn: IP Shapes 3D - f32 - CPUheffte: r2c - FFTW - double-long - 512heffte: c2c - Stock - float - 256heffte: c2c - Stock - float-long - 512heffte: r2c - FFTW - double-long - 256heffte: c2c - FFTW - double-long - 512heffte: c2c - Stock - float-long - 256heffte: c2c - FFTW - double-long - 128heffte: c2c - FFTW - double-long - 256heffte: r2c - FFTW - float-long - 512heffte: r2c - FFTW - float-long - 256heffte: c2c - FFTW - float-long - 512heffte: c2c - FFTW - float-long - 256onednn: Convolution Batch Shapes Auto - f32 - CPUheffte: c2c - Stock - double-long - 256heffte: c2c - Stock - float - 512heffte: r2c - Stock - double-long - 256heffte: c2c - Stock - double - 256onednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUheffte: r2c - Stock - double - 512onednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUmt-dgemm: Sustained Floating-Point Rateremhos: Sample Remap Examplepennant: sedovbigpennant: leblancbigcompress-7zip: Compression Ratingcompress-7zip: Decompression Ratingbuild-linux-kernel: allmodconfigheffte: r2c - Stock - float-long - 256blender: BMW27 - CPU-Onlyblender: Classroom - CPU-Onlyblender: Fishy Cat - CPU-Onlyblender: Barbershop - CPU-Onlyblender: Pabellon Barcelona - CPU-Onlyoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyospray: particle_volume/ao/real_timeospray: particle_volume/scivis/real_timeospray: particle_volume/pathtracer/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/pathtracer/real_timeheffte: r2c - Stock - float-long - 512build-nodejs: Time To Compileliquid-dsp: 1 - 256 - 32liquid-dsp: 32 - 256 - 32liquid-dsp: 32 - 256 - 57liquid-dsp: 128 - 256 - 32liquid-dsp: 128 - 256 - 57liquid-dsp: 176 - 256 - 32liquid-dsp: 176 - 256 - 57liquid-dsp: 176 - 256 - 512pgbench: 1 - 500 - Read Onlypgbench: 1 - 500 - Read Only - Average Latencypgbench: 1 - 800 - Read Onlypgbench: 1 - 800 - Read Only - Average Latencypetsc: Streamshpcg: 104 104 104 - 60HCHBv2HBv3HBv460.8804110.04925.865959.8954123.63259.144225.563558.356728794.2814356.201642.0320188.891181.4819508.000.88244660.572712907.5462.97500.52650114.0251328.430.1190898.831.5846379.957.3101731.6134.760156.5241.7345247.4931.571833.51932.0792060.820459.729257.920357.129033.554559.552758.912530.2175113.940122.77262.902758.54983.1112130.267257.764360.887230.16631.24480707.35359.8216450.247707.322442.47114.34083027.37825.0195610.645482107321481931950.626131.96250.53138.8172.57524.86176.211.821.840.888.975478.9702086.57349.494218.9872310.0490110.197330.61331796333964423333721290909151260000015724000001566133333166473333352921333313548770.36911618000.688151286.249125.997191.4802190.94936.086695.1989203.77259.424436.016791.538366829.1822314.023222.8241977.691884.2243410.711.4075893.313732495.8995.88010.26385191.7751519.550.90321444.246.9289195.191.9186411.7205.206183.8251.3955345.1446.979447.60506.8382591.429691.260193.257388.608147.369692.129061.140351.1954191.141200.03596.494190.78830.57387850.075993.792392.388350.70701.610021345.1494.5301896.8131367.73910.9375.89990314.9315.9158053.4668854894563710441782.933211.41819.4650.8626.19210.1864.142.082.031.0422.333622.1533157.1338.673278.1235613.9151189.208194.3673321166710614333331193400000392593333340459333334027100000410670000082565333324662490.20324396500.328197895.471737.0410121.283232.16638.9739118.236198.66059.381139.1106103.514762427.8621551.482879.0836619.292793.5546705.470.910091102.704631024.76135.6940.27115254.2522284.639.81172032.156.26901506.3103.24572435.6214.063192.7450.6068361.8156.216157.33070.624233120.957103.409124.595106.63257.2263105.36156.869339.3709257.419221.861135.950105.0930.55674138.5694123.242105.500338.44611.40862860.975117.731533.496886.810529.97325.10487615.2566.2771073.6493175582903975051889.463207.97419.4951.0825.47189.3062.641.681.690.7924.458624.1736168.24211.748511.184514.6067233.797185.567328173339173366671086000000336673333335163000003419533333356343333373537000023750050.21024076020.332284001.916239.6093314.336596.22688.5160311.267442.82980.251487.9013256.349151067.8140326.295985.7569051.635870.00108125.860.752929264.95468819.34355.8550.14292622.5806585.6123.3916983.2154.5685006.8261.9035719.0459.918228.1587.6623402.94154.648159.1750.306141315.982244.342323.696273.121159.258247.72585.0078122.981624.951427.101355.512255.9680.276472123.408323.356258.716121.6050.582806535.853311.803401.855533.494411.23453.17569115.3703.5813912.12207410322677279951681.255467.7189.9725.2613.9696.7733.403.083.131.2936.612136.5671208.33838.076437.091832.7911590.925150.55835362667111330000013905400004426300000516823333361222333336758166667205823333331398460.15931230420.256598417.695789.3840OpenBenchmarking.org

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.05, N = 3SE +/- 0.15, N = 3SE +/- 0.86, N = 3SE +/- 0.50, N = 360.8891.48121.28314.341. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.06, N = 3SE +/- 2.04, N = 3SE +/- 1.85, N = 3SE +/- 2.14, N = 3110.05190.95232.17596.231. (CXX) g++ options: -O3 -pthread

High Performance Conjugate Gradient

X Y Z: 144 144 144 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.11, N = 325.8736.0938.9788.521. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.03, N = 3SE +/- 0.16, N = 3SE +/- 0.49, N = 3SE +/- 0.81, N = 359.9095.20118.24311.271. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256HCHBv2HBv3HBv4100200300400500SE +/- 0.52, N = 3SE +/- 1.85, N = 3SE +/- 5.11, N = 15SE +/- 14.97, N = 12123.63203.77198.66442.831. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128HCHBv2HBv3HBv420406080100SE +/- 0.65, N = 5SE +/- 1.72, N = 15SE +/- 1.84, N = 15SE +/- 3.67, N = 1559.1459.4259.3880.251. (CXX) g++ options: -O3 -pthread

High Performance Conjugate Gradient

X Y Z: 160 160 160 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.06, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.12, N = 325.5636.0239.1187.901. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.07, N = 3SE +/- 0.67, N = 15SE +/- 1.41, N = 15SE +/- 1.07, N = 358.3691.54103.51256.351. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHCHBv2HBv3HBv430K60K90K120K150KSE +/- 15.19, N = 3SE +/- 32.07, N = 3SE +/- 36.56, N = 3SE +/- 760.56, N = 328794.2866829.1862427.86151067.811. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: CG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHCHBv2HBv3HBv49K18K27K36K45KSE +/- 233.39, N = 15SE +/- 108.02, N = 3SE +/- 20.87, N = 3SE +/- 77.41, N = 314356.2022314.0221551.4840326.291. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: EP.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.DHCHBv2HBv3HBv413002600390052006500SE +/- 1.76, N = 3SE +/- 32.15, N = 6SE +/- 80.22, N = 12SE +/- 37.41, N = 31642.033222.822879.085985.751. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHCHBv2HBv3HBv415K30K45K60K75KSE +/- 13.57, N = 3SE +/- 219.43, N = 3SE +/- 194.34, N = 3SE +/- 745.61, N = 320188.8941977.6936619.2969051.631. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: IS.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHCHBv2HBv3HBv413002600390052006500SE +/- 2.10, N = 3SE +/- 11.15, N = 3SE +/- 22.55, N = 3SE +/- 17.88, N = 31181.481884.222793.555870.001. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

NAS Parallel Benchmarks

Test / Class: MG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHCHBv2HBv3HBv420K40K60K80K100KSE +/- 24.47, N = 3SE +/- 354.81, N = 3SE +/- 613.84, N = 15SE +/- 748.94, N = 1319508.0043410.7146705.47108125.861. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv40.31670.63340.95011.26681.5835SE +/- 0.000702, N = 3SE +/- 0.014464, N = 3SE +/- 0.013826, N = 12SE +/- 0.001421, N = 30.8824461.4075800.9100910.752929MIN: 0.83MIN: 1.11MIN: 0.691. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.08, N = 3SE +/- 1.10, N = 4SE +/- 0.80, N = 15SE +/- 4.27, N = 1260.5793.31102.70264.951. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHCHBv2HBv3HBv415K30K45K60K75KSE +/- 12.00, N = 3SE +/- 34.59, N = 3SE +/- 273.09, N = 8SE +/- 954.46, N = 1212907.5432495.8931024.7668819.341. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HCHBv2HBv3HBv480160240320400SE +/- 0.04, N = 3SE +/- 0.47, N = 3SE +/- 0.93, N = 3SE +/- 1.24, N = 362.9895.88135.69355.861. (CXX) g++ options: -O3 -pthread

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHCHBv2HBv3HBv40.11850.2370.35550.4740.5925SE +/- 0.00096, N = 3SE +/- 0.00045, N = 3SE +/- 0.00027, N = 3SE +/- 0.00035, N = 30.526500.263850.271150.14292

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.09, N = 3SE +/- 1.03, N = 3SE +/- 2.52, N = 6SE +/- 2.25, N = 3114.03191.78254.25622.581. (CXX) g++ options: -O3 -pthread

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HCHBv2HBv3HBv414002800420056007000SE +/- 11.02, N = 3SE +/- 153.42, N = 6SE +/- 29.40, N = 3SE +/- 59.85, N = 31328.41519.52284.66585.6

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256HCHBv2HBv3HBv4306090120150SE +/- 0.08, N = 3SE +/- 0.55, N = 3SE +/- 0.14, N = 3SE +/- 1.65, N = 330.1250.9039.81123.391. (CXX) g++ options: -O3 -pthread

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HCHBv2HBv3HBv415003000450060007500SE +/- 13.41, N = 12SE +/- 51.69, N = 9SE +/- 23.34, N = 3SE +/- 63.60, N = 3898.81444.22032.16983.2

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv4306090120150SE +/- 0.02, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 331.5846.9356.27154.571. (CXX) g++ options: -O3 -pthread

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HCHBv2HBv3HBv411002200330044005500SE +/- 2.82, N = 11SE +/- 3.90, N = 12SE +/- 32.59, N = 14SE +/- 443.26, N = 12379.9195.11506.35006.8

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.25, N = 3SE +/- 1.31, N = 3SE +/- 0.75, N = 15SE +/- 5.66, N = 1557.3191.92103.25261.901. (CXX) g++ options: -O3 -pthread

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HCHBv2HBv3HBv412002400360048006000SE +/- 5.15, N = 15SE +/- 18.03, N = 13SE +/- 17.54, N = 12SE +/- 226.33, N = 12731.6411.72435.65719.0

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 256HCHBv2HBv3HBv4100200300400500SE +/- 0.57, N = 3SE +/- 2.79, N = 12SE +/- 5.19, N = 15SE +/- 14.34, N = 15134.76205.21214.06459.921. (CXX) g++ options: -O3 -pthread

Laghos

Test: Triple Point Problem

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHCHBv2HBv3HBv450100150200250SE +/- 0.08, N = 3SE +/- 0.57, N = 3SE +/- 0.38, N = 3SE +/- 1.25, N = 3156.52183.82192.74228.151. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 128HCHBv2HBv3HBv420406080100SE +/- 0.30, N = 3SE +/- 1.33, N = 15SE +/- 1.12, N = 15SE +/- 3.68, N = 1441.7351.4050.6187.661. (CXX) g++ options: -O3 -pthread

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHCHBv2HBv3HBv490180270360450SE +/- 1.35, N = 3SE +/- 3.57, N = 5SE +/- 0.15, N = 3SE +/- 0.78, N = 3247.49345.14361.81402.941. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HCHBv2HBv3HBv4306090120150SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.27, N = 331.5746.9856.22154.651. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HCHBv2HBv3HBv44080120160200SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.34, N = 333.5247.6157.33159.181. (CXX) g++ options: -O3 -pthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv4246810SE +/- 0.093711, N = 12SE +/- 0.032665, N = 3SE +/- 0.039917, N = 15SE +/- 0.002422, N = 32.0792006.8382500.6242330.306141MIN: 5.971. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.06, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 3SE +/- 1.65, N = 360.8291.43120.96315.981. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HCHBv2HBv3HBv450100150200250SE +/- 0.02, N = 3SE +/- 0.61, N = 15SE +/- 0.77, N = 15SE +/- 3.04, N = 459.7391.26103.41244.341. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.06, N = 3SE +/- 0.23, N = 3SE +/- 0.05, N = 3SE +/- 0.96, N = 357.9293.26124.60323.701. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.12, N = 3SE +/- 1.12, N = 15SE +/- 1.05, N = 3SE +/- 4.03, N = 1457.1388.61106.63273.121. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HCHBv2HBv3HBv44080120160200SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 333.5547.3757.23159.261. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv450100150200250SE +/- 0.27, N = 3SE +/- 1.33, N = 3SE +/- 1.07, N = 6SE +/- 4.85, N = 1559.5592.13105.36247.731. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128HCHBv2HBv3HBv420406080100SE +/- 0.23, N = 3SE +/- 1.30, N = 15SE +/- 0.34, N = 3SE +/- 4.77, N = 1558.9161.1456.8785.011. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv4306090120150SE +/- 0.05, N = 3SE +/- 0.57, N = 3SE +/- 0.33, N = 3SE +/- 1.21, N = 1530.2251.2039.37122.981. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.18, N = 3SE +/- 1.39, N = 3SE +/- 2.91, N = 3SE +/- 4.23, N = 3113.94191.14257.42624.951. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv490180270360450SE +/- 0.53, N = 3SE +/- 3.34, N = 12SE +/- 3.45, N = 15SE +/- 10.91, N = 15122.77200.04221.86427.101. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv480160240320400SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.58, N = 3SE +/- 1.18, N = 362.9096.49135.95355.511. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.16, N = 3SE +/- 0.74, N = 15SE +/- 1.13, N = 3SE +/- 3.64, N = 1558.5590.79105.09255.971. (CXX) g++ options: -O3 -pthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv40.71.42.12.83.5SE +/- 0.015370, N = 3SE +/- 0.002431, N = 3SE +/- 0.001799, N = 3SE +/- 0.000440, N = 33.1112100.5738780.5567410.276472MIN: 1.73MIN: 0.47MIN: 0.51. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv4306090120150SE +/- 0.03, N = 3SE +/- 0.55, N = 3SE +/- 0.14, N = 3SE +/- 1.16, N = 330.2750.0838.57123.411. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.02, N = 3SE +/- 0.34, N = 3SE +/- 0.73, N = 3SE +/- 0.80, N = 357.7693.79123.24323.361. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HCHBv2HBv3HBv460120180240300SE +/- 0.19, N = 3SE +/- 1.27, N = 3SE +/- 0.81, N = 15SE +/- 2.84, N = 1560.8992.39105.50258.721. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 256HCHBv2HBv3HBv4306090120150SE +/- 0.08, N = 3SE +/- 0.29, N = 3SE +/- 0.29, N = 11SE +/- 1.20, N = 330.1750.7138.45121.611. (CXX) g++ options: -O3 -pthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv40.36230.72461.08691.44921.8115SE +/- 0.002723, N = 3SE +/- 0.021847, N = 3SE +/- 0.003506, N = 3SE +/- 0.001551, N = 31.2448001.6100201.4086200.582806MIN: 1.22MIN: 1.49MIN: 1.36MIN: 0.561. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv430060090012001500SE +/- 1.60, N = 3SE +/- 13.31, N = 3SE +/- 3.89, N = 3SE +/- 3.26, N = 3707.351345.14860.98535.85MIN: 689.52MIN: 1237.17MIN: 814.31MIN: 521.121. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HCHBv2HBv3HBv470140210280350SE +/- 0.05, N = 3SE +/- 0.25, N = 3SE +/- 0.40, N = 3SE +/- 1.60, N = 359.8294.53117.73311.801. (CXX) g++ options: -O3 -pthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUHCHBv2HBv3HBv42004006008001000SE +/- 4.72, N = 3SE +/- 9.52, N = 15SE +/- 4.61, N = 15SE +/- 1.40, N = 3450.25896.81533.50401.86MIN: 432.99MIN: 388.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHCHBv2HBv3HBv430060090012001500SE +/- 1.51, N = 3SE +/- 13.52, N = 15SE +/- 6.66, N = 3SE +/- 1.90, N = 3707.321367.73886.81533.49MIN: 687.14MIN: 849.06MIN: 518.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHCHBv2HBv3HBv42004006008001000SE +/- 1.89, N = 3SE +/- 9.54, N = 15SE +/- 4.36, N = 3SE +/- 3.60, N = 8442.47910.94529.97411.23MIN: 429.93MIN: 469.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHCHBv2HBv3HBv41224364860SE +/- 0.199669, N = 15SE +/- 0.272351, N = 15SE +/- 0.132089, N = 3SE +/- 0.359007, N = 314.3408305.89990325.10487653.1756911. (CC) gcc options: -O3 -march=native -fopenmp

Remhos

Test: Sample Remap Example

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap ExampleHCHBv2HBv3HBv4612182430SE +/- 0.06, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 3SE +/- 0.14, N = 327.3814.9315.2615.371. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHCHBv2HBv3HBv4612182430SE +/- 0.026763, N = 3SE +/- 0.011742, N = 3SE +/- 0.027453, N = 3SE +/- 0.018282, N = 325.0195605.9158056.2771073.5813911. (CXX) g++ options: -fopenmp -pthread -lmpi

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHCHBv2HBv3HBv43691215SE +/- 0.017495, N = 3SE +/- 0.009233, N = 3SE +/- 0.006682, N = 3SE +/- 0.029043, N = 310.6454803.4668853.6493172.1220741. (CXX) g++ options: -fopenmp -pthread -lmpi

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHCHBv2HBv3HBv4200K400K600K800K1000KSE +/- 748.55, N = 3SE +/- 2650.49, N = 3SE +/- 6724.92, N = 3SE +/- 7680.08, N = 1521073248945655829010322671. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHCHBv2HBv3HBv4160K320K480K640K800KSE +/- 256.58, N = 3SE +/- 2438.40, N = 3SE +/- 19127.89, N = 3SE +/- 8360.33, N = 151481933710443975057279951. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigHCHBv2HBv3HBv4400800120016002000SE +/- 7.59, N = 3SE +/- 22.46, N = 3SE +/- 22.02, N = 3SE +/- 32.03, N = 91950.631782.931889.461681.26

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256HCHBv2HBv3HBv4100200300400500SE +/- 0.90, N = 3SE +/- 2.37, N = 15SE +/- 7.34, N = 15SE +/- 17.46, N = 12131.96211.42207.97467.721. (CXX) g++ options: -O3 -pthread

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHCHBv2HBv3HBv41122334455SE +/- 0.65, N = 15SE +/- 0.11, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 350.5319.4619.499.97

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHCHBv2HBv3HBv4306090120150SE +/- 0.49, N = 3SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.11, N = 3138.8150.8651.0825.26

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHCHBv2HBv3HBv41632486480SE +/- 0.48, N = 3SE +/- 0.10, N = 3SE +/- 0.08, N = 3SE +/- 0.14, N = 372.5726.1925.4713.96

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHCHBv2HBv3HBv4110220330440550SE +/- 2.13, N = 3SE +/- 0.01, N = 3SE +/- 0.45, N = 3SE +/- 0.12, N = 3524.86210.18189.3096.77

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHCHBv2HBv3HBv44080120160200SE +/- 1.13, N = 3SE +/- 0.10, N = 3SE +/- 0.24, N = 3SE +/- 0.06, N = 3176.2164.1462.6433.40

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHCHBv2HBv3HBv40.6931.3862.0792.7723.465SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 31.822.081.683.08

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHCHBv2HBv3HBv40.70431.40862.11292.81723.5215SE +/- 0.01, N = 3SE +/- 0.02, N = 9SE +/- 0.01, N = 15SE +/- 0.01, N = 31.842.031.693.13

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHCHBv2HBv3HBv40.29030.58060.87091.16121.4515SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 30.881.040.791.29

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.01225, N = 3SE +/- 0.00495, N = 3SE +/- 0.01755, N = 3SE +/- 0.04053, N = 38.9754722.3336024.4586036.61210

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.00763, N = 3SE +/- 0.01671, N = 3SE +/- 0.01956, N = 3SE +/- 0.03598, N = 38.9702022.1533024.1736036.56710

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHCHBv2HBv3HBv450100150200250SE +/- 8.14, N = 12SE +/- 3.07, N = 12SE +/- 0.23, N = 3SE +/- 0.07, N = 386.57157.13168.24208.34

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHCHBv2HBv3HBv4918273645SE +/- 0.02906, N = 3SE +/- 0.13915, N = 12SE +/- 0.03837, N = 3SE +/- 0.03610, N = 39.494218.6732711.7485038.07640

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHCHBv2HBv3HBv4918273645SE +/- 0.03491, N = 3SE +/- 0.12026, N = 15SE +/- 0.01165, N = 3SE +/- 0.11164, N = 38.987238.1235611.1845037.09180

OSPRay

Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHCHBv2HBv3HBv4816243240SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 310.0513.9214.6132.79

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HCHBv2HBv3HBv4130260390520650SE +/- 0.10, N = 3SE +/- 1.02, N = 3SE +/- 0.15, N = 3SE +/- 2.49, N = 3110.20189.21233.80590.931. (CXX) g++ options: -O3 -pthread

Timed Node.js Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHCHBv2HBv3HBv470140210280350SE +/- 2.37, N = 3SE +/- 1.32, N = 3SE +/- 1.46, N = 3SE +/- 2.23, N = 12330.61194.37185.57150.56

Liquid-DSP

Threads: 1 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 1 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv48M16M24M32M40MSE +/- 1333.33, N = 3SE +/- 2185.81, N = 3SE +/- 4096.07, N = 3SE +/- 20201.76, N = 3317963333321166732817333353626671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv4200M400M600M800M1000MSE +/- 3947135.39, N = 3SE +/- 33333.33, N = 3SE +/- 2475306.94, N = 3SE +/- 1950213.66, N = 3964423333106143333391733666711133000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 57HCHBv2HBv3HBv4300M600M900M1200M1500MSE +/- 5360840.75, N = 11SE +/- 472581.56, N = 3SE +/- 550757.05, N = 3SE +/- 14294460.47, N = 57212909091193400000108600000013905400001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv4900M1800M2700M3600M4500MSE +/- 8213606.60, N = 3SE +/- 3602930.91, N = 3SE +/- 5345506.94, N = 3SE +/- 3774034.09, N = 315126000003925933333336673333344263000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HCHBv2HBv3HBv41100M2200M3300M4400M5500MSE +/- 8373967.60, N = 3SE +/- 4421286.89, N = 3SE +/- 6947661.48, N = 3SE +/- 10401335.38, N = 315724000004045933333351630000051682333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HCHBv2HBv3HBv41300M2600M3900M5200M6500MSE +/- 2852094.75, N = 3SE +/- 44818002.34, N = 3SE +/- 8912600.32, N = 3SE +/- 9214903.39, N = 315661333334027100000341953333361222333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HCHBv2HBv3HBv41400M2800M4200M5600M7000MSE +/- 5446813.54, N = 3SE +/- 13588352.86, N = 3SE +/- 4247482.91, N = 3SE +/- 11394345.58, N = 316647333334106700000356343333367581666671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HCHBv2HBv3HBv4400M800M1200M1600M2000MSE +/- 6341443.93, N = 3SE +/- 3174614.59, N = 3SE +/- 3040334.41, N = 3SE +/- 4603018.33, N = 352921333382565333373537000020582333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHCHBv2HBv3HBv4700K1400K2100K2800K3500KSE +/- 3475.53, N = 3SE +/- 8486.11, N = 3SE +/- 4803.91, N = 3SE +/- 4762.10, N = 313548772466249237500531398461. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHCHBv2HBv3HBv40.0830.1660.2490.3320.415SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.000, N = 3SE +/- 0.000, N = 30.3690.2030.2100.1591. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHCHBv2HBv3HBv4700K1400K2100K2800K3500KSE +/- 4936.18, N = 3SE +/- 4115.38, N = 3SE +/- 11149.78, N = 3SE +/- 20304.79, N = 311618002439650240760231230421. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHCHBv2HBv3HBv40.15480.30960.46440.61920.774SE +/- 0.003, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 30.6880.3280.3320.2561. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PETSc

Test: Streams

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHCHBv2HBv3HBv4130K260K390K520K650KSE +/- 256.75, N = 3SE +/- 12025.83, N = 6SE +/- 2674.31, N = 7SE +/- 46271.80, N = 9151286.25197895.47284001.92598417.701. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64

High Performance Conjugate Gradient

X Y Z: 104 104 104 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HCHBv2HBv3HBv420406080100SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.26, N = 326.0037.0439.6189.381. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi


Phoronix Test Suite v10.8.5