Microsoft Azure HBv4 HPC Comparison Benchmarks

Benchmarks for a future article on Phoronix looking at HBv4 Genoa-X Linux performance..

HTML result view exported from: https://openbenchmarking.org/result/2307260-PTS-AZUREHPC62&grr&rdt.

Microsoft Azure HBv4 HPC Comparison BenchmarksProcessorMotherboardMemoryDiskGraphicsOSKernelCompilerFile-SystemScreen ResolutionSystem LayerHBv4HBv3HBv2HC2 x AMD EPYC 9V33X 96-Core (176 Cores)Microsoft Virtual Machine (Hyper-V UEFI v4.1 BIOS)1 GB + 59 GB + 116 GB + 176 GB + 176 GB + 176 GB2 x 1920GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Diskhyperv_fbAlmaLinux 8.84.18.0-425.3.1.el8.x86_64 (x86_64)GCC 8.5.0 20210514 + CUDA 12.1nfs1024x768microsoft2 x AMD EPYC 7V73X 64-Core (120 Cores)1 GB + 59 GB + 54 GB + 114 GB + 114 GB + 114 GB2 x 960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual DiskAlmaLinux 8.72 x AMD EPYC 7V12 64-Core (120 Cores)960GB Microsoft NVMe Direct Disk + 32GB Virtual Disk + 515GB Virtual Disk2 x Intel Xeon Platinum 8168 (44 Cores)1 GB + 60928 MB + 118272 MB + 176 GB32GB Virtual Disk + 752GB Virtual Disk hyperv_fbOpenBenchmarking.org

Microsoft Azure HBv4 HPC Comparison Benchmarksbuild-linux-kernel: allmodconfigpetsc: Streamshpcg: 160 160 160 - 60hpcg: 144 144 144 - 60ospray: particle_volume/pathtracer/real_timebuild-nodejs: Time To Compileblender: Barbershop - CPU-Onlyhpcg: 104 104 104 - 60ospray: particle_volume/scivis/real_timepgbench: 1 - 500 - Read Only - Average Latencypgbench: 1 - 500 - Read Onlypgbench: 1 - 800 - Read Only - Average Latencypgbench: 1 - 800 - Read Onlyospray: gravity_spheres_volume/dim_512/scivis/real_timelaghos: Sedov Blast Wave, ube_922_hex.meshospray: gravity_spheres_volume/dim_512/ao/real_timeospray: particle_volume/ao/real_timenpb: SP.Cblender: Pabellon Barcelona - CPU-Onlyonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUblender: BMW27 - CPU-Onlymt-dgemm: Sustained Floating-Point Ratenpb: EP.Dblender: Classroom - CPU-Onlylaghos: Triple Point Problemnpb: BT.Cospray: gravity_spheres_volume/dim_512/pathtracer/real_timeliquid-dsp: 32 - 256 - 57compress-7zip: Decompression Ratingcompress-7zip: Compression Ratingheffte: c2c - Stock - double-long - 512heffte: c2c - Stock - double - 512heffte: c2c - FFTW - double-long - 512heffte: c2c - FFTW - double - 512blender: Fishy Cat - CPU-Onlyliquid-dsp: 176 - 256 - 512liquid-dsp: 176 - 256 - 32liquid-dsp: 176 - 256 - 57liquid-dsp: 128 - 256 - 57liquid-dsp: 128 - 256 - 32liquid-dsp: 1 - 256 - 32liquid-dsp: 32 - 256 - 32namd: ATPase Simulation - 327,506 Atomsoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyheffte: r2c - FFTW - float-long - 256heffte: r2c - Stock - float - 256heffte: c2c - FFTW - double - 128heffte: r2c - Stock - float-long - 256heffte: c2c - Stock - double - 128heffte: r2c - FFTW - double - 256heffte: r2c - Stock - double-long - 256heffte: c2c - FFTW - float-long - 256heffte: c2c - Stock - float-long - 512heffte: c2c - Stock - float - 512npb: IS.Dheffte: r2c - FFTW - double-long - 256heffte: r2c - Stock - double-long - 512heffte: r2c - Stock - double - 512heffte: r2c - FFTW - double-long - 512heffte: r2c - FFTW - double - 512heffte: r2c - Stock - double - 256heffte: c2c - FFTW - float - 512heffte: c2c - FFTW - float-long - 512heffte: c2c - Stock - float - 256oidn: RTLightmap.hdr.4096x4096 - CPU-Onlyheffte: c2c - FFTW - float - 256heffte: c2c - FFTW - double-long - 256heffte: c2c - FFTW - double-long - 128heffte: r2c - FFTW - float - 256heffte: c2c - Stock - float-long - 256remhos: Sample Remap Exampleheffte: r2c - FFTW - float - 512heffte: c2c - Stock - double - 256npb: FT.Cpennant: sedovbigonednn: IP Shapes 1D - f32 - CPUheffte: r2c - Stock - float - 512heffte: r2c - Stock - float-long - 512heffte: r2c - FFTW - float-long - 512oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlypennant: leblancbignpb: CG.Cheffte: c2c - Stock - double-long - 256heffte: c2c - FFTW - double - 256onednn: IP Shapes 3D - f32 - CPUnpb: MG.Conednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUlibxsmm: 64libxsmm: 32libxsmm: 256libxsmm: 128HBv4HBv3HBv2HC1681.255598417.695787.901388.5160208.338150.55896.7789.384036.56710.15931398460.256312304237.0918402.9438.076436.612168819.3433.40535.853533.494411.234401.8559.9753.1756915985.7525.26228.15151067.8132.791113905400007279951032267154.568154.648159.258159.17513.96205823333361222333336758166667516823333344263000003536266711133000000.142923.13427.101459.91880.2514467.71887.6623261.903258.716255.968323.696323.3565870.00273.121311.267311.803315.982314.336264.954355.855355.512244.3421.29256.349122.98185.0078442.829247.72515.370622.580121.60569051.633.5813910.752929596.226590.925624.9513.082.12207440326.29123.408123.3910.306141108125.860.2764720.5828065719.05006.86983.26585.61889.463284001.916239.110638.9739168.242185.567189.3039.609324.17360.21023750050.332240760211.1845361.8111.748524.458631024.7662.64860.975886.810529.973533.49619.4925.1048762879.0851.08192.7462427.8614.6067108600000039750555829056.269056.216157.226357.330725.477353700003419533333356343333335163000003366733333328173339173366670.271151.69221.861214.06359.3811207.97450.6068103.2457105.5003105.093124.595123.2422793.55106.632118.236117.731120.957121.283102.7046135.694135.950103.4090.79103.514739.370956.8693198.660105.36115.256254.25238.446136619.296.2771070.910091232.166233.797257.4191.683.64931721551.4838.569439.81170.62423346705.470.5567411.408622435.61506.32032.12284.61782.933197895.471736.016736.0866157.133194.367210.1837.041022.15330.20324662490.32824396508.12356345.148.6732722.333632495.8964.141345.141367.73910.937896.81319.465.8999033222.8250.86183.8266829.1813.9151119340000037104448945646.928946.979447.369647.605026.1982565333340271000004106700000404593333339259333333321166710614333330.263852.03200.035205.20659.4244211.41851.395591.918692.388390.788393.257393.79231884.2288.608195.198994.530191.429691.480293.313795.880196.494191.26011.0491.538351.195461.1403203.77292.129014.931191.77550.707041977.695.9158051.40758190.949189.208191.1412.083.46688522314.0250.075950.90326.8382543410.710.5738781.61002411.7195.11444.21519.51950.626151286.249125.563525.865986.5734330.613524.8625.99718.970200.36913548770.68811618008.98723247.499.494218.9754712907.54176.21707.353707.322442.471450.24750.5314.3408301642.03138.81156.5228794.2810.049072129090914819321073231.584631.571833.554533.519372.575292133331566133333166473333315724000001512600000317963339644233330.526501.84122.772134.76059.1442131.96241.734557.310160.887258.549857.920357.76431181.4857.129059.895459.821660.820460.880460.572762.975062.902759.72920.8858.356730.217558.9125123.63259.552727.378114.02530.166320188.8925.019560.882446110.049110.197113.9401.8210.6454814356.2030.267230.11902.0792019508.003.111211.24480731.6379.9898.81328.4OpenBenchmarking.org

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfigHBv4HBv3HBv2HC400800120016002000SE +/- 32.03, N = 9SE +/- 22.02, N = 3SE +/- 22.46, N = 3SE +/- 7.59, N = 31681.261889.461782.931950.63

PETSc

Test: Streams

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHBv4HBv3HBv2HC130K260K390K520K650KSE +/- 46271.80, N = 9SE +/- 2674.31, N = 7SE +/- 12025.83, N = 6SE +/- 256.75, N = 3598417.70284001.92197895.47151286.251. (CXX) g++ options: -O3 -std=gnu++17 -fPIC -include -m64

High Performance Conjugate Gradient

X Y Z: 160 160 160 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.12, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 387.9039.1136.0225.561. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

High Performance Conjugate Gradient

X Y Z: 144 144 144 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.11, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.05, N = 388.5238.9736.0925.871. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_timeHBv4HBv3HBv2HC50100150200250SE +/- 0.07, N = 3SE +/- 0.23, N = 3SE +/- 3.07, N = 12SE +/- 8.14, N = 12208.34168.24157.1386.57

Timed Node.js Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To CompileHBv4HBv3HBv2HC70140210280350SE +/- 2.23, N = 12SE +/- 1.46, N = 3SE +/- 1.32, N = 3SE +/- 2.37, N = 3150.56185.57194.37330.61

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-OnlyHBv4HBv3HBv2HC110220330440550SE +/- 0.12, N = 3SE +/- 0.45, N = 3SE +/- 0.01, N = 3SE +/- 2.13, N = 396.77189.30210.18524.86

High Performance Conjugate Gradient

X Y Z: 104 104 104 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HBv4HBv3HBv2HC20406080100SE +/- 0.26, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 389.3839.6137.0426.001. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.03598, N = 3SE +/- 0.01956, N = 3SE +/- 0.01671, N = 3SE +/- 0.00763, N = 336.5671024.1736022.153308.97020

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read Only - Average LatencyHBv4HBv3HBv2HC0.0830.1660.2490.3320.415SE +/- 0.000, N = 3SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 30.1590.2100.2030.3691. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 500 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 500 - Mode: Read OnlyHBv4HBv3HBv2HC700K1400K2100K2800K3500KSE +/- 4762.10, N = 3SE +/- 4803.91, N = 3SE +/- 8486.11, N = 3SE +/- 3475.53, N = 331398462375005246624913548771. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read Only - Average LatencyHBv4HBv3HBv2HC0.15480.30960.46440.61920.774SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.003, N = 30.2560.3320.3280.6881. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

PostgreSQL

Scaling Factor: 1 - Clients: 800 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 15Scaling Factor: 1 - Clients: 800 - Mode: Read OnlyHBv4HBv3HBv2HC700K1400K2100K2800K3500KSE +/- 20304.79, N = 3SE +/- 11149.78, N = 3SE +/- 4115.38, N = 3SE +/- 4936.18, N = 331230422407602243965011618001. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lpthread -lrt -ldl -lm

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeHBv4HBv3HBv2HC918273645SE +/- 0.11164, N = 3SE +/- 0.01165, N = 3SE +/- 0.12026, N = 15SE +/- 0.03491, N = 337.0918011.184508.123568.98723

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.meshHBv4HBv3HBv2HC90180270360450SE +/- 0.78, N = 3SE +/- 0.15, N = 3SE +/- 3.57, N = 5SE +/- 1.35, N = 3402.94361.81345.14247.491. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_timeHBv4HBv3HBv2HC918273645SE +/- 0.03610, N = 3SE +/- 0.03837, N = 3SE +/- 0.13915, N = 12SE +/- 0.02906, N = 338.0764011.748508.673279.49421

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.04053, N = 3SE +/- 0.01755, N = 3SE +/- 0.00495, N = 3SE +/- 0.01225, N = 336.6121024.4586022.333608.97547

NAS Parallel Benchmarks

Test / Class: SP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.CHBv4HBv3HBv2HC15K30K45K60K75KSE +/- 954.46, N = 12SE +/- 273.09, N = 8SE +/- 34.59, N = 3SE +/- 12.00, N = 368819.3431024.7632495.8912907.541. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-OnlyHBv4HBv3HBv2HC4080120160200SE +/- 0.06, N = 3SE +/- 0.24, N = 3SE +/- 0.10, N = 3SE +/- 1.13, N = 333.4062.6464.14176.21

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUHBv4HBv3HBv2HC30060090012001500SE +/- 3.26, N = 3SE +/- 3.89, N = 3SE +/- 13.31, N = 3SE +/- 1.60, N = 3535.85860.981345.14707.35MIN: 521.12MIN: 814.31MIN: 1237.17MIN: 689.521. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUHBv4HBv3HBv2HC30060090012001500SE +/- 1.90, N = 3SE +/- 6.66, N = 3SE +/- 13.52, N = 15SE +/- 1.51, N = 3533.49886.811367.73707.32MIN: 518.68MIN: 849.06MIN: 687.141. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUHBv4HBv3HBv2HC2004006008001000SE +/- 3.60, N = 8SE +/- 4.36, N = 3SE +/- 9.54, N = 15SE +/- 1.89, N = 3411.23529.97910.94442.47MIN: 469.93MIN: 429.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUHBv4HBv3HBv2HC2004006008001000SE +/- 1.40, N = 3SE +/- 4.61, N = 15SE +/- 9.52, N = 15SE +/- 4.72, N = 3401.86533.50896.81450.25MIN: 388.53MIN: 432.991. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-OnlyHBv4HBv3HBv2HC1122334455SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.11, N = 3SE +/- 0.65, N = 159.9719.4919.4650.53

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHBv4HBv3HBv2HC1224364860SE +/- 0.359007, N = 3SE +/- 0.132089, N = 3SE +/- 0.272351, N = 15SE +/- 0.199669, N = 1553.17569125.1048765.89990314.3408301. (CC) gcc options: -O3 -march=native -fopenmp

NAS Parallel Benchmarks

Test / Class: EP.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.DHBv4HBv3HBv2HC13002600390052006500SE +/- 37.41, N = 3SE +/- 80.22, N = 12SE +/- 32.15, N = 6SE +/- 1.76, N = 35985.752879.083222.821642.031. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-OnlyHBv4HBv3HBv2HC306090120150SE +/- 0.11, N = 3SE +/- 0.04, N = 3SE +/- 0.10, N = 3SE +/- 0.49, N = 325.2651.0850.86138.81

Laghos

Test: Triple Point Problem

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point ProblemHBv4HBv3HBv2HC50100150200250SE +/- 1.25, N = 3SE +/- 0.38, N = 3SE +/- 0.57, N = 3SE +/- 0.08, N = 3228.15192.74183.82156.521. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.CHBv4HBv3HBv2HC30K60K90K120K150KSE +/- 760.56, N = 3SE +/- 36.56, N = 3SE +/- 32.07, N = 3SE +/- 15.19, N = 3151067.8162427.8666829.1828794.281. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

OSPRay

Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeHBv4HBv3HBv2HC816243240SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 332.7914.6113.9210.05

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 57HBv4HBv3HBv2HC300M600M900M1200M1500MSE +/- 14294460.47, N = 5SE +/- 550757.05, N = 3SE +/- 472581.56, N = 3SE +/- 5360840.75, N = 111390540000108600000011934000007212909091. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Decompression RatingHBv4HBv3HBv2HC160K320K480K640K800KSE +/- 8360.33, N = 15SE +/- 19127.89, N = 3SE +/- 2438.40, N = 3SE +/- 256.58, N = 37279953975053710441481931. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 22.01Test: Compression RatingHBv4HBv3HBv2HC200K400K600K800K1000KSE +/- 7680.08, N = 15SE +/- 6724.92, N = 3SE +/- 2650.49, N = 3SE +/- 748.55, N = 310322675582904894562107321. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC306090120150SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.09, N = 3SE +/- 0.02, N = 3154.5756.2746.9331.581. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HBv3HBv2HC306090120150SE +/- 0.27, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 3154.6556.2246.9831.571. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC4080120160200SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3159.2657.2347.3733.551. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HBv3HBv2HC4080120160200SE +/- 0.34, N = 3SE +/- 0.07, N = 3SE +/- 0.09, N = 3SE +/- 0.03, N = 3159.1857.3347.6133.521. (CXX) g++ options: -O3 -pthread

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-OnlyHBv4HBv3HBv2HC1632486480SE +/- 0.14, N = 3SE +/- 0.08, N = 3SE +/- 0.10, N = 3SE +/- 0.48, N = 313.9625.4726.1972.57

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 512

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 512HBv4HBv3HBv2HC400M800M1200M1600M2000MSE +/- 4603018.33, N = 3SE +/- 3040334.41, N = 3SE +/- 3174614.59, N = 3SE +/- 6341443.93, N = 320582333337353700008256533335292133331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 32HBv4HBv3HBv2HC1300M2600M3900M5200M6500MSE +/- 9214903.39, N = 3SE +/- 8912600.32, N = 3SE +/- 44818002.34, N = 3SE +/- 2852094.75, N = 361222333333419533333402710000015661333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 176 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 176 - Buffer Length: 256 - Filter Length: 57HBv4HBv3HBv2HC1400M2800M4200M5600M7000MSE +/- 11394345.58, N = 3SE +/- 4247482.91, N = 3SE +/- 13588352.86, N = 3SE +/- 5446813.54, N = 367581666673563433333410670000016647333331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 57

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 57HBv4HBv3HBv2HC1100M2200M3300M4400M5500MSE +/- 10401335.38, N = 3SE +/- 6947661.48, N = 3SE +/- 4421286.89, N = 3SE +/- 8373967.60, N = 351682333333516300000404593333315724000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 128 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 128 - Buffer Length: 256 - Filter Length: 32HBv4HBv3HBv2HC900M1800M2700M3600M4500MSE +/- 3774034.09, N = 3SE +/- 5345506.94, N = 3SE +/- 3602930.91, N = 3SE +/- 8213606.60, N = 344263000003366733333392593333315126000001. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 1 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 1 - Buffer Length: 256 - Filter Length: 32HBv4HBv3HBv2HC8M16M24M32M40MSE +/- 20201.76, N = 3SE +/- 4096.07, N = 3SE +/- 2185.81, N = 3SE +/- 1333.33, N = 3353626673281733333211667317963331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Liquid-DSP

Threads: 32 - Buffer Length: 256 - Filter Length: 32

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 32 - Buffer Length: 256 - Filter Length: 32HBv4HBv3HBv2HC200M400M600M800M1000MSE +/- 1950213.66, N = 3SE +/- 2475306.94, N = 3SE +/- 33333.33, N = 3SE +/- 3947135.39, N = 3111330000091733666710614333339644233331. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsHBv4HBv3HBv2HC0.11850.2370.35550.4740.5925SE +/- 0.00035, N = 3SE +/- 0.00027, N = 3SE +/- 0.00045, N = 3SE +/- 0.00096, N = 30.142920.271150.263850.52650

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv4HBv3HBv2HC0.70431.40862.11292.81723.5215SE +/- 0.01, N = 3SE +/- 0.01, N = 15SE +/- 0.02, N = 9SE +/- 0.01, N = 33.131.692.031.84

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC90180270360450SE +/- 10.91, N = 15SE +/- 3.45, N = 15SE +/- 3.34, N = 12SE +/- 0.53, N = 3427.10221.86200.04122.771. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 256HBv4HBv3HBv2HC100200300400500SE +/- 14.34, N = 15SE +/- 5.19, N = 15SE +/- 2.79, N = 12SE +/- 0.57, N = 3459.92214.06205.21134.761. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128HBv4HBv3HBv2HC20406080100SE +/- 3.67, N = 15SE +/- 1.84, N = 15SE +/- 1.72, N = 15SE +/- 0.65, N = 580.2559.3859.4259.141. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC100200300400500SE +/- 17.46, N = 12SE +/- 7.34, N = 15SE +/- 2.37, N = 15SE +/- 0.90, N = 3467.72207.97211.42131.961. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 128HBv4HBv3HBv2HC20406080100SE +/- 3.68, N = 14SE +/- 1.12, N = 15SE +/- 1.33, N = 15SE +/- 0.30, N = 387.6650.6151.4041.731. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 5.66, N = 15SE +/- 0.75, N = 15SE +/- 1.31, N = 3SE +/- 0.25, N = 3261.90103.2591.9257.311. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 2.84, N = 15SE +/- 0.81, N = 15SE +/- 1.27, N = 3SE +/- 0.19, N = 3258.72105.5092.3960.891. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 3.64, N = 15SE +/- 1.13, N = 3SE +/- 0.74, N = 15SE +/- 0.16, N = 3255.97105.0990.7958.551. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.96, N = 3SE +/- 0.05, N = 3SE +/- 0.23, N = 3SE +/- 0.06, N = 3323.70124.6093.2657.921. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.80, N = 3SE +/- 0.73, N = 3SE +/- 0.34, N = 3SE +/- 0.02, N = 3323.36123.2493.7957.761. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: IS.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.DHBv4HBv3HBv2HC13002600390052006500SE +/- 17.88, N = 3SE +/- 22.55, N = 3SE +/- 11.15, N = 3SE +/- 2.10, N = 35870.002793.551884.221181.481. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 4.03, N = 14SE +/- 1.05, N = 3SE +/- 1.12, N = 15SE +/- 0.12, N = 3273.12106.6388.6157.131. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.81, N = 3SE +/- 0.49, N = 3SE +/- 0.16, N = 3SE +/- 0.03, N = 3311.27118.2495.2059.901. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 1.60, N = 3SE +/- 0.40, N = 3SE +/- 0.25, N = 3SE +/- 0.05, N = 3311.80117.7394.5359.821. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 1.65, N = 3SE +/- 0.04, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 3315.98120.9691.4360.821. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512HBv4HBv3HBv2HC70140210280350SE +/- 0.50, N = 3SE +/- 0.86, N = 3SE +/- 0.15, N = 3SE +/- 0.05, N = 3314.34121.2891.4860.881. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: double - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 4.27, N = 12SE +/- 0.80, N = 15SE +/- 1.10, N = 4SE +/- 0.08, N = 3264.95102.7093.3160.571. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HBv3HBv2HC80160240320400SE +/- 1.24, N = 3SE +/- 0.93, N = 3SE +/- 0.47, N = 3SE +/- 0.04, N = 3355.86135.6995.8862.981. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC80160240320400SE +/- 1.18, N = 3SE +/- 0.58, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 3355.51135.9596.4962.901. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float - X Y Z: 256HBv4HBv3HBv2HC50100150200250SE +/- 3.04, N = 4SE +/- 0.77, N = 15SE +/- 0.61, N = 15SE +/- 0.02, N = 3244.34103.4191.2659.731. (CXX) g++ options: -O3 -pthread

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-OnlyHBv4HBv3HBv2HC0.29030.58060.87091.16121.4515SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 31.290.791.040.88

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256HBv4HBv3HBv2HC60120180240300SE +/- 1.07, N = 3SE +/- 1.41, N = 15SE +/- 0.67, N = 15SE +/- 0.07, N = 3256.35103.5191.5458.361. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC306090120150SE +/- 1.21, N = 15SE +/- 0.33, N = 3SE +/- 0.57, N = 3SE +/- 0.05, N = 3122.9839.3751.2030.221. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128HBv4HBv3HBv2HC20406080100SE +/- 4.77, N = 15SE +/- 0.34, N = 3SE +/- 1.30, N = 15SE +/- 0.23, N = 385.0156.8761.1458.911. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256HBv4HBv3HBv2HC100200300400500SE +/- 14.97, N = 12SE +/- 5.11, N = 15SE +/- 1.85, N = 3SE +/- 0.52, N = 3442.83198.66203.77123.631. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: float-long - X Y Z: 256HBv4HBv3HBv2HC50100150200250SE +/- 4.85, N = 15SE +/- 1.07, N = 6SE +/- 1.33, N = 3SE +/- 0.27, N = 3247.73105.3692.1359.551. (CXX) g++ options: -O3 -pthread

Remhos

Test: Sample Remap Example

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap ExampleHBv4HBv3HBv2HC612182430SE +/- 0.14, N = 3SE +/- 0.02, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 315.3715.2614.9327.381. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -pthread -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.25, N = 3SE +/- 2.52, N = 6SE +/- 1.03, N = 3SE +/- 0.09, N = 3622.58254.25191.78114.031. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double - X Y Z: 256HBv4HBv3HBv2HC306090120150SE +/- 1.20, N = 3SE +/- 0.29, N = 11SE +/- 0.29, N = 3SE +/- 0.08, N = 3121.6138.4550.7130.171. (CXX) g++ options: -O3 -pthread

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.CHBv4HBv3HBv2HC15K30K45K60K75KSE +/- 745.61, N = 3SE +/- 194.34, N = 3SE +/- 219.43, N = 3SE +/- 13.57, N = 369051.6336619.2941977.6920188.891. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigHBv4HBv3HBv2HC612182430SE +/- 0.018282, N = 3SE +/- 0.027453, N = 3SE +/- 0.011742, N = 3SE +/- 0.026763, N = 33.5813916.2771075.91580525.0195601. (CXX) g++ options: -fopenmp -pthread -lmpi

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUHBv4HBv3HBv2HC0.31670.63340.95011.26681.5835SE +/- 0.001421, N = 3SE +/- 0.013826, N = 12SE +/- 0.014464, N = 3SE +/- 0.000702, N = 30.7529290.9100911.4075800.882446MIN: 0.69MIN: 1.11MIN: 0.831. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.14, N = 3SE +/- 1.85, N = 3SE +/- 2.04, N = 3SE +/- 0.06, N = 3596.23232.17190.95110.051. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: Stock - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 2.49, N = 3SE +/- 0.15, N = 3SE +/- 1.02, N = 3SE +/- 0.10, N = 3590.93233.80189.21110.201. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float-long - X Y Z: 512HBv4HBv3HBv2HC130260390520650SE +/- 4.23, N = 3SE +/- 2.91, N = 3SE +/- 1.39, N = 3SE +/- 0.18, N = 3624.95257.42191.14113.941. (CXX) g++ options: -O3 -pthread

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-OnlyHBv4HBv3HBv2HC0.6931.3862.0792.7723.465SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 33.081.682.081.82

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigHBv4HBv3HBv2HC3691215SE +/- 0.029043, N = 3SE +/- 0.006682, N = 3SE +/- 0.009233, N = 3SE +/- 0.017495, N = 32.1220743.6493173.46688510.6454801. (CXX) g++ options: -fopenmp -pthread -lmpi

NAS Parallel Benchmarks

Test / Class: CG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.CHBv4HBv3HBv2HC9K18K27K36K45KSE +/- 77.41, N = 3SE +/- 20.87, N = 3SE +/- 108.02, N = 3SE +/- 233.39, N = 1540326.2921551.4822314.0214356.201. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256HBv4HBv3HBv2HC306090120150SE +/- 1.16, N = 3SE +/- 0.14, N = 3SE +/- 0.55, N = 3SE +/- 0.03, N = 3123.4138.5750.0830.271. (CXX) g++ options: -O3 -pthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256HBv4HBv3HBv2HC306090120150SE +/- 1.65, N = 3SE +/- 0.14, N = 3SE +/- 0.55, N = 3SE +/- 0.08, N = 3123.3939.8150.9030.121. (CXX) g++ options: -O3 -pthread

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUHBv4HBv3HBv2HC246810SE +/- 0.002422, N = 3SE +/- 0.039917, N = 15SE +/- 0.032665, N = 3SE +/- 0.093711, N = 120.3061410.6242336.8382502.079200MIN: 5.971. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

NAS Parallel Benchmarks

Test / Class: MG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.CHBv4HBv3HBv2HC20K40K60K80K100KSE +/- 748.94, N = 13SE +/- 613.84, N = 15SE +/- 354.81, N = 3SE +/- 24.47, N = 3108125.8646705.4743410.7119508.001. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUHBv4HBv3HBv2HC0.71.42.12.83.5SE +/- 0.000440, N = 3SE +/- 0.001799, N = 3SE +/- 0.002431, N = 3SE +/- 0.015370, N = 30.2764720.5567410.5738783.111210MIN: 0.5MIN: 0.47MIN: 1.731. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUHBv4HBv3HBv2HC0.36230.72461.08691.44921.8115SE +/- 0.001551, N = 3SE +/- 0.003506, N = 3SE +/- 0.021847, N = 3SE +/- 0.002723, N = 30.5828061.4086201.6100201.244800MIN: 0.56MIN: 1.36MIN: 1.49MIN: 1.221. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HBv4HBv3HBv2HC12002400360048006000SE +/- 226.33, N = 12SE +/- 17.54, N = 12SE +/- 18.03, N = 13SE +/- 5.15, N = 155719.02435.6411.7731.6

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HBv4HBv3HBv2HC11002200330044005500SE +/- 443.26, N = 12SE +/- 32.59, N = 14SE +/- 3.90, N = 12SE +/- 2.82, N = 115006.81506.3195.1379.9

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HBv4HBv3HBv2HC15003000450060007500SE +/- 63.60, N = 3SE +/- 23.34, N = 3SE +/- 51.69, N = 9SE +/- 13.41, N = 126983.22032.11444.2898.8

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HBv4HBv3HBv2HC14002800420056007000SE +/- 59.85, N = 3SE +/- 29.40, N = 3SE +/- 153.42, N = 6SE +/- 11.02, N = 36585.62284.61519.51328.4


Phoronix Test Suite v10.8.5