AMD EPYC 9684X 3D V-Cache Benchmark

AMD EPYC 9684X 96-Core testing by Michael Larabel for a future article. Various benchmarks conducted with the EPYC 9684X 1P and then repeated after disabling 3D V-Cache from the BIOS to see direct comparison of 3DV impact. Plus monitoring CPU thermal / power / frequency for future follow-up article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2307201-PTS-GENOAX3D86
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
Default
July 17 2023
  1 Day, 46 Minutes
3DV Disabled
July 19 2023
  1 Day, 4 Hours, 53 Minutes
Invert Behavior (Only Show Selected Data)
  1 Day, 2 Hours, 50 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC 9684X 3D V-Cache BenchmarkOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9684X 96-Core @ 2.55GHz (96 Cores / 192 Threads)AMD Titanite_4G (RTI1007B BIOS)AMD Device 14a4768GB2 x 1920GB SAMSUNG MZWLJ1T9HBJR-00007ASPEEDBroadcom NetXtreme BCM5720 PCIeUbuntu 22.045.19.0-41-generic (x86_64)GNOME Shell 42.5X Server 1.21.1.41.3.224GCC 11.3.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionAMD EPYC 9684X 3D V-Cache Benchmark PerformanceSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa101121 - Python 3.10.6- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Default vs. 3DV Disabled ComparisonPhoronix Test SuiteBaseline+27.4%+27.4%+54.8%+54.8%+82.2%+82.2%tConvolve OpenMP - Degridding109.5%d.M.M.S - Execution Time104.6%Matrix 3D Math103.5%tConvolve MPI - Gridding76.2%6471.9%SP.C64.4%3263.6%AVL Tree62.3%V.D.F - CPU58.4%V.D.F - CPU58.3%tConvolve MPI - Degridding54.1%r2c - FFTW - float - 51233.7%d.L.M.S - Execution Time30.8%LU.C27.6%BT.C23.1%Eigen21.6%Monero - 1M19.8%c2c - FFTW - float - 51218.4%IS.D16.7%12816.2%CPU Cache15.5%MG.C14.7%CG.C14.1%c2c - FFTW - double - 25613.8%r2c - FFTW - double - 51213.6%tConvolve OpenMP - Gridding13.6%CPU - Numpy - 4194304 - Equation of State13.4%i.i.1.C.P.D13.3%BLAS13.2%r2c - FFTW - double - 25613.2%i.i.1.C.P.D12.8%Church Facade12%Malloc11.2%d.L.M.S - Mesh Time10.9%104 104 104 - 6010.9%conus 2.5km10.4%Matrix Math10.4%MPI CPU - water_GMX50_bare10.3%Pathtracer ISPC - Crown9.8%4009.5%Carbon Nanotube9.4%Pathtracer ISPC - Asian Dragon Obj9.1%2569%Pathtracer ISPC - Asian Dragon8.5%7.7%tConvolve MT - Degridding7.7%CPU - Numpy - 4194304 - Isoneutral Mixing7.5%Pipe7.3%Futex7.3%tConvolve MT - Gridding6.8%C75526.7%Exhaustive6.6%Fused Multiply-Add6.6%5006.6%Small6.2%allmodconfig6.2%N.Q.A.B.b.u.S.1.P - A.M.S6.1%Thorough6%N.Q.A.B.b.u.S.1.P - A.M.S6%S.F.P.R6%Time To Compile5.9%Ninja5.8%10005.7%2 - 4K - 1 - Path Tracer5.7%V.F.P5.6%2 - 4K - 16 - Path Tracer5.6%1 - 4K - 16 - Path Tracer5.5%3 - 4K - 1 - Path Tracer5.4%Lion5.3%3 - 4K - 32 - Path Tracer5.3%1 - 4K - 1 - Path Tracer5.2%2 - 4K - 32 - Path Tracer5%Barbershop - CPU-Only5%gravity_spheres_volume/dim_512/ao/real_time5%gravity_spheres_volume/dim_512/scivis/real_time5%3 - 4K - 16 - Path Tracer4.9%c2c - FFTW - double - 5124.9%1 - 4K - 32 - Path Tracer4.9%ATPase Simulation - 327,506 Atoms4.8%Time To Compile4.7%Memory Copying4.7%144 144 144 - 604.6%particle_volume/scivis/real_time4.3%EP.D4.3%particle_volume/ao/real_time4.3%Unix Makefiles4.2%Total Time4.1%P.D.F - CPU4%P.D.F - CPU3.9%C.S.9.P.Y.P - A.M.S3.8%Classroom - CPU-Only3.8%P.P.B.T.T3.7%C.S.9.P.Y.P - A.M.S3.7%Medium3.6%3.6%P.V.B.D.F - CPU3.6%gravity_spheres_volume/dim_512/pathtracer/real_time3.6%P.V.B.D.F - CPU3.5%3.5%Semaphores3.5%CPU - 512 - GoogLeNet3.3%Pabellon Barcelona - CPU-Only3.3%Fishy Cat - CPU-Only3.2%d.M.M.S - Mesh Time2.8%Streams2.7%160 160 160 - 602.6%CPU Stress2.6%Time To Compile2.6%P.D.F - CPU2.6%Mutex2.6%P.D.F - CPU2.4%192 - 256 - 5122.4%H.C.O2.4%Vector Math2.4%Wide Vector Math2.3%ASKAPOpenFOAMStress-NGASKAPlibxsmmNAS Parallel BenchmarkslibxsmmStress-NGOpenVINOOpenVINOASKAPHeFFTe - Highly Efficient FFT for ExascaleOpenFOAMNAS Parallel BenchmarksNAS Parallel BenchmarksLeelaChessZeroXmrigHeFFTe - Highly Efficient FFT for ExascaleNAS Parallel BenchmarkslibxsmmStress-NGNAS Parallel BenchmarksNAS Parallel BenchmarksHeFFTe - Highly Efficient FFT for ExascaleHeFFTe - Highly Efficient FFT for ExascaleASKAPPyHPC BenchmarksXcompact3d Incompact3dLeelaChessZeroHeFFTe - Highly Efficient FFT for ExascaleXcompact3d Incompact3dGoogle DracoStress-NGOpenFOAMHigh Performance Conjugate GradientWRFStress-NGGROMACSEmbreePalabosGPAWEmbreelibxsmmEmbreeLULESHASKAPPyHPC BenchmarksStress-NGStress-NGASKAPNgspiceASTC EncoderStress-NGPalabosminiFETimed Linux Kernel CompilationNeural Magic DeepSparseASTC EncoderNeural Magic DeepSparseACES DGEMMTimed Godot Game Engine CompilationTimed LLVM CompilationPalabosOSPRay StudioStress-NGOSPRay StudioOSPRay StudioOSPRay StudioGoogle DracoOSPRay StudioOSPRay StudioOSPRay StudioBlenderOSPRayOSPRayOSPRay StudioHeFFTe - Highly Efficient FFT for ExascaleOSPRay StudioNAMDTimed Node.js CompilationStress-NGHigh Performance Conjugate GradientOSPRayNAS Parallel BenchmarksOSPRayTimed LLVM CompilationStockfishOpenVINOOpenVINONeural Magic DeepSparseBlendersrsRAN ProjectNeural Magic DeepSparseASTC EncoderNumpy BenchmarkOpenVINOOSPRayOpenVINOAlgebraic Multi-Grid BenchmarkStress-NGTensorFlowBlenderBlenderOpenFOAMPETScHigh Performance Conjugate GradientStress-NGTimed PHP CompilationOpenVINOStress-NGOpenVINOLiquid-DSPASKAPStress-NGStress-NGDefault3DV Disabled

AMD EPYC 9684X 3D V-Cache Benchmarkwrf: conus 2.5kmhpcg: 192 192 192 - 60openfoam: drivaerFastback, Large Mesh Size - Execution Timeopenfoam: drivaerFastback, Large Mesh Size - Mesh Timehpcg: 160 160 160 - 60petsc: Streamshpcg: 144 144 144 - 60libxsmm: 128hpcg: 104 104 104 - 60tensorflow: CPU - 512 - GoogLeNetlczero: BLASlczero: Eigenpalabos: 400askap: tConvolve MT - Degriddingaskap: tConvolve MT - Griddinglibxsmm: 256build-linux-kernel: allmodconfigbuild-llvm: Unix Makefilespalabos: 500stockfish: Total Timepalabos: 1000openvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUnumpy: blender: Barbershop - CPU-Onlyospray: particle_volume/scivis/real_timebuild-gem5: Time To Compileospray-studio: 3 - 4K - 32 - Path Traceropenfoam: drivaerFastback, Medium Mesh Size - Execution Timeopenfoam: drivaerFastback, Medium Mesh Size - Mesh Timengspice: C2670ospray-studio: 2 - 4K - 32 - Path Tracerospray-studio: 1 - 4K - 32 - Path Tracerbuild-llvm: Ninjaospray-studio: 3 - 4K - 1 - Path Tracerospray-studio: 2 - 4K - 1 - Path Tracerospray-studio: 1 - 4K - 1 - Path Tracerngspice: C7552build-nodejs: Time To Compileospray-studio: 2 - 4K - 16 - Path Tracerospray-studio: 1 - 4K - 16 - Path Tracerospray-studio: 3 - 4K - 16 - Path Tracerospray: particle_volume/ao/real_timebuild-godot: Time To Compilestress-ng: Pipepyhpc: CPU - Numpy - 4194304 - Isoneutral Mixingstress-ng: Matrix 3D Mathopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUaskap: tConvolve MPI - Griddingaskap: tConvolve MPI - Degriddingopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/pathtracer/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timedeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamblender: Pabellon Barcelona - CPU-Onlynpb: EP.Ddeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamblender: Classroom - CPU-Onlydeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamgpaw: Carbon Nanotubebuild-php: Time To Compilesrsran: PUSCH Processor Benchmark, Throughput Totalstress-ng: CPU Cachestress-ng: Mallocliquid-dsp: 192 - 256 - 512stress-ng: AVL Treestress-ng: Matrix Mathstress-ng: Futexstress-ng: Vector Mathstress-ng: Wide Vector Mathstress-ng: Fused Multiply-Addstress-ng: Vector Floating Pointstress-ng: Memory Copyingstress-ng: CPU Stressstress-ng: Semaphoresstress-ng: Mutexheffte: c2c - FFTW - double - 512pyhpc: CPU - Numpy - 4194304 - Equation of Stateamg: gromacs: MPI CPU - water_GMX50_barenamd: ATPase Simulation - 327,506 Atomsaskap: tConvolve OpenMP - Degriddingaskap: tConvolve OpenMP - Griddingremhos: Sample Remap Exampleblender: Fishy Cat - CPU-Onlylulesh: embree: Pathtracer ISPC - Asian Dragon Objheffte: c2c - FFTW - float - 512xmrig: Monero - 1Mheffte: r2c - FFTW - double - 512askap: Hogbom Clean OpenMPcloverleaf: Lagrangian-Eulerian Hydrodynamicsblender: BMW27 - CPU-Onlynpb: BT.Cnpb: SP.Castcenc: Exhaustivedraco: Church Facadeincompact3d: input.i3d 129 Cells Per Directionminife: Smallastcenc: Thoroughincompact3d: input.i3d 193 Cells Per Directionnpb: IS.Dmt-dgemm: Sustained Floating-Point Ratenpb: CG.Cdraco: Lionlibxsmm: 64heffte: r2c - FFTW - float - 512npb: LU.Clibxsmm: 32embree: Pathtracer ISPC - Crownembree: Pathtracer ISPC - Asian Dragonastcenc: Mediumnpb: MG.Cheffte: c2c - FFTW - double - 256heffte: r2c - FFTW - double - 256Default3DV Disabled11269.26222.83328994.4651585.1366223.8369272616.799424.61432913.426.8250409.02976011884317.82015603.213582.83064.4202.557184.047328.713297289892370.18912.423860.64586.50142.0325.1088137.69140392181.01759108.36823118.3943434134007112.887126110651059100.850105.22717078169532022325.157188.38460302912.671.57816595.281753.6327.111766.5426.8973226.759791.18.365732.758.455672.0625.927226.514526.7923417.8584114.177749.6110697.90145.0544329.977340.5160.0637797.9471139.0157344.405334.77433.44618408.11397574.75360348999.6512878666671665.57418033.133985709.56545725.733485374.3376566577.12257925.6132994.18212380.76223213939.8649736118.0667.85030.767241428300011.7930.2473355153.026625.610.29320.6230715.328123.5135153.84469684.3134.8201212.1710.2916.28314777.90207614.706.141160432.1752161354408.256.77807.685286815696.5140.55318059737.9450112455.4332.742337910.641311.3117.8261143.9954419.4265137308.2787.0392190.12612439.01422.573211763.369649.0153823.2313265446.592823.52472506.824.1985395.8586199770290.18914487.712713.52811.7215.052191.728308.472285533040350.09119.672438.56565.91149.1924.0622139.70342524370.30917111.44305119.7053607335661119.453132911261114107.580110.20218030178792122224.128793.63656210994.921.6968156.461796.4226.431835.2325.8641569.438810.48.665538.588.555603.3424.696725.599925.5195433.1377109.945351.2310257.81153.8699311.201742.0561.0087785.5755141.3071338.834838.05434.31017748.81209699.75323948044.1012573666671026.40378776.653716134.14533171.173407854.7771844167.06244276.8031510.80206993.59215763708.6148488923.4164.70290.870233274366710.6900.2590926323.023436.710.45721.2928515.359113.2423129.92558160.0118.6421183.5610.4816.57255679.10126301.035.759867682.4639547351223.553.54278.668864444881.2238.26342052350.0452781428.3248.852264731.92801.6107.2730132.7717404.6610119761.0776.4544167.964OpenBenchmarking.org

WRF

WRF, the Weather Research and Forecasting Model, is a "next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. It features two dynamical cores, a data assimilation system, and a software architecture supporting parallel computation and system extensibility." Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterWRF 4.2.2Input: conus 2.5km3DV DisabledDefault3K6K9K12K15K12439.0111269.261. (F9X) gfortran options: -O2 -ftree-vectorize -funroll-loops -ffree-form -fconvert=big-endian -frecord-marker=4 -fallow-invalid-boz -lesmf_time -lwrfio_nf -lnetcdff -lnetcdf -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 192 192 192 - RT: 603DV DisabledDefault510152025SE +/- 0.20, N = 9SE +/- 0.18, N = 322.5722.831. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Large Mesh Size - Execution Time3DV DisabledDefault3K6K9K12K15K11763.378994.471. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Large Mesh Size - Mesh Time3DV DisabledDefault140280420560700649.02585.141. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 603DV DisabledDefault612182430SE +/- 0.23, N = 9SE +/- 0.34, N = 323.2323.841. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

PETSc

PETSc, the Portable, Extensible Toolkit for Scientific Computation, is for the scalable (parallel) solution of scientific applications modeled by partial differential equations. This test profile runs the PETSc "make streams" benchmark and records the throughput rate when all available cores are utilized for the MPI Streams build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: Streams3DV DisabledDefault60K120K180K240K300KSE +/- 799.96, N = 3SE +/- 6007.86, N = 9265446.59272616.801. (CC) gcc options: -fPIC -O3 -O2 -lpthread -ludev -lpciaccess -lm

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 603DV DisabledDefault612182430SE +/- 0.21, N = 3SE +/- 0.44, N = 923.5224.611. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 1283DV DisabledDefault6001200180024003000SE +/- 6.05, N = 3SE +/- 27.97, N = 92506.82913.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 603DV DisabledDefault612182430SE +/- 0.34, N = 9SE +/- 0.82, N = 924.2026.831. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNet3DV DisabledDefault90180270360450SE +/- 5.03, N = 12SE +/- 3.82, N = 12395.85409.02

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLAS3DV DisabledDefault2K4K6K8K10KSE +/- 93.12, N = 4SE +/- 103.04, N = 5861997601. (CXX) g++ options: -flto -pthread

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: Eigen3DV DisabledDefault3K6K9K12K15KSE +/- 103.20, N = 5SE +/- 72.53, N = 39770118841. (CXX) g++ options: -flto -pthread

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 4003DV DisabledDefault70140210280350SE +/- 0.38, N = 3SE +/- 2.86, N = 12290.19317.821. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - Degridding3DV DisabledDefault3K6K9K12K15KSE +/- 45.70, N = 3SE +/- 16.78, N = 314487.715603.21. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve MT - Gridding3DV DisabledDefault3K6K9K12K15KSE +/- 1.82, N = 3SE +/- 1.20, N = 312713.513582.81. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 2563DV DisabledDefault7001400210028003500SE +/- 15.39, N = 3SE +/- 32.92, N = 52811.73064.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested or alternatively an allmodconfig for building all possible kernel modules for the build. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.1Build: allmodconfig3DV DisabledDefault50100150200250SE +/- 0.76, N = 3SE +/- 0.58, N = 3215.05202.56

Timed LLVM Compilation

This test times how long it takes to compile/build the LLVM compiler stack. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: Unix Makefiles3DV DisabledDefault4080120160200SE +/- 1.10, N = 3SE +/- 1.07, N = 3191.73184.05

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 5003DV DisabledDefault70140210280350SE +/- 0.57, N = 3SE +/- 0.10, N = 3308.47328.711. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

Stockfish

This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 15Total Time3DV DisabledDefault60M120M180M240M300MSE +/- 5254044.14, N = 12SE +/- 1047576.89, N = 32855330402972898921. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto=jobserver

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 10003DV DisabledDefault80160240320400SE +/- 0.97, N = 3SE +/- 0.10, N = 3350.09370.191. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU3DV DisabledDefault510152025SE +/- 0.14, N = 14SE +/- 0.00, N = 319.6712.42MIN: 5.14 / MAX: 51.62MIN: 5.8 / MAX: 47.231. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU3DV DisabledDefault8001600240032004000SE +/- 18.66, N = 14SE +/- 0.30, N = 32438.563860.641. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy Benchmark3DV DisabledDefault130260390520650SE +/- 2.45, N = 3SE +/- 0.96, N = 3565.91586.50

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Barbershop - Compute: CPU-Only3DV DisabledDefault306090120150SE +/- 0.33, N = 3SE +/- 0.01, N = 3149.19142.03

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_time3DV DisabledDefault612182430SE +/- 0.03, N = 3SE +/- 0.01, N = 324.0625.11

Timed Gem5 Compilation

This test times how long it takes to compile Gem5. Gem5 is a simulator for computer system architecture research. Gem5 is widely used for computer architecture research within the industry, academia, and more. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Gem5 Compilation 21.2Time To Compile3DV DisabledDefault306090120150SE +/- 1.60, N = 3SE +/- 0.72, N = 3139.70137.69

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer3DV DisabledDefault9K18K27K36K45KSE +/- 38.63, N = 3SE +/- 43.02, N = 342524403921. (CXX) g++ options: -O3 -lm -ldl

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution Time3DV DisabledDefault80160240320400370.31181.021. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Mesh Time3DV DisabledDefault20406080100111.44108.371. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Ngspice

Ngspice is an open-source SPICE circuit simulator. Ngspice was originally based on the Berkeley SPICE electronic circuit simulator. Ngspice supports basic threading using OpenMP. This test profile is making use of the ISCAS 85 benchmark circuits. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNgspice 34Circuit: C26703DV DisabledDefault306090120150SE +/- 0.47, N = 3SE +/- 0.26, N = 3119.71118.391. (CC) gcc options: -O0 -fopenmp -lm -lstdc++ -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lXft -lfontconfig -lXrender -lfreetype -lSM -lICE

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer3DV DisabledDefault8K16K24K32K40KSE +/- 46.77, N = 3SE +/- 51.48, N = 336073343411. (CXX) g++ options: -O3 -lm -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer3DV DisabledDefault8K16K24K32K40KSE +/- 22.19, N = 3SE +/- 22.67, N = 335661340071. (CXX) g++ options: -O3 -lm -ldl

Timed LLVM Compilation

This test times how long it takes to compile/build the LLVM compiler stack. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: Ninja3DV DisabledDefault306090120150SE +/- 0.24, N = 3SE +/- 0.31, N = 3119.45112.89

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer3DV DisabledDefault30060090012001500SE +/- 1.53, N = 3SE +/- 0.33, N = 3132912611. (CXX) g++ options: -O3 -lm -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer3DV DisabledDefault2004006008001000SE +/- 1.33, N = 3SE +/- 2.33, N = 3112610651. (CXX) g++ options: -O3 -lm -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer3DV DisabledDefault2004006008001000SE +/- 0.67, N = 3SE +/- 1.15, N = 3111410591. (CXX) g++ options: -O3 -lm -ldl

Ngspice

Ngspice is an open-source SPICE circuit simulator. Ngspice was originally based on the Berkeley SPICE electronic circuit simulator. Ngspice supports basic threading using OpenMP. This test profile is making use of the ISCAS 85 benchmark circuits. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNgspice 34Circuit: C75523DV DisabledDefault20406080100SE +/- 0.13, N = 3SE +/- 0.08, N = 3107.58100.851. (CC) gcc options: -O0 -fopenmp -lm -lstdc++ -lfftw3 -lXaw -lXmu -lXt -lXext -lX11 -lXft -lfontconfig -lXrender -lfreetype -lSM -lICE

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 19.8.1Time To Compile3DV DisabledDefault20406080100SE +/- 0.35, N = 3SE +/- 0.15, N = 3110.20105.23

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer3DV DisabledDefault4K8K12K16K20KSE +/- 28.09, N = 3SE +/- 6.03, N = 318030170781. (CXX) g++ options: -O3 -lm -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer3DV DisabledDefault4K8K12K16K20KSE +/- 41.97, N = 3SE +/- 50.26, N = 317879169531. (CXX) g++ options: -O3 -lm -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer3DV DisabledDefault5K10K15K20K25KSE +/- 17.14, N = 3SE +/- 50.67, N = 321222202231. (CXX) g++ options: -O3 -lm -ldl

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_time3DV DisabledDefault612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 324.1325.16

Timed Godot Game Engine Compilation

This test times how long it takes to compile the Godot Game Engine. Godot is a popular, open-source, cross-platform 2D/3D game engine and is built using the SCons build system and targeting the X11 platform. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Godot Game Engine Compilation 4.0Time To Compile3DV DisabledDefault20406080100SE +/- 0.11, N = 3SE +/- 0.06, N = 393.6488.38

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Pipe3DV DisabledDefault13M26M39M52M65MSE +/- 848784.40, N = 15SE +/- 612854.45, N = 356210994.9260302912.671. (CXX) g++ options: -O2 -std=gnu99 -lc

PyHPC Benchmarks

PyHPC-Benchmarks is a suite of Python high performance computing benchmarks for execution on CPUs and GPUs using various popular Python HPC libraries. The PyHPC CPU-based benchmarks focus on sequential CPU performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing3DV DisabledDefault0.38160.76321.14481.52641.908SE +/- 0.005, N = 3SE +/- 0.001, N = 31.6961.578

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Matrix 3D Math3DV DisabledDefault4K8K12K16K20KSE +/- 866.14, N = 12SE +/- 154.68, N = 38156.4616595.281. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU3DV DisabledDefault400800120016002000SE +/- 6.35, N = 3SE +/- 4.64, N = 31796.421753.63MIN: 961.13 / MAX: 2503MIN: 955.99 / MAX: 2502.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU3DV DisabledDefault612182430SE +/- 0.11, N = 3SE +/- 0.10, N = 326.4327.111. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU3DV DisabledDefault400800120016002000SE +/- 13.28, N = 3SE +/- 6.69, N = 31835.231766.54MIN: 923.09 / MAX: 2505.65MIN: 966.02 / MAX: 2507.191. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU3DV DisabledDefault612182430SE +/- 0.19, N = 3SE +/- 0.10, N = 325.8626.891. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - Gridding3DV DisabledDefault16K32K48K64K80KSE +/- 1120.94, N = 6SE +/- 0.00, N = 341569.473226.71. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - Degridding3DV DisabledDefault13K26K39K52K65KSE +/- 363.62, N = 6SE +/- 380.83, N = 338810.459791.11. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU3DV DisabledDefault246810SE +/- 0.00, N = 3SE +/- 0.00, N = 38.668.36MIN: 5.11 / MAX: 29.49MIN: 4.99 / MAX: 31.021. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU3DV DisabledDefault12002400360048006000SE +/- 2.81, N = 3SE +/- 2.96, N = 35538.585732.751. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU3DV DisabledDefault246810SE +/- 0.00, N = 3SE +/- 0.00, N = 38.558.45MIN: 4.32 / MAX: 31.02MIN: 4.5 / MAX: 30.541. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU3DV DisabledDefault12002400360048006000SE +/- 2.58, N = 3SE +/- 0.38, N = 35603.345672.061. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_time3DV DisabledDefault612182430SE +/- 0.07, N = 3SE +/- 0.04, N = 324.7025.93

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time3DV DisabledDefault612182430SE +/- 0.01, N = 3SE +/- 0.02, N = 325.6026.51

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_time3DV DisabledDefault612182430SE +/- 0.06, N = 3SE +/- 0.12, N = 325.5226.79

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream3DV DisabledDefault90180270360450SE +/- 0.14, N = 3SE +/- 0.35, N = 3433.14417.86

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream3DV DisabledDefault306090120150SE +/- 0.06, N = 3SE +/- 0.07, N = 3109.95114.18

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Pabellon Barcelona - Compute: CPU-Only3DV DisabledDefault1224364860SE +/- 0.30, N = 3SE +/- 0.08, N = 351.2349.61

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.D3DV DisabledDefault2K4K6K8K10KSE +/- 149.17, N = 15SE +/- 40.63, N = 410257.8110697.901. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream3DV DisabledDefault306090120150SE +/- 0.07, N = 3SE +/- 0.07, N = 3153.87145.05

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream3DV DisabledDefault70140210280350SE +/- 0.17, N = 3SE +/- 0.25, N = 3311.20329.98

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Classroom - Compute: CPU-Only3DV DisabledDefault1020304050SE +/- 0.06, N = 3SE +/- 0.11, N = 342.0540.51

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream3DV DisabledDefault1428425670SE +/- 0.02, N = 3SE +/- 0.03, N = 361.0160.06

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream3DV DisabledDefault2004006008001000SE +/- 0.28, N = 3SE +/- 0.46, N = 3785.58797.95

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream3DV DisabledDefault306090120150SE +/- 0.05, N = 3SE +/- 0.13, N = 3141.31139.02

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream3DV DisabledDefault70140210280350SE +/- 0.08, N = 3SE +/- 0.41, N = 3338.83344.41

GPAW

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 23.6Input: Carbon Nanotube3DV DisabledDefault918273645SE +/- 0.05, N = 3SE +/- 0.18, N = 338.0534.771. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

Timed PHP Compilation

This test times how long it takes to build PHP. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 8.1.9Time To Compile3DV DisabledDefault816243240SE +/- 0.29, N = 3SE +/- 0.27, N = 334.3133.45

srsRAN Project

srsRAN Project is a complete ORAN-native 5G RAN solution created by Software Radio Systems (SRS). The srsRAN Project radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 23.5Test: PUSCH Processor Benchmark, Throughput Total3DV DisabledDefault4K8K12K16K20KSE +/- 16.77, N = 3SE +/- 23.08, N = 317748.818408.11. (CXX) g++ options: -march=native -mfma -O3 -fno-trapping-math -fno-math-errno -lgtest

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: CPU Cache3DV DisabledDefault300K600K900K1200K1500KSE +/- 17248.98, N = 3SE +/- 11233.16, N = 31209699.751397574.751. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Malloc3DV DisabledDefault80M160M240M320M400MSE +/- 35157.83, N = 3SE +/- 494466.73, N = 3323948044.10360348999.651. (CXX) g++ options: -O2 -std=gnu99 -lc

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgsamples/s, More Is BetterLiquid-DSP 1.6Threads: 192 - Buffer Length: 256 - Filter Length: 5123DV DisabledDefault300M600M900M1200M1500MSE +/- 1211518.79, N = 3SE +/- 762306.44, N = 3125736666712878666671. (CC) gcc options: -O3 -pthread -lm -lc -lliquid

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: AVL Tree3DV DisabledDefault400800120016002000SE +/- 0.16, N = 3SE +/- 0.40, N = 31026.401665.571. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Matrix Math3DV DisabledDefault90K180K270K360K450KSE +/- 16.98, N = 3SE +/- 67.17, N = 3378776.65418033.131. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Futex3DV DisabledDefault900K1800K2700K3600K4500KSE +/- 37571.61, N = 3SE +/- 13786.79, N = 33716134.143985709.561. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Vector Math3DV DisabledDefault120K240K360K480K600KSE +/- 28.98, N = 3SE +/- 61.16, N = 3533171.17545725.731. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Wide Vector Math3DV DisabledDefault700K1400K2100K2800K3500KSE +/- 2839.05, N = 3SE +/- 964.27, N = 33407854.773485374.331. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Fused Multiply-Add3DV DisabledDefault16M32M48M64M80MSE +/- 33044.58, N = 3SE +/- 17500.16, N = 371844167.0676566577.121. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Vector Floating Point3DV DisabledDefault60K120K180K240K300KSE +/- 430.31, N = 3SE +/- 151.27, N = 3244276.80257925.611. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Memory Copying3DV DisabledDefault7K14K21K28K35KSE +/- 4.41, N = 3SE +/- 1.22, N = 331510.8032994.181. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: CPU Stress3DV DisabledDefault50K100K150K200K250KSE +/- 296.75, N = 3SE +/- 233.96, N = 3206993.59212380.761. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Semaphores3DV DisabledDefault50M100M150M200M250MSE +/- 2598030.86, N = 3SE +/- 2019374.88, N = 3215763708.61223213939.861. (CXX) g++ options: -O2 -std=gnu99 -lc

OpenBenchmarking.orgBogo Ops/s, More Is BetterStress-NG 0.15.10Test: Mutex3DV DisabledDefault11M22M33M44M55MSE +/- 127023.50, N = 3SE +/- 152725.58, N = 348488923.4149736118.061. (CXX) g++ options: -O2 -std=gnu99 -lc

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 5123DV DisabledDefault1530456075SE +/- 0.01, N = 3SE +/- 0.10, N = 364.7067.851. (CXX) g++ options: -O3

PyHPC Benchmarks

PyHPC-Benchmarks is a suite of Python high performance computing benchmarks for execution on CPUs and GPUs using various popular Python HPC libraries. The PyHPC CPU-based benchmarks focus on sequential CPU performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State3DV DisabledDefault0.19580.39160.58740.78320.979SE +/- 0.001, N = 3SE +/- 0.004, N = 30.8700.767

Algebraic Multi-Grid Benchmark

AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.23DV DisabledDefault500M1000M1500M2000M2500MSE +/- 1555438.45, N = 3SE +/- 4643811.51, N = 3233274366724142830001. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -lmpi

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2023Implementation: MPI CPU - Input: water_GMX50_bare3DV DisabledDefault3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 310.6911.791. (CXX) g++ options: -O3

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 Atoms3DV DisabledDefault0.05830.11660.17490.23320.2915SE +/- 0.00025, N = 4SE +/- 0.00040, N = 30.259090.24733

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - Degridding3DV DisabledDefault12K24K36K48K60KSE +/- 302.56, N = 8SE +/- 1901.83, N = 726323.055153.01. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

OpenBenchmarking.orgMillion Grid Points Per Second, More Is BetterASKAP 1.0Test: tConvolve OpenMP - Gridding3DV DisabledDefault6K12K18K24K30KSE +/- 460.19, N = 12SE +/- 0.00, N = 723436.726625.61. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap Example3DV DisabledDefault3691215SE +/- 0.10, N = 7SE +/- 0.10, N = 610.4610.291. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-Only3DV DisabledDefault510152025SE +/- 0.04, N = 3SE +/- 0.09, N = 321.2920.62

LULESH

LULESH is the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.33DV DisabledDefault7K14K21K28K35KSE +/- 202.03, N = 4SE +/- 182.94, N = 428515.3630715.331. (CXX) g++ options: -O3 -fopenmp -lm -lmpi_cxx -lmpi

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon Obj3DV DisabledDefault306090120150SE +/- 0.06, N = 4SE +/- 0.05, N = 4113.24123.51MIN: 111.56 / MAX: 116.79MIN: 121.61 / MAX: 126.87

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 5123DV DisabledDefault306090120150SE +/- 0.17, N = 4SE +/- 0.85, N = 4129.93153.841. (CXX) g++ options: -O3

Xmrig

Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmlrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgH/s, More Is BetterXmrig 6.18.1Variant: Monero - Hash Count: 1M3DV DisabledDefault15K30K45K60K75KSE +/- 265.77, N = 3SE +/- 83.67, N = 458160.069684.31. (CXX) g++ options: -fexceptions -fno-rtti -maes -O3 -Ofast -static-libgcc -static-libstdc++ -rdynamic -lssl -lcrypto -luv -lpthread -lrt -ldl -lhwloc

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 5123DV DisabledDefault306090120150SE +/- 0.47, N = 3SE +/- 0.58, N = 4118.64134.821. (CXX) g++ options: -O3

ASKAP

ASKAP is a set of benchmarks from the Australian SKA Pathfinder. The principal ASKAP benchmarks are the Hogbom Clean Benchmark (tHogbomClean) and Convolutional Resamping Benchmark (tConvolve) as well as some previous ASKAP benchmarks being included as well for OpenCL and CUDA execution of tConvolve. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgIterations Per Second, More Is BetterASKAP 1.0Test: Hogbom Clean OpenMP3DV DisabledDefault30060090012001500SE +/- 6.92, N = 4SE +/- 4.24, N = 41183.561212.171. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

CloverLeaf

CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version and benchmarked with the clover_bm.in input file (Problem 5). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeafLagrangian-Eulerian Hydrodynamics3DV DisabledDefault3691215SE +/- 0.03, N = 5SE +/- 0.04, N = 510.4810.291. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-Only3DV DisabledDefault48121620SE +/- 0.03, N = 3SE +/- 0.10, N = 316.5716.28

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.C3DV DisabledDefault70K140K210K280K350KSE +/- 1397.51, N = 4SE +/- 1672.02, N = 5255679.10314777.901. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: SP.C3DV DisabledDefault40K80K120K160K200KSE +/- 380.05, N = 4SE +/- 1595.09, N = 6126301.03207614.701. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2

ASTC Encoder

ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: Exhaustive3DV DisabledDefault246810SE +/- 0.0018, N = 4SE +/- 0.0017, N = 45.75986.14111. (CXX) g++ options: -O3 -flto -pthread

Google Draco

Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterGoogle Draco 1.5.6Model: Church Facade3DV DisabledDefault15003000450060007500SE +/- 17.42, N = 6SE +/- 3.90, N = 6676860431. (CXX) g++ options: -O3

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 129 Cells Per Direction3DV DisabledDefault0.55441.10881.66322.21762.772SE +/- 0.01883751, N = 15SE +/- 0.04209465, N = 152.463954732.175216131. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

miniFE

MiniFE Finite Element is an application for unstructured implicit finite element codes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgCG Mflops, More Is BetterminiFE 2.2Problem Size: Small3DV DisabledDefault12K24K36K48K60KSE +/- 56.96, N = 5SE +/- 227.90, N = 551223.554408.21. (CXX) g++ options: -O3 -fopenmp -lmpi_cxx -lmpi

ASTC Encoder

ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: Thorough3DV DisabledDefault1326395265SE +/- 0.01, N = 5SE +/- 0.02, N = 653.5456.781. (CXX) g++ options: -O3 -flto -pthread

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per Direction3DV DisabledDefault246810SE +/- 0.03638771, N = 5SE +/- 0.06675033, N = 58.668864447.685286811. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: IS.D3DV DisabledDefault12002400360048006000SE +/- 36.66, N = 5SE +/- 29.57, N = 54881.225696.511. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point Rate3DV DisabledDefault918273645SE +/- 0.32, N = 8SE +/- 0.13, N = 738.2640.551. (CC) gcc options: -O3 -march=native -fopenmp

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.C3DV DisabledDefault13K26K39K52K65KSE +/- 512.83, N = 15SE +/- 470.01, N = 1052350.0459737.941. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2

Google Draco

Draco is a library developed by Google for compressing/decompressing 3D geometric meshes and point clouds. This test profile uses some Artec3D PLY models as the sample 3D model input formats for Draco compression/decompression. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterGoogle Draco 1.5.6Model: Lion3DV DisabledDefault11002200330044005500SE +/- 18.65, N = 6SE +/- 2.98, N = 7527850111. (CXX) g++ options: -O3

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 643DV DisabledDefault5001000150020002500SE +/- 0.97, N = 6SE +/- 4.53, N = 71428.32455.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 5123DV DisabledDefault70140210280350SE +/- 0.81, N = 5SE +/- 1.86, N = 6248.85332.741. (CXX) g++ options: -O3

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: LU.C3DV DisabledDefault70K140K210K280K350KSE +/- 1268.32, N = 5SE +/- 1381.97, N = 6264731.92337910.641. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 323DV DisabledDefault30060090012001500SE +/- 0.89, N = 6SE +/- 1.58, N = 8801.61311.31. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Crown3DV DisabledDefault306090120150SE +/- 0.10, N = 6SE +/- 0.08, N = 7107.27117.83MIN: 104.48 / MAX: 111.58MIN: 114.92 / MAX: 122.62

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon3DV DisabledDefault306090120150SE +/- 0.10, N = 7SE +/- 0.10, N = 7132.77144.00MIN: 130.82 / MAX: 136.56MIN: 141.47 / MAX: 149.12

ASTC Encoder

ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMT/s, More Is BetterASTC Encoder 4.0Preset: Medium3DV DisabledDefault90180270360450SE +/- 0.30, N = 8SE +/- 0.05, N = 8404.66419.431. (CXX) g++ options: -O3 -flto -pthread

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.C3DV DisabledDefault30K60K90K120K150KSE +/- 1316.62, N = 15SE +/- 1630.26, N = 15119761.07137308.271. (F9X) gfortran options: -O3 -march=native -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz 2. Open MPI 4.1.2

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 2563DV DisabledDefault20406080100SE +/- 0.50, N = 9SE +/- 0.52, N = 976.4587.041. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 2563DV DisabledDefault4080120160200SE +/- 1.47, N = 15SE +/- 1.48, N = 15167.96190.131. (CXX) g++ options: -O3

CPU Temperature Monitor

OpenBenchmarking.orgCelsiusCPU Temperature MonitorPhoronix Test Suite System MonitoringDefault3DV Disabled1530456075Min: 23.88 / Avg: 54.01 / Max: 78.13Min: 24.5 / Avg: 52.16 / Max: 78.13

CPU Power Consumption Monitor

OpenBenchmarking.orgWattsCPU Power Consumption MonitorPhoronix Test Suite System MonitoringDefault3DV Disabled90180270360450Min: 16.08 / Avg: 255.79 / Max: 446.59Min: 14.18 / Avg: 250.73 / Max: 502.24

CPU Peak Freq (Highest CPU Core Frequency) Monitor

OpenBenchmarking.orgMegahertzCPU Peak Freq (Highest CPU Core Frequency) MonitorPhoronix Test Suite System Monitoring3DV DisabledDefault7001400210028003500Min: 2227 / Avg: 3284.82 / Max: 4264Min: 2272 / Avg: 3283.5 / Max: 4260

130 Results Shown

WRF
High Performance Conjugate Gradient
OpenFOAM:
  drivaerFastback, Large Mesh Size - Execution Time
  drivaerFastback, Large Mesh Size - Mesh Time
High Performance Conjugate Gradient
PETSc
High Performance Conjugate Gradient
libxsmm
High Performance Conjugate Gradient
TensorFlow
LeelaChessZero:
  BLAS
  Eigen
Palabos
ASKAP:
  tConvolve MT - Degridding
  tConvolve MT - Gridding
libxsmm
Timed Linux Kernel Compilation
Timed LLVM Compilation
Palabos
Stockfish
Palabos
OpenVINO:
  Vehicle Detection FP16 - CPU:
    ms
    FPS
Numpy Benchmark
Blender
OSPRay
Timed Gem5 Compilation
OSPRay Studio
OpenFOAM:
  drivaerFastback, Medium Mesh Size - Execution Time
  drivaerFastback, Medium Mesh Size - Mesh Time
Ngspice
OSPRay Studio:
  2 - 4K - 32 - Path Tracer
  1 - 4K - 32 - Path Tracer
Timed LLVM Compilation
OSPRay Studio:
  3 - 4K - 1 - Path Tracer
  2 - 4K - 1 - Path Tracer
  1 - 4K - 1 - Path Tracer
Ngspice
Timed Node.js Compilation
OSPRay Studio:
  2 - 4K - 16 - Path Tracer
  1 - 4K - 16 - Path Tracer
  3 - 4K - 16 - Path Tracer
OSPRay
Timed Godot Game Engine Compilation
Stress-NG
PyHPC Benchmarks
Stress-NG
OpenVINO:
  Person Detection FP16 - CPU:
    ms
    FPS
  Person Detection FP32 - CPU:
    ms
    FPS
ASKAP:
  tConvolve MPI - Gridding
  tConvolve MPI - Degridding
OpenVINO:
  Person Vehicle Bike Detection FP16 - CPU:
    ms
    FPS
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
OSPRay:
  gravity_spheres_volume/dim_512/scivis/real_time
  gravity_spheres_volume/dim_512/pathtracer/real_time
  gravity_spheres_volume/dim_512/ao/real_time
Neural Magic DeepSparse:
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
    ms/batch
    items/sec
Blender
NAS Parallel Benchmarks
Neural Magic DeepSparse:
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
Blender
Neural Magic DeepSparse:
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
    ms/batch
    items/sec
GPAW
Timed PHP Compilation
srsRAN Project
Stress-NG:
  CPU Cache
  Malloc
Liquid-DSP
Stress-NG:
  AVL Tree
  Matrix Math
  Futex
  Vector Math
  Wide Vector Math
  Fused Multiply-Add
  Vector Floating Point
  Memory Copying
  CPU Stress
  Semaphores
  Mutex
HeFFTe - Highly Efficient FFT for Exascale
PyHPC Benchmarks
Algebraic Multi-Grid Benchmark
GROMACS
NAMD
ASKAP:
  tConvolve OpenMP - Degridding
  tConvolve OpenMP - Gridding
Remhos
Blender
LULESH
Embree
HeFFTe - Highly Efficient FFT for Exascale
Xmrig
HeFFTe - Highly Efficient FFT for Exascale
ASKAP
CloverLeaf
Blender
NAS Parallel Benchmarks:
  BT.C
  SP.C
ASTC Encoder
Google Draco
Xcompact3d Incompact3d
miniFE
ASTC Encoder
Xcompact3d Incompact3d
NAS Parallel Benchmarks
ACES DGEMM
NAS Parallel Benchmarks
Google Draco
libxsmm
HeFFTe - Highly Efficient FFT for Exascale
NAS Parallel Benchmarks
libxsmm
Embree:
  Pathtracer ISPC - Crown
  Pathtracer ISPC - Asian Dragon
ASTC Encoder
NAS Parallel Benchmarks
HeFFTe - Highly Efficient FFT for Exascale:
  c2c - FFTW - double - 256
  r2c - FFTW - double - 256
CPU Temperature Monitor:
  Phoronix Test Suite System Monitoring:
    Celsius
    Watts
    Megahertz