Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2308099-NE-XEONPLATI49
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts

Limit displaying results to tests within:

AV1 2 Tests
Bioinformatics 2 Tests
C/C++ Compiler Tests 5 Tests
CPU Massive 10 Tests
Creator Workloads 12 Tests
Encoding 5 Tests
Fortran Tests 4 Tests
Game Development 3 Tests
HPC - High Performance Computing 12 Tests
Machine Learning 6 Tests
Molecular Dynamics 3 Tests
MPI Benchmarks 4 Tests
Multi-Core 14 Tests
NVIDIA GPU Compute 3 Tests
Intel oneAPI 6 Tests
OpenMPI Tests 10 Tests
Python Tests 5 Tests
Renderers 2 Tests
Scientific Computing 5 Tests
Server CPU Tests 6 Tests
Video Encoding 5 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
0xd000390
August 06 2023
  11 Hours, 56 Minutes
0xd0003a5
August 08 2023
  15 Hours, 40 Minutes
Invert Hiding All Results Option
  13 Hours, 48 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


Xeon Platinum 8380 AVX-512 WorkloadsOpenBenchmarking.orgPhoronix Test Suite2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads)Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS)Intel Ice Lake IEH512GB7682GB INTEL SSDPF2KX076TZASPEEDVE2282 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFPUbuntu 22.106.5.0-060500rc4daily20230804-generic (x86_64)6.5.0-rc5-phx-tues (x86_64)GNOME Shell 43.0X Server 1.21.1.31.3.224GCC 12.2.0ext41920x1080ProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelsDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionXeon Platinum 8380 AVX-512 Workloads PerformanceSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - 0xd000390: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000390 - 0xd0003a5: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5 - Python 3.10.7- 0xd000390: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - 0xd0003a5: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

0xd000390 vs. 0xd0003a5 ComparisonPhoronix Test SuiteBaseline+13.1%+13.1%+26.2%+26.2%+39.3%+39.3%17.2%12.9%10.4%10%8%7.1%6.1%5.8%5%3.9%3.7%3.6%3.4%3.2%3%2.9%2.9%2.8%2.7%2.2%2.1%2%2%5.1%particle_volume/scivis/real_time52.3%particle_volume/ao/real_time50.3%Garlicoin32.2%B.L.N.Q.A.S.I - A.M.S21%B.L.N.Q.A.S.I - A.M.S21%FeCO6_b3lyp_gms20.8%CPU - regnety_400mLargeRandgravity_spheres_volume/dim_512/ao/real_time11.6%gravity_spheres_volume/dim_512/scivis/real_time11.5%N.T.C.B.b.u.S.S.I - A.M.S11.3%N.T.C.B.b.u.S.S.I - A.M.S11.3%CPU - 512 - AlexNetKostyaparticle_volume/pathtracer/real_time9.8%CPU - 256 - AlexNetCPU - googlenet7.9%CPU - resnet507.9%CPU - vgg167.8%R.5.S.I - A.M.S7.7%R.5.S.I - A.M.S7.6%64C.D.Y.C.S.I - A.M.S7.1%C.D.Y.C.S.I - A.M.S7.1%vklBenchmark ISPC6.5%r2c - FFTW - double - 128GPT-2 - CPU - StandardPathtracer ISPC - Crown5.7%fcn-resnet101-11 - CPU - Standard5.6%CPU - FastestDetN.T.C.B.b.u.S - A.M.S5%CPU - resnet185%CPU - squeezenet_ssd5%N.T.C.B.b.u.S - A.M.S4.9%CaffeNet 12-int8 - CPU - Standard4.8%yolov4 - CPU - Standard4.5%CPU - blazeface4.4%B.L.N.Q.A - A.M.S4.3%simple-H2O4.3%N.T.C.D.m - A.M.S3.9%CPU - 256 - GoogLeNetN.T.C.D.m - A.M.S3.9%B.L.N.Q.A - A.M.S3.9%CPU - vision_transformerCPU - 256 - ResNet-50Pathtracer ISPC - Asian Dragon3.5%DistinctUserIDPartialTweetsTomographic ModelOpenMP - BM2OpenMP - BM2c2c - FFTW - float - 128TopTweetC.D.Y.C - A.M.S2.5%bertsquad-12 - CPU - Standard2.5%C.D.Y.C - A.M.S2.4%Speed 5 - Bosphorus 4K2.4%CPU - mnasnetR.v.1.i - CPU - Standard2.1%CPU - 512 - GoogLeNetFeCO6_b3lyp_gmsR.N.N.T - bf16bf16bf16 - CPUsuper-resolution-10 - CPU - Standard6.4%R.v.1.i - CPU - Standard2.1%fcn-resnet101-11 - CPU - Standard6.5%CaffeNet 12-int8 - CPU - Standard5.3%bertsquad-12 - CPU - Standard2.8%yolov4 - CPU - Standard4.7%GPT-2 - CPU - StandardOSPRayOSPRayCpuminer-OptNeural Magic DeepSparseNeural Magic DeepSparseQMCPACKNCNNsimdjsonOSPRayOSPRayNeural Magic DeepSparseNeural Magic DeepSparseTensorFlowsimdjsonOSPRayTensorFlowNCNNNCNNNCNNNeural Magic DeepSparseNeural Magic DeepSparselibxsmmNeural Magic DeepSparseNeural Magic DeepSparseOpenVKLHeFFTe - Highly Efficient FFT for ExascaleONNX RuntimeEmbreeONNX RuntimeNCNNNeural Magic DeepSparseNCNNNCNNNeural Magic DeepSparseONNX RuntimeONNX RuntimeNCNNNeural Magic DeepSparseQMCPACKNeural Magic DeepSparseTensorFlowNeural Magic DeepSparseNeural Magic DeepSparseNCNNTensorFlowEmbreesimdjsonsimdjsonSPECFEM3DminiBUDEminiBUDEHeFFTe - Highly Efficient FFT for ExascalesimdjsonNeural Magic DeepSparseONNX RuntimeNeural Magic DeepSparseVP9 libvpx EncodingNCNNONNX RuntimeTensorFlowQMCPACKoneDNNONNX RuntimeONNX RuntimeONNX RuntimeONNX RuntimeONNX RuntimeONNX RuntimeONNX Runtime0xd0003900xd0003a5

Xeon Platinum 8380 AVX-512 Workloadsospray: particle_volume/scivis/real_timeospray: particle_volume/ao/real_timedeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamqmcpack: FeCO6_b3lyp_gmssimdjson: LargeRandospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timedeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamtensorflow: CPU - 512 - AlexNetsimdjson: Kostyaospray: particle_volume/pathtracer/real_timetensorflow: CPU - 256 - AlexNetncnn: CPU - googlenetncnn: CPU - vgg16deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamlibxsmm: 64deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamopenvkl: vklBenchmark ISPCheffte: r2c - FFTW - double - 128embree: Pathtracer ISPC - Crowndeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamncnn: CPU - resnet18ncnn: CPU - squeezenet_ssddeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamonnx: yolov4 - CPU - Standardncnn: CPU - blazefacedeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamqmcpack: simple-H2Odeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamtensorflow: CPU - 256 - GoogLeNetdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamncnn: CPU - vision_transformertensorflow: CPU - 256 - ResNet-50embree: Pathtracer ISPC - Asian Dragonsimdjson: DistinctUserIDsimdjson: PartialTweetsspecfem3d: Tomographic Modelminibude: OpenMP - BM2minibude: OpenMP - BM2heffte: c2c - FFTW - float - 128simdjson: TopTweetdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamonnx: bertsquad-12 - CPU - Standarddeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamvpxenc: Speed 5 - Bosphorus 4Kncnn: CPU - mnasnetonnx: ResNet50 v1-12-int8 - CPU - Standardtensorflow: CPU - 512 - GoogLeNetqmcpack: FeCO6_b3lyp_gmslibxsmm: 128heffte: r2c - FFTW - float - 128deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamsvt-av1: Preset 12 - Bosphorus 4Kdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamopenvino: Machine Translation EN To DE FP16 - CPUncnn: CPU - yolov4-tinyopenvino: Machine Translation EN To DE FP16 - CPUspecfem3d: Mount St. Helensgromacs: MPI CPU - water_GMX50_barespecfem3d: Homogeneous Halfspaceopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUpalabos: 400ncnn: CPU-v3-v3 - mobilenet-v3heffte: c2c - FFTW - double - 256ncnn: CPU - shufflenet-v2tensorflow: CPU - 512 - ResNet-50ncnn: CPU - alexnetncnn: CPU - mobilenetremhos: Sample Remap Examplesvt-av1: Preset 13 - Bosphorus 4Kheffte: r2c - FFTW - double - 256svt-av1: Preset 8 - Bosphorus 4Kheffte: r2c - FFTW - float - 256palabos: 500onnx: ArcFace ResNet-100 - CPU - Standarddeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamlibxsmm: 256heffte: c2c - FFTW - float - 256svt-hevc: 10 - Bosphorus 4Kheffte: c2c - FFTW - double - 128deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamncnn: CPU-v2-v2 - mobilenet-v2deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamminibude: OpenMP - BM1minibude: OpenMP - BM1qmcpack: Li2_STO_aencnn: CPU - efficientnet-b0openvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUspecfem3d: Water-layered Halfspacecpuminer-opt: Myriad-Groestllibxsmm: 32deepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamonednn: IP Shapes 3D - bf16bf16bf16 - CPUmrbayes: Primate Phylogeny Analysiscpuminer-opt: Skeincoindeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamopenvino: Face Detection FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Face Detection FP16 - CPUblender: Fishy Cat - CPU-Onlyonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUcloverleaf: Lagrangian-Eulerian Hydrodynamicscpuminer-opt: Quad SHA-256, Pyritevvenc: Bosphorus 4K - Fasterblender: BMW27 - CPU-Onlyopenvino: Vehicle Detection FP16-INT8 - CPUspecfem3d: Layered Halfspaceopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamopenvino: Person Detection FP16 - CPUcpuminer-opt: LBC, LBRY Creditscpuminer-opt: Deepcoinopenvino: Person Detection FP16 - CPUvvenc: Bosphorus 4K - Fastdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamopenvino: Weld Porosity Detection FP16-INT8 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUdav1d: Chimera 1080popenvino: Weld Porosity Detection FP16-INT8 - CPUlaghos: Triple Point Problemopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUsvt-hevc: 7 - Bosphorus 4Kdav1d: Summer Nature 4Kincompact3d: input.i3d 193 Cells Per Directionopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUpalabos: 100cpuminer-opt: scryptcpuminer-opt: Blake-2 Ssvt-hevc: 1 - Bosphorus 4Kopenvino: Person Detection FP32 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUcpuminer-opt: Triple SHA-256, Onecoinlaghos: Sedov Blast Wave, ube_922_hex.meshcpuminer-opt: Magionednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUcpuminer-opt: x25xoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUncnn: CPU - FastestDetncnn: CPU - regnety_400mncnn: CPU - resnet50onnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: GPT-2 - CPU - Standardcpuminer-opt: Garlicoinonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU0xd0003900xd0003a524.950624.74731051.786837.9943147.510.8521.076120.578017.35042301.1554760.152.61150.281723.2715.3623.865.22107633.25981098.893.5399426.9495912144.94288.1941307.87298.9715.34129.809511.66054.3598.339439.55562.0099309.63644.4430405.966246.9283.89104.68445.524.6214.5741920222526.887101.076154.7315.6094.612616.7110422.116712.637.57221.020317.27268.561941.1195.1991005.5843180.96739.730879.4723.71251.0013.1486923629.23418.0222364702039.639.77388.4768.8846.35169.8985.975.3915.4612.245175.10293.747467.170224.417413.20739.115784.0181594.6101.977184.3893.090672.17888.03474.6917922.930243.30332353.38994.136124.2311.6233.892344.9731.14695123843127604.71006.3407173.794539.69542.59271166.528613333230.0525551.818024.044419.17827.5130.74524.38112.0492173010.36423.834.5129.4979878981121.5617.80551.11951490.764216606467713.295.72272.06749396.523.90360515.818.50256.2767604.00138.75281.3611.024027859274.061517.69209.3195.42312.1952319.31446232710.4613.033.625261332237385.892309.472.069362659.171.463.031.161.3310.2045.5417.156.31428158.3554.5240325.5687110.3239.080671.43407696.72559.841585.79875.54783180.16329203832.45216.384916.4611869.039345.9597178.190.9618.886218.462619.30902068.3047839.412.87136.853781.2516.5825.715.62107092.50301177.1100.1856398.7153856153.79083.4621293.12419.4216.10136.178011.15394.5494.257341.24664.4439321.72620.2548421.754445.2386.93101.09595.714.7714.1459995462601.212104.048159.1005.7596.942916.3098412.053112.337.41216.481323.79263.231978.9198.869987.5452177.72140.450878.1824.10255.0912.9487597959.09417.7516289722070.729.63393.8448.7646.98339.7684.845.4615.6612.401177.20394.861466.460226.783417.48338.718583.2041600.2102.920182.7493.920771.54367.96478.8510930.750642.94052372.41894.897123.2611.7133.632362.8431.38179852843450609.21013.8215172.557939.41312.57437165.419617130231.4277555.087624.184442.98823.0930.90521.74211.9892627710.41523.724.4929.3736065511117.0917.87553.26821496.194231306489713.255.70571.88689419.773.91322514.588.48256.8767754.34138.49280.8411.004196859377.961519.84209.0295.54312.5302321.74446665310.4513.043.622661333117386.082308.662.069672659.551.463.031.161.339.7138.8518.516.71968158.0144.6208525.8601117.5458.599081.51029664.53761.509189.82995.28095190.57022086.25816.508OpenBenchmarking.org

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_time0xd0003a50xd000390612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 316.3824.95

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_time0xd0003a50xd000390612182430SE +/- 0.01, N = 3SE +/- 0.11, N = 316.4624.75

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003902004006008001000SE +/- 1.83, N = 3SE +/- 1.88, N = 3869.041051.79

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901020304050SE +/- 0.10, N = 3SE +/- 0.07, N = 345.9637.99

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003a50xd0003904080120160200SE +/- 0.31, N = 3SE +/- 0.12, N = 3178.19147.511. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: LargeRandom0xd0003900xd0003a50.2160.4320.6480.8641.08SE +/- 0.00, N = 3SE +/- 0.00, N = 30.850.961. (CXX) g++ options: -O3

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_time0xd0003a50xd000390510152025SE +/- 0.02, N = 3SE +/- 0.05, N = 318.8921.08

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_time0xd0003a50xd000390510152025SE +/- 0.03, N = 3SE +/- 0.08, N = 318.4620.58

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390510152025SE +/- 0.03, N = 3SE +/- 0.01, N = 319.3117.35

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003905001000150020002500SE +/- 3.15, N = 3SE +/- 1.14, N = 32068.302301.16

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: AlexNet0xd0003900xd0003a52004006008001000SE +/- 3.77, N = 3SE +/- 2.61, N = 3760.15839.41

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: Kostya0xd0003900xd0003a50.64581.29161.93742.58323.229SE +/- 0.00, N = 3SE +/- 0.00, N = 32.612.871. (CXX) g++ options: -O3

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_time0xd0003a50xd000390306090120150SE +/- 0.45, N = 3SE +/- 0.54, N = 3136.85150.28

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: AlexNet0xd0003900xd0003a52004006008001000SE +/- 2.99, N = 3SE +/- 4.27, N = 3723.27781.25

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: googlenet0xd0003a50xd00039048121620SE +/- 0.48, N = 3SE +/- 0.17, N = 316.5815.36MIN: 15.29 / MAX: 39.63MIN: 14.6 / MAX: 182.221. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vgg160xd0003a50xd000390612182430SE +/- 0.19, N = 3SE +/- 0.25, N = 325.7123.86MIN: 24.88 / MAX: 62.96MIN: 23.05 / MAX: 47.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901.26472.52943.79415.05886.3235SE +/- 0.0018, N = 3SE +/- 0.0056, N = 35.62105.2210

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039016003200480064008000SE +/- 2.44, N = 3SE +/- 8.14, N = 37092.507633.26

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 640xd0003900xd0003a530060090012001500SE +/- 8.60, N = 3SE +/- 13.09, N = 151098.81177.11. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039020406080100SE +/- 1.28, N = 4SE +/- 0.07, N = 3100.1993.54

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039090180270360450SE +/- 4.97, N = 4SE +/- 0.21, N = 3398.72426.95

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPC0xd0003a50xd0003902004006008001000SE +/- 0.88, N = 3SE +/- 1.53, N = 3856912MIN: 137 / MAX: 7211MIN: 140 / MAX: 7236

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003900xd0003a5306090120150SE +/- 1.81, N = 4SE +/- 1.71, N = 3144.94153.791. (CXX) g++ options: -O3

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Crown0xd0003a50xd00039020406080100SE +/- 0.32, N = 3SE +/- 0.07, N = 383.4688.19MIN: 80.27 / MAX: 87.69MIN: 85.24 / MAX: 92.72

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039070140210280350SE +/- 1.05, N = 3SE +/- 0.57, N = 3293.12307.87

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet180xd0003a50xd0003903691215SE +/- 0.09, N = 2SE +/- 0.14, N = 39.428.97MIN: 9.21 / MAX: 32.8MIN: 8.63 / MAX: 27.271. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: squeezenet_ssd0xd0003a50xd00039048121620SE +/- 0.25, N = 3SE +/- 0.12, N = 316.1015.34MIN: 15.43 / MAX: 48.02MIN: 14.63 / MAX: 39.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390306090120150SE +/- 0.60, N = 3SE +/- 0.25, N = 3136.18129.81

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003a50xd0003903691215SE +/- 0.13, N = 15SE +/- 0.11, N = 711.1511.661. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: blazeface0xd0003a50xd0003901.02152.0433.06454.0865.1075SE +/- 0.07, N = 3SE +/- 0.06, N = 34.544.35MIN: 4.37 / MAX: 5.17MIN: 4.16 / MAX: 4.971. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039020406080100SE +/- 0.51, N = 3SE +/- 0.12, N = 394.2698.34

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: simple-H2O0xd0003a50xd000390918273645SE +/- 0.02, N = 3SE +/- 0.12, N = 341.2539.561. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901428425670SE +/- 0.78, N = 3SE +/- 0.23, N = 364.4462.01

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: GoogLeNet0xd0003900xd0003a570140210280350SE +/- 1.89, N = 3SE +/- 2.10, N = 3309.63321.72

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390140280420560700SE +/- 7.47, N = 3SE +/- 2.45, N = 3620.25644.44

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039090180270360450SE +/- 2.15, N = 3SE +/- 0.42, N = 3421.75405.97

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vision_transformer0xd0003900xd0003a51122334455SE +/- 1.17, N = 3SE +/- 0.31, N = 346.9245.23MIN: 43.11 / MAX: 881.49MIN: 43.43 / MAX: 73.391. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-500xd0003900xd0003a520406080100SE +/- 0.37, N = 3SE +/- 0.80, N = 983.8986.93

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon0xd0003a50xd00039020406080100SE +/- 0.19, N = 3SE +/- 0.43, N = 3101.10104.68MIN: 98.59 / MAX: 105.53MIN: 101.9 / MAX: 109.48

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: DistinctUserID0xd0003900xd0003a51.28482.56963.85445.13926.424SE +/- 0.02, N = 3SE +/- 0.00, N = 35.525.711. (CXX) g++ options: -O3

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: PartialTweets0xd0003900xd0003a51.07332.14663.21994.29325.3665SE +/- 0.01, N = 3SE +/- 0.01, N = 34.624.771. (CXX) g++ options: -O3

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Tomographic Model0xd0003900xd0003a548121620SE +/- 0.02, N = 3SE +/- 0.15, N = 314.5714.151. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003900xd0003a56001200180024003000SE +/- 9.78, N = 3SE +/- 9.53, N = 32526.892601.211. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003900xd0003a520406080100SE +/- 0.39, N = 3SE +/- 0.38, N = 3101.08104.051. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003900xd0003a54080120160200SE +/- 1.53, N = 5SE +/- 1.01, N = 3154.73159.101. (CXX) g++ options: -O3

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: TopTweet0xd0003900xd0003a51.29382.58763.88145.17526.469SE +/- 0.03, N = 3SE +/- 0.01, N = 35.605.751. (CXX) g++ options: -O3

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039020406080100SE +/- 0.29, N = 3SE +/- 0.07, N = 396.9494.61

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003a50xd00039048121620SE +/- 0.26, N = 12SE +/- 0.08, N = 316.3116.711. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039090180270360450SE +/- 1.24, N = 3SE +/- 0.36, N = 3412.05422.12

VP9 libvpx Encoding

This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9 video format. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.13Speed: Speed 5 - Input: Bosphorus 4K0xd0003a50xd0003903691215SE +/- 0.13, N = 3SE +/- 0.12, N = 312.3312.631. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mnasnet0xd0003900xd0003a5246810SE +/- 0.10, N = 3SE +/- 0.04, N = 37.577.41MIN: 7.28 / MAX: 30.26MIN: 7.15 / MAX: 31.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003a50xd00039050100150200250SE +/- 1.91, N = 8SE +/- 1.39, N = 3216.48221.021. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNet0xd0003900xd0003a570140210280350SE +/- 0.87, N = 3SE +/- 0.43, N = 3317.27323.79

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003900xd0003a560120180240300SE +/- 3.60, N = 3SE +/- 2.19, N = 3268.56263.231. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 1280xd0003900xd0003a5400800120016002000SE +/- 32.53, N = 7SE +/- 20.92, N = 31941.11978.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003900xd0003a54080120160200SE +/- 0.92, N = 3SE +/- 1.08, N = 3195.20198.871. (CXX) g++ options: -O3

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003902004006008001000SE +/- 2.17, N = 3SE +/- 1.41, N = 3987.551005.58

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 12 - Input: Bosphorus 4K0xd0003a50xd0003904080120160200SE +/- 1.20, N = 3SE +/- 1.29, N = 3177.72180.971. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390918273645SE +/- 0.10, N = 3SE +/- 0.06, N = 340.4539.73

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003900xd0003a520406080100SE +/- 0.22, N = 3SE +/- 0.19, N = 379.4778.18MIN: 62.57 / MAX: 232.01MIN: 66.08 / MAX: 194.381. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: yolov4-tiny0xd0003a50xd000390612182430SE +/- 0.17, N = 3SE +/- 0.22, N = 324.1023.71MIN: 23.25 / MAX: 51.47MIN: 22.57 / MAX: 46.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003900xd0003a560120180240300SE +/- 0.68, N = 3SE +/- 0.63, N = 3251.00255.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Mount St. Helens0xd0003900xd0003a53691215SE +/- 0.18, N = 3SE +/- 0.12, N = 313.1512.951. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2023Implementation: MPI CPU - Input: water_GMX50_bare0xd0003a50xd0003903691215SE +/- 0.026, N = 3SE +/- 0.021, N = 39.0949.2341. (CXX) g++ options: -O3

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Homogeneous Halfspace0xd0003900xd0003a548121620SE +/- 0.15, N = 3SE +/- 0.16, N = 318.0217.751. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003900xd0003a5400800120016002000SE +/- 1.77, N = 3SE +/- 3.75, N = 32039.632070.721. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.01, N = 3SE +/- 0.02, N = 39.779.63MIN: 7.83 / MAX: 20.01MIN: 8.33 / MAX: 19.331. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 4000xd0003900xd0003a590180270360450SE +/- 0.94, N = 3SE +/- 0.06, N = 3388.48393.841. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v3-v3 - Model: mobilenet-v30xd0003900xd0003a5246810SE +/- 0.05, N = 3SE +/- 0.12, N = 38.888.76MIN: 8.69 / MAX: 32.33MIN: 8.3 / MAX: 32.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003900xd0003a51122334455SE +/- 0.25, N = 3SE +/- 0.61, N = 346.3546.981. (CXX) g++ options: -O3

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: shufflenet-v20xd0003900xd0003a53691215SE +/- 0.10, N = 3SE +/- 0.13, N = 39.899.76MIN: 9.61 / MAX: 33.59MIN: 9.32 / MAX: 33.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-500xd0003a50xd00039020406080100SE +/- 1.17, N = 3SE +/- 1.13, N = 384.8485.97

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: alexnet0xd0003a50xd0003901.22852.4573.68554.9146.1425SE +/- 0.15, N = 3SE +/- 0.16, N = 35.465.39MIN: 5.01 / MAX: 29.14MIN: 4.83 / MAX: 151.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mobilenet0xd0003a50xd00039048121620SE +/- 0.07, N = 3SE +/- 0.09, N = 315.6615.46MIN: 15.24 / MAX: 38.75MIN: 14.85 / MAX: 106.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap Example0xd0003a50xd0003903691215SE +/- 0.04, N = 3SE +/- 0.03, N = 312.4012.251. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 13 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 0.79, N = 3SE +/- 2.02, N = 3175.10177.201. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003900xd0003a520406080100SE +/- 1.09, N = 3SE +/- 1.13, N = 393.7594.861. (CXX) g++ options: -O3

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 8 - Input: Bosphorus 4K0xd0003a50xd0003901530456075SE +/- 0.46, N = 3SE +/- 0.47, N = 366.4667.171. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003900xd0003a550100150200250SE +/- 2.25, N = 3SE +/- 2.85, N = 3224.42226.781. (CXX) g++ options: -O3

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 5000xd0003900xd0003a590180270360450SE +/- 0.42, N = 3SE +/- 0.72, N = 3413.21417.481. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003a50xd000390918273645SE +/- 0.38, N = 15SE +/- 0.46, N = 338.7239.121. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003a50xd00039020406080100SE +/- 0.18, N = 3SE +/- 0.32, N = 383.2084.02

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 2560xd0003900xd0003a5130260390520650SE +/- 2.33, N = 3SE +/- 2.54, N = 3594.6600.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003900xd0003a520406080100SE +/- 0.27, N = 3SE +/- 0.20, N = 3101.98102.921. (CXX) g++ options: -O3

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 4K0xd0003a50xd0003904080120160200SE +/- 0.38, N = 3SE +/- 2.03, N = 3182.74184.381. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003900xd0003a520406080100SE +/- 0.15, N = 3SE +/- 0.28, N = 393.0993.921. (CXX) g++ options: -O3

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901632486480SE +/- 0.12, N = 3SE +/- 0.15, N = 371.5472.18

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v2-v2 - Model: mobilenet-v20xd0003900xd0003a5246810SE +/- 0.09, N = 3SE +/- 0.04, N = 38.037.96MIN: 7.8 / MAX: 31.34MIN: 7.75 / MAX: 31.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390100200300400500SE +/- 1.39, N = 3SE +/- 1.00, N = 3478.85474.69

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 11.47, N = 3SE +/- 1.22, N = 3922.93930.75

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51020304050SE +/- 0.53, N = 3SE +/- 0.06, N = 343.3042.94

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003900xd0003a55001000150020002500SE +/- 3.60, N = 3SE +/- 10.90, N = 32353.392372.421. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003900xd0003a520406080100SE +/- 0.14, N = 3SE +/- 0.44, N = 394.1494.901. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: Li2_STO_ae0xd0003900xd0003a5306090120150SE +/- 1.55, N = 3SE +/- 1.03, N = 3124.23123.261. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: efficientnet-b00xd0003a50xd0003903691215SE +/- 0.11, N = 3SE +/- 0.38, N = 311.7111.62MIN: 11.15 / MAX: 19.87MIN: 10.82 / MAX: 21.031. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003900xd0003a5816243240SE +/- 0.05, N = 3SE +/- 0.06, N = 333.8933.63MIN: 29.83 / MAX: 113.13MIN: 29.54 / MAX: 113.551. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003900xd0003a55001000150020002500SE +/- 3.01, N = 3SE +/- 3.91, N = 32344.972362.841. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Water-layered Halfspace0xd0003a50xd000390714212835SE +/- 0.31, N = 5SE +/- 0.24, N = 331.3831.151. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Myriad-Groestl0xd0003900xd0003a59K18K27K36K45KSE +/- 386.00, N = 15SE +/- 406.32, N = 343127434501. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 320xd0003900xd0003a5130260390520650SE +/- 8.71, N = 15SE +/- 4.92, N = 3604.7609.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 0.56, N = 3SE +/- 0.31, N = 31006.341013.82

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a54080120160200SE +/- 0.34, N = 3SE +/- 1.64, N = 3173.79172.56

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5918273645SE +/- 0.02, N = 3SE +/- 0.01, N = 339.7039.41

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.58341.16681.75022.33362.917SE +/- 0.03265, N = 15SE +/- 0.02489, N = 152.592712.57437MIN: 1.91MIN: 1.991. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Timed MrBayes Analysis

This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MrBayes Analysis 3.2.7Primate Phylogeny Analysis0xd0003900xd0003a54080120160200SE +/- 1.36, N = 3SE +/- 0.98, N = 3166.53165.421. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Skeincoin0xd0003900xd0003a5130K260K390K520K650KSE +/- 1652.89, N = 3SE +/- 3788.20, N = 36133336171301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a550100150200250SE +/- 0.46, N = 3SE +/- 2.37, N = 3230.05231.43

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390120240360480600SE +/- 1.12, N = 3SE +/- 0.62, N = 3555.09551.82

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003900xd0003a5612182430SE +/- 0.01, N = 3SE +/- 0.02, N = 324.0424.181. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003900xd0003a510002000300040005000SE +/- 3.29, N = 3SE +/- 2.64, N = 34419.174442.981. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003900xd0003a52004006008001000SE +/- 0.42, N = 3SE +/- 0.71, N = 3827.51823.09MIN: 628.21 / MAX: 980.48MIN: 550.41 / MAX: 926.211. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-Only0xd0003a50xd000390714212835SE +/- 0.02, N = 3SE +/- 0.06, N = 330.9030.74

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a5110220330440550SE +/- 7.19, N = 3SE +/- 2.82, N = 3524.38521.74MIN: 499.75MIN: 505.721. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

CloverLeaf

CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version and benchmarked with the clover_bm.in input file (Problem 5). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeafLagrangian-Eulerian Hydrodynamics0xd0003900xd0003a53691215SE +/- 0.06, N = 3SE +/- 0.09, N = 312.0411.981. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Quad SHA-256, Pyrite0xd0003900xd0003a5200K400K600K800K1000KSE +/- 3352.41, N = 3SE +/- 1690.96, N = 39217309262771. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Faster0xd0003900xd0003a53691215SE +/- 0.09, N = 3SE +/- 0.06, N = 310.3610.421. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-Only0xd0003900xd0003a5612182430SE +/- 0.06, N = 3SE +/- 0.03, N = 323.8323.72

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003900xd0003a51.01482.02963.04444.05925.074SE +/- 0.00, N = 3SE +/- 0.00, N = 34.514.49MIN: 4.02 / MAX: 13.95MIN: 4.04 / MAX: 15.851. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Layered Halfspace0xd0003900xd0003a5714212835SE +/- 0.05, N = 3SE +/- 0.18, N = 329.5029.371. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003a50xd0003902004006008001000SE +/- 1.14, N = 3SE +/- 0.88, N = 31117.091121.561. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003a50xd00039048121620SE +/- 0.02, N = 3SE +/- 0.01, N = 317.8717.80MIN: 12.24 / MAX: 38.45MIN: 12.83 / MAX: 32.641. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003a50xd000390120240360480600SE +/- 1.19, N = 3SE +/- 0.77, N = 3553.27551.12

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003a50xd00039030060090012001500SE +/- 0.81, N = 3SE +/- 2.35, N = 31496.191490.76MIN: 1043.7 / MAX: 1711.44MIN: 1074.22 / MAX: 1692.481. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: LBC, LBRY Credits0xd0003900xd0003a590K180K270K360K450KSE +/- 313.42, N = 3SE +/- 860.95, N = 34216604231301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Deepcoin0xd0003900xd0003a514K28K42K56K70KSE +/- 89.69, N = 3SE +/- 187.02, N = 364677648971. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003a50xd0003903691215SE +/- 0.01, N = 3SE +/- 0.02, N = 313.2513.291. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Fast0xd0003a50xd0003901.28752.5753.86255.156.4375SE +/- 0.029, N = 3SE +/- 0.033, N = 35.7055.7221. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003a50xd0003901632486480SE +/- 0.10, N = 3SE +/- 0.09, N = 371.8972.07

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003900xd0003a52K4K6K8K10KSE +/- 3.74, N = 3SE +/- 2.19, N = 39396.529419.771. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003a50xd0003900.88051.7612.64153.5224.4025SE +/- 0.00080, N = 3SE +/- 0.00202, N = 33.913223.90360MIN: 3.69MIN: 3.681. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

dav1d

Dav1d is an open-source, speedy AV1 video decoder supporting modern SIMD CPU features. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Chimera 1080p0xd0003a50xd000390110220330440550SE +/- 0.51, N = 3SE +/- 0.80, N = 3514.58515.811. (CC) gcc options: -pthread -lm

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003900xd0003a5246810SE +/- 0.00, N = 3SE +/- 0.00, N = 38.508.48MIN: 7.17 / MAX: 18.39MIN: 7.15 / MAX: 22.531. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point Problem0xd0003900xd0003a560120180240300SE +/- 0.32, N = 3SE +/- 0.94, N = 3256.27256.871. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003900xd0003a515K30K45K60K75KSE +/- 17.48, N = 3SE +/- 29.72, N = 367604.0067754.341. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 4K0xd0003a50xd000390306090120150SE +/- 0.44, N = 3SE +/- 0.60, N = 3138.49138.751. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

dav1d

Dav1d is an open-source, speedy AV1 video decoder supporting modern SIMD CPU features. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Summer Nature 4K0xd0003a50xd00039060120180240300SE +/- 0.81, N = 3SE +/- 1.06, N = 3280.84281.361. (CC) gcc options: -pthread -lm

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per Direction0xd0003900xd0003a53691215SE +/- 0.02, N = 3SE +/- 0.02, N = 311.0211.001. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003900xd0003a513K26K39K52K65KSE +/- 17.38, N = 3SE +/- 18.68, N = 359274.0659377.961. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003a50xd00039030060090012001500SE +/- 0.84, N = 3SE +/- 0.52, N = 31519.841517.69MIN: 1074.08 / MAX: 1721.59MIN: 1081.6 / MAX: 1690.61. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003900xd0003a550100150200250SE +/- 0.03, N = 3SE +/- 0.14, N = 3209.31209.02MIN: 160.46 / MAX: 249.09MIN: 152.83 / MAX: 234.281. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003900xd0003a520406080100SE +/- 0.01, N = 3SE +/- 0.07, N = 395.4295.541. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 1000xd0003900xd0003a570140210280350SE +/- 0.89, N = 3SE +/- 0.20, N = 3312.20312.531. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: scrypt0xd0003900xd0003a55001000150020002500SE +/- 1.08, N = 3SE +/- 8.35, N = 32319.312321.741. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Blake-2 S0xd0003900xd0003a51000K2000K3000K4000K5000KSE +/- 8325.80, N = 3SE +/- 8676.85, N = 3446232744666531. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 1 - Input: Bosphorus 4K0xd0003a50xd0003903691215SE +/- 0.01, N = 3SE +/- 0.06, N = 310.4510.461. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.00, N = 3SE +/- 0.01, N = 313.0313.041. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.81571.63142.44713.26284.0785SE +/- 0.00896, N = 3SE +/- 0.00622, N = 33.625263.62266MIN: 3.54MIN: 3.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Triple SHA-256, Onecoin0xd0003900xd0003a5300K600K900K1200K1500KSE +/- 7105.40, N = 3SE +/- 7235.75, N = 3133223713331171. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.mesh0xd0003900xd0003a580160240320400SE +/- 0.23, N = 3SE +/- 0.80, N = 3385.89386.081. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Magi0xd0003a50xd0003905001000150020002500SE +/- 1.08, N = 3SE +/- 3.91, N = 32308.662309.471. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU0xd0003a50xd0003900.46570.93141.39711.86282.3285SE +/- 0.00191, N = 3SE +/- 0.00038, N = 32.069672.06936MIN: 2.03MIN: 2.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: x25x0xd0003900xd0003a56001200180024003000SE +/- 4.75, N = 3SE +/- 5.54, N = 32659.172659.551. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only0xd0003900xd0003a50.32850.6570.98551.3141.6425SE +/- 0.00, N = 3SE +/- 0.00, N = 31.461.46

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only0xd0003900xd0003a50.68181.36362.04542.72723.409SE +/- 0.00, N = 3SE +/- 0.00, N = 33.033.03

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003a50xd0003900.2610.5220.7831.0441.305SE +/- 0.00, N = 3SE +/- 0.00, N = 31.161.16MIN: 0.86 / MAX: 17.98MIN: 0.88 / MAX: 12.611. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003a50xd0003900.29930.59860.89791.19721.4965SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.33MIN: 0.97 / MAX: 13.43MIN: 0.97 / MAX: 131. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: FastestDet0xd0003900xd0003a53691215SE +/- 0.64, N = 3SE +/- 0.07, N = 310.209.71MIN: 9.01 / MAX: 500.17MIN: 9.22 / MAX: 27.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: regnety_400m0xd0003900xd0003a51020304050SE +/- 7.69, N = 3SE +/- 0.50, N = 345.5438.85MIN: 36.01 / MAX: 3343.68MIN: 37.13 / MAX: 233.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet500xd0003a50xd000390510152025SE +/- 0.69, N = 3SE +/- 0.52, N = 318.5117.15MIN: 17.32 / MAX: 299.93MIN: 16.19 / MAX: 41.831. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003a50xd000390246810SE +/- 0.43315, N = 15SE +/- 0.00502, N = 36.719686.314281. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003a50xd000390306090120150SE +/- 10.25, N = 15SE +/- 0.13, N = 3158.01158.361. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003a50xd000390306090120150SE +/- 3.29, N = 15SE +/- 1.29, N = 15117.55110.321. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003a50xd0003903691215SE +/- 0.23491, N = 15SE +/- 0.10283, N = 158.599089.080671. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003a50xd0003900.33980.67961.01941.35921.699SE +/- 0.02722, N = 15SE +/- 0.00633, N = 31.510291.434071. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003a50xd000390150300450600750SE +/- 12.01, N = 15SE +/- 3.08, N = 3664.54696.731. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003900xd0003a51.24832.49663.74494.99326.2415SE +/- 0.00245, N = 3SE +/- 0.11956, N = 155.547835.280951. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003900xd0003a54080120160200SE +/- 0.08, N = 3SE +/- 4.37, N = 15180.16190.571. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Garlicoin0xd0003a50xd0003906K12K18K24K30KSE +/- 3833.17, N = 12SE +/- 330.17, N = 322086.2529203.001. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a52004006008001000SE +/- 16.59, N = 15SE +/- 14.20, N = 12832.45816.51MIN: 714.88MIN: 723.441. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

169 Results Shown

OSPRay:
  particle_volume/scivis/real_time
  particle_volume/ao/real_time
Neural Magic DeepSparse:
  BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream:
    items/sec
    ms/batch
QMCPACK
simdjson
OSPRay:
  gravity_spheres_volume/dim_512/ao/real_time
  gravity_spheres_volume/dim_512/scivis/real_time
Neural Magic DeepSparse:
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
TensorFlow
simdjson
OSPRay
TensorFlow
NCNN:
  CPU - googlenet
  CPU - vgg16
Neural Magic DeepSparse:
  ResNet-50, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
libxsmm
Neural Magic DeepSparse:
  CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
OpenVKL
HeFFTe - Highly Efficient FFT for Exascale
Embree
Neural Magic DeepSparse
NCNN:
  CPU - resnet18
  CPU - squeezenet_ssd
Neural Magic DeepSparse
ONNX Runtime
NCNN
Neural Magic DeepSparse
QMCPACK
Neural Magic DeepSparse
TensorFlow
Neural Magic DeepSparse:
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream
  BERT-Large, NLP Question Answering - Asynchronous Multi-Stream
NCNN
TensorFlow
Embree
simdjson:
  DistinctUserID
  PartialTweets
SPECFEM3D
miniBUDE:
  OpenMP - BM2:
    GFInst/s
    Billion Interactions/s
HeFFTe - Highly Efficient FFT for Exascale
simdjson
Neural Magic DeepSparse
ONNX Runtime
Neural Magic DeepSparse
VP9 libvpx Encoding
NCNN
ONNX Runtime
TensorFlow
QMCPACK
libxsmm
HeFFTe - Highly Efficient FFT for Exascale
Neural Magic DeepSparse
SVT-AV1
Neural Magic DeepSparse
OpenVINO
NCNN
OpenVINO
SPECFEM3D
GROMACS
SPECFEM3D
OpenVINO:
  Person Vehicle Bike Detection FP16 - CPU:
    FPS
    ms
Palabos
NCNN
HeFFTe - Highly Efficient FFT for Exascale
NCNN
TensorFlow
NCNN:
  CPU - alexnet
  CPU - mobilenet
Remhos
SVT-AV1
HeFFTe - Highly Efficient FFT for Exascale
SVT-AV1
HeFFTe - Highly Efficient FFT for Exascale
Palabos
ONNX Runtime
Neural Magic DeepSparse
libxsmm
HeFFTe - Highly Efficient FFT for Exascale
SVT-HEVC
HeFFTe - Highly Efficient FFT for Exascale
Neural Magic DeepSparse
NCNN
Neural Magic DeepSparse:
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream
  NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream
  NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream
miniBUDE:
  OpenMP - BM1:
    GFInst/s
    Billion Interactions/s
QMCPACK
NCNN
OpenVINO:
  Weld Porosity Detection FP16 - CPU:
    ms
    FPS
SPECFEM3D
Cpuminer-Opt
libxsmm
Neural Magic DeepSparse:
  ResNet-50, Baseline - Asynchronous Multi-Stream
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream
  ResNet-50, Baseline - Asynchronous Multi-Stream
oneDNN
Timed MrBayes Analysis
Cpuminer-Opt
Neural Magic DeepSparse:
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream
OpenVINO:
  Face Detection FP16 - CPU
  Vehicle Detection FP16-INT8 - CPU
  Face Detection FP16 - CPU
Blender
oneDNN
CloverLeaf
Cpuminer-Opt
VVenC
Blender
OpenVINO
SPECFEM3D
OpenVINO:
  Vehicle Detection FP16 - CPU:
    FPS
    ms
Neural Magic DeepSparse
OpenVINO
Cpuminer-Opt:
  LBC, LBRY Credits
  Deepcoin
OpenVINO
VVenC
Neural Magic DeepSparse
OpenVINO
oneDNN
dav1d
OpenVINO
Laghos
OpenVINO
SVT-HEVC
dav1d
Xcompact3d Incompact3d
OpenVINO:
  Age Gender Recognition Retail 0013 FP16 - CPU
  Person Detection FP32 - CPU
  Face Detection FP16-INT8 - CPU
  Face Detection FP16-INT8 - CPU
Palabos
Cpuminer-Opt:
  scrypt
  Blake-2 S
SVT-HEVC
OpenVINO
oneDNN
Cpuminer-Opt
Laghos
Cpuminer-Opt
oneDNN
Cpuminer-Opt
Intel Open Image Denoise:
  RTLightmap.hdr.4096x4096 - CPU-Only
  RT.ldr_alb_nrm.3840x2160 - CPU-Only
OpenVINO:
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU
  Age Gender Recognition Retail 0013 FP16 - CPU
NCNN:
  CPU - FastestDet
  CPU - regnety_400m
  CPU - resnet50
ONNX Runtime:
  super-resolution-10 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  fcn-resnet101-11 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  CaffeNet 12-int8 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  GPT-2 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
Cpuminer-Opt
oneDNN