Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2308099-NE-XEONPLATI49
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts

Limit displaying results to tests within:

AV1 2 Tests
Bioinformatics 2 Tests
C/C++ Compiler Tests 5 Tests
CPU Massive 10 Tests
Creator Workloads 12 Tests
Encoding 5 Tests
Fortran Tests 4 Tests
Game Development 3 Tests
HPC - High Performance Computing 12 Tests
Machine Learning 6 Tests
Molecular Dynamics 3 Tests
MPI Benchmarks 4 Tests
Multi-Core 14 Tests
NVIDIA GPU Compute 3 Tests
Intel oneAPI 6 Tests
OpenMPI Tests 10 Tests
Python Tests 5 Tests
Renderers 2 Tests
Scientific Computing 5 Tests
Server CPU Tests 6 Tests
Video Encoding 5 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
0xd000390
August 06 2023
  11 Hours, 56 Minutes
0xd0003a5
August 08 2023
  15 Hours, 40 Minutes
Invert Hiding All Results Option
  13 Hours, 48 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


Xeon Platinum 8380 AVX-512 WorkloadsOpenBenchmarking.orgPhoronix Test Suite2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads)Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS)Intel Ice Lake IEH512GB7682GB INTEL SSDPF2KX076TZASPEEDVE2282 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFPUbuntu 22.106.5.0-060500rc4daily20230804-generic (x86_64)6.5.0-rc5-phx-tues (x86_64)GNOME Shell 43.0X Server 1.21.1.31.3.224GCC 12.2.0ext41920x1080ProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelsDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionXeon Platinum 8380 AVX-512 Workloads PerformanceSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - 0xd000390: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000390 - 0xd0003a5: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5 - Python 3.10.7- 0xd000390: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - 0xd0003a5: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

0xd000390 vs. 0xd0003a5 ComparisonPhoronix Test SuiteBaseline+13.1%+13.1%+26.2%+26.2%+39.3%+39.3%17.2%12.9%10.4%10%8%7.1%6.1%5.8%5%3.9%3.7%3.6%3.4%3.2%3%2.9%2.9%2.8%2.7%2.2%2.1%2%2%5.1%particle_volume/scivis/real_time52.3%particle_volume/ao/real_time50.3%Garlicoin32.2%B.L.N.Q.A.S.I - A.M.S21%B.L.N.Q.A.S.I - A.M.S21%FeCO6_b3lyp_gms20.8%CPU - regnety_400mLargeRandgravity_spheres_volume/dim_512/ao/real_time11.6%gravity_spheres_volume/dim_512/scivis/real_time11.5%N.T.C.B.b.u.S.S.I - A.M.S11.3%N.T.C.B.b.u.S.S.I - A.M.S11.3%CPU - 512 - AlexNetKostyaparticle_volume/pathtracer/real_time9.8%CPU - 256 - AlexNetCPU - googlenet7.9%CPU - resnet507.9%CPU - vgg167.8%R.5.S.I - A.M.S7.7%R.5.S.I - A.M.S7.6%64C.D.Y.C.S.I - A.M.S7.1%C.D.Y.C.S.I - A.M.S7.1%vklBenchmark ISPC6.5%r2c - FFTW - double - 128GPT-2 - CPU - StandardPathtracer ISPC - Crown5.7%fcn-resnet101-11 - CPU - Standard5.6%CPU - FastestDetN.T.C.B.b.u.S - A.M.S5%CPU - resnet185%CPU - squeezenet_ssd5%N.T.C.B.b.u.S - A.M.S4.9%CaffeNet 12-int8 - CPU - Standard4.8%yolov4 - CPU - Standard4.5%CPU - blazeface4.4%B.L.N.Q.A - A.M.S4.3%simple-H2O4.3%N.T.C.D.m - A.M.S3.9%CPU - 256 - GoogLeNetN.T.C.D.m - A.M.S3.9%B.L.N.Q.A - A.M.S3.9%CPU - vision_transformerCPU - 256 - ResNet-50Pathtracer ISPC - Asian Dragon3.5%DistinctUserIDPartialTweetsTomographic ModelOpenMP - BM2OpenMP - BM2c2c - FFTW - float - 128TopTweetC.D.Y.C - A.M.S2.5%bertsquad-12 - CPU - Standard2.5%C.D.Y.C - A.M.S2.4%Speed 5 - Bosphorus 4K2.4%CPU - mnasnetR.v.1.i - CPU - Standard2.1%CPU - 512 - GoogLeNetFeCO6_b3lyp_gmsR.N.N.T - bf16bf16bf16 - CPUfcn-resnet101-11 - CPU - Standard6.5%yolov4 - CPU - Standard4.7%GPT-2 - CPU - StandardCaffeNet 12-int8 - CPU - Standard5.3%super-resolution-10 - CPU - Standard6.4%bertsquad-12 - CPU - Standard2.8%R.v.1.i - CPU - Standard2.1%OSPRayOSPRayCpuminer-OptNeural Magic DeepSparseNeural Magic DeepSparseQMCPACKNCNNsimdjsonOSPRayOSPRayNeural Magic DeepSparseNeural Magic DeepSparseTensorFlowsimdjsonOSPRayTensorFlowNCNNNCNNNCNNNeural Magic DeepSparseNeural Magic DeepSparselibxsmmNeural Magic DeepSparseNeural Magic DeepSparseOpenVKLHeFFTe - Highly Efficient FFT for ExascaleONNX RuntimeEmbreeONNX RuntimeNCNNNeural Magic DeepSparseNCNNNCNNNeural Magic DeepSparseONNX RuntimeONNX RuntimeNCNNNeural Magic DeepSparseQMCPACKNeural Magic DeepSparseTensorFlowNeural Magic DeepSparseNeural Magic DeepSparseNCNNTensorFlowEmbreesimdjsonsimdjsonSPECFEM3DminiBUDEminiBUDEHeFFTe - Highly Efficient FFT for ExascalesimdjsonNeural Magic DeepSparseONNX RuntimeNeural Magic DeepSparseVP9 libvpx EncodingNCNNONNX RuntimeTensorFlowQMCPACKoneDNNONNX RuntimeONNX RuntimeONNX RuntimeONNX RuntimeONNX RuntimeONNX RuntimeONNX Runtime0xd0003900xd0003a5

Xeon Platinum 8380 AVX-512 Workloadstensorflow: CPU - 512 - ResNet-50tensorflow: CPU - 256 - ResNet-50libxsmm: 128onnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardopenvkl: vklBenchmark ISPConednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonnx: yolov4 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Standardqmcpack: FeCO6_b3lyp_gmsonnx: bertsquad-12 - CPU - Standardonnx: bertsquad-12 - CPU - Standardlibxsmm: 256tensorflow: CPU - 512 - GoogLeNetospray: particle_volume/pathtracer/real_timemrbayes: Primate Phylogeny Analysisonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardqmcpack: FeCO6_b3lyp_gmspalabos: 100cpuminer-opt: Garlicoinospray: particle_volume/scivis/real_timedeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streampalabos: 400qmcpack: Li2_STO_aepalabos: 500ncnn: CPU - FastestDetncnn: CPU - vision_transformerncnn: CPU - regnety_400mncnn: CPU - squeezenet_ssdncnn: CPU - yolov4-tinyncnn: CPU - resnet50ncnn: CPU - alexnetncnn: CPU - resnet18ncnn: CPU - vgg16ncnn: CPU - googlenetncnn: CPU - blazefacencnn: CPU - efficientnet-b0ncnn: CPU - mnasnetncnn: CPU - shufflenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU - mobilenetvvenc: Bosphorus 4K - Fastospray: particle_volume/ao/real_timetensorflow: CPU - 256 - GoogLeNetcpuminer-opt: Myriad-Groestllaghos: Sedov Blast Wave, ube_922_hex.meshonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUsimdjson: PartialTweetssimdjson: DistinctUserIDsimdjson: TopTweettensorflow: CPU - 512 - AlexNetdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUvvenc: Bosphorus 4K - Fasteropenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUsvt-hevc: 1 - Bosphorus 4Ksimdjson: Kostyadeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamospray: gravity_spheres_volume/dim_512/scivis/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timedeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamsimdjson: LargeRandvpxenc: Speed 5 - Bosphorus 4Kminibude: OpenMP - BM2minibude: OpenMP - BM2specfem3d: Water-layered Halfspaceonednn: IP Shapes 3D - bf16bf16bf16 - CPUdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamlaghos: Triple Point Problemdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamqmcpack: simple-H2Otensorflow: CPU - 256 - AlexNetdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamlibxsmm: 64specfem3d: Layered Halfspaceblender: Fishy Cat - CPU-Onlylibxsmm: 32cpuminer-opt: LBC, LBRY Creditscpuminer-opt: scryptcpuminer-opt: Skeincoincpuminer-opt: Blake-2 Scpuminer-opt: Magicpuminer-opt: Deepcoincpuminer-opt: Triple SHA-256, Onecoincpuminer-opt: x25xcpuminer-opt: Quad SHA-256, Pyritegromacs: MPI CPU - water_GMX50_bareblender: BMW27 - CPU-Onlyonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUspecfem3d: Homogeneous Halfspacedav1d: Chimera 1080pspecfem3d: Tomographic Modelspecfem3d: Mount St. Helensoidn: RTLightmap.hdr.4096x4096 - CPU-Onlydav1d: Summer Nature 4Ksvt-av1: Preset 8 - Bosphorus 4Kremhos: Sample Remap Examplecloverleaf: Lagrangian-Eulerian Hydrodynamicsincompact3d: input.i3d 193 Cells Per Directionminibude: OpenMP - BM1minibude: OpenMP - BM1embree: Pathtracer ISPC - Crownoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyembree: Pathtracer ISPC - Asian Dragonsvt-hevc: 7 - Bosphorus 4Konednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUsvt-hevc: 10 - Bosphorus 4Ksvt-av1: Preset 12 - Bosphorus 4Ksvt-av1: Preset 13 - Bosphorus 4Kheffte: c2c - FFTW - double - 256onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUheffte: r2c - FFTW - double - 256heffte: c2c - FFTW - float - 256heffte: r2c - FFTW - float - 256heffte: c2c - FFTW - float - 128heffte: r2c - FFTW - double - 128heffte: c2c - FFTW - double - 128heffte: r2c - FFTW - float - 1280xd0003900xd0003a585.9783.891941.1110.3239.08067912832.45285.798711.66055.54783180.16325.568739.11571.43407696.7256.31428158.355268.5659.841516.7110594.6317.27150.281166.5284.52403221.020147.51312.1952920324.9506405.966298.3394388.476124.23413.20710.2046.9245.5415.3423.7117.155.398.9723.8615.364.3511.627.579.898.888.0315.465.72224.7473309.6343127385.89524.3814.625.525.60760.1593.5399426.94951490.7613.291517.6913.03827.5124.04209.3195.4279.47251.009.772039.6310.36433.892344.971.1667604.0037.99431051.78688.509396.521.3359274.064.514419.1717.801121.5610.462.61474.691784.018117.35042301.155420.578021.0761129.8095307.8729551.119572.0674551.818072.17880.8512.63101.0762526.88731.1469512382.59271173.7945230.052543.3033922.9302256.2762.0099644.443039.555723.2794.6126422.116739.73081005.584339.69541006.34075.22107633.25981098.829.49798789830.74604.74216602319.3161333344623272309.476467713322372659.179217309.23423.833.9036018.022236470515.8114.57419202213.1486923621.46281.3667.17012.24512.0411.024027894.1362353.38988.19413.03104.6844138.752.06936184.38180.967175.10246.35163.6252693.7474101.977224.417154.731144.94293.0906195.19984.8486.931978.9117.5458.59908856816.50889.829911.15395.28095190.57025.860138.71851.51029664.5376.71968158.014263.2361.509116.3098600.2323.79136.853165.4194.62085216.481178.19312.53022086.2516.3849421.754494.2573393.844123.26417.4839.7145.2338.8516.1024.1018.515.469.4225.7116.584.5411.717.419.768.767.9615.665.70516.4611321.7243450386.08521.7424.775.715.75839.41100.1856398.71531496.1913.251519.8413.04823.0924.18209.0295.5478.18255.099.632070.7210.41533.632362.841.1667754.3445.9597869.03938.489419.771.3359377.964.494442.9817.871117.0910.452.87478.851083.204119.30902068.304718.462618.8862136.1780293.1241553.268271.8868555.087671.54360.9612.33104.0482601.21231.3817985282.57437172.5579231.427742.9405930.7506256.8764.4439620.254841.246781.2596.9429412.053140.4508987.545239.41311013.82155.62107092.50301177.129.37360655130.90609.24231302321.7461713044666532308.666489713331172659.559262779.09423.723.9132217.751628972514.5814.14599954612.9487597951.46280.8466.46012.40111.9811.004196894.8972372.41883.46213.03101.0959138.492.06967182.74177.721177.20346.98333.6226694.8614102.920226.783159.100153.79093.9207198.869OpenBenchmarking.org

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-500xd0003900xd0003a520406080100SE +/- 1.13, N = 3SE +/- 1.17, N = 385.9784.84

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-500xd0003900xd0003a520406080100SE +/- 0.37, N = 3SE +/- 0.80, N = 983.8986.93

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 1280xd0003900xd0003a5400800120016002000SE +/- 32.53, N = 7SE +/- 20.92, N = 31941.11978.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003900xd0003a5306090120150SE +/- 1.29, N = 15SE +/- 3.29, N = 15110.32117.551. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003900xd0003a53691215SE +/- 0.10283, N = 15SE +/- 0.23491, N = 159.080678.599081. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPC0xd0003900xd0003a52004006008001000SE +/- 1.53, N = 3SE +/- 0.88, N = 3912856MIN: 140 / MAX: 7236MIN: 137 / MAX: 7211

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a52004006008001000SE +/- 16.59, N = 15SE +/- 14.20, N = 12832.45816.51MIN: 714.88MIN: 723.441. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003900xd0003a520406080100SE +/- 0.79, N = 7SE +/- 1.07, N = 1585.8089.831. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003900xd0003a53691215SE +/- 0.11, N = 7SE +/- 0.13, N = 1511.6611.151. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003900xd0003a51.24832.49663.74494.99326.2415SE +/- 0.00245, N = 3SE +/- 0.11956, N = 155.547835.280951. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003900xd0003a54080120160200SE +/- 0.08, N = 3SE +/- 4.37, N = 15180.16190.571. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003900xd0003a5612182430SE +/- 0.30, N = 3SE +/- 0.26, N = 1525.5725.861. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003900xd0003a5918273645SE +/- 0.46, N = 3SE +/- 0.38, N = 1539.1238.721. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a50.33980.67961.01941.35921.699SE +/- 0.00633, N = 3SE +/- 0.02722, N = 151.434071.510291. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a5150300450600750SE +/- 3.08, N = 3SE +/- 12.01, N = 15696.73664.541. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003900xd0003a5246810SE +/- 0.00502, N = 3SE +/- 0.43315, N = 156.314286.719681. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003900xd0003a5306090120150SE +/- 0.13, N = 3SE +/- 10.25, N = 15158.36158.011. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003900xd0003a560120180240300SE +/- 3.60, N = 3SE +/- 2.19, N = 3268.56263.231. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003900xd0003a51428425670SE +/- 0.28, N = 3SE +/- 1.14, N = 1259.8461.511. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003900xd0003a548121620SE +/- 0.08, N = 3SE +/- 0.26, N = 1216.7116.311. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 2560xd0003900xd0003a5130260390520650SE +/- 2.33, N = 3SE +/- 2.54, N = 3594.6600.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNet0xd0003900xd0003a570140210280350SE +/- 0.87, N = 3SE +/- 0.43, N = 3317.27323.79

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_time0xd0003900xd0003a5306090120150SE +/- 0.54, N = 3SE +/- 0.45, N = 3150.28136.85

Timed MrBayes Analysis

This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MrBayes Analysis 3.2.7Primate Phylogeny Analysis0xd0003900xd0003a54080120160200SE +/- 1.36, N = 3SE +/- 0.98, N = 3166.53165.421. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a51.03972.07943.11914.15885.1985SE +/- 0.02854, N = 3SE +/- 0.04028, N = 84.524034.620851. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a550100150200250SE +/- 1.39, N = 3SE +/- 1.91, N = 8221.02216.481. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003900xd0003a54080120160200SE +/- 0.12, N = 3SE +/- 0.31, N = 3147.51178.191. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 1000xd0003900xd0003a570140210280350SE +/- 0.89, N = 3SE +/- 0.20, N = 3312.20312.531. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Garlicoin0xd0003900xd0003a56K12K18K24K30KSE +/- 330.17, N = 3SE +/- 3833.17, N = 1229203.0022086.251. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_time0xd0003900xd0003a5612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 324.9516.38

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.42, N = 3SE +/- 2.15, N = 3405.97421.75

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.12, N = 3SE +/- 0.51, N = 398.3494.26

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 4000xd0003900xd0003a590180270360450SE +/- 0.94, N = 3SE +/- 0.06, N = 3388.48393.841. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: Li2_STO_ae0xd0003900xd0003a5306090120150SE +/- 1.55, N = 3SE +/- 1.03, N = 3124.23123.261. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 5000xd0003900xd0003a590180270360450SE +/- 0.42, N = 3SE +/- 0.72, N = 3413.21417.481. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: FastestDet0xd0003900xd0003a53691215SE +/- 0.64, N = 3SE +/- 0.07, N = 310.209.71MIN: 9.01 / MAX: 500.17MIN: 9.22 / MAX: 27.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vision_transformer0xd0003900xd0003a51122334455SE +/- 1.17, N = 3SE +/- 0.31, N = 346.9245.23MIN: 43.11 / MAX: 881.49MIN: 43.43 / MAX: 73.391. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: regnety_400m0xd0003900xd0003a51020304050SE +/- 7.69, N = 3SE +/- 0.50, N = 345.5438.85MIN: 36.01 / MAX: 3343.68MIN: 37.13 / MAX: 233.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: squeezenet_ssd0xd0003900xd0003a548121620SE +/- 0.12, N = 3SE +/- 0.25, N = 315.3416.10MIN: 14.63 / MAX: 39.65MIN: 15.43 / MAX: 48.021. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: yolov4-tiny0xd0003900xd0003a5612182430SE +/- 0.22, N = 3SE +/- 0.17, N = 323.7124.10MIN: 22.57 / MAX: 46.11MIN: 23.25 / MAX: 51.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet500xd0003900xd0003a5510152025SE +/- 0.52, N = 3SE +/- 0.69, N = 317.1518.51MIN: 16.19 / MAX: 41.83MIN: 17.32 / MAX: 299.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: alexnet0xd0003900xd0003a51.22852.4573.68554.9146.1425SE +/- 0.16, N = 3SE +/- 0.15, N = 35.395.46MIN: 4.83 / MAX: 151.52MIN: 5.01 / MAX: 29.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet180xd0003900xd0003a53691215SE +/- 0.14, N = 3SE +/- 0.09, N = 28.979.42MIN: 8.63 / MAX: 27.27MIN: 9.21 / MAX: 32.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vgg160xd0003900xd0003a5612182430SE +/- 0.25, N = 3SE +/- 0.19, N = 323.8625.71MIN: 23.05 / MAX: 47.41MIN: 24.88 / MAX: 62.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: googlenet0xd0003900xd0003a548121620SE +/- 0.17, N = 3SE +/- 0.48, N = 315.3616.58MIN: 14.6 / MAX: 182.22MIN: 15.29 / MAX: 39.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: blazeface0xd0003900xd0003a51.02152.0433.06454.0865.1075SE +/- 0.06, N = 3SE +/- 0.07, N = 34.354.54MIN: 4.16 / MAX: 4.97MIN: 4.37 / MAX: 5.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: efficientnet-b00xd0003900xd0003a53691215SE +/- 0.38, N = 3SE +/- 0.11, N = 311.6211.71MIN: 10.82 / MAX: 21.03MIN: 11.15 / MAX: 19.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mnasnet0xd0003900xd0003a5246810SE +/- 0.10, N = 3SE +/- 0.04, N = 37.577.41MIN: 7.28 / MAX: 30.26MIN: 7.15 / MAX: 31.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: shufflenet-v20xd0003900xd0003a53691215SE +/- 0.10, N = 3SE +/- 0.13, N = 39.899.76MIN: 9.61 / MAX: 33.59MIN: 9.32 / MAX: 33.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v3-v3 - Model: mobilenet-v30xd0003900xd0003a5246810SE +/- 0.05, N = 3SE +/- 0.12, N = 38.888.76MIN: 8.69 / MAX: 32.33MIN: 8.3 / MAX: 32.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v2-v2 - Model: mobilenet-v20xd0003900xd0003a5246810SE +/- 0.09, N = 3SE +/- 0.04, N = 38.037.96MIN: 7.8 / MAX: 31.34MIN: 7.75 / MAX: 31.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mobilenet0xd0003900xd0003a548121620SE +/- 0.09, N = 3SE +/- 0.07, N = 315.4615.66MIN: 14.85 / MAX: 106.56MIN: 15.24 / MAX: 38.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Fast0xd0003900xd0003a51.28752.5753.86255.156.4375SE +/- 0.033, N = 3SE +/- 0.029, N = 35.7225.7051. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_time0xd0003900xd0003a5612182430SE +/- 0.11, N = 3SE +/- 0.01, N = 324.7516.46

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: GoogLeNet0xd0003900xd0003a570140210280350SE +/- 1.89, N = 3SE +/- 2.10, N = 3309.63321.72

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Myriad-Groestl0xd0003900xd0003a59K18K27K36K45KSE +/- 386.00, N = 15SE +/- 406.32, N = 343127434501. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.mesh0xd0003900xd0003a580160240320400SE +/- 0.23, N = 3SE +/- 0.80, N = 3385.89386.081. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a5110220330440550SE +/- 7.19, N = 3SE +/- 2.82, N = 3524.38521.74MIN: 499.75MIN: 505.721. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: PartialTweets0xd0003900xd0003a51.07332.14663.21994.29325.3665SE +/- 0.01, N = 3SE +/- 0.01, N = 34.624.771. (CXX) g++ options: -O3

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: DistinctUserID0xd0003900xd0003a51.28482.56963.85445.13926.424SE +/- 0.02, N = 3SE +/- 0.00, N = 35.525.711. (CXX) g++ options: -O3

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: TopTweet0xd0003900xd0003a51.29382.58763.88145.17526.469SE +/- 0.03, N = 3SE +/- 0.01, N = 35.605.751. (CXX) g++ options: -O3

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: AlexNet0xd0003900xd0003a52004006008001000SE +/- 3.77, N = 3SE +/- 2.61, N = 3760.15839.41

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 1.28, N = 493.54100.19

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.21, N = 3SE +/- 4.97, N = 4426.95398.72

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003900xd0003a530060090012001500SE +/- 2.35, N = 3SE +/- 0.81, N = 31490.761496.19MIN: 1074.22 / MAX: 1692.48MIN: 1043.7 / MAX: 1711.441. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.02, N = 3SE +/- 0.01, N = 313.2913.251. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003900xd0003a530060090012001500SE +/- 0.52, N = 3SE +/- 0.84, N = 31517.691519.84MIN: 1081.6 / MAX: 1690.6MIN: 1074.08 / MAX: 1721.591. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.00, N = 3SE +/- 0.01, N = 313.0313.041. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003900xd0003a52004006008001000SE +/- 0.42, N = 3SE +/- 0.71, N = 3827.51823.09MIN: 628.21 / MAX: 980.48MIN: 550.41 / MAX: 926.211. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003900xd0003a5612182430SE +/- 0.01, N = 3SE +/- 0.02, N = 324.0424.181. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003900xd0003a550100150200250SE +/- 0.03, N = 3SE +/- 0.14, N = 3209.31209.02MIN: 160.46 / MAX: 249.09MIN: 152.83 / MAX: 234.281. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003900xd0003a520406080100SE +/- 0.01, N = 3SE +/- 0.07, N = 395.4295.541. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003900xd0003a520406080100SE +/- 0.22, N = 3SE +/- 0.19, N = 379.4778.18MIN: 62.57 / MAX: 232.01MIN: 66.08 / MAX: 194.381. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003900xd0003a560120180240300SE +/- 0.68, N = 3SE +/- 0.63, N = 3251.00255.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.01, N = 3SE +/- 0.02, N = 39.779.63MIN: 7.83 / MAX: 20.01MIN: 8.33 / MAX: 19.331. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003900xd0003a5400800120016002000SE +/- 1.77, N = 3SE +/- 3.75, N = 32039.632070.721. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Faster0xd0003900xd0003a53691215SE +/- 0.09, N = 3SE +/- 0.06, N = 310.3610.421. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003900xd0003a5816243240SE +/- 0.05, N = 3SE +/- 0.06, N = 333.8933.63MIN: 29.83 / MAX: 113.13MIN: 29.54 / MAX: 113.551. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003900xd0003a55001000150020002500SE +/- 3.01, N = 3SE +/- 3.91, N = 32344.972362.841. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003900xd0003a50.2610.5220.7831.0441.305SE +/- 0.00, N = 3SE +/- 0.00, N = 31.161.16MIN: 0.88 / MAX: 12.61MIN: 0.86 / MAX: 17.981. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003900xd0003a515K30K45K60K75KSE +/- 17.48, N = 3SE +/- 29.72, N = 367604.0067754.341. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51020304050SE +/- 0.07, N = 3SE +/- 0.10, N = 337.9945.96

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 1.88, N = 3SE +/- 1.83, N = 31051.79869.04

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003900xd0003a5246810SE +/- 0.00, N = 3SE +/- 0.00, N = 38.508.48MIN: 7.17 / MAX: 18.39MIN: 7.15 / MAX: 22.531. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003900xd0003a52K4K6K8K10KSE +/- 3.74, N = 3SE +/- 2.19, N = 39396.529419.771. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003900xd0003a50.29930.59860.89791.19721.4965SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.33MIN: 0.97 / MAX: 13MIN: 0.97 / MAX: 13.431. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003900xd0003a513K26K39K52K65KSE +/- 17.38, N = 3SE +/- 18.68, N = 359274.0659377.961. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003900xd0003a51.01482.02963.04444.05925.074SE +/- 0.00, N = 3SE +/- 0.00, N = 34.514.49MIN: 4.02 / MAX: 13.95MIN: 4.04 / MAX: 15.851. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003900xd0003a510002000300040005000SE +/- 3.29, N = 3SE +/- 2.64, N = 34419.174442.981. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003900xd0003a548121620SE +/- 0.01, N = 3SE +/- 0.02, N = 317.8017.87MIN: 12.83 / MAX: 32.64MIN: 12.24 / MAX: 38.451. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003900xd0003a52004006008001000SE +/- 0.88, N = 3SE +/- 1.14, N = 31121.561117.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 1 - Input: Bosphorus 4K0xd0003900xd0003a53691215SE +/- 0.06, N = 3SE +/- 0.01, N = 310.4610.451. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: Kostya0xd0003900xd0003a50.64581.29161.93742.58323.229SE +/- 0.00, N = 3SE +/- 0.00, N = 32.612.871. (CXX) g++ options: -O3

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5100200300400500SE +/- 1.00, N = 3SE +/- 1.39, N = 3474.69478.85

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.32, N = 3SE +/- 0.18, N = 384.0283.20

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5510152025SE +/- 0.01, N = 3SE +/- 0.03, N = 317.3519.31

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a55001000150020002500SE +/- 1.14, N = 3SE +/- 3.15, N = 32301.162068.30

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_time0xd0003900xd0003a5510152025SE +/- 0.08, N = 3SE +/- 0.03, N = 320.5818.46

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_time0xd0003900xd0003a5510152025SE +/- 0.05, N = 3SE +/- 0.02, N = 321.0818.89

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5306090120150SE +/- 0.25, N = 3SE +/- 0.60, N = 3129.81136.18

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a570140210280350SE +/- 0.57, N = 3SE +/- 1.05, N = 3307.87293.12

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5120240360480600SE +/- 0.77, N = 3SE +/- 1.19, N = 3551.12553.27

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51632486480SE +/- 0.09, N = 3SE +/- 0.10, N = 372.0771.89

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5120240360480600SE +/- 0.62, N = 3SE +/- 1.12, N = 3551.82555.09

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51632486480SE +/- 0.15, N = 3SE +/- 0.12, N = 372.1871.54

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: LargeRandom0xd0003900xd0003a50.2160.4320.6480.8641.08SE +/- 0.00, N = 3SE +/- 0.00, N = 30.850.961. (CXX) g++ options: -O3

VP9 libvpx Encoding

This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9 video format. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.13Speed: Speed 5 - Input: Bosphorus 4K0xd0003900xd0003a53691215SE +/- 0.12, N = 3SE +/- 0.13, N = 312.6312.331. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003900xd0003a520406080100SE +/- 0.39, N = 3SE +/- 0.38, N = 3101.08104.051. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003900xd0003a56001200180024003000SE +/- 9.78, N = 3SE +/- 9.53, N = 32526.892601.211. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Water-layered Halfspace0xd0003900xd0003a5714212835SE +/- 0.24, N = 3SE +/- 0.31, N = 531.1531.381. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.58341.16681.75022.33362.917SE +/- 0.03265, N = 15SE +/- 0.02489, N = 152.592712.57437MIN: 1.91MIN: 1.991. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a54080120160200SE +/- 0.34, N = 3SE +/- 1.64, N = 3173.79172.56

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a550100150200250SE +/- 0.46, N = 3SE +/- 2.37, N = 3230.05231.43

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51020304050SE +/- 0.53, N = 3SE +/- 0.06, N = 343.3042.94

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 11.47, N = 3SE +/- 1.22, N = 3922.93930.75

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point Problem0xd0003900xd0003a560120180240300SE +/- 0.32, N = 3SE +/- 0.94, N = 3256.27256.871. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51428425670SE +/- 0.23, N = 3SE +/- 0.78, N = 362.0164.44

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5140280420560700SE +/- 2.45, N = 3SE +/- 7.47, N = 3644.44620.25

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: simple-H2O0xd0003900xd0003a5918273645SE +/- 0.12, N = 3SE +/- 0.02, N = 339.5641.251. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: AlexNet0xd0003900xd0003a52004006008001000SE +/- 2.99, N = 3SE +/- 4.27, N = 3723.27781.25

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 0.29, N = 394.6196.94

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.36, N = 3SE +/- 1.24, N = 3422.12412.05

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5918273645SE +/- 0.06, N = 3SE +/- 0.10, N = 339.7340.45

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 1.41, N = 3SE +/- 2.17, N = 31005.58987.55

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5918273645SE +/- 0.02, N = 3SE +/- 0.01, N = 339.7039.41

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 0.56, N = 3SE +/- 0.31, N = 31006.341013.82

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51.26472.52943.79415.05886.3235SE +/- 0.0056, N = 3SE +/- 0.0018, N = 35.22105.6210

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a516003200480064008000SE +/- 8.14, N = 3SE +/- 2.44, N = 37633.267092.50

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 640xd0003900xd0003a530060090012001500SE +/- 8.60, N = 3SE +/- 13.09, N = 151098.81177.11. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Layered Halfspace0xd0003900xd0003a5714212835SE +/- 0.05, N = 3SE +/- 0.18, N = 329.5029.371. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-Only0xd0003900xd0003a5714212835SE +/- 0.06, N = 3SE +/- 0.02, N = 330.7430.90

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 320xd0003900xd0003a5130260390520650SE +/- 8.71, N = 15SE +/- 4.92, N = 3604.7609.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: LBC, LBRY Credits0xd0003900xd0003a590K180K270K360K450KSE +/- 313.42, N = 3SE +/- 860.95, N = 34216604231301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: scrypt0xd0003900xd0003a55001000150020002500SE +/- 1.08, N = 3SE +/- 8.35, N = 32319.312321.741. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Skeincoin0xd0003900xd0003a5130K260K390K520K650KSE +/- 1652.89, N = 3SE +/- 3788.20, N = 36133336171301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Blake-2 S0xd0003900xd0003a51000K2000K3000K4000K5000KSE +/- 8325.80, N = 3SE +/- 8676.85, N = 3446232744666531. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Magi0xd0003900xd0003a55001000150020002500SE +/- 3.91, N = 3SE +/- 1.08, N = 32309.472308.661. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Deepcoin0xd0003900xd0003a514K28K42K56K70KSE +/- 89.69, N = 3SE +/- 187.02, N = 364677648971. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Triple SHA-256, Onecoin0xd0003900xd0003a5300K600K900K1200K1500KSE +/- 7105.40, N = 3SE +/- 7235.75, N = 3133223713331171. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: x25x0xd0003900xd0003a56001200180024003000SE +/- 4.75, N = 3SE +/- 5.54, N = 32659.172659.551. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Quad SHA-256, Pyrite0xd0003900xd0003a5200K400K600K800K1000KSE +/- 3352.41, N = 3SE +/- 1690.96, N = 39217309262771. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2023Implementation: MPI CPU - Input: water_GMX50_bare0xd0003900xd0003a53691215SE +/- 0.021, N = 3SE +/- 0.026, N = 39.2349.0941. (CXX) g++ options: -O3

Blender

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-Only0xd0003900xd0003a5612182430SE +/- 0.06, N = 3SE +/- 0.03, N = 323.8323.72

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.88051.7612.64153.5224.4025SE +/- 0.00202, N = 3SE +/- 0.00080, N = 33.903603.91322MIN: 3.68MIN: 3.691. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Homogeneous Halfspace0xd0003900xd0003a548121620SE +/- 0.15, N = 3SE +/- 0.16, N = 318.0217.751. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

dav1d

Dav1d is an open-source, speedy AV1 video decoder supporting modern SIMD CPU features. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Chimera 1080p0xd0003900xd0003a5110220330440550SE +/- 0.80, N = 3SE +/- 0.51, N = 3515.81514.581. (CC) gcc options: -pthread -lm

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Tomographic Model0xd0003900xd0003a548121620SE +/- 0.02, N = 3SE +/- 0.15, N = 314.5714.151. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Mount St. Helens0xd0003900xd0003a53691215SE +/- 0.18, N = 3SE +/- 0.12, N = 313.1512.951. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only0xd0003900xd0003a50.32850.6570.98551.3141.6425SE +/- 0.00, N = 3SE +/- 0.00, N = 31.461.46

dav1d

Dav1d is an open-source, speedy AV1 video decoder supporting modern SIMD CPU features. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Summer Nature 4K0xd0003900xd0003a560120180240300SE +/- 1.06, N = 3SE +/- 0.81, N = 3281.36280.841. (CC) gcc options: -pthread -lm

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 8 - Input: Bosphorus 4K0xd0003900xd0003a51530456075SE +/- 0.47, N = 3SE +/- 0.46, N = 367.1766.461. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap Example0xd0003900xd0003a53691215SE +/- 0.03, N = 3SE +/- 0.04, N = 312.2512.401. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

CloverLeaf

CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version and benchmarked with the clover_bm.in input file (Problem 5). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeafLagrangian-Eulerian Hydrodynamics0xd0003900xd0003a53691215SE +/- 0.06, N = 3SE +/- 0.09, N = 312.0411.981. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per Direction0xd0003900xd0003a53691215SE +/- 0.02, N = 3SE +/- 0.02, N = 311.0211.001. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003900xd0003a520406080100SE +/- 0.14, N = 3SE +/- 0.44, N = 394.1494.901. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003900xd0003a55001000150020002500SE +/- 3.60, N = 3SE +/- 10.90, N = 32353.392372.421. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Crown0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 0.32, N = 388.1983.46MIN: 85.24 / MAX: 92.72MIN: 80.27 / MAX: 87.69

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only0xd0003900xd0003a50.68181.36362.04542.72723.409SE +/- 0.00, N = 3SE +/- 0.00, N = 33.033.03

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon0xd0003900xd0003a520406080100SE +/- 0.43, N = 3SE +/- 0.19, N = 3104.68101.10MIN: 101.9 / MAX: 109.48MIN: 98.59 / MAX: 105.53

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 4K0xd0003900xd0003a5306090120150SE +/- 0.60, N = 3SE +/- 0.44, N = 3138.75138.491. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.46570.93141.39711.86282.3285SE +/- 0.00038, N = 3SE +/- 0.00191, N = 32.069362.06967MIN: 2.03MIN: 2.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 2.03, N = 3SE +/- 0.38, N = 3184.38182.741. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 12 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 1.29, N = 3SE +/- 1.20, N = 3180.97177.721. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 13 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 0.79, N = 3SE +/- 2.02, N = 3175.10177.201. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003900xd0003a51122334455SE +/- 0.25, N = 3SE +/- 0.61, N = 346.3546.981. (CXX) g++ options: -O3

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.81571.63142.44713.26284.0785SE +/- 0.00896, N = 3SE +/- 0.00622, N = 33.625263.62266MIN: 3.54MIN: 3.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003900xd0003a520406080100SE +/- 1.09, N = 3SE +/- 1.13, N = 393.7594.861. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003900xd0003a520406080100SE +/- 0.27, N = 3SE +/- 0.20, N = 3101.98102.921. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003900xd0003a550100150200250SE +/- 2.25, N = 3SE +/- 2.85, N = 3224.42226.781. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003900xd0003a54080120160200SE +/- 1.53, N = 5SE +/- 1.01, N = 3154.73159.101. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003900xd0003a5306090120150SE +/- 1.81, N = 4SE +/- 1.71, N = 3144.94153.791. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003900xd0003a520406080100SE +/- 0.15, N = 3SE +/- 0.28, N = 393.0993.921. (CXX) g++ options: -O3

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003900xd0003a54080120160200SE +/- 0.92, N = 3SE +/- 1.08, N = 3195.20198.871. (CXX) g++ options: -O3

173 Results Shown

TensorFlow:
  CPU - 512 - ResNet-50
  CPU - 256 - ResNet-50
libxsmm
ONNX Runtime:
  fcn-resnet101-11 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
OpenVKL
oneDNN
ONNX Runtime:
  yolov4 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  GPT-2 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  ArcFace ResNet-100 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  CaffeNet 12-int8 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
  super-resolution-10 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
QMCPACK
ONNX Runtime:
  bertsquad-12 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
libxsmm
TensorFlow
OSPRay
Timed MrBayes Analysis
ONNX Runtime:
  ResNet50 v1-12-int8 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
QMCPACK
Palabos
Cpuminer-Opt
OSPRay
Neural Magic DeepSparse:
  BERT-Large, NLP Question Answering - Asynchronous Multi-Stream:
    ms/batch
    items/sec
Palabos
QMCPACK
Palabos
NCNN:
  CPU - FastestDet
  CPU - vision_transformer
  CPU - regnety_400m
  CPU - squeezenet_ssd
  CPU - yolov4-tiny
  CPU - resnet50
  CPU - alexnet
  CPU - resnet18
  CPU - vgg16
  CPU - googlenet
  CPU - blazeface
  CPU - efficientnet-b0
  CPU - mnasnet
  CPU - shufflenet-v2
  CPU-v3-v3 - mobilenet-v3
  CPU-v2-v2 - mobilenet-v2
  CPU - mobilenet
VVenC
OSPRay
TensorFlow
Cpuminer-Opt
Laghos
oneDNN
simdjson:
  PartialTweets
  DistinctUserID
  TopTweet
TensorFlow
Neural Magic DeepSparse:
  CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
OpenVINO:
  Person Detection FP16 - CPU:
    ms
    FPS
  Person Detection FP32 - CPU:
    ms
    FPS
  Face Detection FP16 - CPU:
    ms
    FPS
  Face Detection FP16-INT8 - CPU:
    ms
    FPS
  Machine Translation EN To DE FP16 - CPU:
    ms
    FPS
  Person Vehicle Bike Detection FP16 - CPU:
    ms
    FPS
VVenC
OpenVINO:
  Weld Porosity Detection FP16 - CPU:
    ms
    FPS
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
    ms
    FPS
Neural Magic DeepSparse:
  BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
OpenVINO:
  Weld Porosity Detection FP16-INT8 - CPU:
    ms
    FPS
  Age Gender Recognition Retail 0013 FP16 - CPU:
    ms
    FPS
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
  Vehicle Detection FP16 - CPU:
    ms
    FPS
SVT-HEVC
simdjson
Neural Magic DeepSparse:
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
OSPRay:
  gravity_spheres_volume/dim_512/scivis/real_time
  gravity_spheres_volume/dim_512/ao/real_time
Neural Magic DeepSparse:
  NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
simdjson
VP9 libvpx Encoding
miniBUDE:
  OpenMP - BM2:
    Billion Interactions/s
    GFInst/s
SPECFEM3D
oneDNN
Neural Magic DeepSparse:
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream:
    ms/batch
    items/sec
Laghos
Neural Magic DeepSparse:
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
    ms/batch
    items/sec
QMCPACK
TensorFlow
Neural Magic DeepSparse:
  CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  ResNet-50, Baseline - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  ResNet-50, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
libxsmm
SPECFEM3D
Blender
libxsmm
Cpuminer-Opt:
  LBC, LBRY Credits
  scrypt
  Skeincoin
  Blake-2 S
  Magi
  Deepcoin
  Triple SHA-256, Onecoin
  x25x
  Quad SHA-256, Pyrite
GROMACS
Blender
oneDNN
SPECFEM3D
dav1d
SPECFEM3D:
  Tomographic Model
  Mount St. Helens
Intel Open Image Denoise
dav1d
SVT-AV1
Remhos
CloverLeaf
Xcompact3d Incompact3d
miniBUDE:
  OpenMP - BM1:
    Billion Interactions/s
    GFInst/s
Embree
Intel Open Image Denoise
Embree
SVT-HEVC
oneDNN
SVT-HEVC
SVT-AV1:
  Preset 12 - Bosphorus 4K
  Preset 13 - Bosphorus 4K
HeFFTe - Highly Efficient FFT for Exascale
oneDNN
HeFFTe - Highly Efficient FFT for Exascale:
  r2c - FFTW - double - 256
  c2c - FFTW - float - 256
  r2c - FFTW - float - 256
  c2c - FFTW - float - 128
  r2c - FFTW - double - 128
  c2c - FFTW - double - 128
  r2c - FFTW - float - 128