Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2308099-NE-XEONPLATI49&sro&grs.

Xeon Platinum 8380 AVX-512 WorkloadsProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen Resolution0xd0003900xd0003a52 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads)Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS)Intel Ice Lake IEH512GB7682GB INTEL SSDPF2KX076TZASPEEDVE2282 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFPUbuntu 22.106.5.0-060500rc4daily20230804-generic (x86_64)GNOME Shell 43.0X Server 1.21.1.31.3.224GCC 12.2.0ext41920x10806.5.0-rc5-phx-tues (x86_64)OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- 0xd000390: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000390- 0xd0003a5: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5Python Details- Python 3.10.7Security Details- 0xd000390: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - 0xd0003a5: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

Xeon Platinum 8380 AVX-512 Workloadsospray: particle_volume/scivis/real_timeospray: particle_volume/ao/real_timedeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamqmcpack: FeCO6_b3lyp_gmssimdjson: LargeRandospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timedeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamtensorflow: CPU - 512 - AlexNetsimdjson: Kostyaospray: particle_volume/pathtracer/real_timetensorflow: CPU - 256 - AlexNetncnn: CPU - googlenetncnn: CPU - vgg16deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamlibxsmm: 64deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamopenvkl: vklBenchmark ISPCheffte: r2c - FFTW - double - 128embree: Pathtracer ISPC - Crowndeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamncnn: CPU - resnet18ncnn: CPU - squeezenet_ssddeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamonnx: yolov4 - CPU - Standardncnn: CPU - blazefacedeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamqmcpack: simple-H2Odeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamtensorflow: CPU - 256 - GoogLeNetdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamncnn: CPU - vision_transformertensorflow: CPU - 256 - ResNet-50embree: Pathtracer ISPC - Asian Dragonsimdjson: DistinctUserIDsimdjson: PartialTweetsspecfem3d: Tomographic Modelminibude: OpenMP - BM2minibude: OpenMP - BM2heffte: c2c - FFTW - float - 128simdjson: TopTweetdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamonnx: bertsquad-12 - CPU - Standarddeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamvpxenc: Speed 5 - Bosphorus 4Kncnn: CPU - mnasnetonnx: ResNet50 v1-12-int8 - CPU - Standardtensorflow: CPU - 512 - GoogLeNetqmcpack: FeCO6_b3lyp_gmslibxsmm: 128heffte: r2c - FFTW - float - 128deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamsvt-av1: Preset 12 - Bosphorus 4Kdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamopenvino: Machine Translation EN To DE FP16 - CPUncnn: CPU - yolov4-tinyopenvino: Machine Translation EN To DE FP16 - CPUspecfem3d: Mount St. Helensgromacs: MPI CPU - water_GMX50_barespecfem3d: Homogeneous Halfspaceopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUpalabos: 400ncnn: CPU-v3-v3 - mobilenet-v3heffte: c2c - FFTW - double - 256ncnn: CPU - shufflenet-v2tensorflow: CPU - 512 - ResNet-50ncnn: CPU - alexnetncnn: CPU - mobilenetremhos: Sample Remap Examplesvt-av1: Preset 13 - Bosphorus 4Kheffte: r2c - FFTW - double - 256svt-av1: Preset 8 - Bosphorus 4Kheffte: r2c - FFTW - float - 256palabos: 500onnx: ArcFace ResNet-100 - CPU - Standarddeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamlibxsmm: 256heffte: c2c - FFTW - float - 256svt-hevc: 10 - Bosphorus 4Kheffte: c2c - FFTW - double - 128deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamncnn: CPU-v2-v2 - mobilenet-v2deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamminibude: OpenMP - BM1minibude: OpenMP - BM1qmcpack: Li2_STO_aencnn: CPU - efficientnet-b0openvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUspecfem3d: Water-layered Halfspacecpuminer-opt: Myriad-Groestllibxsmm: 32deepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamonednn: IP Shapes 3D - bf16bf16bf16 - CPUmrbayes: Primate Phylogeny Analysiscpuminer-opt: Skeincoindeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamopenvino: Face Detection FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Face Detection FP16 - CPUblender: Fishy Cat - CPU-Onlyonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUcloverleaf: Lagrangian-Eulerian Hydrodynamicscpuminer-opt: Quad SHA-256, Pyritevvenc: Bosphorus 4K - Fasterblender: BMW27 - CPU-Onlyopenvino: Vehicle Detection FP16-INT8 - CPUspecfem3d: Layered Halfspaceopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamopenvino: Person Detection FP16 - CPUcpuminer-opt: LBC, LBRY Creditscpuminer-opt: Deepcoinopenvino: Person Detection FP16 - CPUvvenc: Bosphorus 4K - Fastdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamopenvino: Weld Porosity Detection FP16-INT8 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUdav1d: Chimera 1080popenvino: Weld Porosity Detection FP16-INT8 - CPUlaghos: Triple Point Problemopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUsvt-hevc: 7 - Bosphorus 4Kdav1d: Summer Nature 4Kincompact3d: input.i3d 193 Cells Per Directionopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUpalabos: 100cpuminer-opt: scryptcpuminer-opt: Blake-2 Ssvt-hevc: 1 - Bosphorus 4Kopenvino: Person Detection FP32 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUcpuminer-opt: Triple SHA-256, Onecoinlaghos: Sedov Blast Wave, ube_922_hex.meshcpuminer-opt: Magionednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUcpuminer-opt: x25xoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUncnn: CPU - FastestDetncnn: CPU - regnety_400mncnn: CPU - resnet50onnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: GPT-2 - CPU - Standardcpuminer-opt: Garlicoinonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU0xd0003900xd0003a524.950624.74731051.786837.9943147.510.8521.076120.578017.35042301.1554760.152.61150.281723.2715.3623.865.22107633.25981098.893.5399426.9495912144.94288.1941307.87298.9715.34129.809511.66054.3598.339439.55562.0099309.63644.4430405.966246.9283.89104.68445.524.6214.5741920222526.887101.076154.7315.6094.612616.7110422.116712.637.57221.020317.27268.561941.1195.1991005.5843180.96739.730879.4723.71251.0013.1486923629.23418.0222364702039.639.77388.4768.8846.35169.8985.975.3915.4612.245175.10293.747467.170224.417413.20739.115784.0181594.6101.977184.3893.090672.17888.03474.6917922.930243.30332353.38994.136124.2311.6233.892344.9731.14695123843127604.71006.3407173.794539.69542.59271166.528613333230.0525551.818024.044419.17827.5130.74524.38112.0492173010.36423.834.5129.4979878981121.5617.80551.11951490.764216606467713.295.72272.06749396.523.90360515.818.50256.2767604.00138.75281.3611.024027859274.061517.69209.3195.42312.1952319.31446232710.4613.033.625261332237385.892309.472.069362659.171.463.031.161.3310.2045.5417.156.31428158.3554.5240325.5687110.3239.080671.43407696.72559.841585.79875.54783180.16329203832.45216.384916.4611869.039345.9597178.190.9618.886218.462619.30902068.3047839.412.87136.853781.2516.5825.715.62107092.50301177.1100.1856398.7153856153.79083.4621293.12419.4216.10136.178011.15394.5494.257341.24664.4439321.72620.2548421.754445.2386.93101.09595.714.7714.1459995462601.212104.048159.1005.7596.942916.3098412.053112.337.41216.481323.79263.231978.9198.869987.5452177.72140.450878.1824.10255.0912.9487597959.09417.7516289722070.729.63393.8448.7646.98339.7684.845.4615.6612.401177.20394.861466.460226.783417.48338.718583.2041600.2102.920182.7493.920771.54367.96478.8510930.750642.94052372.41894.897123.2611.7133.632362.8431.38179852843450609.21013.8215172.557939.41312.57437165.419617130231.4277555.087624.184442.98823.0930.90521.74211.9892627710.41523.724.4929.3736065511117.0917.87553.26821496.194231306489713.255.70571.88689419.773.91322514.588.48256.8767754.34138.49280.8411.004196859377.961519.84209.0295.54312.5302321.74446665310.4513.043.622661333117386.082308.662.069672659.551.463.031.161.339.7138.8518.516.71968158.0144.6208525.8601117.5458.599081.51029664.53761.509189.82995.28095190.57022086.25816.508OpenBenchmarking.org

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_time0xd0003900xd0003a5612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 324.9516.38

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_time0xd0003900xd0003a5612182430SE +/- 0.11, N = 3SE +/- 0.01, N = 324.7516.46

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 1.88, N = 3SE +/- 1.83, N = 31051.79869.04

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51020304050SE +/- 0.07, N = 3SE +/- 0.10, N = 337.9945.96

QMCPACK

Input: FeCO6_b3lyp_gms

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003900xd0003a54080120160200SE +/- 0.12, N = 3SE +/- 0.31, N = 3147.51178.191. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

simdjson

Throughput Test: LargeRandom

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: LargeRandom0xd0003900xd0003a50.2160.4320.6480.8641.08SE +/- 0.00, N = 3SE +/- 0.00, N = 30.850.961. (CXX) g++ options: -O3

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_time0xd0003900xd0003a5510152025SE +/- 0.05, N = 3SE +/- 0.02, N = 321.0818.89

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_time0xd0003900xd0003a5510152025SE +/- 0.08, N = 3SE +/- 0.03, N = 320.5818.46

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5510152025SE +/- 0.01, N = 3SE +/- 0.03, N = 317.3519.31

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a55001000150020002500SE +/- 1.14, N = 3SE +/- 3.15, N = 32301.162068.30

TensorFlow

Device: CPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: AlexNet0xd0003900xd0003a52004006008001000SE +/- 3.77, N = 3SE +/- 2.61, N = 3760.15839.41

simdjson

Throughput Test: Kostya

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: Kostya0xd0003900xd0003a50.64581.29161.93742.58323.229SE +/- 0.00, N = 3SE +/- 0.00, N = 32.612.871. (CXX) g++ options: -O3

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_time0xd0003900xd0003a5306090120150SE +/- 0.54, N = 3SE +/- 0.45, N = 3150.28136.85

TensorFlow

Device: CPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: AlexNet0xd0003900xd0003a52004006008001000SE +/- 2.99, N = 3SE +/- 4.27, N = 3723.27781.25

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: googlenet0xd0003900xd0003a548121620SE +/- 0.17, N = 3SE +/- 0.48, N = 315.3616.58MIN: 14.6 / MAX: 182.22MIN: 15.29 / MAX: 39.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vgg160xd0003900xd0003a5612182430SE +/- 0.25, N = 3SE +/- 0.19, N = 323.8625.71MIN: 23.05 / MAX: 47.41MIN: 24.88 / MAX: 62.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51.26472.52943.79415.05886.3235SE +/- 0.0056, N = 3SE +/- 0.0018, N = 35.22105.6210

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a516003200480064008000SE +/- 8.14, N = 3SE +/- 2.44, N = 37633.267092.50

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 640xd0003900xd0003a530060090012001500SE +/- 8.60, N = 3SE +/- 13.09, N = 151098.81177.11. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 1.28, N = 493.54100.19

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.21, N = 3SE +/- 4.97, N = 4426.95398.72

OpenVKL

Benchmark: vklBenchmark ISPC

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPC0xd0003900xd0003a52004006008001000SE +/- 1.53, N = 3SE +/- 0.88, N = 3912856MIN: 140 / MAX: 7236MIN: 137 / MAX: 7211

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003900xd0003a5306090120150SE +/- 1.81, N = 4SE +/- 1.71, N = 3144.94153.791. (CXX) g++ options: -O3

Embree

Binary: Pathtracer ISPC - Model: Crown

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Crown0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 0.32, N = 388.1983.46MIN: 85.24 / MAX: 92.72MIN: 80.27 / MAX: 87.69

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a570140210280350SE +/- 0.57, N = 3SE +/- 1.05, N = 3307.87293.12

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet180xd0003900xd0003a53691215SE +/- 0.14, N = 3SE +/- 0.09, N = 28.979.42MIN: 8.63 / MAX: 27.27MIN: 9.21 / MAX: 32.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: squeezenet_ssd0xd0003900xd0003a548121620SE +/- 0.12, N = 3SE +/- 0.25, N = 315.3416.10MIN: 14.63 / MAX: 39.65MIN: 15.43 / MAX: 48.021. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5306090120150SE +/- 0.25, N = 3SE +/- 0.60, N = 3129.81136.18

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003900xd0003a53691215SE +/- 0.11, N = 7SE +/- 0.13, N = 1511.6611.151. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: blazeface0xd0003900xd0003a51.02152.0433.06454.0865.1075SE +/- 0.06, N = 3SE +/- 0.07, N = 34.354.54MIN: 4.16 / MAX: 4.97MIN: 4.37 / MAX: 5.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.12, N = 3SE +/- 0.51, N = 398.3494.26

QMCPACK

Input: simple-H2O

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: simple-H2O0xd0003900xd0003a5918273645SE +/- 0.12, N = 3SE +/- 0.02, N = 339.5641.251. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51428425670SE +/- 0.23, N = 3SE +/- 0.78, N = 362.0164.44

TensorFlow

Device: CPU - Batch Size: 256 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: GoogLeNet0xd0003900xd0003a570140210280350SE +/- 1.89, N = 3SE +/- 2.10, N = 3309.63321.72

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5140280420560700SE +/- 2.45, N = 3SE +/- 7.47, N = 3644.44620.25

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.42, N = 3SE +/- 2.15, N = 3405.97421.75

NCNN

Target: CPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vision_transformer0xd0003900xd0003a51122334455SE +/- 1.17, N = 3SE +/- 0.31, N = 346.9245.23MIN: 43.11 / MAX: 881.49MIN: 43.43 / MAX: 73.391. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

TensorFlow

Device: CPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-500xd0003900xd0003a520406080100SE +/- 0.37, N = 3SE +/- 0.80, N = 983.8986.93

Embree

Binary: Pathtracer ISPC - Model: Asian Dragon

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon0xd0003900xd0003a520406080100SE +/- 0.43, N = 3SE +/- 0.19, N = 3104.68101.10MIN: 101.9 / MAX: 109.48MIN: 98.59 / MAX: 105.53

simdjson

Throughput Test: DistinctUserID

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: DistinctUserID0xd0003900xd0003a51.28482.56963.85445.13926.424SE +/- 0.02, N = 3SE +/- 0.00, N = 35.525.711. (CXX) g++ options: -O3

simdjson

Throughput Test: PartialTweets

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: PartialTweets0xd0003900xd0003a51.07332.14663.21994.29325.3665SE +/- 0.01, N = 3SE +/- 0.01, N = 34.624.771. (CXX) g++ options: -O3

SPECFEM3D

Model: Tomographic Model

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Tomographic Model0xd0003900xd0003a548121620SE +/- 0.02, N = 3SE +/- 0.15, N = 314.5714.151. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003900xd0003a56001200180024003000SE +/- 9.78, N = 3SE +/- 9.53, N = 32526.892601.211. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003900xd0003a520406080100SE +/- 0.39, N = 3SE +/- 0.38, N = 3101.08104.051. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003900xd0003a54080120160200SE +/- 1.53, N = 5SE +/- 1.01, N = 3154.73159.101. (CXX) g++ options: -O3

simdjson

Throughput Test: TopTweet

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: TopTweet0xd0003900xd0003a51.29382.58763.88145.17526.469SE +/- 0.03, N = 3SE +/- 0.01, N = 35.605.751. (CXX) g++ options: -O3

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 0.29, N = 394.6196.94

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003900xd0003a548121620SE +/- 0.08, N = 3SE +/- 0.26, N = 1216.7116.311. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.36, N = 3SE +/- 1.24, N = 3422.12412.05

VP9 libvpx Encoding

Speed: Speed 5 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.13Speed: Speed 5 - Input: Bosphorus 4K0xd0003900xd0003a53691215SE +/- 0.12, N = 3SE +/- 0.13, N = 312.6312.331. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mnasnet0xd0003900xd0003a5246810SE +/- 0.10, N = 3SE +/- 0.04, N = 37.577.41MIN: 7.28 / MAX: 30.26MIN: 7.15 / MAX: 31.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a550100150200250SE +/- 1.39, N = 3SE +/- 1.91, N = 8221.02216.481. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

TensorFlow

Device: CPU - Batch Size: 512 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNet0xd0003900xd0003a570140210280350SE +/- 0.87, N = 3SE +/- 0.43, N = 3317.27323.79

QMCPACK

Input: FeCO6_b3lyp_gms

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003900xd0003a560120180240300SE +/- 3.60, N = 3SE +/- 2.19, N = 3268.56263.231. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 1280xd0003900xd0003a5400800120016002000SE +/- 32.53, N = 7SE +/- 20.92, N = 31941.11978.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003900xd0003a54080120160200SE +/- 0.92, N = 3SE +/- 1.08, N = 3195.20198.871. (CXX) g++ options: -O3

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 1.41, N = 3SE +/- 2.17, N = 31005.58987.55

SVT-AV1

Encoder Mode: Preset 12 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 12 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 1.29, N = 3SE +/- 1.20, N = 3180.97177.721. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5918273645SE +/- 0.06, N = 3SE +/- 0.10, N = 339.7340.45

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003900xd0003a520406080100SE +/- 0.22, N = 3SE +/- 0.19, N = 379.4778.18MIN: 62.57 / MAX: 232.01MIN: 66.08 / MAX: 194.381. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: yolov4-tiny0xd0003900xd0003a5612182430SE +/- 0.22, N = 3SE +/- 0.17, N = 323.7124.10MIN: 22.57 / MAX: 46.11MIN: 23.25 / MAX: 51.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003900xd0003a560120180240300SE +/- 0.68, N = 3SE +/- 0.63, N = 3251.00255.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SPECFEM3D

Model: Mount St. Helens

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Mount St. Helens0xd0003900xd0003a53691215SE +/- 0.18, N = 3SE +/- 0.12, N = 313.1512.951. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2023Implementation: MPI CPU - Input: water_GMX50_bare0xd0003900xd0003a53691215SE +/- 0.021, N = 3SE +/- 0.026, N = 39.2349.0941. (CXX) g++ options: -O3

SPECFEM3D

Model: Homogeneous Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Homogeneous Halfspace0xd0003900xd0003a548121620SE +/- 0.15, N = 3SE +/- 0.16, N = 318.0217.751. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003900xd0003a5400800120016002000SE +/- 1.77, N = 3SE +/- 3.75, N = 32039.632070.721. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.01, N = 3SE +/- 0.02, N = 39.779.63MIN: 7.83 / MAX: 20.01MIN: 8.33 / MAX: 19.331. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Palabos

Grid Size: 400

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 4000xd0003900xd0003a590180270360450SE +/- 0.94, N = 3SE +/- 0.06, N = 3388.48393.841. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v3-v3 - Model: mobilenet-v30xd0003900xd0003a5246810SE +/- 0.05, N = 3SE +/- 0.12, N = 38.888.76MIN: 8.69 / MAX: 32.33MIN: 8.3 / MAX: 32.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003900xd0003a51122334455SE +/- 0.25, N = 3SE +/- 0.61, N = 346.3546.981. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: shufflenet-v20xd0003900xd0003a53691215SE +/- 0.10, N = 3SE +/- 0.13, N = 39.899.76MIN: 9.61 / MAX: 33.59MIN: 9.32 / MAX: 33.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

TensorFlow

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-500xd0003900xd0003a520406080100SE +/- 1.13, N = 3SE +/- 1.17, N = 385.9784.84

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: alexnet0xd0003900xd0003a51.22852.4573.68554.9146.1425SE +/- 0.16, N = 3SE +/- 0.15, N = 35.395.46MIN: 4.83 / MAX: 151.52MIN: 5.01 / MAX: 29.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mobilenet0xd0003900xd0003a548121620SE +/- 0.09, N = 3SE +/- 0.07, N = 315.4615.66MIN: 14.85 / MAX: 106.56MIN: 15.24 / MAX: 38.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Remhos

Test: Sample Remap Example

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap Example0xd0003900xd0003a53691215SE +/- 0.03, N = 3SE +/- 0.04, N = 312.2512.401. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 13 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 0.79, N = 3SE +/- 2.02, N = 3175.10177.201. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003900xd0003a520406080100SE +/- 1.09, N = 3SE +/- 1.13, N = 393.7594.861. (CXX) g++ options: -O3

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 8 - Input: Bosphorus 4K0xd0003900xd0003a51530456075SE +/- 0.47, N = 3SE +/- 0.46, N = 367.1766.461. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003900xd0003a550100150200250SE +/- 2.25, N = 3SE +/- 2.85, N = 3224.42226.781. (CXX) g++ options: -O3

Palabos

Grid Size: 500

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 5000xd0003900xd0003a590180270360450SE +/- 0.42, N = 3SE +/- 0.72, N = 3413.21417.481. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003900xd0003a5918273645SE +/- 0.46, N = 3SE +/- 0.38, N = 1539.1238.721. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.32, N = 3SE +/- 0.18, N = 384.0283.20

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 2560xd0003900xd0003a5130260390520650SE +/- 2.33, N = 3SE +/- 2.54, N = 3594.6600.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003900xd0003a520406080100SE +/- 0.27, N = 3SE +/- 0.20, N = 3101.98102.921. (CXX) g++ options: -O3

SVT-HEVC

Tuning: 10 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 2.03, N = 3SE +/- 0.38, N = 3184.38182.741. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003900xd0003a520406080100SE +/- 0.15, N = 3SE +/- 0.28, N = 393.0993.921. (CXX) g++ options: -O3

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51632486480SE +/- 0.15, N = 3SE +/- 0.12, N = 372.1871.54

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v2-v2 - Model: mobilenet-v20xd0003900xd0003a5246810SE +/- 0.09, N = 3SE +/- 0.04, N = 38.037.96MIN: 7.8 / MAX: 31.34MIN: 7.75 / MAX: 31.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5100200300400500SE +/- 1.00, N = 3SE +/- 1.39, N = 3474.69478.85

Neural Magic DeepSparse

Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 11.47, N = 3SE +/- 1.22, N = 3922.93930.75

Neural Magic DeepSparse

Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51020304050SE +/- 0.53, N = 3SE +/- 0.06, N = 343.3042.94

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003900xd0003a55001000150020002500SE +/- 3.60, N = 3SE +/- 10.90, N = 32353.392372.421. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003900xd0003a520406080100SE +/- 0.14, N = 3SE +/- 0.44, N = 394.1494.901. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

QMCPACK

Input: Li2_STO_ae

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: Li2_STO_ae0xd0003900xd0003a5306090120150SE +/- 1.55, N = 3SE +/- 1.03, N = 3124.23123.261. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: efficientnet-b00xd0003900xd0003a53691215SE +/- 0.38, N = 3SE +/- 0.11, N = 311.6211.71MIN: 10.82 / MAX: 21.03MIN: 11.15 / MAX: 19.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003900xd0003a5816243240SE +/- 0.05, N = 3SE +/- 0.06, N = 333.8933.63MIN: 29.83 / MAX: 113.13MIN: 29.54 / MAX: 113.551. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003900xd0003a55001000150020002500SE +/- 3.01, N = 3SE +/- 3.91, N = 32344.972362.841. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SPECFEM3D

Model: Water-layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Water-layered Halfspace0xd0003900xd0003a5714212835SE +/- 0.24, N = 3SE +/- 0.31, N = 531.1531.381. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Cpuminer-Opt

Algorithm: Myriad-Groestl

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Myriad-Groestl0xd0003900xd0003a59K18K27K36K45KSE +/- 386.00, N = 15SE +/- 406.32, N = 343127434501. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 320xd0003900xd0003a5130260390520650SE +/- 8.71, N = 15SE +/- 4.92, N = 3604.7609.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 0.56, N = 3SE +/- 0.31, N = 31006.341013.82

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a54080120160200SE +/- 0.34, N = 3SE +/- 1.64, N = 3173.79172.56

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5918273645SE +/- 0.02, N = 3SE +/- 0.01, N = 339.7039.41

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.58341.16681.75022.33362.917SE +/- 0.03265, N = 15SE +/- 0.02489, N = 152.592712.57437MIN: 1.91MIN: 1.991. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Timed MrBayes Analysis

Primate Phylogeny Analysis

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MrBayes Analysis 3.2.7Primate Phylogeny Analysis0xd0003900xd0003a54080120160200SE +/- 1.36, N = 3SE +/- 0.98, N = 3166.53165.421. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline

Cpuminer-Opt

Algorithm: Skeincoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Skeincoin0xd0003900xd0003a5130K260K390K520K650KSE +/- 1652.89, N = 3SE +/- 3788.20, N = 36133336171301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a550100150200250SE +/- 0.46, N = 3SE +/- 2.37, N = 3230.05231.43

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5120240360480600SE +/- 0.62, N = 3SE +/- 1.12, N = 3551.82555.09

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003900xd0003a5612182430SE +/- 0.01, N = 3SE +/- 0.02, N = 324.0424.181. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003900xd0003a510002000300040005000SE +/- 3.29, N = 3SE +/- 2.64, N = 34419.174442.981. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003900xd0003a52004006008001000SE +/- 0.42, N = 3SE +/- 0.71, N = 3827.51823.09MIN: 628.21 / MAX: 980.48MIN: 550.41 / MAX: 926.211. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-Only0xd0003900xd0003a5714212835SE +/- 0.06, N = 3SE +/- 0.02, N = 330.7430.90

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a5110220330440550SE +/- 7.19, N = 3SE +/- 2.82, N = 3524.38521.74MIN: 499.75MIN: 505.721. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

CloverLeaf

Lagrangian-Eulerian Hydrodynamics

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeafLagrangian-Eulerian Hydrodynamics0xd0003900xd0003a53691215SE +/- 0.06, N = 3SE +/- 0.09, N = 312.0411.981. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

Cpuminer-Opt

Algorithm: Quad SHA-256, Pyrite

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Quad SHA-256, Pyrite0xd0003900xd0003a5200K400K600K800K1000KSE +/- 3352.41, N = 3SE +/- 1690.96, N = 39217309262771. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

VVenC

Video Input: Bosphorus 4K - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Faster0xd0003900xd0003a53691215SE +/- 0.09, N = 3SE +/- 0.06, N = 310.3610.421. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-Only0xd0003900xd0003a5612182430SE +/- 0.06, N = 3SE +/- 0.03, N = 323.8323.72

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003900xd0003a51.01482.02963.04444.05925.074SE +/- 0.00, N = 3SE +/- 0.00, N = 34.514.49MIN: 4.02 / MAX: 13.95MIN: 4.04 / MAX: 15.851. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SPECFEM3D

Model: Layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Layered Halfspace0xd0003900xd0003a5714212835SE +/- 0.05, N = 3SE +/- 0.18, N = 329.5029.371. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003900xd0003a52004006008001000SE +/- 0.88, N = 3SE +/- 1.14, N = 31121.561117.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003900xd0003a548121620SE +/- 0.01, N = 3SE +/- 0.02, N = 317.8017.87MIN: 12.83 / MAX: 32.64MIN: 12.24 / MAX: 38.451. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5120240360480600SE +/- 0.77, N = 3SE +/- 1.19, N = 3551.12553.27

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003900xd0003a530060090012001500SE +/- 2.35, N = 3SE +/- 0.81, N = 31490.761496.19MIN: 1074.22 / MAX: 1692.48MIN: 1043.7 / MAX: 1711.441. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Cpuminer-Opt

Algorithm: LBC, LBRY Credits

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: LBC, LBRY Credits0xd0003900xd0003a590K180K270K360K450KSE +/- 313.42, N = 3SE +/- 860.95, N = 34216604231301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Deepcoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Deepcoin0xd0003900xd0003a514K28K42K56K70KSE +/- 89.69, N = 3SE +/- 187.02, N = 364677648971. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.02, N = 3SE +/- 0.01, N = 313.2913.251. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

VVenC

Video Input: Bosphorus 4K - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Fast0xd0003900xd0003a51.28752.5753.86255.156.4375SE +/- 0.033, N = 3SE +/- 0.029, N = 35.7225.7051. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51632486480SE +/- 0.09, N = 3SE +/- 0.10, N = 372.0771.89

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003900xd0003a52K4K6K8K10KSE +/- 3.74, N = 3SE +/- 2.19, N = 39396.529419.771. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.88051.7612.64153.5224.4025SE +/- 0.00202, N = 3SE +/- 0.00080, N = 33.903603.91322MIN: 3.68MIN: 3.691. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

dav1d

Video Input: Chimera 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Chimera 1080p0xd0003900xd0003a5110220330440550SE +/- 0.80, N = 3SE +/- 0.51, N = 3515.81514.581. (CC) gcc options: -pthread -lm

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003900xd0003a5246810SE +/- 0.00, N = 3SE +/- 0.00, N = 38.508.48MIN: 7.17 / MAX: 18.39MIN: 7.15 / MAX: 22.531. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Laghos

Test: Triple Point Problem

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point Problem0xd0003900xd0003a560120180240300SE +/- 0.32, N = 3SE +/- 0.94, N = 3256.27256.871. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003900xd0003a515K30K45K60K75KSE +/- 17.48, N = 3SE +/- 29.72, N = 367604.0067754.341. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

SVT-HEVC

Tuning: 7 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 4K0xd0003900xd0003a5306090120150SE +/- 0.60, N = 3SE +/- 0.44, N = 3138.75138.491. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

dav1d

Video Input: Summer Nature 4K

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Summer Nature 4K0xd0003900xd0003a560120180240300SE +/- 1.06, N = 3SE +/- 0.81, N = 3281.36280.841. (CC) gcc options: -pthread -lm

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per Direction0xd0003900xd0003a53691215SE +/- 0.02, N = 3SE +/- 0.02, N = 311.0211.001. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003900xd0003a513K26K39K52K65KSE +/- 17.38, N = 3SE +/- 18.68, N = 359274.0659377.961. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003900xd0003a530060090012001500SE +/- 0.52, N = 3SE +/- 0.84, N = 31517.691519.84MIN: 1081.6 / MAX: 1690.6MIN: 1074.08 / MAX: 1721.591. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003900xd0003a550100150200250SE +/- 0.03, N = 3SE +/- 0.14, N = 3209.31209.02MIN: 160.46 / MAX: 249.09MIN: 152.83 / MAX: 234.281. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003900xd0003a520406080100SE +/- 0.01, N = 3SE +/- 0.07, N = 395.4295.541. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

Palabos

Grid Size: 100

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 1000xd0003900xd0003a570140210280350SE +/- 0.89, N = 3SE +/- 0.20, N = 3312.20312.531. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

Cpuminer-Opt

Algorithm: scrypt

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: scrypt0xd0003900xd0003a55001000150020002500SE +/- 1.08, N = 3SE +/- 8.35, N = 32319.312321.741. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Blake-2 S

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Blake-2 S0xd0003900xd0003a51000K2000K3000K4000K5000KSE +/- 8325.80, N = 3SE +/- 8676.85, N = 3446232744666531. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

SVT-HEVC

Tuning: 1 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 1 - Input: Bosphorus 4K0xd0003900xd0003a53691215SE +/- 0.06, N = 3SE +/- 0.01, N = 310.4610.451. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.00, N = 3SE +/- 0.01, N = 313.0313.041. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.81571.63142.44713.26284.0785SE +/- 0.00896, N = 3SE +/- 0.00622, N = 33.625263.62266MIN: 3.54MIN: 3.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Cpuminer-Opt

Algorithm: Triple SHA-256, Onecoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Triple SHA-256, Onecoin0xd0003900xd0003a5300K600K900K1200K1500KSE +/- 7105.40, N = 3SE +/- 7235.75, N = 3133223713331171. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.mesh0xd0003900xd0003a580160240320400SE +/- 0.23, N = 3SE +/- 0.80, N = 3385.89386.081. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Cpuminer-Opt

Algorithm: Magi

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Magi0xd0003900xd0003a55001000150020002500SE +/- 3.91, N = 3SE +/- 1.08, N = 32309.472308.661. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.46570.93141.39711.86282.3285SE +/- 0.00038, N = 3SE +/- 0.00191, N = 32.069362.06967MIN: 2.03MIN: 2.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Cpuminer-Opt

Algorithm: x25x

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: x25x0xd0003900xd0003a56001200180024003000SE +/- 4.75, N = 3SE +/- 5.54, N = 32659.172659.551. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only0xd0003900xd0003a50.32850.6570.98551.3141.6425SE +/- 0.00, N = 3SE +/- 0.00, N = 31.461.46

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only0xd0003900xd0003a50.68181.36362.04542.72723.409SE +/- 0.00, N = 3SE +/- 0.00, N = 33.033.03

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003900xd0003a50.2610.5220.7831.0441.305SE +/- 0.00, N = 3SE +/- 0.00, N = 31.161.16MIN: 0.88 / MAX: 12.61MIN: 0.86 / MAX: 17.981. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003900xd0003a50.29930.59860.89791.19721.4965SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.33MIN: 0.97 / MAX: 13MIN: 0.97 / MAX: 13.431. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

NCNN

Target: CPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: FastestDet0xd0003900xd0003a53691215SE +/- 0.64, N = 3SE +/- 0.07, N = 310.209.71MIN: 9.01 / MAX: 500.17MIN: 9.22 / MAX: 27.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: regnety_400m0xd0003900xd0003a51020304050SE +/- 7.69, N = 3SE +/- 0.50, N = 345.5438.85MIN: 36.01 / MAX: 3343.68MIN: 37.13 / MAX: 233.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet500xd0003900xd0003a5510152025SE +/- 0.52, N = 3SE +/- 0.69, N = 317.1518.51MIN: 16.19 / MAX: 41.83MIN: 17.32 / MAX: 299.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003900xd0003a5246810SE +/- 0.00502, N = 3SE +/- 0.43315, N = 156.314286.719681. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003900xd0003a5306090120150SE +/- 0.13, N = 3SE +/- 10.25, N = 15158.36158.011. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a51.03972.07943.11914.15885.1985SE +/- 0.02854, N = 3SE +/- 0.04028, N = 84.524034.620851. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003900xd0003a5612182430SE +/- 0.30, N = 3SE +/- 0.26, N = 1525.5725.861. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003900xd0003a5306090120150SE +/- 1.29, N = 15SE +/- 3.29, N = 15110.32117.551. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003900xd0003a53691215SE +/- 0.10283, N = 15SE +/- 0.23491, N = 159.080678.599081. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a50.33980.67961.01941.35921.699SE +/- 0.00633, N = 3SE +/- 0.02722, N = 151.434071.510291. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a5150300450600750SE +/- 3.08, N = 3SE +/- 12.01, N = 15696.73664.541. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003900xd0003a51428425670SE +/- 0.28, N = 3SE +/- 1.14, N = 1259.8461.511. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003900xd0003a520406080100SE +/- 0.79, N = 7SE +/- 1.07, N = 1585.8089.831. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003900xd0003a51.24832.49663.74494.99326.2415SE +/- 0.00245, N = 3SE +/- 0.11956, N = 155.547835.280951. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003900xd0003a54080120160200SE +/- 0.08, N = 3SE +/- 4.37, N = 15180.16190.571. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

Cpuminer-Opt

Algorithm: Garlicoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Garlicoin0xd0003900xd0003a56K12K18K24K30KSE +/- 330.17, N = 3SE +/- 3833.17, N = 1229203.0022086.251. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a52004006008001000SE +/- 16.59, N = 15SE +/- 14.20, N = 12832.45816.51MIN: 714.88MIN: 723.441. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread


Phoronix Test Suite v10.8.5