Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2308099-NE-XEONPLATI49&rdt&grw.

Xeon Platinum 8380 AVX-512 WorkloadsProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen Resolution0xd0003900xd0003a52 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads)Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS)Intel Ice Lake IEH512GB7682GB INTEL SSDPF2KX076TZASPEEDVE2282 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFPUbuntu 22.106.5.0-060500rc4daily20230804-generic (x86_64)GNOME Shell 43.0X Server 1.21.1.31.3.224GCC 12.2.0ext41920x10806.5.0-rc5-phx-tues (x86_64)OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- 0xd000390: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000390- 0xd0003a5: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5Python Details- Python 3.10.7Security Details- 0xd000390: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - 0xd0003a5: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

Xeon Platinum 8380 AVX-512 Workloadsspecfem3d: Homogeneous Halfspaceminibude: OpenMP - BM1heffte: c2c - FFTW - double - 128heffte: c2c - FFTW - double - 256heffte: r2c - FFTW - float - 128heffte: r2c - FFTW - float - 256palabos: 500heffte: c2c - FFTW - float - 256laghos: Triple Point Problempalabos: 100libxsmm: 32libxsmm: 64libxsmm: 256libxsmm: 128minibude: OpenMP - BM1heffte: r2c - FFTW - double - 128laghos: Sedov Blast Wave, ube_922_hex.meshheffte: r2c - FFTW - double - 256heffte: c2c - FFTW - float - 128palabos: 400specfem3d: Water-layered Halfspaceminibude: OpenMP - BM2specfem3d: Tomographic Modelspecfem3d: Layered Halfspacespecfem3d: Mount St. Helensminibude: OpenMP - BM2mrbayes: Primate Phylogeny Analysistensorflow: CPU - 256 - AlexNettensorflow: CPU - 512 - AlexNetremhos: Sample Remap Exampletensorflow: CPU - 256 - GoogLeNettensorflow: CPU - 256 - ResNet-50tensorflow: CPU - 512 - GoogLeNettensorflow: CPU - 512 - ResNet-50cloverleaf: Lagrangian-Eulerian Hydrodynamicsdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: CPU - vision_transformerncnn: CPU - FastestDetgromacs: MPI CPU - water_GMX50_bareonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUqmcpack: Li2_STO_aeqmcpack: simple-H2Oqmcpack: FeCO6_b3lyp_gmsqmcpack: FeCO6_b3lyp_gmsincompact3d: input.i3d 193 Cells Per Directioncpuminer-opt: Magicpuminer-opt: x25xcpuminer-opt: scryptcpuminer-opt: Deepcoincpuminer-opt: Blake-2 Scpuminer-opt: Garlicoincpuminer-opt: Skeincoincpuminer-opt: Myriad-Groestlcpuminer-opt: LBC, LBRY Creditscpuminer-opt: Quad SHA-256, Pyritecpuminer-opt: Triple SHA-256, Onecoinvpxenc: Speed 5 - Bosphorus 4Kdav1d: Chimera 1080pdav1d: Summer Nature 4Ksvt-av1: Preset 8 - Bosphorus 4Ksvt-av1: Preset 12 - Bosphorus 4Ksvt-av1: Preset 13 - Bosphorus 4Ksvt-hevc: 1 - Bosphorus 4Ksvt-hevc: 7 - Bosphorus 4Ksvt-hevc: 10 - Bosphorus 4Kblender: BMW27 - CPU-Onlyblender: Fishy Cat - CPU-Onlyvvenc: Bosphorus 4K - Fastvvenc: Bosphorus 4K - Fasterembree: Pathtracer ISPC - Crownembree: Pathtracer ISPC - Asian Dragonoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyopenvkl: vklBenchmark ISPCospray: particle_volume/ao/real_timeospray: particle_volume/scivis/real_timeospray: particle_volume/pathtracer/real_timeospray: gravity_spheres_volume/dim_512/ao/real_timeospray: gravity_spheres_volume/dim_512/scivis/real_timesimdjson: Kostyasimdjson: TopTweetsimdjson: LargeRandsimdjson: PartialTweetssimdjson: DistinctUserIDonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standard0xd0003900xd0003a518.0222364702353.38993.090646.3516195.199224.417413.207101.977256.27312.195604.71098.8594.61941.194.136144.942385.8993.7474154.731388.47631.1469512382526.88714.57419202229.49798789813.148692362101.076166.528723.27760.1512.245309.6383.89317.2785.9712.0472.0674551.11952301.155417.3504922.930243.3033230.0525173.79451006.340739.69547633.25985.2210422.116794.612698.3394405.96621005.584339.7308426.949593.5399644.443062.009984.0181474.69171051.786837.9943307.8729129.809572.1788551.8180180.16311.660516.7110696.7259.0806739.1157221.020158.35515.468.038.889.897.5711.624.3515.3623.868.975.3917.1523.7115.3445.5446.9210.209.2342.592712.069363.903603.62526832.452524.38124.04827.5113.291490.7613.031517.691121.5617.8095.42209.314419.174.512344.9733.89251.0079.479396.528.502039.639.7759274.061.3367604.001.16124.2339.555147.51268.5611.02402782309.472659.172319.316467744623272920361333343127421660921730133223712.63515.81281.3667.170180.967175.10210.46138.75184.3823.8330.745.72210.36488.1941104.68443.031.4691224.747324.9506150.28121.076120.57802.615.600.854.625.525.5478385.798759.84151.43407110.32325.56874.524036.3142817.7516289722372.41893.920746.9833198.869226.783417.483102.920256.87312.530609.21177.1600.21978.994.897153.790386.0894.8614159.100393.84431.3817985282601.21214.14599954629.37360655112.948759795104.048165.419781.25839.4112.401321.7286.93323.7984.8411.9871.8868553.26822068.304719.3090930.750642.9405231.4277172.55791013.821539.41317092.50305.6210412.053196.942994.2573421.7544987.545240.4508398.7153100.1856620.254864.443983.2041478.8510869.039345.9597293.1241136.178071.5436555.0876190.57011.153916.3098664.5378.5990838.7185216.481158.01415.667.968.769.767.4111.714.5416.5825.719.425.4618.5124.1016.1038.8545.239.719.0942.574372.069673.913223.62266816.508521.74224.18823.0913.251496.1913.041519.841117.0917.8795.54209.024442.984.492362.8433.63255.0978.189419.778.482070.729.6359377.961.3367754.341.16123.2641.246178.19263.2311.00419682308.662659.552321.7464897446665322086.2561713043450423130926277133311712.33514.58280.8466.460177.721177.20310.45138.49182.7423.7230.905.70510.41583.4621101.09593.031.4685616.461116.3849136.85318.886218.46262.875.750.964.775.715.2809589.829961.50911.51029117.54525.86014.620856.71968OpenBenchmarking.org

SPECFEM3D

Model: Homogeneous Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Homogeneous Halfspace0xd0003900xd0003a548121620SE +/- 0.15, N = 3SE +/- 0.16, N = 318.0217.751. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003900xd0003a55001000150020002500SE +/- 3.60, N = 3SE +/- 10.90, N = 32353.392372.421. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003900xd0003a520406080100SE +/- 0.15, N = 3SE +/- 0.28, N = 393.0993.921. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003900xd0003a51122334455SE +/- 0.25, N = 3SE +/- 0.61, N = 346.3546.981. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003900xd0003a54080120160200SE +/- 0.92, N = 3SE +/- 1.08, N = 3195.20198.871. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003900xd0003a550100150200250SE +/- 2.25, N = 3SE +/- 2.85, N = 3224.42226.781. (CXX) g++ options: -O3

Palabos

Grid Size: 500

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 5000xd0003900xd0003a590180270360450SE +/- 0.42, N = 3SE +/- 0.72, N = 3413.21417.481. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 2560xd0003900xd0003a520406080100SE +/- 0.27, N = 3SE +/- 0.20, N = 3101.98102.921. (CXX) g++ options: -O3

Laghos

Test: Triple Point Problem

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Triple Point Problem0xd0003900xd0003a560120180240300SE +/- 0.32, N = 3SE +/- 0.94, N = 3256.27256.871. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

Palabos

Grid Size: 100

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 1000xd0003900xd0003a570140210280350SE +/- 0.89, N = 3SE +/- 0.20, N = 3312.20312.531. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 320xd0003900xd0003a5130260390520650SE +/- 8.71, N = 15SE +/- 4.92, N = 3604.7609.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 640xd0003900xd0003a530060090012001500SE +/- 8.60, N = 3SE +/- 13.09, N = 151098.81177.11. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 2560xd0003900xd0003a5130260390520650SE +/- 2.33, N = 3SE +/- 2.54, N = 3594.6600.21. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 1280xd0003900xd0003a5400800120016002000SE +/- 32.53, N = 7SE +/- 20.92, N = 31941.11978.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

miniBUDE

Implementation: OpenMP - Input Deck: BM1

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM10xd0003900xd0003a520406080100SE +/- 0.14, N = 3SE +/- 0.44, N = 394.1494.901. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 1280xd0003900xd0003a5306090120150SE +/- 1.81, N = 4SE +/- 1.71, N = 3144.94153.791. (CXX) g++ options: -O3

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

OpenBenchmarking.orgMajor Kernels Total Rate, More Is BetterLaghos 3.1Test: Sedov Blast Wave, ube_922_hex.mesh0xd0003900xd0003a580160240320400SE +/- 0.23, N = 3SE +/- 0.80, N = 3385.89386.081. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: r2c - Backend: FFTW - Precision: double - X Y Z: 2560xd0003900xd0003a520406080100SE +/- 1.09, N = 3SE +/- 1.13, N = 393.7594.861. (CXX) g++ options: -O3

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128

OpenBenchmarking.orgGFLOP/s, More Is BetterHeFFTe - Highly Efficient FFT for Exascale 2.3Test: c2c - Backend: FFTW - Precision: float - X Y Z: 1280xd0003900xd0003a54080120160200SE +/- 1.53, N = 5SE +/- 1.01, N = 3154.73159.101. (CXX) g++ options: -O3

Palabos

Grid Size: 400

OpenBenchmarking.orgMega Site Updates Per Second, More Is BetterPalabos 2.3Grid Size: 4000xd0003900xd0003a590180270360450SE +/- 0.94, N = 3SE +/- 0.06, N = 3388.48393.841. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm

SPECFEM3D

Model: Water-layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Water-layered Halfspace0xd0003900xd0003a5714212835SE +/- 0.24, N = 3SE +/- 0.31, N = 531.1531.381. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003900xd0003a56001200180024003000SE +/- 9.78, N = 3SE +/- 9.53, N = 32526.892601.211. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

SPECFEM3D

Model: Tomographic Model

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Tomographic Model0xd0003900xd0003a548121620SE +/- 0.02, N = 3SE +/- 0.15, N = 314.5714.151. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

SPECFEM3D

Model: Layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Layered Halfspace0xd0003900xd0003a5714212835SE +/- 0.05, N = 3SE +/- 0.18, N = 329.5029.371. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

SPECFEM3D

Model: Mount St. Helens

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.0Model: Mount St. Helens0xd0003900xd0003a53691215SE +/- 0.18, N = 3SE +/- 0.12, N = 313.1512.951. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM20xd0003900xd0003a520406080100SE +/- 0.39, N = 3SE +/- 0.38, N = 3101.08104.051. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

Timed MrBayes Analysis

Primate Phylogeny Analysis

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MrBayes Analysis 3.2.7Primate Phylogeny Analysis0xd0003900xd0003a54080120160200SE +/- 1.36, N = 3SE +/- 0.98, N = 3166.53165.421. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline

TensorFlow

Device: CPU - Batch Size: 256 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: AlexNet0xd0003900xd0003a52004006008001000SE +/- 2.99, N = 3SE +/- 4.27, N = 3723.27781.25

TensorFlow

Device: CPU - Batch Size: 512 - Model: AlexNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: AlexNet0xd0003900xd0003a52004006008001000SE +/- 3.77, N = 3SE +/- 2.61, N = 3760.15839.41

Remhos

Test: Sample Remap Example

OpenBenchmarking.orgSeconds, Fewer Is BetterRemhos 1.0Test: Sample Remap Example0xd0003900xd0003a53691215SE +/- 0.03, N = 3SE +/- 0.04, N = 312.2512.401. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi

TensorFlow

Device: CPU - Batch Size: 256 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: GoogLeNet0xd0003900xd0003a570140210280350SE +/- 1.89, N = 3SE +/- 2.10, N = 3309.63321.72

TensorFlow

Device: CPU - Batch Size: 256 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-500xd0003900xd0003a520406080100SE +/- 0.37, N = 3SE +/- 0.80, N = 983.8986.93

TensorFlow

Device: CPU - Batch Size: 512 - Model: GoogLeNet

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNet0xd0003900xd0003a570140210280350SE +/- 0.87, N = 3SE +/- 0.43, N = 3317.27323.79

TensorFlow

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: ResNet-500xd0003900xd0003a520406080100SE +/- 1.13, N = 3SE +/- 1.17, N = 385.9784.84

CloverLeaf

Lagrangian-Eulerian Hydrodynamics

OpenBenchmarking.orgSeconds, Fewer Is BetterCloverLeafLagrangian-Eulerian Hydrodynamics0xd0003900xd0003a53691215SE +/- 0.06, N = 3SE +/- 0.09, N = 312.0411.981. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51632486480SE +/- 0.09, N = 3SE +/- 0.10, N = 372.0771.89

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5120240360480600SE +/- 0.77, N = 3SE +/- 1.19, N = 3551.12553.27

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a55001000150020002500SE +/- 1.14, N = 3SE +/- 3.15, N = 32301.162068.30

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5510152025SE +/- 0.01, N = 3SE +/- 0.03, N = 317.3519.31

Neural Magic DeepSparse

Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 11.47, N = 3SE +/- 1.22, N = 3922.93930.75

Neural Magic DeepSparse

Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51020304050SE +/- 0.53, N = 3SE +/- 0.06, N = 343.3042.94

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a550100150200250SE +/- 0.46, N = 3SE +/- 2.37, N = 3230.05231.43

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a54080120160200SE +/- 0.34, N = 3SE +/- 1.64, N = 3173.79172.56

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 0.56, N = 3SE +/- 0.31, N = 31006.341013.82

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5918273645SE +/- 0.02, N = 3SE +/- 0.01, N = 339.7039.41

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a516003200480064008000SE +/- 8.14, N = 3SE +/- 2.44, N = 37633.267092.50

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51.26472.52943.79415.05886.3235SE +/- 0.0056, N = 3SE +/- 0.0018, N = 35.22105.6210

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.36, N = 3SE +/- 1.24, N = 3422.12412.05

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 0.29, N = 394.6196.94

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.12, N = 3SE +/- 0.51, N = 398.3494.26

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.42, N = 3SE +/- 2.15, N = 3405.97421.75

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 1.41, N = 3SE +/- 2.17, N = 31005.58987.55

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5918273645SE +/- 0.06, N = 3SE +/- 0.10, N = 339.7340.45

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a590180270360450SE +/- 0.21, N = 3SE +/- 4.97, N = 4426.95398.72

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 1.28, N = 493.54100.19

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5140280420560700SE +/- 2.45, N = 3SE +/- 7.47, N = 3644.44620.25

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51428425670SE +/- 0.23, N = 3SE +/- 0.78, N = 362.0164.44

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a520406080100SE +/- 0.32, N = 3SE +/- 0.18, N = 384.0283.20

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5100200300400500SE +/- 1.00, N = 3SE +/- 1.39, N = 3474.69478.85

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a52004006008001000SE +/- 1.88, N = 3SE +/- 1.83, N = 31051.79869.04

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51020304050SE +/- 0.07, N = 3SE +/- 0.10, N = 337.9945.96

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a570140210280350SE +/- 0.57, N = 3SE +/- 1.05, N = 3307.87293.12

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5306090120150SE +/- 0.25, N = 3SE +/- 0.60, N = 3129.81136.18

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a51632486480SE +/- 0.15, N = 3SE +/- 0.12, N = 372.1871.54

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.5Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream0xd0003900xd0003a5120240360480600SE +/- 0.62, N = 3SE +/- 1.12, N = 3551.82555.09

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003900xd0003a54080120160200SE +/- 0.08, N = 3SE +/- 4.37, N = 15180.16190.571. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003900xd0003a53691215SE +/- 0.11, N = 7SE +/- 0.13, N = 1511.6611.151. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003900xd0003a548121620SE +/- 0.08, N = 3SE +/- 0.26, N = 1216.7116.311. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a5150300450600750SE +/- 3.08, N = 3SE +/- 12.01, N = 15696.73664.541. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003900xd0003a53691215SE +/- 0.10283, N = 15SE +/- 0.23491, N = 159.080678.599081. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003900xd0003a5918273645SE +/- 0.46, N = 3SE +/- 0.38, N = 1539.1238.721. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a550100150200250SE +/- 1.39, N = 3SE +/- 1.91, N = 8221.02216.481. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003900xd0003a5306090120150SE +/- 0.13, N = 3SE +/- 10.25, N = 15158.36158.011. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mobilenet0xd0003900xd0003a548121620SE +/- 0.09, N = 3SE +/- 0.07, N = 315.4615.66MIN: 14.85 / MAX: 106.56MIN: 15.24 / MAX: 38.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v2-v2 - Model: mobilenet-v20xd0003900xd0003a5246810SE +/- 0.09, N = 3SE +/- 0.04, N = 38.037.96MIN: 7.8 / MAX: 31.34MIN: 7.75 / MAX: 31.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v3-v3 - Model: mobilenet-v30xd0003900xd0003a5246810SE +/- 0.05, N = 3SE +/- 0.12, N = 38.888.76MIN: 8.69 / MAX: 32.33MIN: 8.3 / MAX: 32.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: shufflenet-v20xd0003900xd0003a53691215SE +/- 0.10, N = 3SE +/- 0.13, N = 39.899.76MIN: 9.61 / MAX: 33.59MIN: 9.32 / MAX: 33.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mnasnet0xd0003900xd0003a5246810SE +/- 0.10, N = 3SE +/- 0.04, N = 37.577.41MIN: 7.28 / MAX: 30.26MIN: 7.15 / MAX: 31.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: efficientnet-b00xd0003900xd0003a53691215SE +/- 0.38, N = 3SE +/- 0.11, N = 311.6211.71MIN: 10.82 / MAX: 21.03MIN: 11.15 / MAX: 19.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: blazeface0xd0003900xd0003a51.02152.0433.06454.0865.1075SE +/- 0.06, N = 3SE +/- 0.07, N = 34.354.54MIN: 4.16 / MAX: 4.97MIN: 4.37 / MAX: 5.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: googlenet0xd0003900xd0003a548121620SE +/- 0.17, N = 3SE +/- 0.48, N = 315.3616.58MIN: 14.6 / MAX: 182.22MIN: 15.29 / MAX: 39.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vgg160xd0003900xd0003a5612182430SE +/- 0.25, N = 3SE +/- 0.19, N = 323.8625.71MIN: 23.05 / MAX: 47.41MIN: 24.88 / MAX: 62.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet180xd0003900xd0003a53691215SE +/- 0.14, N = 3SE +/- 0.09, N = 28.979.42MIN: 8.63 / MAX: 27.27MIN: 9.21 / MAX: 32.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: alexnet0xd0003900xd0003a51.22852.4573.68554.9146.1425SE +/- 0.16, N = 3SE +/- 0.15, N = 35.395.46MIN: 4.83 / MAX: 151.52MIN: 5.01 / MAX: 29.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet500xd0003900xd0003a5510152025SE +/- 0.52, N = 3SE +/- 0.69, N = 317.1518.51MIN: 16.19 / MAX: 41.83MIN: 17.32 / MAX: 299.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: yolov4-tiny0xd0003900xd0003a5612182430SE +/- 0.22, N = 3SE +/- 0.17, N = 323.7124.10MIN: 22.57 / MAX: 46.11MIN: 23.25 / MAX: 51.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: squeezenet_ssd0xd0003900xd0003a548121620SE +/- 0.12, N = 3SE +/- 0.25, N = 315.3416.10MIN: 14.63 / MAX: 39.65MIN: 15.43 / MAX: 48.021. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: regnety_400m0xd0003900xd0003a51020304050SE +/- 7.69, N = 3SE +/- 0.50, N = 345.5438.85MIN: 36.01 / MAX: 3343.68MIN: 37.13 / MAX: 233.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vision_transformer0xd0003900xd0003a51122334455SE +/- 1.17, N = 3SE +/- 0.31, N = 346.9245.23MIN: 43.11 / MAX: 881.49MIN: 43.43 / MAX: 73.391. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: FastestDet0xd0003900xd0003a53691215SE +/- 0.64, N = 3SE +/- 0.07, N = 310.209.71MIN: 9.01 / MAX: 500.17MIN: 9.22 / MAX: 27.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2023Implementation: MPI CPU - Input: water_GMX50_bare0xd0003900xd0003a53691215SE +/- 0.021, N = 3SE +/- 0.026, N = 39.2349.0941. (CXX) g++ options: -O3

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.58341.16681.75022.33362.917SE +/- 0.03265, N = 15SE +/- 0.02489, N = 152.592712.57437MIN: 1.91MIN: 1.991. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.46570.93141.39711.86282.3285SE +/- 0.00038, N = 3SE +/- 0.00191, N = 32.069362.06967MIN: 2.03MIN: 2.031. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.88051.7612.64153.5224.4025SE +/- 0.00202, N = 3SE +/- 0.00080, N = 33.903603.91322MIN: 3.68MIN: 3.691. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a50.81571.63142.44713.26284.0785SE +/- 0.00896, N = 3SE +/- 0.00622, N = 33.625263.62266MIN: 3.54MIN: 3.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a52004006008001000SE +/- 16.59, N = 15SE +/- 14.20, N = 12832.45816.51MIN: 714.88MIN: 723.441. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.1Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU0xd0003900xd0003a5110220330440550SE +/- 7.19, N = 3SE +/- 2.82, N = 3524.38521.74MIN: 499.75MIN: 505.721. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003900xd0003a5612182430SE +/- 0.01, N = 3SE +/- 0.02, N = 324.0424.181. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16 - Device: CPU0xd0003900xd0003a52004006008001000SE +/- 0.42, N = 3SE +/- 0.71, N = 3827.51823.09MIN: 628.21 / MAX: 980.48MIN: 550.41 / MAX: 926.211. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.02, N = 3SE +/- 0.01, N = 313.2913.251. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP16 - Device: CPU0xd0003900xd0003a530060090012001500SE +/- 2.35, N = 3SE +/- 0.81, N = 31490.761496.19MIN: 1074.22 / MAX: 1692.48MIN: 1043.7 / MAX: 1711.441. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.00, N = 3SE +/- 0.01, N = 313.0313.041. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Detection FP32 - Device: CPU0xd0003900xd0003a530060090012001500SE +/- 0.52, N = 3SE +/- 0.84, N = 31517.691519.84MIN: 1081.6 / MAX: 1690.6MIN: 1074.08 / MAX: 1721.591. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003900xd0003a52004006008001000SE +/- 0.88, N = 3SE +/- 1.14, N = 31121.561117.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16 - Device: CPU0xd0003900xd0003a548121620SE +/- 0.01, N = 3SE +/- 0.02, N = 317.8017.87MIN: 12.83 / MAX: 32.64MIN: 12.24 / MAX: 38.451. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003900xd0003a520406080100SE +/- 0.01, N = 3SE +/- 0.07, N = 395.4295.541. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Face Detection FP16-INT8 - Device: CPU0xd0003900xd0003a550100150200250SE +/- 0.03, N = 3SE +/- 0.14, N = 3209.31209.02MIN: 160.46 / MAX: 249.09MIN: 152.83 / MAX: 234.281. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003900xd0003a510002000300040005000SE +/- 3.29, N = 3SE +/- 2.64, N = 34419.174442.981. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Vehicle Detection FP16-INT8 - Device: CPU0xd0003900xd0003a51.01482.02963.04444.05925.074SE +/- 0.00, N = 3SE +/- 0.00, N = 34.514.49MIN: 4.02 / MAX: 13.95MIN: 4.04 / MAX: 15.851. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003900xd0003a55001000150020002500SE +/- 3.01, N = 3SE +/- 3.91, N = 32344.972362.841. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16 - Device: CPU0xd0003900xd0003a5816243240SE +/- 0.05, N = 3SE +/- 0.06, N = 333.8933.63MIN: 29.83 / MAX: 113.13MIN: 29.54 / MAX: 113.551. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003900xd0003a560120180240300SE +/- 0.68, N = 3SE +/- 0.63, N = 3251.00255.091. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Machine Translation EN To DE FP16 - Device: CPU0xd0003900xd0003a520406080100SE +/- 0.22, N = 3SE +/- 0.19, N = 379.4778.18MIN: 62.57 / MAX: 232.01MIN: 66.08 / MAX: 194.381. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003900xd0003a52K4K6K8K10KSE +/- 3.74, N = 3SE +/- 2.19, N = 39396.529419.771. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Weld Porosity Detection FP16-INT8 - Device: CPU0xd0003900xd0003a5246810SE +/- 0.00, N = 3SE +/- 0.00, N = 38.508.48MIN: 7.17 / MAX: 18.39MIN: 7.15 / MAX: 22.531. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003900xd0003a5400800120016002000SE +/- 1.77, N = 3SE +/- 3.75, N = 32039.632070.721. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Person Vehicle Bike Detection FP16 - Device: CPU0xd0003900xd0003a53691215SE +/- 0.01, N = 3SE +/- 0.02, N = 39.779.63MIN: 7.83 / MAX: 20.01MIN: 8.33 / MAX: 19.331. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003900xd0003a513K26K39K52K65KSE +/- 17.38, N = 3SE +/- 18.68, N = 359274.0659377.961. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU0xd0003900xd0003a50.29930.59860.89791.19721.4965SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.33MIN: 0.97 / MAX: 13MIN: 0.97 / MAX: 13.431. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003900xd0003a515K30K45K60K75KSE +/- 17.48, N = 3SE +/- 29.72, N = 367604.0067754.341. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.3Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU0xd0003900xd0003a50.2610.5220.7831.0441.305SE +/- 0.00, N = 3SE +/- 0.00, N = 31.161.16MIN: 0.88 / MAX: 12.61MIN: 0.86 / MAX: 17.981. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF

QMCPACK

Input: Li2_STO_ae

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: Li2_STO_ae0xd0003900xd0003a5306090120150SE +/- 1.55, N = 3SE +/- 1.03, N = 3124.23123.261. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

QMCPACK

Input: simple-H2O

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: simple-H2O0xd0003900xd0003a5918273645SE +/- 0.12, N = 3SE +/- 0.02, N = 339.5641.251. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

QMCPACK

Input: FeCO6_b3lyp_gms

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003900xd0003a54080120160200SE +/- 0.12, N = 3SE +/- 0.31, N = 3147.51178.191. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

QMCPACK

Input: FeCO6_b3lyp_gms

OpenBenchmarking.orgTotal Execution Time - Seconds, Fewer Is BetterQMCPACK 3.16Input: FeCO6_b3lyp_gms0xd0003900xd0003a560120180240300SE +/- 3.60, N = 3SE +/- 2.19, N = 3268.56263.231. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per Direction0xd0003900xd0003a53691215SE +/- 0.02, N = 3SE +/- 0.02, N = 311.0211.001. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Cpuminer-Opt

Algorithm: Magi

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Magi0xd0003900xd0003a55001000150020002500SE +/- 3.91, N = 3SE +/- 1.08, N = 32309.472308.661. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: x25x

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: x25x0xd0003900xd0003a56001200180024003000SE +/- 4.75, N = 3SE +/- 5.54, N = 32659.172659.551. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: scrypt

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: scrypt0xd0003900xd0003a55001000150020002500SE +/- 1.08, N = 3SE +/- 8.35, N = 32319.312321.741. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Deepcoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Deepcoin0xd0003900xd0003a514K28K42K56K70KSE +/- 89.69, N = 3SE +/- 187.02, N = 364677648971. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Blake-2 S

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Blake-2 S0xd0003900xd0003a51000K2000K3000K4000K5000KSE +/- 8325.80, N = 3SE +/- 8676.85, N = 3446232744666531. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Garlicoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Garlicoin0xd0003900xd0003a56K12K18K24K30KSE +/- 330.17, N = 3SE +/- 3833.17, N = 1229203.0022086.251. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Skeincoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Skeincoin0xd0003900xd0003a5130K260K390K520K650KSE +/- 1652.89, N = 3SE +/- 3788.20, N = 36133336171301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Myriad-Groestl

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Myriad-Groestl0xd0003900xd0003a59K18K27K36K45KSE +/- 386.00, N = 15SE +/- 406.32, N = 343127434501. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: LBC, LBRY Credits

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: LBC, LBRY Credits0xd0003900xd0003a590K180K270K360K450KSE +/- 313.42, N = 3SE +/- 860.95, N = 34216604231301. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Quad SHA-256, Pyrite

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Quad SHA-256, Pyrite0xd0003900xd0003a5200K400K600K800K1000KSE +/- 3352.41, N = 3SE +/- 1690.96, N = 39217309262771. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

Cpuminer-Opt

Algorithm: Triple SHA-256, Onecoin

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Triple SHA-256, Onecoin0xd0003900xd0003a5300K600K900K1200K1500KSE +/- 7105.40, N = 3SE +/- 7235.75, N = 3133223713331171. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp

VP9 libvpx Encoding

Speed: Speed 5 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.13Speed: Speed 5 - Input: Bosphorus 4K0xd0003900xd0003a53691215SE +/- 0.12, N = 3SE +/- 0.13, N = 312.6312.331. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11

dav1d

Video Input: Chimera 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Chimera 1080p0xd0003900xd0003a5110220330440550SE +/- 0.80, N = 3SE +/- 0.51, N = 3515.81514.581. (CC) gcc options: -pthread -lm

dav1d

Video Input: Summer Nature 4K

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.2.1Video Input: Summer Nature 4K0xd0003900xd0003a560120180240300SE +/- 1.06, N = 3SE +/- 0.81, N = 3281.36280.841. (CC) gcc options: -pthread -lm

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 8 - Input: Bosphorus 4K0xd0003900xd0003a51530456075SE +/- 0.47, N = 3SE +/- 0.46, N = 367.1766.461. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 12 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 12 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 1.29, N = 3SE +/- 1.20, N = 3180.97177.721. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.6Encoder Mode: Preset 13 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 0.79, N = 3SE +/- 2.02, N = 3175.10177.201. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-HEVC

Tuning: 1 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 1 - Input: Bosphorus 4K0xd0003900xd0003a53691215SE +/- 0.06, N = 3SE +/- 0.01, N = 310.4610.451. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

SVT-HEVC

Tuning: 7 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 4K0xd0003900xd0003a5306090120150SE +/- 0.60, N = 3SE +/- 0.44, N = 3138.75138.491. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

SVT-HEVC

Tuning: 10 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 4K0xd0003900xd0003a54080120160200SE +/- 2.03, N = 3SE +/- 0.38, N = 3184.38182.741. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: BMW27 - Compute: CPU-Only0xd0003900xd0003a5612182430SE +/- 0.06, N = 3SE +/- 0.03, N = 323.8323.72

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.6Blend File: Fishy Cat - Compute: CPU-Only0xd0003900xd0003a5714212835SE +/- 0.06, N = 3SE +/- 0.02, N = 330.7430.90

VVenC

Video Input: Bosphorus 4K - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Fast0xd0003900xd0003a51.28752.5753.86255.156.4375SE +/- 0.033, N = 3SE +/- 0.029, N = 35.7225.7051. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

VVenC

Video Input: Bosphorus 4K - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.9Video Input: Bosphorus 4K - Video Preset: Faster0xd0003900xd0003a53691215SE +/- 0.09, N = 3SE +/- 0.06, N = 310.3610.421. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Embree

Binary: Pathtracer ISPC - Model: Crown

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Crown0xd0003900xd0003a520406080100SE +/- 0.07, N = 3SE +/- 0.32, N = 388.1983.46MIN: 85.24 / MAX: 92.72MIN: 80.27 / MAX: 87.69

Embree

Binary: Pathtracer ISPC - Model: Asian Dragon

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 4.1Binary: Pathtracer ISPC - Model: Asian Dragon0xd0003900xd0003a520406080100SE +/- 0.43, N = 3SE +/- 0.19, N = 3104.68101.10MIN: 101.9 / MAX: 109.48MIN: 98.59 / MAX: 105.53

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only0xd0003900xd0003a50.68181.36362.04542.72723.409SE +/- 0.00, N = 3SE +/- 0.00, N = 33.033.03

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.0Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only0xd0003900xd0003a50.32850.6570.98551.3141.6425SE +/- 0.00, N = 3SE +/- 0.00, N = 31.461.46

OpenVKL

Benchmark: vklBenchmark ISPC

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPC0xd0003900xd0003a52004006008001000SE +/- 1.53, N = 3SE +/- 0.88, N = 3912856MIN: 140 / MAX: 7236MIN: 137 / MAX: 7211

OSPRay

Benchmark: particle_volume/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/ao/real_time0xd0003900xd0003a5612182430SE +/- 0.11, N = 3SE +/- 0.01, N = 324.7516.46

OSPRay

Benchmark: particle_volume/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/scivis/real_time0xd0003900xd0003a5612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 324.9516.38

OSPRay

Benchmark: particle_volume/pathtracer/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: particle_volume/pathtracer/real_time0xd0003900xd0003a5306090120150SE +/- 0.54, N = 3SE +/- 0.45, N = 3150.28136.85

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/ao/real_time0xd0003900xd0003a5510152025SE +/- 0.05, N = 3SE +/- 0.02, N = 321.0818.89

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.12Benchmark: gravity_spheres_volume/dim_512/scivis/real_time0xd0003900xd0003a5510152025SE +/- 0.08, N = 3SE +/- 0.03, N = 320.5818.46

simdjson

Throughput Test: Kostya

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: Kostya0xd0003900xd0003a50.64581.29161.93742.58323.229SE +/- 0.00, N = 3SE +/- 0.00, N = 32.612.871. (CXX) g++ options: -O3

simdjson

Throughput Test: TopTweet

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: TopTweet0xd0003900xd0003a51.29382.58763.88145.17526.469SE +/- 0.03, N = 3SE +/- 0.01, N = 35.605.751. (CXX) g++ options: -O3

simdjson

Throughput Test: LargeRandom

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: LargeRandom0xd0003900xd0003a50.2160.4320.6480.8641.08SE +/- 0.00, N = 3SE +/- 0.00, N = 30.850.961. (CXX) g++ options: -O3

simdjson

Throughput Test: PartialTweets

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: PartialTweets0xd0003900xd0003a51.07332.14663.21994.29325.3665SE +/- 0.01, N = 3SE +/- 0.01, N = 34.624.771. (CXX) g++ options: -O3

simdjson

Throughput Test: DistinctUserID

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: DistinctUserID0xd0003900xd0003a51.28482.56963.85445.13926.424SE +/- 0.02, N = 3SE +/- 0.00, N = 35.525.711. (CXX) g++ options: -O3

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: GPT-2 - Device: CPU - Executor: Standard0xd0003900xd0003a51.24832.49663.74494.99326.2415SE +/- 0.00245, N = 3SE +/- 0.11956, N = 155.547835.280951. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: yolov4 - Device: CPU - Executor: Standard0xd0003900xd0003a520406080100SE +/- 0.79, N = 7SE +/- 1.07, N = 1585.8089.831. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: bertsquad-12 - Device: CPU - Executor: Standard0xd0003900xd0003a51428425670SE +/- 0.28, N = 3SE +/- 1.14, N = 1259.8461.511. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a50.33980.67961.01941.35921.699SE +/- 0.00633, N = 3SE +/- 0.02722, N = 151.434071.510291. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: fcn-resnet101-11 - Device: CPU - Executor: Standard0xd0003900xd0003a5306090120150SE +/- 1.29, N = 15SE +/- 3.29, N = 15110.32117.551. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard0xd0003900xd0003a5612182430SE +/- 0.30, N = 3SE +/- 0.26, N = 1525.5725.861. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard0xd0003900xd0003a51.03972.07943.11914.15885.1985SE +/- 0.02854, N = 3SE +/- 0.04028, N = 84.524034.620851. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.14Model: super-resolution-10 - Device: CPU - Executor: Standard0xd0003900xd0003a5246810SE +/- 0.00502, N = 3SE +/- 0.43315, N = 156.314286.719681. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt


Phoronix Test Suite v10.8.5