Xeon Platinum 8380 AVX-512 Workloads Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2308099-NE-XEONPLATI49&grs&rdt .
Xeon Platinum 8380 AVX-512 Workloads Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Vulkan Compiler File-System Screen Resolution 0xd000390 0xd0003a5 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Ice Lake IEH 512GB 7682GB INTEL SSDPF2KX076TZ ASPEED VE228 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Ubuntu 22.10 6.5.0-060500rc4daily20230804-generic (x86_64) GNOME Shell 43.0 X Server 1.21.1.3 1.3.224 GCC 12.2.0 ext4 1920x1080 6.5.0-rc5-phx-tues (x86_64) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - 0xd000390: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000390 - 0xd0003a5: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5 Python Details - Python 3.10.7 Security Details - 0xd000390: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - 0xd0003a5: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
Xeon Platinum 8380 AVX-512 Workloads ospray: particle_volume/scivis/real_time ospray: particle_volume/ao/real_time deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream qmcpack: FeCO6_b3lyp_gms simdjson: LargeRand ospray: gravity_spheres_volume/dim_512/ao/real_time ospray: gravity_spheres_volume/dim_512/scivis/real_time deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream tensorflow: CPU - 512 - AlexNet simdjson: Kostya ospray: particle_volume/pathtracer/real_time tensorflow: CPU - 256 - AlexNet ncnn: CPU - googlenet ncnn: CPU - vgg16 deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream libxsmm: 64 deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream openvkl: vklBenchmark ISPC heffte: r2c - FFTW - double - 128 embree: Pathtracer ISPC - Crown deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream ncnn: CPU - resnet18 ncnn: CPU - squeezenet_ssd deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream onnx: yolov4 - CPU - Standard ncnn: CPU - blazeface deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream qmcpack: simple-H2O deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream tensorflow: CPU - 256 - GoogLeNet deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream ncnn: CPU - vision_transformer tensorflow: CPU - 256 - ResNet-50 embree: Pathtracer ISPC - Asian Dragon simdjson: DistinctUserID simdjson: PartialTweets specfem3d: Tomographic Model minibude: OpenMP - BM2 minibude: OpenMP - BM2 heffte: c2c - FFTW - float - 128 simdjson: TopTweet deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream onnx: bertsquad-12 - CPU - Standard deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream vpxenc: Speed 5 - Bosphorus 4K ncnn: CPU - mnasnet onnx: ResNet50 v1-12-int8 - CPU - Standard tensorflow: CPU - 512 - GoogLeNet qmcpack: FeCO6_b3lyp_gms libxsmm: 128 heffte: r2c - FFTW - float - 128 deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream svt-av1: Preset 12 - Bosphorus 4K deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream openvino: Machine Translation EN To DE FP16 - CPU ncnn: CPU - yolov4-tiny openvino: Machine Translation EN To DE FP16 - CPU specfem3d: Mount St. Helens gromacs: MPI CPU - water_GMX50_bare specfem3d: Homogeneous Halfspace openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU palabos: 400 ncnn: CPU-v3-v3 - mobilenet-v3 heffte: c2c - FFTW - double - 256 ncnn: CPU - shufflenet-v2 tensorflow: CPU - 512 - ResNet-50 ncnn: CPU - alexnet ncnn: CPU - mobilenet remhos: Sample Remap Example svt-av1: Preset 13 - Bosphorus 4K heffte: r2c - FFTW - double - 256 svt-av1: Preset 8 - Bosphorus 4K heffte: r2c - FFTW - float - 256 palabos: 500 onnx: ArcFace ResNet-100 - CPU - Standard deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream libxsmm: 256 heffte: c2c - FFTW - float - 256 svt-hevc: 10 - Bosphorus 4K heffte: c2c - FFTW - double - 128 deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream ncnn: CPU-v2-v2 - mobilenet-v2 deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream minibude: OpenMP - BM1 minibude: OpenMP - BM1 qmcpack: Li2_STO_ae ncnn: CPU - efficientnet-b0 openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU specfem3d: Water-layered Halfspace cpuminer-opt: Myriad-Groestl libxsmm: 32 deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream onednn: IP Shapes 3D - bf16bf16bf16 - CPU mrbayes: Primate Phylogeny Analysis cpuminer-opt: Skeincoin deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream openvino: Face Detection FP16 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Face Detection FP16 - CPU blender: Fishy Cat - CPU-Only onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU cloverleaf: Lagrangian-Eulerian Hydrodynamics cpuminer-opt: Quad SHA-256, Pyrite vvenc: Bosphorus 4K - Faster blender: BMW27 - CPU-Only openvino: Vehicle Detection FP16-INT8 - CPU specfem3d: Layered Halfspace openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream openvino: Person Detection FP16 - CPU cpuminer-opt: LBC, LBRY Credits cpuminer-opt: Deepcoin openvino: Person Detection FP16 - CPU vvenc: Bosphorus 4K - Fast deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream openvino: Weld Porosity Detection FP16-INT8 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU dav1d: Chimera 1080p openvino: Weld Porosity Detection FP16-INT8 - CPU laghos: Triple Point Problem openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU svt-hevc: 7 - Bosphorus 4K dav1d: Summer Nature 4K incompact3d: input.i3d 193 Cells Per Direction openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU palabos: 100 cpuminer-opt: scrypt cpuminer-opt: Blake-2 S svt-hevc: 1 - Bosphorus 4K openvino: Person Detection FP32 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU cpuminer-opt: Triple SHA-256, Onecoin laghos: Sedov Blast Wave, ube_922_hex.mesh cpuminer-opt: Magi onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU cpuminer-opt: x25x oidn: RTLightmap.hdr.4096x4096 - CPU-Only oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU ncnn: CPU - FastestDet ncnn: CPU - regnety_400m ncnn: CPU - resnet50 onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard cpuminer-opt: Garlicoin onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU 0xd000390 0xd0003a5 24.9506 24.7473 1051.7868 37.9943 147.51 0.85 21.0761 20.5780 17.3504 2301.1554 760.15 2.61 150.281 723.27 15.36 23.86 5.2210 7633.2598 1098.8 93.5399 426.9495 912 144.942 88.1941 307.8729 8.97 15.34 129.8095 11.6605 4.35 98.3394 39.555 62.0099 309.63 644.4430 405.9662 46.92 83.89 104.6844 5.52 4.62 14.574192022 2526.887 101.076 154.731 5.60 94.6126 16.7110 422.1167 12.63 7.57 221.020 317.27 268.56 1941.1 195.199 1005.5843 180.967 39.7308 79.47 23.71 251.00 13.148692362 9.234 18.022236470 2039.63 9.77 388.476 8.88 46.3516 9.89 85.97 5.39 15.46 12.245 175.102 93.7474 67.170 224.417 413.207 39.1157 84.0181 594.6 101.977 184.38 93.0906 72.1788 8.03 474.6917 922.9302 43.3033 2353.389 94.136 124.23 11.62 33.89 2344.97 31.146951238 43127 604.7 1006.3407 173.7945 39.6954 2.59271 166.528 613333 230.0525 551.8180 24.04 4419.17 827.51 30.74 524.381 12.04 921730 10.364 23.83 4.51 29.497987898 1121.56 17.80 551.1195 1490.76 421660 64677 13.29 5.722 72.0674 9396.52 3.90360 515.81 8.50 256.27 67604.00 138.75 281.36 11.0240278 59274.06 1517.69 209.31 95.42 312.195 2319.31 4462327 10.46 13.03 3.62526 1332237 385.89 2309.47 2.06936 2659.17 1.46 3.03 1.16 1.33 10.20 45.54 17.15 6.31428 158.355 4.52403 25.5687 110.323 9.08067 1.43407 696.725 59.8415 85.7987 5.54783 180.163 29203 832.452 16.3849 16.4611 869.0393 45.9597 178.19 0.96 18.8862 18.4626 19.3090 2068.3047 839.41 2.87 136.853 781.25 16.58 25.71 5.6210 7092.5030 1177.1 100.1856 398.7153 856 153.790 83.4621 293.1241 9.42 16.10 136.1780 11.1539 4.54 94.2573 41.246 64.4439 321.72 620.2548 421.7544 45.23 86.93 101.0959 5.71 4.77 14.145999546 2601.212 104.048 159.100 5.75 96.9429 16.3098 412.0531 12.33 7.41 216.481 323.79 263.23 1978.9 198.869 987.5452 177.721 40.4508 78.18 24.10 255.09 12.948759795 9.094 17.751628972 2070.72 9.63 393.844 8.76 46.9833 9.76 84.84 5.46 15.66 12.401 177.203 94.8614 66.460 226.783 417.483 38.7185 83.2041 600.2 102.920 182.74 93.9207 71.5436 7.96 478.8510 930.7506 42.9405 2372.418 94.897 123.26 11.71 33.63 2362.84 31.381798528 43450 609.2 1013.8215 172.5579 39.4131 2.57437 165.419 617130 231.4277 555.0876 24.18 4442.98 823.09 30.90 521.742 11.98 926277 10.415 23.72 4.49 29.373606551 1117.09 17.87 553.2682 1496.19 423130 64897 13.25 5.705 71.8868 9419.77 3.91322 514.58 8.48 256.87 67754.34 138.49 280.84 11.0041968 59377.96 1519.84 209.02 95.54 312.530 2321.74 4466653 10.45 13.04 3.62266 1333117 386.08 2308.66 2.06967 2659.55 1.46 3.03 1.16 1.33 9.71 38.85 18.51 6.71968 158.014 4.62085 25.8601 117.545 8.59908 1.51029 664.537 61.5091 89.8299 5.28095 190.570 22086.25 816.508 OpenBenchmarking.org
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/scivis/real_time 0xd000390 0xd0003a5 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 24.95 16.38
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/ao/real_time 0xd000390 0xd0003a5 6 12 18 24 30 SE +/- 0.11, N = 3 SE +/- 0.01, N = 3 24.75 16.46
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 1.88, N = 3 SE +/- 1.83, N = 3 1051.79 869.04
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 10 20 30 40 50 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 37.99 45.96
QMCPACK Input: FeCO6_b3lyp_gms OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: FeCO6_b3lyp_gms 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 0.12, N = 3 SE +/- 0.31, N = 3 147.51 178.19 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: LargeRandom 0xd000390 0xd0003a5 0.216 0.432 0.648 0.864 1.08 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.85 0.96 1. (CXX) g++ options: -O3
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time 0xd000390 0xd0003a5 5 10 15 20 25 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 21.08 18.89
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time 0xd000390 0xd0003a5 5 10 15 20 25 SE +/- 0.08, N = 3 SE +/- 0.03, N = 3 20.58 18.46
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 17.35 19.31
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 500 1000 1500 2000 2500 SE +/- 1.14, N = 3 SE +/- 3.15, N = 3 2301.16 2068.30
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: AlexNet 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 3.77, N = 3 SE +/- 2.61, N = 3 760.15 839.41
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: Kostya 0xd000390 0xd0003a5 0.6458 1.2916 1.9374 2.5832 3.229 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 2.61 2.87 1. (CXX) g++ options: -O3
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time 0xd000390 0xd0003a5 30 60 90 120 150 SE +/- 0.54, N = 3 SE +/- 0.45, N = 3 150.28 136.85
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: AlexNet 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 2.99, N = 3 SE +/- 4.27, N = 3 723.27 781.25
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet 0xd000390 0xd0003a5 4 8 12 16 20 SE +/- 0.17, N = 3 SE +/- 0.48, N = 3 15.36 16.58 MIN: 14.6 / MAX: 182.22 MIN: 15.29 / MAX: 39.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 0xd000390 0xd0003a5 6 12 18 24 30 SE +/- 0.25, N = 3 SE +/- 0.19, N = 3 23.86 25.71 MIN: 23.05 / MAX: 47.41 MIN: 24.88 / MAX: 62.96 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 1.2647 2.5294 3.7941 5.0588 6.3235 SE +/- 0.0056, N = 3 SE +/- 0.0018, N = 3 5.2210 5.6210
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 1600 3200 4800 6400 8000 SE +/- 8.14, N = 3 SE +/- 2.44, N = 3 7633.26 7092.50
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 0xd000390 0xd0003a5 300 600 900 1200 1500 SE +/- 8.60, N = 3 SE +/- 13.09, N = 15 1098.8 1177.1 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 1.28, N = 4 93.54 100.19
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 90 180 270 360 450 SE +/- 0.21, N = 3 SE +/- 4.97, N = 4 426.95 398.72
OpenVKL Benchmark: vklBenchmark ISPC OpenBenchmarking.org Items / Sec, More Is Better OpenVKL 1.3.1 Benchmark: vklBenchmark ISPC 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 1.53, N = 3 SE +/- 0.88, N = 3 912 856 MIN: 140 / MAX: 7236 MIN: 137 / MAX: 7211
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128 0xd000390 0xd0003a5 30 60 90 120 150 SE +/- 1.81, N = 4 SE +/- 1.71, N = 3 144.94 153.79 1. (CXX) g++ options: -O3
Embree Binary: Pathtracer ISPC - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Crown 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.32, N = 3 88.19 83.46 MIN: 85.24 / MAX: 92.72 MIN: 80.27 / MAX: 87.69
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 70 140 210 280 350 SE +/- 0.57, N = 3 SE +/- 1.05, N = 3 307.87 293.12
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.14, N = 3 SE +/- 0.09, N = 2 8.97 9.42 MIN: 8.63 / MAX: 27.27 MIN: 9.21 / MAX: 32.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd 0xd000390 0xd0003a5 4 8 12 16 20 SE +/- 0.12, N = 3 SE +/- 0.25, N = 3 15.34 16.10 MIN: 14.63 / MAX: 39.65 MIN: 15.43 / MAX: 48.02 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 30 60 90 120 150 SE +/- 0.25, N = 3 SE +/- 0.60, N = 3 129.81 136.18
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.11, N = 7 SE +/- 0.13, N = 15 11.66 11.15 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface 0xd000390 0xd0003a5 1.0215 2.043 3.0645 4.086 5.1075 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 4.35 4.54 MIN: 4.16 / MAX: 4.97 MIN: 4.37 / MAX: 5.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.12, N = 3 SE +/- 0.51, N = 3 98.34 94.26
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: simple-H2O 0xd000390 0xd0003a5 9 18 27 36 45 SE +/- 0.12, N = 3 SE +/- 0.02, N = 3 39.56 41.25 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 14 28 42 56 70 SE +/- 0.23, N = 3 SE +/- 0.78, N = 3 62.01 64.44
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: GoogLeNet 0xd000390 0xd0003a5 70 140 210 280 350 SE +/- 1.89, N = 3 SE +/- 2.10, N = 3 309.63 321.72
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 140 280 420 560 700 SE +/- 2.45, N = 3 SE +/- 7.47, N = 3 644.44 620.25
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 90 180 270 360 450 SE +/- 0.42, N = 3 SE +/- 2.15, N = 3 405.97 421.75
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer 0xd000390 0xd0003a5 11 22 33 44 55 SE +/- 1.17, N = 3 SE +/- 0.31, N = 3 46.92 45.23 MIN: 43.11 / MAX: 881.49 MIN: 43.43 / MAX: 73.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
TensorFlow Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: ResNet-50 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.37, N = 3 SE +/- 0.80, N = 9 83.89 86.93
Embree Binary: Pathtracer ISPC - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Asian Dragon 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.43, N = 3 SE +/- 0.19, N = 3 104.68 101.10 MIN: 101.9 / MAX: 109.48 MIN: 98.59 / MAX: 105.53
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: DistinctUserID 0xd000390 0xd0003a5 1.2848 2.5696 3.8544 5.1392 6.424 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 5.52 5.71 1. (CXX) g++ options: -O3
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: PartialTweets 0xd000390 0xd0003a5 1.0733 2.1466 3.2199 4.2932 5.3665 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 4.62 4.77 1. (CXX) g++ options: -O3
SPECFEM3D Model: Tomographic Model OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Tomographic Model 0xd000390 0xd0003a5 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.15, N = 3 14.57 14.15 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
miniBUDE Implementation: OpenMP - Input Deck: BM2 OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 0xd000390 0xd0003a5 600 1200 1800 2400 3000 SE +/- 9.78, N = 3 SE +/- 9.53, N = 3 2526.89 2601.21 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
miniBUDE Implementation: OpenMP - Input Deck: BM2 OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM2 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.39, N = 3 SE +/- 0.38, N = 3 101.08 104.05 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 1.53, N = 5 SE +/- 1.01, N = 3 154.73 159.10 1. (CXX) g++ options: -O3
simdjson Throughput Test: TopTweet OpenBenchmarking.org GB/s, More Is Better simdjson 2.0 Throughput Test: TopTweet 0xd000390 0xd0003a5 1.2938 2.5876 3.8814 5.1752 6.469 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 5.60 5.75 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.29, N = 3 94.61 96.94
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 4 8 12 16 20 SE +/- 0.08, N = 3 SE +/- 0.26, N = 12 16.71 16.31 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 90 180 270 360 450 SE +/- 0.36, N = 3 SE +/- 1.24, N = 3 422.12 412.05
VP9 libvpx Encoding Speed: Speed 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better VP9 libvpx Encoding 1.13 Speed: Speed 5 - Input: Bosphorus 4K 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.12, N = 3 SE +/- 0.13, N = 3 12.63 12.33 1. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet 0xd000390 0xd0003a5 2 4 6 8 10 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 7.57 7.41 MIN: 7.28 / MAX: 30.26 MIN: 7.15 / MAX: 31.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 50 100 150 200 250 SE +/- 1.39, N = 3 SE +/- 1.91, N = 8 221.02 216.48 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
TensorFlow Device: CPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: GoogLeNet 0xd000390 0xd0003a5 70 140 210 280 350 SE +/- 0.87, N = 3 SE +/- 0.43, N = 3 317.27 323.79
QMCPACK Input: FeCO6_b3lyp_gms OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: FeCO6_b3lyp_gms 0xd000390 0xd0003a5 60 120 180 240 300 SE +/- 3.60, N = 3 SE +/- 2.19, N = 3 268.56 263.23 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 0xd000390 0xd0003a5 400 800 1200 1600 2000 SE +/- 32.53, N = 7 SE +/- 20.92, N = 3 1941.1 1978.9 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 0.92, N = 3 SE +/- 1.08, N = 3 195.20 198.87 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 1.41, N = 3 SE +/- 2.17, N = 3 1005.58 987.55
SVT-AV1 Encoder Mode: Preset 12 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 12 - Input: Bosphorus 4K 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 1.29, N = 3 SE +/- 1.20, N = 3 180.97 177.72 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 9 18 27 36 45 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 39.73 40.45
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.22, N = 3 SE +/- 0.19, N = 3 79.47 78.18 MIN: 62.57 / MAX: 232.01 MIN: 66.08 / MAX: 194.38 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny 0xd000390 0xd0003a5 6 12 18 24 30 SE +/- 0.22, N = 3 SE +/- 0.17, N = 3 23.71 24.10 MIN: 22.57 / MAX: 46.11 MIN: 23.25 / MAX: 51.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Machine Translation EN To DE FP16 - Device: CPU 0xd000390 0xd0003a5 60 120 180 240 300 SE +/- 0.68, N = 3 SE +/- 0.63, N = 3 251.00 255.09 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
SPECFEM3D Model: Mount St. Helens OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Mount St. Helens 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.18, N = 3 SE +/- 0.12, N = 3 13.15 12.95 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2023 Implementation: MPI CPU - Input: water_GMX50_bare 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.021, N = 3 SE +/- 0.026, N = 3 9.234 9.094 1. (CXX) g++ options: -O3
SPECFEM3D Model: Homogeneous Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Homogeneous Halfspace 0xd000390 0xd0003a5 4 8 12 16 20 SE +/- 0.15, N = 3 SE +/- 0.16, N = 3 18.02 17.75 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU 0xd000390 0xd0003a5 400 800 1200 1600 2000 SE +/- 1.77, N = 3 SE +/- 3.75, N = 3 2039.63 2070.72 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Vehicle Bike Detection FP16 - Device: CPU 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 9.77 9.63 MIN: 7.83 / MAX: 20.01 MIN: 8.33 / MAX: 19.33 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
Palabos Grid Size: 400 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 400 0xd000390 0xd0003a5 90 180 270 360 450 SE +/- 0.94, N = 3 SE +/- 0.06, N = 3 388.48 393.84 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 0xd000390 0xd0003a5 2 4 6 8 10 SE +/- 0.05, N = 3 SE +/- 0.12, N = 3 8.88 8.76 MIN: 8.69 / MAX: 32.33 MIN: 8.3 / MAX: 32.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256 0xd000390 0xd0003a5 11 22 33 44 55 SE +/- 0.25, N = 3 SE +/- 0.61, N = 3 46.35 46.98 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.13, N = 3 9.89 9.76 MIN: 9.61 / MAX: 33.59 MIN: 9.32 / MAX: 33.4 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
TensorFlow Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: ResNet-50 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 1.13, N = 3 SE +/- 1.17, N = 3 85.97 84.84
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet 0xd000390 0xd0003a5 1.2285 2.457 3.6855 4.914 6.1425 SE +/- 0.16, N = 3 SE +/- 0.15, N = 3 5.39 5.46 MIN: 4.83 / MAX: 151.52 MIN: 5.01 / MAX: 29.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet 0xd000390 0xd0003a5 4 8 12 16 20 SE +/- 0.09, N = 3 SE +/- 0.07, N = 3 15.46 15.66 MIN: 14.85 / MAX: 106.56 MIN: 15.24 / MAX: 38.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Remhos Test: Sample Remap Example OpenBenchmarking.org Seconds, Fewer Is Better Remhos 1.0 Test: Sample Remap Example 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 12.25 12.40 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 13 - Input: Bosphorus 4K 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 0.79, N = 3 SE +/- 2.02, N = 3 175.10 177.20 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 1.09, N = 3 SE +/- 1.13, N = 3 93.75 94.86 1. (CXX) g++ options: -O3
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 8 - Input: Bosphorus 4K 0xd000390 0xd0003a5 15 30 45 60 75 SE +/- 0.47, N = 3 SE +/- 0.46, N = 3 67.17 66.46 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256 0xd000390 0xd0003a5 50 100 150 200 250 SE +/- 2.25, N = 3 SE +/- 2.85, N = 3 224.42 226.78 1. (CXX) g++ options: -O3
Palabos Grid Size: 500 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 500 0xd000390 0xd0003a5 90 180 270 360 450 SE +/- 0.42, N = 3 SE +/- 0.72, N = 3 413.21 417.48 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 9 18 27 36 45 SE +/- 0.46, N = 3 SE +/- 0.38, N = 15 39.12 38.72 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.32, N = 3 SE +/- 0.18, N = 3 84.02 83.20
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 0xd000390 0xd0003a5 130 260 390 520 650 SE +/- 2.33, N = 3 SE +/- 2.54, N = 3 594.6 600.2 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.27, N = 3 SE +/- 0.20, N = 3 101.98 102.92 1. (CXX) g++ options: -O3
SVT-HEVC Tuning: 10 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 10 - Input: Bosphorus 4K 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 2.03, N = 3 SE +/- 0.38, N = 3 184.38 182.74 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.28, N = 3 93.09 93.92 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 16 32 48 64 80 SE +/- 0.15, N = 3 SE +/- 0.12, N = 3 72.18 71.54
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 0xd000390 0xd0003a5 2 4 6 8 10 SE +/- 0.09, N = 3 SE +/- 0.04, N = 3 8.03 7.96 MIN: 7.8 / MAX: 31.34 MIN: 7.75 / MAX: 31.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 100 200 300 400 500 SE +/- 1.00, N = 3 SE +/- 1.39, N = 3 474.69 478.85
Neural Magic DeepSparse Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 11.47, N = 3 SE +/- 1.22, N = 3 922.93 930.75
Neural Magic DeepSparse Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 10 20 30 40 50 SE +/- 0.53, N = 3 SE +/- 0.06, N = 3 43.30 42.94
miniBUDE Implementation: OpenMP - Input Deck: BM1 OpenBenchmarking.org GFInst/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 0xd000390 0xd0003a5 500 1000 1500 2000 2500 SE +/- 3.60, N = 3 SE +/- 10.90, N = 3 2353.39 2372.42 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
miniBUDE Implementation: OpenMP - Input Deck: BM1 OpenBenchmarking.org Billion Interactions/s, More Is Better miniBUDE 20210901 Implementation: OpenMP - Input Deck: BM1 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.14, N = 3 SE +/- 0.44, N = 3 94.14 94.90 1. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm
QMCPACK Input: Li2_STO_ae OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: Li2_STO_ae 0xd000390 0xd0003a5 30 60 90 120 150 SE +/- 1.55, N = 3 SE +/- 1.03, N = 3 124.23 123.26 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.38, N = 3 SE +/- 0.11, N = 3 11.62 11.71 MIN: 10.82 / MAX: 21.03 MIN: 11.15 / MAX: 19.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU 0xd000390 0xd0003a5 8 16 24 32 40 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 33.89 33.63 MIN: 29.83 / MAX: 113.13 MIN: 29.54 / MAX: 113.55 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16 - Device: CPU 0xd000390 0xd0003a5 500 1000 1500 2000 2500 SE +/- 3.01, N = 3 SE +/- 3.91, N = 3 2344.97 2362.84 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
SPECFEM3D Model: Water-layered Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Water-layered Halfspace 0xd000390 0xd0003a5 7 14 21 28 35 SE +/- 0.24, N = 3 SE +/- 0.31, N = 5 31.15 31.38 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
Cpuminer-Opt Algorithm: Myriad-Groestl OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Myriad-Groestl 0xd000390 0xd0003a5 9K 18K 27K 36K 45K SE +/- 386.00, N = 15 SE +/- 406.32, N = 3 43127 43450 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 0xd000390 0xd0003a5 130 260 390 520 650 SE +/- 8.71, N = 15 SE +/- 4.92, N = 3 604.7 609.2 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 0.56, N = 3 SE +/- 0.31, N = 3 1006.34 1013.82
Neural Magic DeepSparse Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 0.34, N = 3 SE +/- 1.64, N = 3 173.79 172.56
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 9 18 27 36 45 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 39.70 39.41
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 0xd000390 0xd0003a5 0.5834 1.1668 1.7502 2.3336 2.917 SE +/- 0.03265, N = 15 SE +/- 0.02489, N = 15 2.59271 2.57437 MIN: 1.91 MIN: 1.99 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 1.36, N = 3 SE +/- 0.98, N = 3 166.53 165.42 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline
Cpuminer-Opt Algorithm: Skeincoin OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Skeincoin 0xd000390 0xd0003a5 130K 260K 390K 520K 650K SE +/- 1652.89, N = 3 SE +/- 3788.20, N = 3 613333 617130 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
Neural Magic DeepSparse Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 50 100 150 200 250 SE +/- 0.46, N = 3 SE +/- 2.37, N = 3 230.05 231.43
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 120 240 360 480 600 SE +/- 0.62, N = 3 SE +/- 1.12, N = 3 551.82 555.09
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU 0xd000390 0xd0003a5 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 24.04 24.18 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU 0xd000390 0xd0003a5 1000 2000 3000 4000 5000 SE +/- 3.29, N = 3 SE +/- 2.64, N = 3 4419.17 4442.98 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Face Detection FP16 - Device: CPU 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 0.42, N = 3 SE +/- 0.71, N = 3 827.51 823.09 MIN: 628.21 / MAX: 980.48 MIN: 550.41 / MAX: 926.21 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
Blender Blend File: Fishy Cat - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: Fishy Cat - Compute: CPU-Only 0xd000390 0xd0003a5 7 14 21 28 35 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 30.74 30.90
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 0xd000390 0xd0003a5 110 220 330 440 550 SE +/- 7.19, N = 3 SE +/- 2.82, N = 3 524.38 521.74 MIN: 499.75 MIN: 505.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
CloverLeaf Lagrangian-Eulerian Hydrodynamics OpenBenchmarking.org Seconds, Fewer Is Better CloverLeaf Lagrangian-Eulerian Hydrodynamics 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 12.04 11.98 1. (F9X) gfortran options: -O3 -march=native -funroll-loops -fopenmp
Cpuminer-Opt Algorithm: Quad SHA-256, Pyrite OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Quad SHA-256, Pyrite 0xd000390 0xd0003a5 200K 400K 600K 800K 1000K SE +/- 3352.41, N = 3 SE +/- 1690.96, N = 3 921730 926277 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
VVenC Video Input: Bosphorus 4K - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Faster 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.09, N = 3 SE +/- 0.06, N = 3 10.36 10.42 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
Blender Blend File: BMW27 - Compute: CPU-Only OpenBenchmarking.org Seconds, Fewer Is Better Blender 3.6 Blend File: BMW27 - Compute: CPU-Only 0xd000390 0xd0003a5 6 12 18 24 30 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 23.83 23.72
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16-INT8 - Device: CPU 0xd000390 0xd0003a5 1.0148 2.0296 3.0444 4.0592 5.074 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 4.51 4.49 MIN: 4.02 / MAX: 13.95 MIN: 4.04 / MAX: 15.85 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
SPECFEM3D Model: Layered Halfspace OpenBenchmarking.org Seconds, Fewer Is Better SPECFEM3D 4.0 Model: Layered Halfspace 0xd000390 0xd0003a5 7 14 21 28 35 SE +/- 0.05, N = 3 SE +/- 0.18, N = 3 29.50 29.37 1. (F9X) gfortran options: -O2 -fopenmp -std=f2003 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 0.88, N = 3 SE +/- 1.14, N = 3 1121.56 1117.09 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Vehicle Detection FP16 - Device: CPU 0xd000390 0xd0003a5 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 17.80 17.87 MIN: 12.83 / MAX: 32.64 MIN: 12.24 / MAX: 38.45 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 120 240 360 480 600 SE +/- 0.77, N = 3 SE +/- 1.19, N = 3 551.12 553.27
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU 0xd000390 0xd0003a5 300 600 900 1200 1500 SE +/- 2.35, N = 3 SE +/- 0.81, N = 3 1490.76 1496.19 MIN: 1074.22 / MAX: 1692.48 MIN: 1043.7 / MAX: 1711.44 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
Cpuminer-Opt Algorithm: LBC, LBRY Credits OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: LBC, LBRY Credits 0xd000390 0xd0003a5 90K 180K 270K 360K 450K SE +/- 313.42, N = 3 SE +/- 860.95, N = 3 421660 423130 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
Cpuminer-Opt Algorithm: Deepcoin OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Deepcoin 0xd000390 0xd0003a5 14K 28K 42K 56K 70K SE +/- 89.69, N = 3 SE +/- 187.02, N = 3 64677 64897 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Detection FP16 - Device: CPU 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 13.29 13.25 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
VVenC Video Input: Bosphorus 4K - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.9 Video Input: Bosphorus 4K - Video Preset: Fast 0xd000390 0xd0003a5 1.2875 2.575 3.8625 5.15 6.4375 SE +/- 0.033, N = 3 SE +/- 0.029, N = 3 5.722 5.705 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream 0xd000390 0xd0003a5 16 32 48 64 80 SE +/- 0.09, N = 3 SE +/- 0.10, N = 3 72.07 71.89
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU 0xd000390 0xd0003a5 2K 4K 6K 8K 10K SE +/- 3.74, N = 3 SE +/- 2.19, N = 3 9396.52 9419.77 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU 0xd000390 0xd0003a5 0.8805 1.761 2.6415 3.522 4.4025 SE +/- 0.00202, N = 3 SE +/- 0.00080, N = 3 3.90360 3.91322 MIN: 3.68 MIN: 3.69 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Chimera 1080p 0xd000390 0xd0003a5 110 220 330 440 550 SE +/- 0.80, N = 3 SE +/- 0.51, N = 3 515.81 514.58 1. (CC) gcc options: -pthread -lm
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Weld Porosity Detection FP16-INT8 - Device: CPU 0xd000390 0xd0003a5 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 8.50 8.48 MIN: 7.17 / MAX: 18.39 MIN: 7.15 / MAX: 22.53 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
Laghos Test: Triple Point Problem OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Triple Point Problem 0xd000390 0xd0003a5 60 120 180 240 300 SE +/- 0.32, N = 3 SE +/- 0.94, N = 3 256.27 256.87 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU 0xd000390 0xd0003a5 15K 30K 45K 60K 75K SE +/- 17.48, N = 3 SE +/- 29.72, N = 3 67604.00 67754.34 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
SVT-HEVC Tuning: 7 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 7 - Input: Bosphorus 4K 0xd000390 0xd0003a5 30 60 90 120 150 SE +/- 0.60, N = 3 SE +/- 0.44, N = 3 138.75 138.49 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Summer Nature 4K 0xd000390 0xd0003a5 60 120 180 240 300 SE +/- 1.06, N = 3 SE +/- 0.81, N = 3 281.36 280.84 1. (CC) gcc options: -pthread -lm
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 11.02 11.00 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 0xd000390 0xd0003a5 13K 26K 39K 52K 65K SE +/- 17.38, N = 3 SE +/- 18.68, N = 3 59274.06 59377.96 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU 0xd000390 0xd0003a5 300 600 900 1200 1500 SE +/- 0.52, N = 3 SE +/- 0.84, N = 3 1517.69 1519.84 MIN: 1081.6 / MAX: 1690.6 MIN: 1074.08 / MAX: 1721.59 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU 0xd000390 0xd0003a5 50 100 150 200 250 SE +/- 0.03, N = 3 SE +/- 0.14, N = 3 209.31 209.02 MIN: 160.46 / MAX: 249.09 MIN: 152.83 / MAX: 234.28 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Face Detection FP16-INT8 - Device: CPU 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 95.42 95.54 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
Palabos Grid Size: 100 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 100 0xd000390 0xd0003a5 70 140 210 280 350 SE +/- 0.89, N = 3 SE +/- 0.20, N = 3 312.20 312.53 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
Cpuminer-Opt Algorithm: scrypt OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: scrypt 0xd000390 0xd0003a5 500 1000 1500 2000 2500 SE +/- 1.08, N = 3 SE +/- 8.35, N = 3 2319.31 2321.74 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
Cpuminer-Opt Algorithm: Blake-2 S OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Blake-2 S 0xd000390 0xd0003a5 1000K 2000K 3000K 4000K 5000K SE +/- 8325.80, N = 3 SE +/- 8676.85, N = 3 4462327 4466653 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
SVT-HEVC Tuning: 1 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-HEVC 1.5.0 Tuning: 1 - Input: Bosphorus 4K 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 10.46 10.45 1. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2022.3 Model: Person Detection FP32 - Device: CPU 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 13.03 13.04 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU 0xd000390 0xd0003a5 0.8157 1.6314 2.4471 3.2628 4.0785 SE +/- 0.00896, N = 3 SE +/- 0.00622, N = 3 3.62526 3.62266 MIN: 3.54 MIN: 3.53 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Cpuminer-Opt Algorithm: Triple SHA-256, Onecoin OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Triple SHA-256, Onecoin 0xd000390 0xd0003a5 300K 600K 900K 1200K 1500K SE +/- 7105.40, N = 3 SE +/- 7235.75, N = 3 1332237 1333117 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
Laghos Test: Sedov Blast Wave, ube_922_hex.mesh OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh 0xd000390 0xd0003a5 80 160 240 320 400 SE +/- 0.23, N = 3 SE +/- 0.80, N = 3 385.89 386.08 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
Cpuminer-Opt Algorithm: Magi OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Magi 0xd000390 0xd0003a5 500 1000 1500 2000 2500 SE +/- 3.91, N = 3 SE +/- 1.08, N = 3 2309.47 2308.66 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU 0xd000390 0xd0003a5 0.4657 0.9314 1.3971 1.8628 2.3285 SE +/- 0.00038, N = 3 SE +/- 0.00191, N = 3 2.06936 2.06967 MIN: 2.03 MIN: 2.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Cpuminer-Opt Algorithm: x25x OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: x25x 0xd000390 0xd0003a5 600 1200 1800 2400 3000 SE +/- 4.75, N = 3 SE +/- 5.54, N = 3 2659.17 2659.55 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only 0xd000390 0xd0003a5 0.3285 0.657 0.9855 1.314 1.6425 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.46 1.46
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only 0xd000390 0xd0003a5 0.6818 1.3636 2.0454 2.7272 3.409 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 3.03 3.03
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU 0xd000390 0xd0003a5 0.261 0.522 0.783 1.044 1.305 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.16 1.16 MIN: 0.88 / MAX: 12.61 MIN: 0.86 / MAX: 17.98 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2022.3 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 0xd000390 0xd0003a5 0.2993 0.5986 0.8979 1.1972 1.4965 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.33 1.33 MIN: 0.97 / MAX: 13 MIN: 0.97 / MAX: 13.43 1. (CXX) g++ options: -isystem -fsigned-char -ffunction-sections -fdata-sections -msse4.1 -msse4.2 -O3 -fno-strict-overflow -fwrapv -fPIC -fvisibility=hidden -Os -std=c++11 -MD -MT -MF
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.64, N = 3 SE +/- 0.07, N = 3 10.20 9.71 MIN: 9.01 / MAX: 500.17 MIN: 9.22 / MAX: 27.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m 0xd000390 0xd0003a5 10 20 30 40 50 SE +/- 7.69, N = 3 SE +/- 0.50, N = 3 45.54 38.85 MIN: 36.01 / MAX: 3343.68 MIN: 37.13 / MAX: 233.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 0xd000390 0xd0003a5 5 10 15 20 25 SE +/- 0.52, N = 3 SE +/- 0.69, N = 3 17.15 18.51 MIN: 16.19 / MAX: 41.83 MIN: 17.32 / MAX: 299.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 2 4 6 8 10 SE +/- 0.00502, N = 3 SE +/- 0.43315, N = 15 6.31428 6.71968 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 30 60 90 120 150 SE +/- 0.13, N = 3 SE +/- 10.25, N = 15 158.36 158.01 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 1.0397 2.0794 3.1191 4.1588 5.1985 SE +/- 0.02854, N = 3 SE +/- 0.04028, N = 8 4.52403 4.62085 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 6 12 18 24 30 SE +/- 0.30, N = 3 SE +/- 0.26, N = 15 25.57 25.86 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 30 60 90 120 150 SE +/- 1.29, N = 15 SE +/- 3.29, N = 15 110.32 117.55 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 3 6 9 12 15 SE +/- 0.10283, N = 15 SE +/- 0.23491, N = 15 9.08067 8.59908 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 0.3398 0.6796 1.0194 1.3592 1.699 SE +/- 0.00633, N = 3 SE +/- 0.02722, N = 15 1.43407 1.51029 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 150 300 450 600 750 SE +/- 3.08, N = 3 SE +/- 12.01, N = 15 696.73 664.54 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 14 28 42 56 70 SE +/- 0.28, N = 3 SE +/- 1.14, N = 12 59.84 61.51 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 20 40 60 80 100 SE +/- 0.79, N = 7 SE +/- 1.07, N = 15 85.80 89.83 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 1.2483 2.4966 3.7449 4.9932 6.2415 SE +/- 0.00245, N = 3 SE +/- 0.11956, N = 15 5.54783 5.28095 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard 0xd000390 0xd0003a5 40 80 120 160 200 SE +/- 0.08, N = 3 SE +/- 4.37, N = 15 180.16 190.57 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
Cpuminer-Opt Algorithm: Garlicoin OpenBenchmarking.org kH/s, More Is Better Cpuminer-Opt 3.20.3 Algorithm: Garlicoin 0xd000390 0xd0003a5 6K 12K 18K 24K 30K SE +/- 330.17, N = 3 SE +/- 3833.17, N = 12 29203.00 22086.25 1. (CXX) g++ options: -O2 -lcurl -lz -lpthread -lssl -lcrypto -lgmp
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 0xd000390 0xd0003a5 200 400 600 800 1000 SE +/- 16.59, N = 15 SE +/- 14.20, N = 12 832.45 816.51 MIN: 714.88 MIN: 723.44 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread
Phoronix Test Suite v10.8.5