ml-benchmark2 AMD Ryzen 9 7950X3D 16-Core testing with a ASUS PRIME X670E-PRO WIFI (1813 BIOS) and MSI NVIDIA GeForce RTX 3060 12GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2312269-NE-MLBENCHMA88&grs .
ml-benchmark2 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution ml-benchmark2-12-25-23 AMD Ryzen 9 7950X3D 16-Core @ 4.20GHz (16 Cores / 32 Threads) ASUS PRIME X670E-PRO WIFI (1813 BIOS) AMD Device 14d8 128GB 4001GB CT4000P3PSSD8 + 1024GB SPCC M.2 PCIe SSD MSI NVIDIA GeForce RTX 3060 12GB NVIDIA Device 228e LC27T55 Realtek RTL8125 2.5GbE + MEDIATEK Device 0608 Ubuntu 22.04 6.2.0-39-generic (x86_64) GNOME Shell 42.9 X Server 1.21.1.4 NVIDIA 535.129.03 4.6.0 OpenCL 3.0 CUDA 12.2.147 1.3.242 GCC 11.4.0 + CUDA 12.3 ext4 1920x1080 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa601206 - GLAMOR - BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 94.06.2f.00.98 - GPU Compute Cores: 3584 - Python 3.10.12 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ml-benchmark2 whisper-cpp: ggml-medium.en - 2016 State of the Union whisper-cpp: ggml-small.en - 2016 State of the Union whisper-cpp: ggml-base.en - 2016 State of the Union scikit-learn: Sparse Rand Projections / 100 Iterations scikit-learn: 20 Newsgroups / Logistic Regression scikit-learn: Isotonic / Perturbed Logarithm scikit-learn: Covertype Dataset Benchmark scikit-learn: Isotonic / Pathological scikit-learn: Plot Incremental PCA scikit-learn: Isotonic / Logistic scikit-learn: Feature Expansions scikit-learn: Plot Hierarchical scikit-learn: Text Vectorizers scikit-learn: Plot Neighbors scikit-learn: Plot Ward scikit-learn: Sparsify scikit-learn: Lasso scikit-learn: Tree mlpack: scikit_linearridgeregression mlpack: scikit_svm mlpack: scikit_qda mlpack: scikit_ica ai-benchmark: Device AI Score ai-benchmark: Device Training Score ai-benchmark: Device Inference Score numenta-nab: Contextual Anomaly Detector OSE numenta-nab: Bayesian Changepoint numenta-nab: Earthgecko Skyline numenta-nab: Windowed Gaussian numenta-nab: Relative Entropy numenta-nab: KNN CAD openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU tnn: CPU - SqueezeNet v1.1 tnn: CPU - SqueezeNet v2 tnn: CPU - MobileNet v2 tnn: CPU - DenseNet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - vgg16 ncnn: CPU - vision_transformer ncnn: CPU - regnety_400m ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - vgg16 ncnn: CPU - googlenet mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: mobilenetV3 mnn: nasnet caffe: GoogleNet - CPU - 1000 caffe: GoogleNet - CPU - 200 caffe: GoogleNet - CPU - 100 caffe: AlexNet - CPU - 1000 caffe: AlexNet - CPU - 200 caffe: AlexNet - CPU - 100 spacy: en_core_web_trf spacy: en_core_web_lg deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream tensorflow: CPU - 512 - ResNet-50 tensorflow: CPU - 512 - GoogLeNet tensorflow: CPU - 256 - ResNet-50 tensorflow: CPU - 256 - GoogLeNet tensorflow: CPU - 64 - ResNet-50 tensorflow: CPU - 64 - GoogLeNet tensorflow: CPU - 32 - ResNet-50 tensorflow: CPU - 32 - GoogLeNet tensorflow: CPU - 16 - ResNet-50 tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 512 - AlexNet tensorflow: CPU - 256 - AlexNet tensorflow: CPU - 64 - AlexNet tensorflow: CPU - 512 - VGG-16 tensorflow: CPU - 32 - AlexNet tensorflow: CPU - 256 - VGG-16 tensorflow: CPU - 16 - AlexNet tensorflow: CPU - 64 - VGG-16 tensorflow: CPU - 32 - VGG-16 tensorflow: CPU - 16 - VGG-16 pytorch: CPU - 512 - Efficientnet_v2_l pytorch: CPU - 256 - Efficientnet_v2_l pytorch: CPU - 64 - Efficientnet_v2_l pytorch: CPU - 32 - Efficientnet_v2_l pytorch: CPU - 16 - Efficientnet_v2_l pytorch: CPU - 1 - Efficientnet_v2_l pytorch: CPU - 512 - ResNet-152 pytorch: CPU - 256 - ResNet-152 pytorch: CPU - 64 - ResNet-152 pytorch: CPU - 512 - ResNet-50 pytorch: CPU - 32 - ResNet-152 pytorch: CPU - 256 - ResNet-50 pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 64 - ResNet-50 pytorch: CPU - 32 - ResNet-50 pytorch: CPU - 16 - ResNet-50 pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 1 - ResNet-50 tensorflow-lite: Inception ResNet V2 tensorflow-lite: Mobilenet Quant tensorflow-lite: Mobilenet Float tensorflow-lite: NASNet Mobile tensorflow-lite: Inception V4 tensorflow-lite: SqueezeNet rnnoise: rbenchmark: deepspeech: CPU numpy: onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - f32 - CPU lczero: BLAS opencv: DNN - Deep Neural Network ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet ncnn: CPU - FastestDet ncnn: CPU - squeezenet_ssd ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet mnn: squeezenetv1.1 onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU shoc: OpenCL - S3D ml-benchmark2-12-25-23 807.17623 289.73641 96.97771 401.306 28.219 1384.028 270.959 3114.638 49.606 1123.364 90.526 138.998 41.823 123.612 40.579 71.653 237.287 37.432 1.08 14.12 31.30 29.93 6524 3592 2932 25.412 12.749 51.837 4.713 8.048 100.046 0.26 58961.44 26.55 602.34 0.38 41501.98 21.72 736.26 5.06 1579.58 5.95 2688.84 65.69 121.71 15.00 532.88 3.28 4876.38 11.53 1386.49 4.79 1667.89 18.10 441.60 2.29 3485.66 301.34 26.49 8.04 994.76 91.56 87.33 91.06 87.80 581.79 13.70 179.657 42.143 187.479 2134.430 42.15 11.29 17.22 13.39 35.09 42.19 11.06 17.12 13.00 35.80 10.38 22.381 2.640 3.723 4.402 11.360 1.818 11.960 626915 124421 63606 243445 49227 24893 2354 19452 56.8061 17.6016 356.9725 22.3806 9.9862 100.1107 18.6647 428.2324 37.7800 26.4575 228.4604 34.9310 9.1055 109.7687 42.7042 187.2701 10.5009 95.2152 62.4189 128.0944 5.2028 192.1292 29.5869 270.2553 56.8125 17.5992 289.5002 27.6321 10.6189 94.1453 66.4627 120.3088 0.8513 1172.1452 3.0495 2614.2430 5.2053 192.0351 29.5978 270.1520 3.6930 270.6650 8.5518 934.0846 56.9632 17.5530 361.4936 22.0683 34.09 110.88 34.24 112.12 35.45 120.40 36.10 127.29 35.90 127.79 400.77 386.97 297.39 19.13 215.96 19.00 140.39 18.48 17.71 16.48 10.42 10.47 10.43 10.45 10.29 13.70 17.89 17.71 17.98 44.93 17.78 45.77 17.88 45.17 45.71 45.48 26.53 67.87 22832.8 1985.59 1293.76 10670.2 22388.9 1798.07 14.603 0.1133 52.49225 721.25 702.315 1343.53 704.857 1.61432 2.60894 1.49610 1367.75 707.837 1358.07 0.731894 0.505580 5.08690 2.88833 3.43419 5.27943 1.49285 0.772163 3.81408 2.13148 175 31437 4.76 8.79 5.20 6.64 10.69 1.60 5.56 3.97 4.36 4.40 4.29 10.67 5.08 8.75 5.09 6.70 1.70 5.30 3.86 4.42 4.25 4.52 10.64 2.770 0.325432 0.542833 OpenBenchmarking.org
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-medium.en - Input: 2016 State of the Union ml-benchmark2-12-25-23 200 400 600 800 1000 SE +/- 9.64, N = 3 807.18 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-small.en - Input: 2016 State of the Union ml-benchmark2-12-25-23 60 120 180 240 300 SE +/- 3.07, N = 9 289.74 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-base.en - Input: 2016 State of the Union ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 1.08, N = 15 96.98 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Scikit-Learn Benchmark: Sparse Random Projections / 100 Iterations OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparse Random Projections / 100 Iterations ml-benchmark2-12-25-23 90 180 270 360 450 SE +/- 2.81, N = 3 401.31 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: 20 Newsgroups / Logistic Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: 20 Newsgroups / Logistic Regression ml-benchmark2-12-25-23 7 14 21 28 35 SE +/- 0.07, N = 3 28.22 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Isotonic / Perturbed Logarithm OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isotonic / Perturbed Logarithm ml-benchmark2-12-25-23 300 600 900 1200 1500 SE +/- 6.24, N = 3 1384.03 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Covertype Dataset Benchmark OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Covertype Dataset Benchmark ml-benchmark2-12-25-23 60 120 180 240 300 SE +/- 2.26, N = 3 270.96 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Isotonic / Pathological OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isotonic / Pathological ml-benchmark2-12-25-23 700 1400 2100 2800 3500 SE +/- 12.71, N = 3 3114.64 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Plot Incremental PCA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Incremental PCA ml-benchmark2-12-25-23 11 22 33 44 55 SE +/- 0.66, N = 15 49.61 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Isotonic / Logistic OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Isotonic / Logistic ml-benchmark2-12-25-23 200 400 600 800 1000 SE +/- 4.14, N = 3 1123.36 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Feature Expansions OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Feature Expansions ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.70, N = 3 90.53 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Plot Hierarchical OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Hierarchical ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.48, N = 3 139.00 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Text Vectorizers OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Text Vectorizers ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.09, N = 3 41.82 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Plot Neighbors OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Neighbors ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 1.25, N = 12 123.61 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Plot Ward OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Ward ml-benchmark2-12-25-23 9 18 27 36 45 SE +/- 0.09, N = 3 40.58 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Sparsify OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparsify ml-benchmark2-12-25-23 16 32 48 64 80 SE +/- 1.01, N = 3 71.65 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Lasso OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Lasso ml-benchmark2-12-25-23 50 100 150 200 250 SE +/- 1.50, N = 3 237.29 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Scikit-Learn Benchmark: Tree OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Tree ml-benchmark2-12-25-23 9 18 27 36 45 SE +/- 0.42, N = 3 37.43 1. (F9X) gfortran options: -O3 -fopenmp -fno-tree-vectorize -lm -lpthread -lgfortran -lc
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression ml-benchmark2-12-25-23 0.243 0.486 0.729 0.972 1.215 SE +/- 0.01, N = 15 1.08
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.18, N = 3 14.12
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda ml-benchmark2-12-25-23 7 14 21 28 35 SE +/- 0.13, N = 3 31.30
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica ml-benchmark2-12-25-23 7 14 21 28 35 SE +/- 0.04, N = 3 29.93
AI Benchmark Alpha Device AI Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device AI Score ml-benchmark2-12-25-23 1400 2800 4200 5600 7000 6524
AI Benchmark Alpha Device Training Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Training Score ml-benchmark2-12-25-23 800 1600 2400 3200 4000 3592
AI Benchmark Alpha Device Inference Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Inference Score ml-benchmark2-12-25-23 600 1200 1800 2400 3000 2932
Numenta Anomaly Benchmark Detector: Contextual Anomaly Detector OSE OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Contextual Anomaly Detector OSE ml-benchmark2-12-25-23 6 12 18 24 30 SE +/- 0.27, N = 4 25.41
Numenta Anomaly Benchmark Detector: Bayesian Changepoint OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Bayesian Changepoint ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.14, N = 4 12.75
Numenta Anomaly Benchmark Detector: Earthgecko Skyline OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Earthgecko Skyline ml-benchmark2-12-25-23 12 24 36 48 60 SE +/- 0.56, N = 3 51.84
Numenta Anomaly Benchmark Detector: Windowed Gaussian OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Windowed Gaussian ml-benchmark2-12-25-23 1.0604 2.1208 3.1812 4.2416 5.302 SE +/- 0.049, N = 15 4.713
Numenta Anomaly Benchmark Detector: Relative Entropy OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Relative Entropy ml-benchmark2-12-25-23 2 4 6 8 10 SE +/- 0.034, N = 3 8.048
Numenta Anomaly Benchmark Detector: KNN CAD OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: KNN CAD ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.47, N = 3 100.05
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 0.0585 0.117 0.1755 0.234 0.2925 SE +/- 0.00, N = 3 0.26 MIN: 0.16 / MAX: 134.43 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 13K 26K 39K 52K 65K SE +/- 14.39, N = 3 58961.44 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 6 12 18 24 30 SE +/- 0.05, N = 3 26.55 MIN: 18.75 / MAX: 163.99 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 130 260 390 520 650 SE +/- 1.20, N = 3 602.34 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU ml-benchmark2-12-25-23 0.0855 0.171 0.2565 0.342 0.4275 SE +/- 0.00, N = 3 0.38 MIN: 0.21 / MAX: 73.29 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU ml-benchmark2-12-25-23 9K 18K 27K 36K 45K SE +/- 18.73, N = 3 41501.98 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU ml-benchmark2-12-25-23 5 10 15 20 25 SE +/- 0.09, N = 3 21.72 MIN: 14.79 / MAX: 129.44 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU ml-benchmark2-12-25-23 160 320 480 640 800 SE +/- 3.04, N = 3 736.26 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU ml-benchmark2-12-25-23 1.1385 2.277 3.4155 4.554 5.6925 SE +/- 0.01, N = 3 5.06 MIN: 3.44 / MAX: 59.53 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU ml-benchmark2-12-25-23 300 600 900 1200 1500 SE +/- 3.61, N = 3 1579.58 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 1.3388 2.6776 4.0164 5.3552 6.694 SE +/- 0.00, N = 3 5.95 MIN: 3.2 / MAX: 99.8 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 600 1200 1800 2400 3000 SE +/- 1.97, N = 3 2688.84 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU ml-benchmark2-12-25-23 15 30 45 60 75 SE +/- 0.20, N = 3 65.69 MIN: 28.67 / MAX: 156.27 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.37, N = 3 121.71 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.05, N = 3 15.00 MIN: 8.89 / MAX: 105.47 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 120 240 360 480 600 SE +/- 1.61, N = 3 532.88 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 0.738 1.476 2.214 2.952 3.69 SE +/- 0.00, N = 3 3.28 MIN: 2.08 / MAX: 88.57 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 1000 2000 3000 4000 5000 SE +/- 3.56, N = 3 4876.38 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.01, N = 3 11.53 MIN: 5.99 / MAX: 113.69 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU ml-benchmark2-12-25-23 300 600 900 1200 1500 SE +/- 0.66, N = 3 1386.49 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 1.0778 2.1556 3.2334 4.3112 5.389 SE +/- 0.01, N = 3 4.79 MIN: 2.93 / MAX: 107.95 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 400 800 1200 1600 2000 SE +/- 2.61, N = 3 1667.89 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.04, N = 3 18.10 MIN: 10.53 / MAX: 93.07 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU ml-benchmark2-12-25-23 100 200 300 400 500 SE +/- 1.13, N = 3 441.60 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU ml-benchmark2-12-25-23 0.5153 1.0306 1.5459 2.0612 2.5765 SE +/- 0.00, N = 3 2.29 MIN: 1.38 / MAX: 55.46 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU ml-benchmark2-12-25-23 700 1400 2100 2800 3500 SE +/- 2.28, N = 3 3485.66 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 70 140 210 280 350 SE +/- 0.11, N = 3 301.34 MIN: 180.53 / MAX: 466.91 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU ml-benchmark2-12-25-23 6 12 18 24 30 SE +/- 0.02, N = 3 26.49 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU ml-benchmark2-12-25-23 2 4 6 8 10 SE +/- 0.03, N = 3 8.04 MIN: 3.93 / MAX: 100.58 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU ml-benchmark2-12-25-23 200 400 600 800 1000 SE +/- 3.28, N = 3 994.76 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.18, N = 3 91.56 MIN: 31.31 / MAX: 170.1 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.18, N = 3 87.33 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.62, N = 3 91.06 MIN: 33.94 / MAX: 197.35 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.59, N = 3 87.80 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU ml-benchmark2-12-25-23 130 260 390 520 650 SE +/- 0.48, N = 3 581.79 MIN: 290.44 / MAX: 695.5 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.01, N = 3 13.70 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 ml-benchmark2-12-25-23 40 80 120 160 200 SE +/- 2.19, N = 15 179.66 MIN: 169.37 / MAX: 193.32 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.64, N = 15 42.14 MIN: 38.56 / MAX: 49.14 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 ml-benchmark2-12-25-23 40 80 120 160 200 SE +/- 1.57, N = 15 187.48 MIN: 176.2 / MAX: 252.15 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet ml-benchmark2-12-25-23 500 1000 1500 2000 2500 SE +/- 8.59, N = 3 2134.43 MIN: 2002.11 / MAX: 2290.65 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.31, N = 12 42.15 MIN: 34.59 / MAX: 476.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.16, N = 12 11.29 MIN: 8.75 / MAX: 379.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.18, N = 12 17.22 MIN: 13.77 / MAX: 381.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.20, N = 12 13.39 MIN: 10.9 / MAX: 369.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 ml-benchmark2-12-25-23 8 16 24 32 40 SE +/- 0.46, N = 12 35.09 MIN: 28.16 / MAX: 385.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.29, N = 15 42.19 MIN: 35.18 / MAX: 582.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.14, N = 15 11.06 MIN: 8.41 / MAX: 389.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.14, N = 15 17.12 MIN: 13.93 / MAX: 387.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.15, N = 15 13.00 MIN: 10.61 / MAX: 388.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 ml-benchmark2-12-25-23 8 16 24 32 40 SE +/- 0.47, N = 15 35.80 MIN: 28.59 / MAX: 399.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.15, N = 15 10.38 MIN: 8.27 / MAX: 421.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: inception-v3 ml-benchmark2-12-25-23 5 10 15 20 25 SE +/- 0.32, N = 3 22.38 MIN: 19.4 / MAX: 90.14 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenet-v1-1.0 ml-benchmark2-12-25-23 0.594 1.188 1.782 2.376 2.97 SE +/- 0.053, N = 3 2.640 MIN: 2.3 / MAX: 44.01 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: MobileNetV2_224 ml-benchmark2-12-25-23 0.8377 1.6754 2.5131 3.3508 4.1885 SE +/- 0.086, N = 3 3.723 MIN: 3.23 / MAX: 82.07 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: SqueezeNetV1.0 ml-benchmark2-12-25-23 0.9905 1.981 2.9715 3.962 4.9525 SE +/- 0.049, N = 3 4.402 MIN: 3.93 / MAX: 80.86 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: resnet-v2-50 ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.21, N = 3 11.36 MIN: 10.13 / MAX: 74.03 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenetV3 ml-benchmark2-12-25-23 0.4091 0.8182 1.2273 1.6364 2.0455 SE +/- 0.014, N = 3 1.818 MIN: 1.54 / MAX: 31.92 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: nasnet OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: nasnet ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.11, N = 3 11.96 MIN: 10.36 / MAX: 117.89 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 ml-benchmark2-12-25-23 130K 260K 390K 520K 650K SE +/- 1421.72, N = 3 626915 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 ml-benchmark2-12-25-23 30K 60K 90K 120K 150K SE +/- 239.03, N = 3 124421 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 ml-benchmark2-12-25-23 14K 28K 42K 56K 70K SE +/- 328.59, N = 3 63606 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 ml-benchmark2-12-25-23 50K 100K 150K 200K 250K SE +/- 481.92, N = 3 243445 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 ml-benchmark2-12-25-23 11K 22K 33K 44K 55K SE +/- 122.11, N = 3 49227 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 ml-benchmark2-12-25-23 5K 10K 15K 20K 25K SE +/- 159.49, N = 3 24893 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm -llmdb
spaCy Model: en_core_web_trf OpenBenchmarking.org tokens/sec, More Is Better spaCy 3.4.1 Model: en_core_web_trf ml-benchmark2-12-25-23 500 1000 1500 2000 2500 SE +/- 24.52, N = 3 2354
spaCy Model: en_core_web_lg OpenBenchmarking.org tokens/sec, More Is Better spaCy 3.4.1 Model: en_core_web_lg ml-benchmark2-12-25-23 4K 8K 12K 16K 20K SE +/- 113.01, N = 3 19452
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 13 26 39 52 65 SE +/- 0.08, N = 3 56.81
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.03, N = 3 17.60
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 80 160 240 320 400 SE +/- 0.71, N = 3 356.97
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 5 10 15 20 25 SE +/- 0.05, N = 3 22.38
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.0082, N = 3 9.9862
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.08, N = 3 100.11
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 5 10 15 20 25 SE +/- 0.03, N = 3 18.66
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 90 180 270 360 450 SE +/- 0.62, N = 3 428.23
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 9 18 27 36 45 SE +/- 0.02, N = 3 37.78
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 6 12 18 24 30 SE +/- 0.02, N = 3 26.46
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 50 100 150 200 250 SE +/- 1.03, N = 3 228.46
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 8 16 24 32 40 SE +/- 0.16, N = 3 34.93
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.0104, N = 3 9.1055
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.13, N = 3 109.77
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.06, N = 3 42.70
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 40 80 120 160 200 SE +/- 0.27, N = 3 187.27
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.01, N = 3 10.50
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.07, N = 3 95.22
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 14 28 42 56 70 SE +/- 0.07, N = 3 62.42
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.13, N = 3 128.09
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 1.1706 2.3412 3.5118 4.6824 5.853 SE +/- 0.0097, N = 3 5.2028
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 40 80 120 160 200 SE +/- 0.36, N = 3 192.13
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 7 14 21 28 35 SE +/- 0.03, N = 3 29.59
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 60 120 180 240 300 SE +/- 0.30, N = 3 270.26
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 13 26 39 52 65 SE +/- 0.04, N = 3 56.81
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.01, N = 3 17.60
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 60 120 180 240 300 SE +/- 0.25, N = 3 289.50
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 7 14 21 28 35 SE +/- 0.02, N = 3 27.63
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.02, N = 3 10.62
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.13, N = 3 94.15
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 15 30 45 60 75 SE +/- 0.17, N = 3 66.46
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.30, N = 3 120.31
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 0.1915 0.383 0.5745 0.766 0.9575 SE +/- 0.0017, N = 3 0.8513
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 300 600 900 1200 1500 SE +/- 2.24, N = 3 1172.15
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 0.6861 1.3722 2.0583 2.7444 3.4305 SE +/- 0.0077, N = 3 3.0495
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 600 1200 1800 2400 3000 SE +/- 6.37, N = 3 2614.24
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 1.1712 2.3424 3.5136 4.6848 5.856 SE +/- 0.0090, N = 3 5.2053
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 40 80 120 160 200 SE +/- 0.33, N = 3 192.04
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 7 14 21 28 35 SE +/- 0.01, N = 3 29.60
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 60 120 180 240 300 SE +/- 0.07, N = 3 270.15
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 0.8309 1.6618 2.4927 3.3236 4.1545 SE +/- 0.0023, N = 3 3.6930
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 60 120 180 240 300 SE +/- 0.17, N = 3 270.67
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 2 4 6 8 10 SE +/- 0.0005, N = 3 8.5518
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 200 400 600 800 1000 SE +/- 0.06, N = 3 934.08
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 13 26 39 52 65 SE +/- 0.03, N = 3 56.96
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.01, N = 3 17.55
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 80 160 240 320 400 SE +/- 0.63, N = 3 361.49
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream ml-benchmark2-12-25-23 5 10 15 20 25 SE +/- 0.03, N = 3 22.07
TensorFlow Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: ResNet-50 ml-benchmark2-12-25-23 8 16 24 32 40 SE +/- 0.00, N = 3 34.09
TensorFlow Device: CPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: GoogLeNet ml-benchmark2-12-25-23 20 40 60 80 100 SE +/- 0.03, N = 3 110.88
TensorFlow Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: ResNet-50 ml-benchmark2-12-25-23 8 16 24 32 40 SE +/- 0.00, N = 3 34.24
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: GoogLeNet ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.03, N = 3 112.12
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: ResNet-50 ml-benchmark2-12-25-23 8 16 24 32 40 SE +/- 0.01, N = 3 35.45
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: GoogLeNet ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.09, N = 3 120.40
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: ResNet-50 ml-benchmark2-12-25-23 8 16 24 32 40 SE +/- 0.02, N = 3 36.10
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: GoogLeNet ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.10, N = 3 127.29
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 ml-benchmark2-12-25-23 8 16 24 32 40 SE +/- 0.04, N = 3 35.90
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.16, N = 3 127.79
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: AlexNet ml-benchmark2-12-25-23 90 180 270 360 450 SE +/- 0.13, N = 3 400.77
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: AlexNet ml-benchmark2-12-25-23 80 160 240 320 400 SE +/- 0.26, N = 3 386.97
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: AlexNet ml-benchmark2-12-25-23 60 120 180 240 300 SE +/- 0.16, N = 3 297.39
TensorFlow Device: CPU - Batch Size: 512 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: VGG-16 ml-benchmark2-12-25-23 5 10 15 20 25 SE +/- 0.01, N = 3 19.13
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: AlexNet ml-benchmark2-12-25-23 50 100 150 200 250 SE +/- 0.12, N = 3 215.96
TensorFlow Device: CPU - Batch Size: 256 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: VGG-16 ml-benchmark2-12-25-23 5 10 15 20 25 SE +/- 0.01, N = 3 19.00
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet ml-benchmark2-12-25-23 30 60 90 120 150 SE +/- 0.22, N = 3 140.39
TensorFlow Device: CPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: VGG-16 ml-benchmark2-12-25-23 5 10 15 20 25 SE +/- 0.01, N = 3 18.48
TensorFlow Device: CPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: VGG-16 ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.00, N = 3 17.71
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.00, N = 3 16.48
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.04, N = 3 10.42 MIN: 8.29 / MAX: 11.15
PyTorch Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.11, N = 4 10.47 MIN: 8.19 / MAX: 11.25
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.14, N = 3 10.43 MIN: 8.26 / MAX: 11.26
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.11, N = 3 10.45 MIN: 8.22 / MAX: 11.22
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.07, N = 3 10.29 MIN: 8.22 / MAX: 11.02
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.08, N = 3 13.70 MIN: 10.96 / MAX: 14.13
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.01, N = 3 17.89 MIN: 13.88 / MAX: 18.4
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.13, N = 3 17.71 MIN: 14.07 / MAX: 18.34
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.07, N = 3 17.98 MIN: 13.29 / MAX: 18.54
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.01, N = 3 44.93 MIN: 34.06 / MAX: 46.17
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.08, N = 3 17.78 MIN: 13.52 / MAX: 18.49
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.52, N = 3 45.77 MIN: 35.6 / MAX: 47.88
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.24, N = 3 17.88 MIN: 12.93 / MAX: 18.62
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.42, N = 3 45.17 MIN: 35.7 / MAX: 47.3
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.19, N = 3 45.71 MIN: 38.99 / MAX: 47.61
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 ml-benchmark2-12-25-23 10 20 30 40 50 SE +/- 0.34, N = 3 45.48 MIN: 33.75 / MAX: 47.46
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 ml-benchmark2-12-25-23 6 12 18 24 30 SE +/- 0.25, N = 3 26.53 MIN: 20.93 / MAX: 27.56
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 ml-benchmark2-12-25-23 15 30 45 60 75 SE +/- 0.22, N = 3 67.87 MIN: 53.01 / MAX: 70.77
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception ResNet V2 ml-benchmark2-12-25-23 5K 10K 15K 20K 25K SE +/- 111.48, N = 3 22832.8
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Quant ml-benchmark2-12-25-23 400 800 1200 1600 2000 SE +/- 11.21, N = 3 1985.59
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Float ml-benchmark2-12-25-23 300 600 900 1200 1500 SE +/- 2.11, N = 3 1293.76
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: NASNet Mobile ml-benchmark2-12-25-23 2K 4K 6K 8K 10K SE +/- 51.78, N = 3 10670.2
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception V4 ml-benchmark2-12-25-23 5K 10K 15K 20K 25K SE +/- 52.61, N = 3 22388.9
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: SqueezeNet ml-benchmark2-12-25-23 400 800 1200 1600 2000 SE +/- 7.51, N = 3 1798.07
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 ml-benchmark2-12-25-23 4 8 12 16 20 SE +/- 0.04, N = 3 14.60 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark ml-benchmark2-12-25-23 0.0255 0.051 0.0765 0.102 0.1275 SE +/- 0.0003, N = 3 0.1133 1. R scripting front-end version 4.1.2 (2021-11-01)
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU ml-benchmark2-12-25-23 12 24 36 48 60 SE +/- 0.73, N = 3 52.49
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark ml-benchmark2-12-25-23 160 320 480 640 800 SE +/- 3.15, N = 3 721.25
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ml-benchmark2-12-25-23 150 300 450 600 750 SE +/- 6.22, N = 3 702.32 MIN: 612.77 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU ml-benchmark2-12-25-23 300 600 900 1200 1500 SE +/- 12.14, N = 3 1343.53 MIN: 1191.31 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU ml-benchmark2-12-25-23 150 300 450 600 750 SE +/- 4.11, N = 3 704.86 MIN: 611.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU ml-benchmark2-12-25-23 0.3632 0.7264 1.0896 1.4528 1.816 SE +/- 0.01893, N = 3 1.61432 MIN: 1.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU ml-benchmark2-12-25-23 0.587 1.174 1.761 2.348 2.935 SE +/- 0.02456, N = 3 2.60894 MIN: 2.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU ml-benchmark2-12-25-23 0.3366 0.6732 1.0098 1.3464 1.683 SE +/- 0.01461, N = 15 1.49610 MIN: 1.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU ml-benchmark2-12-25-23 300 600 900 1200 1500 SE +/- 3.41, N = 3 1367.75 MIN: 1186.82 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ml-benchmark2-12-25-23 150 300 450 600 750 SE +/- 10.07, N = 3 707.84 MIN: 607.92 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU ml-benchmark2-12-25-23 300 600 900 1200 1500 SE +/- 16.16, N = 4 1358.07 MIN: 1187.15 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU ml-benchmark2-12-25-23 0.1647 0.3294 0.4941 0.6588 0.8235 SE +/- 0.007344, N = 15 0.731894 MIN: 0.62 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU ml-benchmark2-12-25-23 0.1138 0.2276 0.3414 0.4552 0.569 SE +/- 0.007201, N = 3 0.505580 MIN: 0.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU ml-benchmark2-12-25-23 1.1446 2.2892 3.4338 4.5784 5.723 SE +/- 0.06167, N = 4 5.08690 MIN: 4.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU ml-benchmark2-12-25-23 0.6499 1.2998 1.9497 2.5996 3.2495 SE +/- 0.03209, N = 15 2.88833 MIN: 2.52 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU ml-benchmark2-12-25-23 0.7727 1.5454 2.3181 3.0908 3.8635 SE +/- 0.03067, N = 15 3.43419 MIN: 2.56 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ml-benchmark2-12-25-23 1.1879 2.3758 3.5637 4.7516 5.9395 SE +/- 0.04391, N = 3 5.27943 MIN: 4.76 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU ml-benchmark2-12-25-23 0.3359 0.6718 1.0077 1.3436 1.6795 SE +/- 0.01642, N = 3 1.49285 MIN: 1.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU ml-benchmark2-12-25-23 0.1737 0.3474 0.5211 0.6948 0.8685 SE +/- 0.008040, N = 3 0.772163 MIN: 0.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU ml-benchmark2-12-25-23 0.8582 1.7164 2.5746 3.4328 4.291 SE +/- 0.05089, N = 15 3.81408 MIN: 3.26 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU ml-benchmark2-12-25-23 0.4796 0.9592 1.4388 1.9184 2.398 SE +/- 0.02341, N = 3 2.13148 MIN: 1.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.30 Backend: BLAS ml-benchmark2-12-25-23 40 80 120 160 200 SE +/- 2.31, N = 3 175 1. (CXX) g++ options: -flto -pthread
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.7 Test: DNN - Deep Neural Network ml-benchmark2-12-25-23 7K 14K 21K 28K 35K SE +/- 829.30, N = 15 31437 1. (CXX) g++ options: -fPIC -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -shared
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet ml-benchmark2-12-25-23 1.071 2.142 3.213 4.284 5.355 SE +/- 0.28, N = 12 4.76 MIN: 2.77 / MAX: 309.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd ml-benchmark2-12-25-23 2 4 6 8 10 SE +/- 0.17, N = 12 8.79 MIN: 6.89 / MAX: 385.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet ml-benchmark2-12-25-23 1.17 2.34 3.51 4.68 5.85 SE +/- 0.21, N = 12 5.20 MIN: 4.2 / MAX: 274.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 ml-benchmark2-12-25-23 2 4 6 8 10 SE +/- 0.19, N = 12 6.64 MIN: 5.3 / MAX: 394.05 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.23, N = 12 10.69 MIN: 8.16 / MAX: 460.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface ml-benchmark2-12-25-23 0.36 0.72 1.08 1.44 1.8 SE +/- 0.03, N = 12 1.60 MIN: 1.29 / MAX: 9.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 ml-benchmark2-12-25-23 1.251 2.502 3.753 5.004 6.255 SE +/- 0.21, N = 12 5.56 MIN: 4.32 / MAX: 601.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet ml-benchmark2-12-25-23 0.8933 1.7866 2.6799 3.5732 4.4665 SE +/- 0.14, N = 12 3.97 MIN: 3.13 / MAX: 314.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 ml-benchmark2-12-25-23 0.981 1.962 2.943 3.924 4.905 SE +/- 0.17, N = 12 4.36 MIN: 3.67 / MAX: 375.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 ml-benchmark2-12-25-23 0.99 1.98 2.97 3.96 4.95 SE +/- 0.18, N = 12 4.40 MIN: 3.22 / MAX: 352.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 ml-benchmark2-12-25-23 0.9653 1.9306 2.8959 3.8612 4.8265 SE +/- 0.13, N = 12 4.29 MIN: 3.31 / MAX: 320.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.26, N = 12 10.67 MIN: 8.56 / MAX: 353.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet ml-benchmark2-12-25-23 1.143 2.286 3.429 4.572 5.715 SE +/- 0.22, N = 15 5.08 MIN: 3 / MAX: 221.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd ml-benchmark2-12-25-23 2 4 6 8 10 SE +/- 0.14, N = 15 8.75 MIN: 6.94 / MAX: 418.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet ml-benchmark2-12-25-23 1.1453 2.2906 3.4359 4.5812 5.7265 SE +/- 0.15, N = 15 5.09 MIN: 4.18 / MAX: 317.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 ml-benchmark2-12-25-23 2 4 6 8 10 SE +/- 0.17, N = 15 6.70 MIN: 5.33 / MAX: 359.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface ml-benchmark2-12-25-23 0.3825 0.765 1.1475 1.53 1.9125 SE +/- 0.09, N = 15 1.70 MIN: 1.16 / MAX: 246.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 ml-benchmark2-12-25-23 1.1925 2.385 3.5775 4.77 5.9625 SE +/- 0.13, N = 15 5.30 MIN: 4.07 / MAX: 351.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet ml-benchmark2-12-25-23 0.8685 1.737 2.6055 3.474 4.3425 SE +/- 0.13, N = 15 3.86 MIN: 3.06 / MAX: 334.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 ml-benchmark2-12-25-23 0.9945 1.989 2.9835 3.978 4.9725 SE +/- 0.15, N = 15 4.42 MIN: 3.7 / MAX: 373.85 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 ml-benchmark2-12-25-23 0.9563 1.9126 2.8689 3.8252 4.7815 SE +/- 0.14, N = 15 4.25 MIN: 3.54 / MAX: 358.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 ml-benchmark2-12-25-23 1.017 2.034 3.051 4.068 5.085 SE +/- 0.15, N = 15 4.52 MIN: 3.49 / MAX: 453.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet ml-benchmark2-12-25-23 3 6 9 12 15 SE +/- 0.22, N = 15 10.64 MIN: 8.78 / MAX: 504.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: squeezenetv1.1 ml-benchmark2-12-25-23 0.6233 1.2466 1.8699 2.4932 3.1165 SE +/- 0.098, N = 3 2.770 MIN: 2.36 / MAX: 51.44 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU ml-benchmark2-12-25-23 0.0732 0.1464 0.2196 0.2928 0.366 SE +/- 0.008133, N = 12 0.325432 MIN: 0.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU ml-benchmark2-12-25-23 0.1221 0.2442 0.3663 0.4884 0.6105 SE +/- 0.014251, N = 15 0.542833 MIN: 0.37 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Phoronix Test Suite v10.8.4