24.03.13.Pop.2204.ML.test1 AMD Ryzen 9 7950X 16-Core testing with a ASUS ProArt X670E-CREATOR WIFI (1710 BIOS) and Zotac NVIDIA GeForce RTX 4070 Ti 12GB on Pop 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2403157-NE-240313POP28&grr .
24.03.13.Pop.2204.ML.test1 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution Initial test 1 No water cool AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ProArt X670E-CREATOR WIFI (1710 BIOS) AMD Device 14d8 2 x 16 GB DDR5-4800MT/s G Skill F5-6000J3636F16G 1000GB PNY CS2130 1TB SSD Zotac NVIDIA GeForce RTX 4070 Ti 12GB NVIDIA Device 22bc 2 x DELL 2001FP Intel I225-V + Aquantia AQtion AQC113CS NBase-T/IEEE + MEDIATEK MT7922 802.11ax PCI Pop 22.04 6.6.10-76060610-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 NVIDIA 550.54.14 4.6.0 OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 ext4 3200x1200 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601206 - GLAMOR - BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.3b - GPU Compute Cores: 7680 - Python 3.10.12 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
24.03.13.Pop.2204.ML.test1 tensorflow: GPU - 256 - VGG-16 tensorflow: GPU - 256 - ResNet-50 tensorflow: GPU - 64 - VGG-16 tensorflow: GPU - 512 - GoogLeNet tensorflow: GPU - 32 - VGG-16 tensorflow: GPU - 256 - GoogLeNet shoc: OpenCL - Max SP Flops tensorflow: GPU - 512 - AlexNet tensorflow: CPU - 256 - VGG-16 tensorflow: GPU - 64 - ResNet-50 tensorflow: GPU - 16 - VGG-16 tensorflow: GPU - 256 - AlexNet tensorflow: CPU - 256 - ResNet-50 tensorflow: GPU - 32 - ResNet-50 tensorflow: CPU - 512 - GoogLeNet tensorflow: GPU - 64 - GoogLeNet tensorflow: CPU - 64 - VGG-16 tensorflow: GPU - 16 - ResNet-50 numenta-nab: KNN CAD ai-benchmark: Device AI Score ai-benchmark: Device Training Score ai-benchmark: Device Inference Score tensorflow: CPU - 256 - GoogLeNet tensorflow: GPU - 32 - GoogLeNet tensorflow: CPU - 32 - VGG-16 tensorflow: GPU - 64 - AlexNet tensorflow: CPU - 64 - ResNet-50 pytorch: CPU - 512 - Efficientnet_v2_l pytorch: CPU - 16 - Efficientnet_v2_l pytorch: CPU - 32 - Efficientnet_v2_l pytorch: CPU - 256 - Efficientnet_v2_l pytorch: CPU - 64 - Efficientnet_v2_l opencv: DNN - Deep Neural Network openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU tensorflow: CPU - 512 - AlexNet tnn: CPU - DenseNet mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: squeezenetv1.1 mnn: mobilenetV3 mnn: nasnet tensorflow: GPU - 16 - GoogLeNet numpy: tensorflow: CPU - 16 - VGG-16 pytorch: CPU - 256 - ResNet-152 pytorch: CPU - 64 - ResNet-152 pytorch: CPU - 512 - ResNet-152 pytorch: CPU - 32 - ResNet-152 pytorch: CPU - 16 - ResNet-152 tensorflow: GPU - 32 - AlexNet tensorflow: CPU - 32 - ResNet-50 pytorch: CPU - 1 - Efficientnet_v2_l tensorflow: GPU - 1 - VGG-16 onednn: Recurrent Neural Network Training - CPU onednn: Recurrent Neural Network Inference - CPU tensorflow: CPU - 256 - AlexNet deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream mlpack: scikit_qda openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU ncnn: CPU - FastestDet ncnn: CPU - vision_transformer ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet tensorflow: CPU - 64 - GoogLeNet ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU tensorflow-lite: Inception V4 openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Noise Suppression Poconet-Like FP16 - CPU tensorflow-lite: Inception ResNet V2 openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Float tensorflow-lite: SqueezeNet openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU tensorflow-lite: Mobilenet Quant openvino: Road Segmentation ADAS FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU tensorflow: GPU - 16 - AlexNet numenta-nab: Earthgecko Skyline tensorflow: CPU - 16 - ResNet-50 onednn: IP Shapes 1D - CPU pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 512 - ResNet-50 pytorch: CPU - 256 - ResNet-50 pytorch: CPU - 64 - ResNet-50 pytorch: CPU - 32 - ResNet-50 pytorch: CPU - 16 - ResNet-50 deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream spacy: en_core_web_trf spacy: en_core_web_lg deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepspeech: CPU deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream onednn: Deconvolution Batch shapes_1d - CPU pytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_l pytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_l pytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_l pytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_l pytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_l mlpack: scikit_ica tensorflow: CPU - 32 - GoogLeNet mlpack: scikit_linearridgeregression tensorflow: GPU - 1 - ResNet-50 numenta-nab: Contextual Anomaly Detector OSE tensorflow: CPU - 1 - VGG-16 numenta-nab: Windowed Gaussian tensorflow: CPU - 64 - AlexNet pytorch: CPU - 1 - ResNet-50 pytorch: NVIDIA CUDA GPU - 256 - ResNet-152 pytorch: NVIDIA CUDA GPU - 16 - ResNet-152 pytorch: NVIDIA CUDA GPU - 32 - ResNet-152 pytorch: NVIDIA CUDA GPU - 64 - ResNet-152 pytorch: NVIDIA CUDA GPU - 512 - ResNet-152 pytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_l mlpack: scikit_svm tensorflow: CPU - 32 - AlexNet tensorflow: CPU - 16 - GoogLeNet rnnoise: tensorflow: CPU - 16 - AlexNet numenta-nab: Bayesian Changepoint tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v1.1 tensorflow: CPU - 1 - ResNet-50 tensorflow: GPU - 1 - GoogLeNet numenta-nab: Relative Entropy pytorch: NVIDIA CUDA GPU - 1 - ResNet-152 tensorflow: GPU - 1 - AlexNet tensorflow: CPU - 1 - AlexNet onednn: IP Shapes 3D - CPU pytorch: NVIDIA CUDA GPU - 256 - ResNet-50 pytorch: NVIDIA CUDA GPU - 64 - ResNet-50 pytorch: NVIDIA CUDA GPU - 16 - ResNet-50 pytorch: NVIDIA CUDA GPU - 32 - ResNet-50 pytorch: NVIDIA CUDA GPU - 512 - ResNet-50 shoc: OpenCL - Texture Read Bandwidth onednn: Convolution Batch Shapes Auto - CPU shoc: OpenCL - GEMM SGEMM_N pytorch: NVIDIA CUDA GPU - 1 - ResNet-50 whisper-cpp: ggml-medium.en - 2016 State of the Union tensorflow: CPU - 1 - GoogLeNet shoc: OpenCL - Bus Speed Readback onednn: Deconvolution Batch shapes_3d - CPU tnn: CPU - SqueezeNet v2 shoc: OpenCL - S3D whisper-cpp: ggml-small.en - 2016 State of the Union shoc: OpenCL - FFT SP shoc: OpenCL - Triad shoc: OpenCL - Reduction shoc: OpenCL - Bus Speed Download whisper-cpp: ggml-base.en - 2016 State of the Union shoc: OpenCL - MD5 Hash lczero: BLAS Initial test 1 No water cool 1.77 5.56 1.73 15.90 1.72 15.76 43074.9 35.93 18.12 5.51 1.70 35.82 36.15 5.49 115.70 15.61 17.44 5.42 105.001 6473 3573 2900 116.33 15.45 16.89 34.84 36.36 10.44 10.46 10.59 10.63 10.58 30277 625.66 12.75 392.16 2005.606 23.421 2.456 3.410 4.141 12.123 2.542 1.638 11.300 15.10 704.52 16.09 17.69 17.66 17.59 17.64 17.66 33.39 36.74 14.14 1.46 1452.96 747.499 388.40 303.3330 26.3345 54.0028 18.5143 34.07 323.12 24.71 4.69 37.92 9.87 8.49 16.28 9.80 13.48 5.52 6.62 32.60 9.56 1.60 4.50 3.46 3.93 3.69 3.65 9.80 119.04 4.30 38.12 9.83 8.38 15.84 9.36 13.16 5.67 6.65 32.31 9.69 1.62 4.57 3.45 3.90 3.72 3.69 9.36 65.52 121.96 104.27 76.65 103.09 77.54 21139.4 11.33 1386.93 21857.0 17.81 448.28 10099.3 1214.11 1716.04 5.53 1442.07 1861.53 29.32 272.36 4.46 1785.48 21.88 729.99 5.18 1538.27 3.61 4335.66 23.94 667.29 0.31 46025.94 0.45 32402.42 12.91 618.41 12.61 1266.85 2.53 3062.63 6.44 2470.86 30.67 55.566 36.43 1.17351 25.64 42.91 43.49 43.35 44.08 44.08 397.5797 20.0852 3.5898 278.3122 2415 18557 8.9714 890.1894 400.3931 19.9101 19.1612 417.1723 10.1862 98.0758 57.6038 17.3572 57.8304 17.2893 236.3144 33.8095 43.7801 182.6225 36.3089 27.5303 10.8023 92.5135 10.3590 96.4692 69.4409 115.1124 30.2415 264.3773 30.1491 265.2018 72.0412 110.9914 11.0556 90.3558 5.7601 173.3786 3.9240 2031.8775 5.7436 173.8986 47.03514 0.8214 1214.2399 3.06179 69.80 69.97 70.52 70.63 70.81 30.12 122.39 1.03 4.25 25.400 4.74 4.984 305.81 64.81 138.62 138.78 139.41 138.72 140.41 71.98 15.12 224.56 125.83 13.707 148.72 13.213 183.277 179.679 12.70 12.36 8.281 137.39 12.58 13.00 4.42170 380.34 379.98 380.67 380.74 383.56 2985.70 7.16631 13212.0 387.06 0.86615 47.21 27.0723 2.56519 42.215 299.468 0.34881 1292.53 25.4632 388.934 26.8275 0.15350 47.8988 OpenBenchmarking.org
TensorFlow Device: GPU - Batch Size: 256 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 256 - Model: VGG-16 Initial test 1 No water cool 0.3983 0.7966 1.1949 1.5932 1.9915 SE +/- 0.00, N = 3 1.77
TensorFlow Device: GPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 256 - Model: ResNet-50 Initial test 1 No water cool 1.251 2.502 3.753 5.004 6.255 SE +/- 0.02, N = 3 5.56
TensorFlow Device: GPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 64 - Model: VGG-16 Initial test 1 No water cool 0.3893 0.7786 1.1679 1.5572 1.9465 SE +/- 0.01, N = 3 1.73
TensorFlow Device: GPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 512 - Model: GoogLeNet Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.02, N = 3 15.90
TensorFlow Device: GPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 32 - Model: VGG-16 Initial test 1 No water cool 0.387 0.774 1.161 1.548 1.935 SE +/- 0.01, N = 3 1.72
TensorFlow Device: GPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 256 - Model: GoogLeNet Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.04, N = 3 15.76
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops Initial test 1 No water cool 9K 18K 27K 36K 45K SE +/- 97.97, N = 3 43074.9 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
TensorFlow Device: GPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 512 - Model: AlexNet Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.09, N = 3 35.93
TensorFlow Device: CPU - Batch Size: 256 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: VGG-16 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.01, N = 3 18.12
TensorFlow Device: GPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 64 - Model: ResNet-50 Initial test 1 No water cool 1.2398 2.4796 3.7194 4.9592 6.199 SE +/- 0.01, N = 3 5.51
TensorFlow Device: GPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 16 - Model: VGG-16 Initial test 1 No water cool 0.3825 0.765 1.1475 1.53 1.9125 SE +/- 0.00, N = 3 1.70
TensorFlow Device: GPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 256 - Model: AlexNet Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.10, N = 3 35.82
TensorFlow Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: ResNet-50 Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.00, N = 3 36.15
TensorFlow Device: GPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 32 - Model: ResNet-50 Initial test 1 No water cool 1.2353 2.4706 3.7059 4.9412 6.1765 SE +/- 0.04, N = 3 5.49
TensorFlow Device: CPU - Batch Size: 512 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: GoogLeNet Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.08, N = 3 115.70
TensorFlow Device: GPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 64 - Model: GoogLeNet Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.04, N = 3 15.61
TensorFlow Device: CPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: VGG-16 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.07, N = 3 17.44
TensorFlow Device: GPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 16 - Model: ResNet-50 Initial test 1 No water cool 1.2195 2.439 3.6585 4.878 6.0975 SE +/- 0.01, N = 3 5.42
Numenta Anomaly Benchmark Detector: KNN CAD OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: KNN CAD Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.86, N = 9 105.00
AI Benchmark Alpha Device AI Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device AI Score Initial test 1 No water cool 1400 2800 4200 5600 7000 6473
AI Benchmark Alpha Device Training Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Training Score Initial test 1 No water cool 800 1600 2400 3200 4000 3573
AI Benchmark Alpha Device Inference Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Inference Score Initial test 1 No water cool 600 1200 1800 2400 3000 2900
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: GoogLeNet Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.41, N = 3 116.33
TensorFlow Device: GPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 32 - Model: GoogLeNet Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.03, N = 3 15.45
TensorFlow Device: CPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: VGG-16 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.03, N = 3 16.89
TensorFlow Device: GPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 64 - Model: AlexNet Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.12, N = 3 34.84
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: ResNet-50 Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.02, N = 3 36.36
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.10, N = 3 10.44 MIN: 8.67 / MAX: 11.33
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.07, N = 3 10.46 MIN: 8.62 / MAX: 11.22
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.06, N = 3 10.59 MIN: 8.62 / MAX: 11.44
PyTorch Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.08, N = 3 10.63 MIN: 8.45 / MAX: 11.32
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.05, N = 3 10.58 MIN: 8.79 / MAX: 11.4
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.7 Test: DNN - Deep Neural Network Initial test 1 No water cool 6K 12K 18K 24K 30K SE +/- 360.44, N = 15 30277 1. (CXX) g++ options: -fPIC -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -shared
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection FP16 - Device: CPU Initial test 1 No water cool 140 280 420 560 700 SE +/- 5.40, N = 7 625.66 MIN: 442.72 / MAX: 668.18 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection FP16 - Device: CPU Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.12, N = 7 12.75 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: AlexNet Initial test 1 No water cool 90 180 270 360 450 SE +/- 2.15, N = 3 392.16
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet Initial test 1 No water cool 400 800 1200 1600 2000 SE +/- 7.45, N = 3 2005.61 MIN: 1929.36 / MAX: 2112.89 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: inception-v3 Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.58, N = 3 23.42 MIN: 20.69 / MAX: 54.3 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenet-v1-1.0 Initial test 1 No water cool 0.5526 1.1052 1.6578 2.2104 2.763 SE +/- 0.038, N = 3 2.456 MIN: 2.27 / MAX: 6.46 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: MobileNetV2_224 Initial test 1 No water cool 0.7673 1.5346 2.3019 3.0692 3.8365 SE +/- 0.049, N = 3 3.410 MIN: 3.18 / MAX: 12.53 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: SqueezeNetV1.0 Initial test 1 No water cool 0.9317 1.8634 2.7951 3.7268 4.6585 SE +/- 0.111, N = 3 4.141 MIN: 3.72 / MAX: 10.49 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: resnet-v2-50 Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.04, N = 3 12.12 MIN: 11.31 / MAX: 29.79 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: squeezenetv1.1 Initial test 1 No water cool 0.572 1.144 1.716 2.288 2.86 SE +/- 0.046, N = 3 2.542 MIN: 2.3 / MAX: 9.08 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenetV3 Initial test 1 No water cool 0.3686 0.7372 1.1058 1.4744 1.843 SE +/- 0.026, N = 3 1.638 MIN: 1.47 / MAX: 4.85 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: nasnet OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: nasnet Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.10, N = 3 11.30 MIN: 10.49 / MAX: 27.45 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
TensorFlow Device: GPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 16 - Model: GoogLeNet Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.05, N = 3 15.10
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark Initial test 1 No water cool 150 300 450 600 750 SE +/- 7.50, N = 3 704.52
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.12, N = 3 16.09
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.04, N = 3 17.69 MIN: 14.61 / MAX: 18.11
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.23, N = 3 17.66 MIN: 14.14 / MAX: 18.38
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.10, N = 3 17.59 MIN: 16.92 / MAX: 18.27
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.10, N = 3 17.64 MIN: 14.95 / MAX: 18.12
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.11, N = 3 17.66 MIN: 17.14 / MAX: 18.3
TensorFlow Device: GPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 32 - Model: AlexNet Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.12, N = 3 33.39
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: ResNet-50 Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.05, N = 3 36.74
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.09, N = 3 14.14 MIN: 12.35 / MAX: 14.45
TensorFlow Device: GPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 1 - Model: VGG-16 Initial test 1 No water cool 0.3285 0.657 0.9855 1.314 1.6425 SE +/- 0.00, N = 3 1.46
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Recurrent Neural Network Training - Engine: CPU Initial test 1 No water cool 300 600 900 1200 1500 SE +/- 7.60, N = 3 1452.96 MIN: 1391.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Recurrent Neural Network Inference - Engine: CPU Initial test 1 No water cool 160 320 480 640 800 SE +/- 1.15, N = 3 747.50 MIN: 714.81 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: AlexNet Initial test 1 No water cool 80 160 240 320 400 SE +/- 3.29, N = 3 388.40
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 70 140 210 280 350 SE +/- 0.41, N = 3 303.33
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.04, N = 3 26.33
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream Initial test 1 No water cool 12 24 36 48 60 SE +/- 0.12, N = 3 54.00
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream Initial test 1 No water cool 5 10 15 20 25 SE +/- 0.04, N = 3 18.51
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.25, N = 3 34.07
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU Initial test 1 No water cool 70 140 210 280 350 SE +/- 0.74, N = 3 323.12 MIN: 174.68 / MAX: 361.95 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.06, N = 3 24.71 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet Initial test 1 No water cool 1.0553 2.1106 3.1659 4.2212 5.2765 SE +/- 0.11, N = 3 4.69 MIN: 4.25 / MAX: 7.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer Initial test 1 No water cool 9 18 27 36 45 SE +/- 0.43, N = 3 37.92 MIN: 34.53 / MAX: 51.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.06, N = 3 9.87 MIN: 9.16 / MAX: 15.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd Initial test 1 No water cool 2 4 6 8 10 SE +/- 0.16, N = 3 8.49 MIN: 7.41 / MAX: 14.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.33, N = 3 16.28 MIN: 14.44 / MAX: 34.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.08, N = 3 9.80 MIN: 8.81 / MAX: 23.96 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.34, N = 3 13.48 MIN: 11.93 / MAX: 33.06 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet Initial test 1 No water cool 1.242 2.484 3.726 4.968 6.21 SE +/- 0.01, N = 3 5.52 MIN: 5.06 / MAX: 10.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 Initial test 1 No water cool 2 4 6 8 10 SE +/- 0.03, N = 3 6.62 MIN: 5.93 / MAX: 11.83 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.24, N = 3 32.60 MIN: 30.13 / MAX: 77.85 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.02, N = 3 9.56 MIN: 8.74 / MAX: 18.1 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface Initial test 1 No water cool 0.36 0.72 1.08 1.44 1.8 SE +/- 0.01, N = 3 1.60 MIN: 1.48 / MAX: 6.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 Initial test 1 No water cool 1.0125 2.025 3.0375 4.05 5.0625 SE +/- 0.02, N = 3 4.50 MIN: 4.16 / MAX: 8.33 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet Initial test 1 No water cool 0.7785 1.557 2.3355 3.114 3.8925 SE +/- 0.01, N = 3 3.46 MIN: 3.18 / MAX: 18.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 Initial test 1 No water cool 0.8843 1.7686 2.6529 3.5372 4.4215 SE +/- 0.01, N = 3 3.93 MIN: 3.65 / MAX: 7.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 Initial test 1 No water cool 0.8303 1.6606 2.4909 3.3212 4.1515 SE +/- 0.03, N = 3 3.69 MIN: 3.42 / MAX: 8.19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 Initial test 1 No water cool 0.8213 1.6426 2.4639 3.2852 4.1065 SE +/- 0.02, N = 3 3.65 MIN: 3.37 / MAX: 7.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.08, N = 3 9.80 MIN: 8.81 / MAX: 23.96 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: GoogLeNet Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.13, N = 3 119.04
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet Initial test 1 No water cool 0.9675 1.935 2.9025 3.87 4.8375 SE +/- 0.11, N = 3 4.30 MIN: 3.94 / MAX: 7.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer Initial test 1 No water cool 9 18 27 36 45 SE +/- 0.62, N = 3 38.12 MIN: 34.66 / MAX: 105.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.05, N = 3 9.83 MIN: 9.06 / MAX: 26.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd Initial test 1 No water cool 2 4 6 8 10 SE +/- 0.04, N = 3 8.38 MIN: 7.63 / MAX: 22.04 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.04, N = 3 15.84 MIN: 14.55 / MAX: 30.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.03, N = 3 9.36 MIN: 8.66 / MAX: 15.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.06, N = 3 13.16 MIN: 11.9 / MAX: 19.89 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet Initial test 1 No water cool 1.2758 2.5516 3.8274 5.1032 6.379 SE +/- 0.16, N = 3 5.67 MIN: 5.06 / MAX: 11.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 Initial test 1 No water cool 2 4 6 8 10 SE +/- 0.03, N = 3 6.65 MIN: 5.9 / MAX: 23.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.15, N = 3 32.31 MIN: 29.89 / MAX: 88.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.08, N = 3 9.69 MIN: 8.75 / MAX: 15.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface Initial test 1 No water cool 0.3645 0.729 1.0935 1.458 1.8225 SE +/- 0.03, N = 3 1.62 MIN: 1.47 / MAX: 14.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 Initial test 1 No water cool 1.0283 2.0566 3.0849 4.1132 5.1415 SE +/- 0.06, N = 3 4.57 MIN: 4.16 / MAX: 9.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet Initial test 1 No water cool 0.7763 1.5526 2.3289 3.1052 3.8815 SE +/- 0.00, N = 3 3.45 MIN: 3.2 / MAX: 7.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 Initial test 1 No water cool 0.8775 1.755 2.6325 3.51 4.3875 SE +/- 0.02, N = 3 3.90 MIN: 3.65 / MAX: 8.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 Initial test 1 No water cool 0.837 1.674 2.511 3.348 4.185 SE +/- 0.03, N = 3 3.72 MIN: 3.43 / MAX: 15.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 Initial test 1 No water cool 0.8303 1.6606 2.4909 3.3212 4.1515 SE +/- 0.03, N = 3 3.69 MIN: 3.36 / MAX: 25.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.03, N = 3 9.36 MIN: 8.66 / MAX: 15.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU Initial test 1 No water cool 15 30 45 60 75 SE +/- 0.08, N = 3 65.52 MIN: 34.28 / MAX: 85.73 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.16, N = 3 121.96 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.46, N = 3 104.27 MIN: 66.39 / MAX: 133.39 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.34, N = 3 76.65 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Detection FP32 - Device: CPU Initial test 1 No water cool 20 40 60 80 100 SE +/- 1.01, N = 3 103.09 MIN: 50.34 / MAX: 135.14 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Detection FP32 - Device: CPU Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.77, N = 3 77.54 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception V4 Initial test 1 No water cool 5K 10K 15K 20K 25K SE +/- 19.27, N = 3 21139.4
OpenVINO Model: Noise Suppression Poconet-Like FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.01, N = 3 11.33 MIN: 6.71 / MAX: 21.43 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Noise Suppression Poconet-Like FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU Initial test 1 No water cool 300 600 900 1200 1500 SE +/- 1.13, N = 3 1386.93 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception ResNet V2 Initial test 1 No water cool 5K 10K 15K 20K 25K SE +/- 104.72, N = 3 21857.0
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.04, N = 3 17.81 MIN: 8.88 / MAX: 27.3 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU Initial test 1 No water cool 100 200 300 400 500 SE +/- 1.04, N = 3 448.28 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: NASNet Mobile Initial test 1 No water cool 2K 4K 6K 8K 10K SE +/- 20.39, N = 3 10099.3
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Float Initial test 1 No water cool 300 600 900 1200 1500 SE +/- 1.58, N = 3 1214.11
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: SqueezeNet Initial test 1 No water cool 400 800 1200 1600 2000 SE +/- 12.41, N = 3 1716.04
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU Initial test 1 No water cool 1.2443 2.4886 3.7329 4.9772 6.2215 SE +/- 0.01, N = 3 5.53 MIN: 3.88 / MAX: 14.27 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU Initial test 1 No water cool 300 600 900 1200 1500 SE +/- 1.93, N = 3 1442.07 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Quant Initial test 1 No water cool 400 800 1200 1600 2000 SE +/- 11.38, N = 3 1861.53
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16 - Device: CPU Initial test 1 No water cool 7 14 21 28 35 SE +/- 0.19, N = 3 29.32 MIN: 13.59 / MAX: 44.55 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Road Segmentation ADAS FP16 - Device: CPU Initial test 1 No water cool 60 120 180 240 300 SE +/- 1.80, N = 3 272.36 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Re-Identification Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU Initial test 1 No water cool 1.0035 2.007 3.0105 4.014 5.0175 SE +/- 0.01, N = 3 4.46 MIN: 2.58 / MAX: 13.94 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Re-Identification Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU Initial test 1 No water cool 400 800 1200 1600 2000 SE +/- 2.90, N = 3 1785.48 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU Initial test 1 No water cool 5 10 15 20 25 SE +/- 0.07, N = 3 21.88 MIN: 16.69 / MAX: 40.53 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU Initial test 1 No water cool 160 320 480 640 800 SE +/- 2.28, N = 3 729.99 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU Initial test 1 No water cool 1.1655 2.331 3.4965 4.662 5.8275 SE +/- 0.02, N = 3 5.18 MIN: 3.09 / MAX: 14.58 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU Initial test 1 No water cool 300 600 900 1200 1500 SE +/- 4.10, N = 3 1538.27 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU Initial test 1 No water cool 0.8123 1.6246 2.4369 3.2492 4.0615 SE +/- 0.01, N = 3 3.61 MIN: 1.95 / MAX: 11.37 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU Initial test 1 No water cool 900 1800 2700 3600 4500 SE +/- 18.97, N = 3 4335.66 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16 - Device: CPU Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.06, N = 3 23.94 MIN: 14.82 / MAX: 35.6 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Handwritten English Recognition FP16 - Device: CPU Initial test 1 No water cool 140 280 420 560 700 SE +/- 1.79, N = 3 667.29 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU Initial test 1 No water cool 0.0698 0.1396 0.2094 0.2792 0.349 SE +/- 0.00, N = 3 0.31 MIN: 0.16 / MAX: 8.12 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU Initial test 1 No water cool 10K 20K 30K 40K 50K SE +/- 95.25, N = 3 46025.94 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU Initial test 1 No water cool 0.1013 0.2026 0.3039 0.4052 0.5065 SE +/- 0.00, N = 3 0.45 MIN: 0.21 / MAX: 7.28 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU Initial test 1 No water cool 7K 14K 21K 28K 35K SE +/- 75.38, N = 3 32402.42 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16 - Device: CPU Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.04, N = 3 12.91 MIN: 5.88 / MAX: 26.23 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Vehicle Detection FP16 - Device: CPU Initial test 1 No water cool 130 260 390 520 650 SE +/- 1.73, N = 3 618.41 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16 - Device: CPU Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.03, N = 3 12.61 MIN: 6.44 / MAX: 25.71 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16 - Device: CPU Initial test 1 No water cool 300 600 900 1200 1500 SE +/- 3.30, N = 3 1266.85 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16 - Device: CPU Initial test 1 No water cool 0.5693 1.1386 1.7079 2.2772 2.8465 SE +/- 0.01, N = 3 2.53 MIN: 1.29 / MAX: 10.52 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Face Detection Retail FP16 - Device: CPU Initial test 1 No water cool 700 1400 2100 2800 3500 SE +/- 11.24, N = 3 3062.63 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU Initial test 1 No water cool 2 4 6 8 10 SE +/- 0.02, N = 3 6.44 MIN: 3.29 / MAX: 17.17 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU Initial test 1 No water cool 500 1000 1500 2000 2500 SE +/- 8.85, N = 3 2470.86 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Device: GPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 16 - Model: AlexNet Initial test 1 No water cool 7 14 21 28 35 SE +/- 0.21, N = 3 30.67
Numenta Anomaly Benchmark Detector: Earthgecko Skyline OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Earthgecko Skyline Initial test 1 No water cool 12 24 36 48 60 SE +/- 0.29, N = 3 55.57
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.07, N = 3 36.43
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: IP Shapes 1D - Engine: CPU Initial test 1 No water cool 0.264 0.528 0.792 1.056 1.32 SE +/- 0.00923, N = 10 1.17351 MIN: 1.01 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.33, N = 3 25.64 MIN: 23.55 / MAX: 26.92
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 Initial test 1 No water cool 10 20 30 40 50 SE +/- 0.37, N = 3 42.91 MIN: 37.63 / MAX: 44.8
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 Initial test 1 No water cool 10 20 30 40 50 SE +/- 0.47, N = 3 43.49 MIN: 37.04 / MAX: 45.45
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 Initial test 1 No water cool 10 20 30 40 50 SE +/- 0.35, N = 3 43.35 MIN: 38.96 / MAX: 45.4
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 Initial test 1 No water cool 10 20 30 40 50 SE +/- 0.26, N = 3 44.08 MIN: 41.27 / MAX: 45.58
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 Initial test 1 No water cool 10 20 30 40 50 SE +/- 0.15, N = 3 44.08 MIN: 38.79 / MAX: 45.8
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 90 180 270 360 450 SE +/- 2.37, N = 3 397.58
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 5 10 15 20 25 SE +/- 0.12, N = 3 20.09
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 0.8077 1.6154 2.4231 3.2308 4.0385 SE +/- 0.0014, N = 3 3.5898
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 60 120 180 240 300 SE +/- 0.11, N = 3 278.31
spaCy Model: en_core_web_trf OpenBenchmarking.org tokens/sec, More Is Better spaCy 3.4.1 Model: en_core_web_trf Initial test 1 No water cool 500 1000 1500 2000 2500 SE +/- 29.81, N = 3 2415
spaCy Model: en_core_web_lg OpenBenchmarking.org tokens/sec, More Is Better spaCy 3.4.1 Model: en_core_web_lg Initial test 1 No water cool 4K 8K 12K 16K 20K SE +/- 175.73, N = 3 18557
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.0324, N = 3 8.9714
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 200 400 600 800 1000 SE +/- 3.22, N = 3 890.19
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 90 180 270 360 450 SE +/- 1.37, N = 3 400.39
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 5 10 15 20 25 SE +/- 0.08, N = 3 19.91
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 5 10 15 20 25 SE +/- 0.02, N = 3 19.16
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 90 180 270 360 450 SE +/- 0.37, N = 3 417.17
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.01, N = 3 10.19
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.11, N = 3 98.08
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream Initial test 1 No water cool 13 26 39 52 65 SE +/- 0.15, N = 3 57.60
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.04, N = 3 17.36
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 13 26 39 52 65 SE +/- 0.09, N = 3 57.83
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.03, N = 3 17.29
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 50 100 150 200 250 SE +/- 0.26, N = 3 236.31
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.05, N = 3 33.81
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 10 20 30 40 50 SE +/- 0.09, N = 3 43.78
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 40 80 120 160 200 SE +/- 0.39, N = 3 182.62
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream Initial test 1 No water cool 8 16 24 32 40 SE +/- 0.01, N = 3 36.31
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.01, N = 3 27.53
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.01, N = 3 10.80
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.10, N = 3 92.51
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.05, N = 3 10.36
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.48, N = 3 96.47
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 15 30 45 60 75 SE +/- 0.38, N = 3 69.44
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.61, N = 3 115.11
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 7 14 21 28 35 SE +/- 0.11, N = 3 30.24
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 60 120 180 240 300 SE +/- 0.96, N = 3 264.38
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 7 14 21 28 35 SE +/- 0.14, N = 3 30.15
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 60 120 180 240 300 SE +/- 1.19, N = 3 265.20
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 16 32 48 64 80 SE +/- 0.05, N = 3 72.04
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.07, N = 3 110.99
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.02, N = 3 11.06
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream Initial test 1 No water cool 20 40 60 80 100 SE +/- 0.16, N = 3 90.36
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream Initial test 1 No water cool 1.296 2.592 3.888 5.184 6.48 SE +/- 0.0168, N = 3 5.7601
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream Initial test 1 No water cool 40 80 120 160 200 SE +/- 0.50, N = 3 173.38
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 0.8829 1.7658 2.6487 3.5316 4.4145 SE +/- 0.0211, N = 3 3.9240
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream Initial test 1 No water cool 400 800 1200 1600 2000 SE +/- 10.82, N = 3 2031.88
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream Initial test 1 No water cool 1.2923 2.5846 3.8769 5.1692 6.4615 SE +/- 0.0188, N = 3 5.7436
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream Initial test 1 No water cool 40 80 120 160 200 SE +/- 0.56, N = 3 173.90
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU Initial test 1 No water cool 11 22 33 44 55 SE +/- 0.35, N = 3 47.04
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 0.1848 0.3696 0.5544 0.7392 0.924 SE +/- 0.0023, N = 3 0.8214
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream Initial test 1 No water cool 300 600 900 1200 1500 SE +/- 3.33, N = 3 1214.24
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Deconvolution Batch shapes_1d - Engine: CPU Initial test 1 No water cool 0.6889 1.3778 2.0667 2.7556 3.4445 SE +/- 0.03049, N = 5 3.06179 MIN: 2.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l Initial test 1 No water cool 16 32 48 64 80 SE +/- 0.47, N = 3 69.80 MIN: 59.21 / MAX: 72.04
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l Initial test 1 No water cool 16 32 48 64 80 SE +/- 0.08, N = 3 69.97 MIN: 59.27 / MAX: 71.36
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l Initial test 1 No water cool 16 32 48 64 80 SE +/- 0.19, N = 3 70.52 MIN: 60 / MAX: 71.86
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l Initial test 1 No water cool 16 32 48 64 80 SE +/- 0.85, N = 3 70.63 MIN: 59.58 / MAX: 73.6
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l Initial test 1 No water cool 16 32 48 64 80 SE +/- 0.55, N = 3 70.81 MIN: 59.4 / MAX: 72.65
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica Initial test 1 No water cool 7 14 21 28 35 SE +/- 0.05, N = 3 30.12
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: GoogLeNet Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.15, N = 3 122.39
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression Initial test 1 No water cool 0.2318 0.4636 0.6954 0.9272 1.159 SE +/- 0.00, N = 3 1.03
TensorFlow Device: GPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 1 - Model: ResNet-50 Initial test 1 No water cool 0.9563 1.9126 2.8689 3.8252 4.7815 SE +/- 0.02, N = 3 4.25
Numenta Anomaly Benchmark Detector: Contextual Anomaly Detector OSE OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Contextual Anomaly Detector OSE Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.23, N = 3 25.40
TensorFlow Device: CPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: VGG-16 Initial test 1 No water cool 1.0665 2.133 3.1995 4.266 5.3325 SE +/- 0.01, N = 3 4.74
Numenta Anomaly Benchmark Detector: Windowed Gaussian OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Windowed Gaussian Initial test 1 No water cool 1.1214 2.2428 3.3642 4.4856 5.607 SE +/- 0.046, N = 15 4.984
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: AlexNet Initial test 1 No water cool 70 140 210 280 350 SE +/- 1.28, N = 3 305.81
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 Initial test 1 No water cool 14 28 42 56 70 SE +/- 0.45, N = 3 64.81 MIN: 58.56 / MAX: 67.34
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152 Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.83, N = 3 138.62 MIN: 121.16 / MAX: 142.38
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152 Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.16, N = 3 138.78 MIN: 119.06 / MAX: 141.64
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152 Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.75, N = 3 139.41 MIN: 120.33 / MAX: 143.02
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152 Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.85, N = 3 138.72 MIN: 120.45 / MAX: 142.01
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152 Initial test 1 No water cool 30 60 90 120 150 SE +/- 1.67, N = 3 140.41 MIN: 122.18 / MAX: 145.83
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l Initial test 1 No water cool 16 32 48 64 80 SE +/- 0.41, N = 3 71.98 MIN: 60.96 / MAX: 73.57
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.05, N = 3 15.12
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: AlexNet Initial test 1 No water cool 50 100 150 200 250 SE +/- 0.23, N = 3 224.56
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.29, N = 3 125.83
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 Initial test 1 No water cool 4 8 12 16 20 SE +/- 0.16, N = 3 13.71 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.24, N = 3 148.72
Numenta Anomaly Benchmark Detector: Bayesian Changepoint OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Bayesian Changepoint Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.05, N = 3 13.21
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 Initial test 1 No water cool 40 80 120 160 200 SE +/- 0.72, N = 3 183.28 MIN: 178.67 / MAX: 193.66 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 Initial test 1 No water cool 40 80 120 160 200 SE +/- 1.36, N = 3 179.68 MIN: 176.67 / MAX: 181.68 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: ResNet-50 Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.02, N = 3 12.70
TensorFlow Device: GPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 1 - Model: GoogLeNet Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.04, N = 3 12.36
Numenta Anomaly Benchmark Detector: Relative Entropy OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Relative Entropy Initial test 1 No water cool 2 4 6 8 10 SE +/- 0.101, N = 4 8.281
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152 Initial test 1 No water cool 30 60 90 120 150 SE +/- 0.86, N = 3 137.39 MIN: 123.44 / MAX: 140.4
TensorFlow Device: GPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: GPU - Batch Size: 1 - Model: AlexNet Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.01, N = 3 12.58
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: AlexNet Initial test 1 No water cool 3 6 9 12 15 SE +/- 0.01, N = 3 13.00
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: IP Shapes 3D - Engine: CPU Initial test 1 No water cool 0.9949 1.9898 2.9847 3.9796 4.9745 SE +/- 0.00764, N = 3 4.42170 MIN: 4.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50 Initial test 1 No water cool 80 160 240 320 400 SE +/- 0.44, N = 3 380.34 MIN: 332.41 / MAX: 387.86
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50 Initial test 1 No water cool 80 160 240 320 400 SE +/- 0.25, N = 3 379.98 MIN: 282.37 / MAX: 389.84
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50 Initial test 1 No water cool 80 160 240 320 400 SE +/- 1.52, N = 3 380.67 MIN: 329.56 / MAX: 390.17
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50 Initial test 1 No water cool 80 160 240 320 400 SE +/- 3.53, N = 3 380.74 MIN: 326.25 / MAX: 392.43
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50 Initial test 1 No water cool 80 160 240 320 400 SE +/- 3.59, N = 3 383.56 MIN: 325.34 / MAX: 393.54
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth Initial test 1 No water cool 600 1200 1800 2400 3000 SE +/- 3.00, N = 3 2985.70 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Convolution Batch Shapes Auto - Engine: CPU Initial test 1 No water cool 2 4 6 8 10 SE +/- 0.01686, N = 3 7.16631 MIN: 6.75 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N Initial test 1 No water cool 3K 6K 9K 12K 15K SE +/- 99.14, N = 11 13212.0 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
PyTorch Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50 Initial test 1 No water cool 80 160 240 320 400 SE +/- 3.94, N = 3 387.06 MIN: 305.77 / MAX: 401.83
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-medium.en - Input: 2016 State of the Union Initial test 1 No water cool 0.1949 0.3898 0.5847 0.7796 0.9745 SE +/- 0.08461, N = 15 0.86615 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: GoogLeNet Initial test 1 No water cool 11 22 33 44 55 SE +/- 0.14, N = 3 47.21
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.00, N = 3 27.07 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.4 Harness: Deconvolution Batch shapes_3d - Engine: CPU Initial test 1 No water cool 0.5772 1.1544 1.7316 2.3088 2.886 SE +/- 0.00599, N = 3 2.56519 MIN: 2.37 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 Initial test 1 No water cool 10 20 30 40 50 SE +/- 0.13, N = 3 42.22 MIN: 41.67 / MAX: 45.74 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D Initial test 1 No water cool 70 140 210 280 350 SE +/- 0.24, N = 3 299.47 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-small.en - Input: 2016 State of the Union Initial test 1 No water cool 0.0785 0.157 0.2355 0.314 0.3925 SE +/- 0.02582, N = 15 0.34881 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP Initial test 1 No water cool 300 600 900 1200 1500 SE +/- 1.18, N = 3 1292.53 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.05, N = 3 25.46 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction Initial test 1 No water cool 80 160 240 320 400 SE +/- 0.08, N = 3 388.93 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download Initial test 1 No water cool 6 12 18 24 30 SE +/- 0.00, N = 3 26.83 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-base.en - Input: 2016 State of the Union Initial test 1 No water cool 0.0345 0.069 0.1035 0.138 0.1725 SE +/- 0.00750, N = 15 0.15350 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash Initial test 1 No water cool 11 22 33 44 55 SE +/- 0.04, N = 3 47.90 1. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi
Phoronix Test Suite v10.8.5