24.03.13.Pop.2204.ML.test1

AMD Ryzen 9 7950X 16-Core testing with a ASUS ProArt X670E-CREATOR WIFI (1710 BIOS) and Zotac NVIDIA GeForce RTX 4070 Ti 12GB on Pop 22.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2403157-NE-240313POP28
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
Initial test 1 No water cool
March 13 2024
  2 Days, 55 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


24.03.13.Pop.2204.ML.test1OpenBenchmarking.orgPhoronix Test SuiteAMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads)ASUS ProArt X670E-CREATOR WIFI (1710 BIOS)AMD Device 14d82 x 16 GB DDR5-4800MT/s G Skill F5-6000J3636F16G1000GB PNY CS2130 1TB SSDZotac NVIDIA GeForce RTX 4070 Ti 12GBNVIDIA Device 22bc2 x DELL 2001FPIntel I225-V + Aquantia AQtion AQC113CS NBase-T/IEEE + MEDIATEK MT7922 802.11ax PCIPop 22.046.6.10-76060610-generic (x86_64)GNOME Shell 42.5X Server 1.21.1.4NVIDIA 550.54.144.6.0OpenCL 3.0 CUDA 12.4.891.3.277GCC 11.4.0ext43200x1200ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen Resolution24.03.13.Pop.2204.ML.test1 BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601206- GLAMOR - BAR1 / Visible vRAM Size: 16384 MiB - vBIOS Version: 95.04.31.00.3b- GPU Compute Cores: 7680- Python 3.10.12- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

24.03.13.Pop.2204.ML.test1tensorflow: GPU - 256 - VGG-16tensorflow: GPU - 256 - ResNet-50tensorflow: GPU - 64 - VGG-16tensorflow: GPU - 512 - GoogLeNettensorflow: GPU - 32 - VGG-16tensorflow: GPU - 256 - GoogLeNetshoc: OpenCL - Max SP Flopstensorflow: GPU - 512 - AlexNettensorflow: CPU - 256 - VGG-16tensorflow: GPU - 64 - ResNet-50tensorflow: GPU - 16 - VGG-16tensorflow: GPU - 256 - AlexNettensorflow: CPU - 256 - ResNet-50tensorflow: GPU - 32 - ResNet-50tensorflow: CPU - 512 - GoogLeNettensorflow: GPU - 64 - GoogLeNettensorflow: CPU - 64 - VGG-16tensorflow: GPU - 16 - ResNet-50numenta-nab: KNN CADai-benchmark: Device AI Scoreai-benchmark: Device Training Scoreai-benchmark: Device Inference Scoretensorflow: CPU - 256 - GoogLeNettensorflow: GPU - 32 - GoogLeNettensorflow: CPU - 32 - VGG-16tensorflow: GPU - 64 - AlexNettensorflow: CPU - 64 - ResNet-50pytorch: CPU - 512 - Efficientnet_v2_lpytorch: CPU - 16 - Efficientnet_v2_lpytorch: CPU - 32 - Efficientnet_v2_lpytorch: CPU - 256 - Efficientnet_v2_lpytorch: CPU - 64 - Efficientnet_v2_lopencv: DNN - Deep Neural Networkopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUtensorflow: CPU - 512 - AlexNettnn: CPU - DenseNetmnn: inception-v3mnn: mobilenet-v1-1.0mnn: MobileNetV2_224mnn: SqueezeNetV1.0mnn: resnet-v2-50mnn: squeezenetv1.1mnn: mobilenetV3mnn: nasnettensorflow: GPU - 16 - GoogLeNetnumpy: tensorflow: CPU - 16 - VGG-16pytorch: CPU - 256 - ResNet-152pytorch: CPU - 64 - ResNet-152pytorch: CPU - 512 - ResNet-152pytorch: CPU - 32 - ResNet-152pytorch: CPU - 16 - ResNet-152tensorflow: GPU - 32 - AlexNettensorflow: CPU - 32 - ResNet-50pytorch: CPU - 1 - Efficientnet_v2_ltensorflow: GPU - 1 - VGG-16onednn: Recurrent Neural Network Training - CPUonednn: Recurrent Neural Network Inference - CPUtensorflow: CPU - 256 - AlexNetdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Streamdeepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Streammlpack: scikit_qdaopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUncnn: CPU - FastestDetncnn: CPU - vision_transformerncnn: CPU - regnety_400mncnn: CPU - squeezenet_ssdncnn: CPU - yolov4-tinyncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: CPU - resnet50ncnn: CPU - alexnetncnn: CPU - resnet18ncnn: CPU - vgg16ncnn: CPU - googlenetncnn: CPU - blazefacencnn: CPU - efficientnet-b0ncnn: CPU - mnasnetncnn: CPU - shufflenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU - mobilenettensorflow: CPU - 64 - GoogLeNetncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - googlenetncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU - mobilenetopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUtensorflow-lite: Inception V4openvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUtensorflow-lite: Inception ResNet V2openvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUtensorflow-lite: NASNet Mobiletensorflow-lite: Mobilenet Floattensorflow-lite: SqueezeNetopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUtensorflow-lite: Mobilenet Quantopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUtensorflow: GPU - 16 - AlexNetnumenta-nab: Earthgecko Skylinetensorflow: CPU - 16 - ResNet-50onednn: IP Shapes 1D - CPUpytorch: CPU - 1 - ResNet-152pytorch: CPU - 512 - ResNet-50pytorch: CPU - 256 - ResNet-50pytorch: CPU - 64 - ResNet-50pytorch: CPU - 32 - ResNet-50pytorch: CPU - 16 - ResNet-50deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Streamspacy: en_core_web_trfspacy: en_core_web_lgdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Streamdeepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Streamdeepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: ResNet-50, Baseline - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Streamdeepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Streamdeepsparse: ResNet-50, Baseline - Synchronous Single-Streamdeepsparse: ResNet-50, Baseline - Synchronous Single-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Streamdeepspeech: CPUdeepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Streamdeepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Streamonednn: Deconvolution Batch shapes_1d - CPUpytorch: NVIDIA CUDA GPU - 64 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 512 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 32 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 16 - Efficientnet_v2_lpytorch: NVIDIA CUDA GPU - 256 - Efficientnet_v2_lmlpack: scikit_icatensorflow: CPU - 32 - GoogLeNetmlpack: scikit_linearridgeregressiontensorflow: GPU - 1 - ResNet-50numenta-nab: Contextual Anomaly Detector OSEtensorflow: CPU - 1 - VGG-16numenta-nab: Windowed Gaussiantensorflow: CPU - 64 - AlexNetpytorch: CPU - 1 - ResNet-50pytorch: NVIDIA CUDA GPU - 256 - ResNet-152pytorch: NVIDIA CUDA GPU - 16 - ResNet-152pytorch: NVIDIA CUDA GPU - 32 - ResNet-152pytorch: NVIDIA CUDA GPU - 64 - ResNet-152pytorch: NVIDIA CUDA GPU - 512 - ResNet-152pytorch: NVIDIA CUDA GPU - 1 - Efficientnet_v2_lmlpack: scikit_svmtensorflow: CPU - 32 - AlexNettensorflow: CPU - 16 - GoogLeNetrnnoise: tensorflow: CPU - 16 - AlexNetnumenta-nab: Bayesian Changepointtnn: CPU - MobileNet v2tnn: CPU - SqueezeNet v1.1tensorflow: CPU - 1 - ResNet-50tensorflow: GPU - 1 - GoogLeNetnumenta-nab: Relative Entropypytorch: NVIDIA CUDA GPU - 1 - ResNet-152tensorflow: GPU - 1 - AlexNettensorflow: CPU - 1 - AlexNetonednn: IP Shapes 3D - CPUpytorch: NVIDIA CUDA GPU - 256 - ResNet-50pytorch: NVIDIA CUDA GPU - 64 - ResNet-50pytorch: NVIDIA CUDA GPU - 16 - ResNet-50pytorch: NVIDIA CUDA GPU - 32 - ResNet-50pytorch: NVIDIA CUDA GPU - 512 - ResNet-50shoc: OpenCL - Texture Read Bandwidthonednn: Convolution Batch Shapes Auto - CPUshoc: OpenCL - GEMM SGEMM_Npytorch: NVIDIA CUDA GPU - 1 - ResNet-50whisper-cpp: ggml-medium.en - 2016 State of the Uniontensorflow: CPU - 1 - GoogLeNetshoc: OpenCL - Bus Speed Readbackonednn: Deconvolution Batch shapes_3d - CPUtnn: CPU - SqueezeNet v2shoc: OpenCL - S3Dwhisper-cpp: ggml-small.en - 2016 State of the Unionshoc: OpenCL - FFT SPshoc: OpenCL - Triadshoc: OpenCL - Reductionshoc: OpenCL - Bus Speed Downloadwhisper-cpp: ggml-base.en - 2016 State of the Unionshoc: OpenCL - MD5 Hashlczero: BLASInitial test 1 No water cool1.775.561.7315.901.7215.7643074.935.9318.125.511.7035.8236.155.49115.7015.6117.445.42105.001647335732900116.3315.4516.8934.8436.3610.4410.4610.5910.6310.5830277625.6612.75392.162005.60623.4212.4563.4104.14112.1232.5421.63811.30015.10704.5216.0917.6917.6617.5917.6417.6633.3936.7414.141.461452.96747.499388.40303.333026.334554.002818.514334.07323.1224.714.6937.929.878.4916.289.8013.485.526.6232.609.561.604.503.463.933.693.659.80119.044.3038.129.838.3815.849.3613.165.676.6532.319.691.624.573.453.903.723.699.3665.52121.96104.2776.65103.0977.5421139.411.331386.9321857.017.81448.2810099.31214.111716.045.531442.071861.5329.32272.364.461785.4821.88729.995.181538.273.614335.6623.94667.290.3146025.940.4532402.4212.91618.4112.611266.852.533062.636.442470.8630.6755.56636.431.1735125.6442.9143.4943.3544.0844.08397.579720.08523.5898278.31222415185578.9714890.1894400.393119.910119.1612417.172310.186298.075857.603817.357257.830417.2893236.314433.809543.7801182.622536.308927.530310.802392.513510.359096.469269.4409115.112430.2415264.377330.1491265.201872.0412110.991411.055690.35585.7601173.37863.92402031.87755.7436173.898647.035140.82141214.23993.0617969.8069.9770.5270.6370.8130.12122.391.034.2525.4004.744.984305.8164.81138.62138.78139.41138.72140.4171.9815.12224.56125.8313.707148.7213.213183.277179.67912.7012.368.281137.3912.5813.004.42170380.34379.98380.67380.74383.562985.707.1663113212.0387.060.8661547.2127.07232.5651942.215299.4680.348811292.5325.4632388.93426.82750.1535047.8988OpenBenchmarking.org

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: VGG-16Initial test 1 No water cool0.39830.79661.19491.59321.9915SE +/- 0.00, N = 31.77

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: ResNet-50Initial test 1 No water cool1.2512.5023.7535.0046.255SE +/- 0.02, N = 35.56

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: VGG-16Initial test 1 No water cool0.38930.77861.16791.55721.9465SE +/- 0.01, N = 31.73

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 512 - Model: GoogLeNetInitial test 1 No water cool48121620SE +/- 0.02, N = 315.90

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: VGG-16Initial test 1 No water cool0.3870.7741.1611.5481.935SE +/- 0.01, N = 31.72

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: GoogLeNetInitial test 1 No water cool48121620SE +/- 0.04, N = 315.76

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP FlopsInitial test 1 No water cool9K18K27K36K45KSE +/- 97.97, N = 343074.91. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 512 - Model: AlexNetInitial test 1 No water cool816243240SE +/- 0.09, N = 335.93

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: VGG-16Initial test 1 No water cool48121620SE +/- 0.01, N = 318.12

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: ResNet-50Initial test 1 No water cool1.23982.47963.71944.95926.199SE +/- 0.01, N = 35.51

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: VGG-16Initial test 1 No water cool0.38250.7651.14751.531.9125SE +/- 0.00, N = 31.70

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 256 - Model: AlexNetInitial test 1 No water cool816243240SE +/- 0.10, N = 335.82

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: ResNet-50Initial test 1 No water cool816243240SE +/- 0.00, N = 336.15

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: ResNet-50Initial test 1 No water cool1.23532.47063.70594.94126.1765SE +/- 0.04, N = 35.49

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: GoogLeNetInitial test 1 No water cool306090120150SE +/- 0.08, N = 3115.70

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: GoogLeNetInitial test 1 No water cool48121620SE +/- 0.04, N = 315.61

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: VGG-16Initial test 1 No water cool48121620SE +/- 0.07, N = 317.44

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: ResNet-50Initial test 1 No water cool1.21952.4393.65854.8786.0975SE +/- 0.01, N = 35.42

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: KNN CADInitial test 1 No water cool20406080100SE +/- 0.86, N = 9105.00

AI Benchmark Alpha

AI Benchmark Alpha is a Python library for evaluating artificial intelligence (AI) performance on diverse hardware platforms and relies upon the TensorFlow machine learning library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device AI ScoreInitial test 1 No water cool140028004200560070006473

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device Training ScoreInitial test 1 No water cool80016002400320040003573

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device Inference ScoreInitial test 1 No water cool60012001800240030002900

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: GoogLeNetInitial test 1 No water cool306090120150SE +/- 0.41, N = 3116.33

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: GoogLeNetInitial test 1 No water cool48121620SE +/- 0.03, N = 315.45

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: VGG-16Initial test 1 No water cool48121620SE +/- 0.03, N = 316.89

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 64 - Model: AlexNetInitial test 1 No water cool816243240SE +/- 0.12, N = 334.84

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: ResNet-50Initial test 1 No water cool816243240SE +/- 0.02, N = 336.36

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_lInitial test 1 No water cool3691215SE +/- 0.10, N = 310.44MIN: 8.67 / MAX: 11.33

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_lInitial test 1 No water cool3691215SE +/- 0.07, N = 310.46MIN: 8.62 / MAX: 11.22

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_lInitial test 1 No water cool3691215SE +/- 0.06, N = 310.59MIN: 8.62 / MAX: 11.44

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_lInitial test 1 No water cool3691215SE +/- 0.08, N = 310.63MIN: 8.45 / MAX: 11.32

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_lInitial test 1 No water cool3691215SE +/- 0.05, N = 310.58MIN: 8.79 / MAX: 11.4

OpenCV

This is a benchmark of the OpenCV (Computer Vision) library's built-in performance tests. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenCV 4.7Test: DNN - Deep Neural NetworkInitial test 1 No water cool6K12K18K24K30KSE +/- 360.44, N = 15302771. (CXX) g++ options: -fPIC -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -shared

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection FP16 - Device: CPUInitial test 1 No water cool140280420560700SE +/- 5.40, N = 7625.66MIN: 442.72 / MAX: 668.181. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection FP16 - Device: CPUInitial test 1 No water cool3691215SE +/- 0.12, N = 712.751. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 512 - Model: AlexNetInitial test 1 No water cool90180270360450SE +/- 2.15, N = 3392.16

TNN

TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: DenseNetInitial test 1 No water cool400800120016002000SE +/- 7.45, N = 32005.61MIN: 1929.36 / MAX: 2112.891. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

Mobile Neural Network

MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. This MNN test profile is building the OpenMP / CPU threaded version for processor benchmarking and not any GPU-accelerated test. MNN does allow making use of AVX-512 extensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: inception-v3Initial test 1 No water cool612182430SE +/- 0.58, N = 323.42MIN: 20.69 / MAX: 54.31. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: mobilenet-v1-1.0Initial test 1 No water cool0.55261.10521.65782.21042.763SE +/- 0.038, N = 32.456MIN: 2.27 / MAX: 6.461. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: MobileNetV2_224Initial test 1 No water cool0.76731.53462.30193.06923.8365SE +/- 0.049, N = 33.410MIN: 3.18 / MAX: 12.531. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: SqueezeNetV1.0Initial test 1 No water cool0.93171.86342.79513.72684.6585SE +/- 0.111, N = 34.141MIN: 3.72 / MAX: 10.491. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: resnet-v2-50Initial test 1 No water cool3691215SE +/- 0.04, N = 312.12MIN: 11.31 / MAX: 29.791. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: squeezenetv1.1Initial test 1 No water cool0.5721.1441.7162.2882.86SE +/- 0.046, N = 32.542MIN: 2.3 / MAX: 9.081. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: mobilenetV3Initial test 1 No water cool0.36860.73721.10581.47441.843SE +/- 0.026, N = 31.638MIN: 1.47 / MAX: 4.851. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: nasnetInitial test 1 No water cool3691215SE +/- 0.10, N = 311.30MIN: 10.49 / MAX: 27.451. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: GoogLeNetInitial test 1 No water cool48121620SE +/- 0.05, N = 315.10

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkInitial test 1 No water cool150300450600750SE +/- 7.50, N = 3704.52

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: VGG-16Initial test 1 No water cool48121620SE +/- 0.12, N = 316.09

Device: GPU - Batch Size: 512 - Model: VGG-16

Initial test 1 No water cool: The test quit with a non-zero exit status. E: Fatal Python error: Segmentation fault

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 256 - Model: ResNet-152Initial test 1 No water cool48121620SE +/- 0.04, N = 317.69MIN: 14.61 / MAX: 18.11

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 64 - Model: ResNet-152Initial test 1 No water cool48121620SE +/- 0.23, N = 317.66MIN: 14.14 / MAX: 18.38

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-152Initial test 1 No water cool48121620SE +/- 0.10, N = 317.59MIN: 16.92 / MAX: 18.27

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: ResNet-152Initial test 1 No water cool48121620SE +/- 0.10, N = 317.64MIN: 14.95 / MAX: 18.12

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-152Initial test 1 No water cool48121620SE +/- 0.11, N = 317.66MIN: 17.14 / MAX: 18.3

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 32 - Model: AlexNetInitial test 1 No water cool816243240SE +/- 0.12, N = 333.39

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: ResNet-50Initial test 1 No water cool816243240SE +/- 0.05, N = 336.74

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_lInitial test 1 No water cool48121620SE +/- 0.09, N = 314.14MIN: 12.35 / MAX: 14.45

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: VGG-16Initial test 1 No water cool0.32850.6570.98551.3141.6425SE +/- 0.00, N = 31.46

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Training - Engine: CPUInitial test 1 No water cool30060090012001500SE +/- 7.60, N = 31452.96MIN: 1391.341. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Recurrent Neural Network Inference - Engine: CPUInitial test 1 No water cool160320480640800SE +/- 1.15, N = 3747.50MIN: 714.811. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 256 - Model: AlexNetInitial test 1 No water cool80160240320400SE +/- 3.29, N = 3388.40

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool70140210280350SE +/- 0.41, N = 3303.33

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool612182430SE +/- 0.04, N = 326.33

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-StreamInitial test 1 No water cool1224364860SE +/- 0.12, N = 354.00

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-StreamInitial test 1 No water cool510152025SE +/- 0.04, N = 318.51

Mlpack Benchmark

Mlpack benchmark scripts for machine learning libraries Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_qdaInitial test 1 No water cool816243240SE +/- 0.25, N = 334.07

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection FP16-INT8 - Device: CPUInitial test 1 No water cool70140210280350SE +/- 0.74, N = 3323.12MIN: 174.68 / MAX: 361.951. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection FP16-INT8 - Device: CPUInitial test 1 No water cool612182430SE +/- 0.06, N = 324.711. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: FastestDetInitial test 1 No water cool1.05532.11063.16594.22125.2765SE +/- 0.11, N = 34.69MIN: 4.25 / MAX: 7.771. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vision_transformerInitial test 1 No water cool918273645SE +/- 0.43, N = 337.92MIN: 34.53 / MAX: 51.031. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: regnety_400mInitial test 1 No water cool3691215SE +/- 0.06, N = 39.87MIN: 9.16 / MAX: 15.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: squeezenet_ssdInitial test 1 No water cool246810SE +/- 0.16, N = 38.49MIN: 7.41 / MAX: 14.781. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: yolov4-tinyInitial test 1 No water cool48121620SE +/- 0.33, N = 316.28MIN: 14.44 / MAX: 34.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3Initial test 1 No water cool3691215SE +/- 0.08, N = 39.80MIN: 8.81 / MAX: 23.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet50Initial test 1 No water cool3691215SE +/- 0.34, N = 313.48MIN: 11.93 / MAX: 33.061. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: alexnetInitial test 1 No water cool1.2422.4843.7264.9686.21SE +/- 0.01, N = 35.52MIN: 5.06 / MAX: 10.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: resnet18Initial test 1 No water cool246810SE +/- 0.03, N = 36.62MIN: 5.93 / MAX: 11.831. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: vgg16Initial test 1 No water cool816243240SE +/- 0.24, N = 332.60MIN: 30.13 / MAX: 77.851. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: googlenetInitial test 1 No water cool3691215SE +/- 0.02, N = 39.56MIN: 8.74 / MAX: 18.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: blazefaceInitial test 1 No water cool0.360.721.081.441.8SE +/- 0.01, N = 31.60MIN: 1.48 / MAX: 6.711. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: efficientnet-b0Initial test 1 No water cool1.01252.0253.03754.055.0625SE +/- 0.02, N = 34.50MIN: 4.16 / MAX: 8.331. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mnasnetInitial test 1 No water cool0.77851.5572.33553.1143.8925SE +/- 0.01, N = 33.46MIN: 3.18 / MAX: 18.791. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: shufflenet-v2Initial test 1 No water cool0.88431.76862.65293.53724.4215SE +/- 0.01, N = 33.93MIN: 3.65 / MAX: 7.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v3-v3 - Model: mobilenet-v3Initial test 1 No water cool0.83031.66062.49093.32124.1515SE +/- 0.03, N = 33.69MIN: 3.42 / MAX: 8.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU-v2-v2 - Model: mobilenet-v2Initial test 1 No water cool0.82131.64262.46393.28524.1065SE +/- 0.02, N = 33.65MIN: 3.37 / MAX: 7.361. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: CPU - Model: mobilenetInitial test 1 No water cool3691215SE +/- 0.08, N = 39.80MIN: 8.81 / MAX: 23.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: GoogLeNetInitial test 1 No water cool306090120150SE +/- 0.13, N = 3119.04

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetInitial test 1 No water cool0.96751.9352.90253.874.8375SE +/- 0.11, N = 34.30MIN: 3.94 / MAX: 7.541. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerInitial test 1 No water cool918273645SE +/- 0.62, N = 338.12MIN: 34.66 / MAX: 105.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mInitial test 1 No water cool3691215SE +/- 0.05, N = 39.83MIN: 9.06 / MAX: 26.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdInitial test 1 No water cool246810SE +/- 0.04, N = 38.38MIN: 7.63 / MAX: 22.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyInitial test 1 No water cool48121620SE +/- 0.04, N = 315.84MIN: 14.55 / MAX: 30.61. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3Initial test 1 No water cool3691215SE +/- 0.03, N = 39.36MIN: 8.66 / MAX: 15.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50Initial test 1 No water cool3691215SE +/- 0.06, N = 313.16MIN: 11.9 / MAX: 19.891. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetInitial test 1 No water cool1.27582.55163.82745.10326.379SE +/- 0.16, N = 35.67MIN: 5.06 / MAX: 11.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18Initial test 1 No water cool246810SE +/- 0.03, N = 36.65MIN: 5.9 / MAX: 23.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16Initial test 1 No water cool816243240SE +/- 0.15, N = 332.31MIN: 29.89 / MAX: 88.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetInitial test 1 No water cool3691215SE +/- 0.08, N = 39.69MIN: 8.75 / MAX: 15.081. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceInitial test 1 No water cool0.36450.7291.09351.4581.8225SE +/- 0.03, N = 31.62MIN: 1.47 / MAX: 14.361. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0Initial test 1 No water cool1.02832.05663.08494.11325.1415SE +/- 0.06, N = 34.57MIN: 4.16 / MAX: 9.031. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetInitial test 1 No water cool0.77631.55262.32893.10523.8815SE +/- 0.00, N = 33.45MIN: 3.2 / MAX: 7.91. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2Initial test 1 No water cool0.87751.7552.63253.514.3875SE +/- 0.02, N = 33.90MIN: 3.65 / MAX: 8.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3Initial test 1 No water cool0.8371.6742.5113.3484.185SE +/- 0.03, N = 33.72MIN: 3.43 / MAX: 15.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2Initial test 1 No water cool0.83031.66062.49093.32124.1515SE +/- 0.03, N = 33.69MIN: 3.36 / MAX: 25.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetInitial test 1 No water cool3691215SE +/- 0.03, N = 39.36MIN: 8.66 / MAX: 15.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Machine Translation EN To DE FP16 - Device: CPUInitial test 1 No water cool1530456075SE +/- 0.08, N = 365.52MIN: 34.28 / MAX: 85.731. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Machine Translation EN To DE FP16 - Device: CPUInitial test 1 No water cool306090120150SE +/- 0.16, N = 3121.961. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Detection FP16 - Device: CPUInitial test 1 No water cool20406080100SE +/- 0.46, N = 3104.27MIN: 66.39 / MAX: 133.391. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Detection FP16 - Device: CPUInitial test 1 No water cool20406080100SE +/- 0.34, N = 376.651. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Detection FP32 - Device: CPUInitial test 1 No water cool20406080100SE +/- 1.01, N = 3103.09MIN: 50.34 / MAX: 135.141. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Detection FP32 - Device: CPUInitial test 1 No water cool20406080100SE +/- 0.77, N = 377.541. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Inception V4Initial test 1 No water cool5K10K15K20K25KSE +/- 19.27, N = 321139.4

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Noise Suppression Poconet-Like FP16 - Device: CPUInitial test 1 No water cool3691215SE +/- 0.01, N = 311.33MIN: 6.71 / MAX: 21.431. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Noise Suppression Poconet-Like FP16 - Device: CPUInitial test 1 No water cool30060090012001500SE +/- 1.13, N = 31386.931. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Inception ResNet V2Initial test 1 No water cool5K10K15K20K25KSE +/- 104.72, N = 321857.0

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16-INT8 - Device: CPUInitial test 1 No water cool48121620SE +/- 0.04, N = 317.81MIN: 8.88 / MAX: 27.31. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16-INT8 - Device: CPUInitial test 1 No water cool100200300400500SE +/- 1.04, N = 3448.281. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: NASNet MobileInitial test 1 No water cool2K4K6K8K10KSE +/- 20.39, N = 310099.3

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Mobilenet FloatInitial test 1 No water cool30060090012001500SE +/- 1.58, N = 31214.11

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: SqueezeNetInitial test 1 No water cool400800120016002000SE +/- 12.41, N = 31716.04

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Vehicle Bike Detection FP16 - Device: CPUInitial test 1 No water cool1.24432.48863.73294.97726.2215SE +/- 0.01, N = 35.53MIN: 3.88 / MAX: 14.271. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Vehicle Bike Detection FP16 - Device: CPUInitial test 1 No water cool30060090012001500SE +/- 1.93, N = 31442.071. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

TensorFlow Lite

This is a benchmark of the TensorFlow Lite implementation focused on TensorFlow machine learning for mobile, IoT, edge, and other cases. The current Linux support is limited to running on CPUs. This test profile is measuring the average inference time. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Mobilenet QuantInitial test 1 No water cool400800120016002000SE +/- 11.38, N = 31861.53

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16 - Device: CPUInitial test 1 No water cool714212835SE +/- 0.19, N = 329.32MIN: 13.59 / MAX: 44.551. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Road Segmentation ADAS FP16 - Device: CPUInitial test 1 No water cool60120180240300SE +/- 1.80, N = 3272.361. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Person Re-Identification Retail FP16 - Device: CPUInitial test 1 No water cool1.00352.0073.01054.0145.0175SE +/- 0.01, N = 34.46MIN: 2.58 / MAX: 13.941. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Person Re-Identification Retail FP16 - Device: CPUInitial test 1 No water cool400800120016002000SE +/- 2.90, N = 31785.481. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16-INT8 - Device: CPUInitial test 1 No water cool510152025SE +/- 0.07, N = 321.88MIN: 16.69 / MAX: 40.531. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16-INT8 - Device: CPUInitial test 1 No water cool160320480640800SE +/- 2.28, N = 3729.991. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16-INT8 - Device: CPUInitial test 1 No water cool1.16552.3313.49654.6625.8275SE +/- 0.02, N = 35.18MIN: 3.09 / MAX: 14.581. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16-INT8 - Device: CPUInitial test 1 No water cool30060090012001500SE +/- 4.10, N = 31538.271. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16-INT8 - Device: CPUInitial test 1 No water cool0.81231.62462.43693.24924.0615SE +/- 0.01, N = 33.61MIN: 1.95 / MAX: 11.371. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16-INT8 - Device: CPUInitial test 1 No water cool9001800270036004500SE +/- 18.97, N = 34335.661. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16 - Device: CPUInitial test 1 No water cool612182430SE +/- 0.06, N = 323.94MIN: 14.82 / MAX: 35.61. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Handwritten English Recognition FP16 - Device: CPUInitial test 1 No water cool140280420560700SE +/- 1.79, N = 3667.291. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUInitial test 1 No water cool0.06980.13960.20940.27920.349SE +/- 0.00, N = 30.31MIN: 0.16 / MAX: 8.121. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUInitial test 1 No water cool10K20K30K40K50KSE +/- 95.25, N = 346025.941. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUInitial test 1 No water cool0.10130.20260.30390.40520.5065SE +/- 0.00, N = 30.45MIN: 0.21 / MAX: 7.281. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUInitial test 1 No water cool7K14K21K28K35KSE +/- 75.38, N = 332402.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16 - Device: CPUInitial test 1 No water cool3691215SE +/- 0.04, N = 312.91MIN: 5.88 / MAX: 26.231. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Vehicle Detection FP16 - Device: CPUInitial test 1 No water cool130260390520650SE +/- 1.73, N = 3618.411. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16 - Device: CPUInitial test 1 No water cool3691215SE +/- 0.03, N = 312.61MIN: 6.44 / MAX: 25.711. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16 - Device: CPUInitial test 1 No water cool30060090012001500SE +/- 3.30, N = 31266.851. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16 - Device: CPUInitial test 1 No water cool0.56931.13861.70792.27722.8465SE +/- 0.01, N = 32.53MIN: 1.29 / MAX: 10.521. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Face Detection Retail FP16 - Device: CPUInitial test 1 No water cool7001400210028003500SE +/- 11.24, N = 33062.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16-INT8 - Device: CPUInitial test 1 No water cool246810SE +/- 0.02, N = 36.44MIN: 3.29 / MAX: 17.171. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.0Model: Weld Porosity Detection FP16-INT8 - Device: CPUInitial test 1 No water cool5001000150020002500SE +/- 8.85, N = 32470.861. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

Device: GPU - Batch Size: 512 - Model: ResNet-50

Initial test 1 No water cool: The test quit with a non-zero exit status. E: Fatal Python error: Segmentation fault

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 16 - Model: AlexNetInitial test 1 No water cool714212835SE +/- 0.21, N = 330.67

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Earthgecko SkylineInitial test 1 No water cool1224364860SE +/- 0.29, N = 355.57

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: ResNet-50Initial test 1 No water cool816243240SE +/- 0.07, N = 336.43

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 1D - Engine: CPUInitial test 1 No water cool0.2640.5280.7921.0561.32SE +/- 0.00923, N = 101.17351MIN: 1.011. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152Initial test 1 No water cool612182430SE +/- 0.33, N = 325.64MIN: 23.55 / MAX: 26.92

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 512 - Model: ResNet-50Initial test 1 No water cool1020304050SE +/- 0.37, N = 342.91MIN: 37.63 / MAX: 44.8

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 256 - Model: ResNet-50Initial test 1 No water cool1020304050SE +/- 0.47, N = 343.49MIN: 37.04 / MAX: 45.45

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 64 - Model: ResNet-50Initial test 1 No water cool1020304050SE +/- 0.35, N = 343.35MIN: 38.96 / MAX: 45.4

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: ResNet-50Initial test 1 No water cool1020304050SE +/- 0.26, N = 344.08MIN: 41.27 / MAX: 45.58

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-50Initial test 1 No water cool1020304050SE +/- 0.15, N = 344.08MIN: 38.79 / MAX: 45.8

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool90180270360450SE +/- 2.37, N = 3397.58

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool510152025SE +/- 0.12, N = 320.09

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-StreamInitial test 1 No water cool0.80771.61542.42313.23084.0385SE +/- 0.0014, N = 33.5898

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-StreamInitial test 1 No water cool60120180240300SE +/- 0.11, N = 3278.31

spaCy

The spaCy library is an open-source solution for advanced neural language processing (NLP). The spaCy library leverages Python and is a leading neural language processing solution. This test profile times the spaCy CPU performance with various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgtokens/sec, More Is BetterspaCy 3.4.1Model: en_core_web_trfInitial test 1 No water cool5001000150020002500SE +/- 29.81, N = 32415

OpenBenchmarking.orgtokens/sec, More Is BetterspaCy 3.4.1Model: en_core_web_lgInitial test 1 No water cool4K8K12K16K20KSE +/- 175.73, N = 318557

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool3691215SE +/- 0.0324, N = 38.9714

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool2004006008001000SE +/- 3.22, N = 3890.19

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool90180270360450SE +/- 1.37, N = 3400.39

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool510152025SE +/- 0.08, N = 319.91

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool510152025SE +/- 0.02, N = 319.16

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool90180270360450SE +/- 0.37, N = 3417.17

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-StreamInitial test 1 No water cool3691215SE +/- 0.01, N = 310.19

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-StreamInitial test 1 No water cool20406080100SE +/- 0.11, N = 398.08

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-StreamInitial test 1 No water cool1326395265SE +/- 0.15, N = 357.60

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-StreamInitial test 1 No water cool48121620SE +/- 0.04, N = 317.36

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-StreamInitial test 1 No water cool1326395265SE +/- 0.09, N = 357.83

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-StreamInitial test 1 No water cool48121620SE +/- 0.03, N = 317.29

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool50100150200250SE +/- 0.26, N = 3236.31

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool816243240SE +/- 0.05, N = 333.81

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool1020304050SE +/- 0.09, N = 343.78

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool4080120160200SE +/- 0.39, N = 3182.62

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-StreamInitial test 1 No water cool816243240SE +/- 0.01, N = 336.31

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-StreamInitial test 1 No water cool612182430SE +/- 0.01, N = 327.53

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-StreamInitial test 1 No water cool3691215SE +/- 0.01, N = 310.80

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-StreamInitial test 1 No water cool20406080100SE +/- 0.10, N = 392.51

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-StreamInitial test 1 No water cool3691215SE +/- 0.05, N = 310.36

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-StreamInitial test 1 No water cool20406080100SE +/- 0.48, N = 396.47

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool1530456075SE +/- 0.38, N = 369.44

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool306090120150SE +/- 0.61, N = 3115.11

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool714212835SE +/- 0.11, N = 330.24

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool60120180240300SE +/- 0.96, N = 3264.38

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool714212835SE +/- 0.14, N = 330.15

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool60120180240300SE +/- 1.19, N = 3265.20

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool1632486480SE +/- 0.05, N = 372.04

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool20406080100SE +/- 0.07, N = 3110.99

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-StreamInitial test 1 No water cool3691215SE +/- 0.02, N = 311.06

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-StreamInitial test 1 No water cool20406080100SE +/- 0.16, N = 390.36

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Baseline - Scenario: Synchronous Single-StreamInitial test 1 No water cool1.2962.5923.8885.1846.48SE +/- 0.0168, N = 35.7601

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Baseline - Scenario: Synchronous Single-StreamInitial test 1 No water cool4080120160200SE +/- 0.50, N = 3173.38

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool0.88291.76582.64873.53164.4145SE +/- 0.0211, N = 33.9240

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-StreamInitial test 1 No water cool400800120016002000SE +/- 10.82, N = 32031.88

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-StreamInitial test 1 No water cool1.29232.58463.87695.16926.4615SE +/- 0.0188, N = 35.7436

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-StreamInitial test 1 No water cool4080120160200SE +/- 0.56, N = 3173.90

DeepSpeech

Mozilla DeepSpeech is a speech-to-text engine powered by TensorFlow for machine learning and derived from Baidu's Deep Speech research paper. This test profile times the speech-to-text process for a roughly three minute audio recording. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPUInitial test 1 No water cool1122334455SE +/- 0.35, N = 347.04

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-StreamInitial test 1 No water cool0.18480.36960.55440.73920.924SE +/- 0.0023, N = 30.8214

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.6Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-StreamInitial test 1 No water cool30060090012001500SE +/- 3.33, N = 31214.24

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

Device: CPU - Batch Size: 512 - Model: VGG-16

Initial test 1 No water cool: The test quit with a non-zero exit status. E: Fatal Python error: Segmentation fault

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_1d - Engine: CPUInitial test 1 No water cool0.68891.37782.06672.75563.4445SE +/- 0.03049, N = 53.06179MIN: 2.281. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_lInitial test 1 No water cool1632486480SE +/- 0.47, N = 369.80MIN: 59.21 / MAX: 72.04

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_lInitial test 1 No water cool1632486480SE +/- 0.08, N = 369.97MIN: 59.27 / MAX: 71.36

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_lInitial test 1 No water cool1632486480SE +/- 0.19, N = 370.52MIN: 60 / MAX: 71.86

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_lInitial test 1 No water cool1632486480SE +/- 0.85, N = 370.63MIN: 59.58 / MAX: 73.6

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_lInitial test 1 No water cool1632486480SE +/- 0.55, N = 370.81MIN: 59.4 / MAX: 72.65

Mlpack Benchmark

Mlpack benchmark scripts for machine learning libraries Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_icaInitial test 1 No water cool714212835SE +/- 0.05, N = 330.12

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: GoogLeNetInitial test 1 No water cool306090120150SE +/- 0.15, N = 3122.39

Device: CPU - Batch Size: 512 - Model: ResNet-50

Initial test 1 No water cool: The test quit with a non-zero exit status. E: Fatal Python error: Segmentation fault

Mlpack Benchmark

Mlpack benchmark scripts for machine learning libraries Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_linearridgeregressionInitial test 1 No water cool0.23180.46360.69540.92721.159SE +/- 0.00, N = 31.03

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: ResNet-50Initial test 1 No water cool0.95631.91262.86893.82524.7815SE +/- 0.02, N = 34.25

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Contextual Anomaly Detector OSEInitial test 1 No water cool612182430SE +/- 0.23, N = 325.40

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: VGG-16Initial test 1 No water cool1.06652.1333.19954.2665.3325SE +/- 0.01, N = 34.74

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Windowed GaussianInitial test 1 No water cool1.12142.24283.36424.48565.607SE +/- 0.046, N = 154.984

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 64 - Model: AlexNetInitial test 1 No water cool70140210280350SE +/- 1.28, N = 3305.81

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50Initial test 1 No water cool1428425670SE +/- 0.45, N = 364.81MIN: 58.56 / MAX: 67.34

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152Initial test 1 No water cool306090120150SE +/- 0.83, N = 3138.62MIN: 121.16 / MAX: 142.38

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152Initial test 1 No water cool306090120150SE +/- 0.16, N = 3138.78MIN: 119.06 / MAX: 141.64

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152Initial test 1 No water cool306090120150SE +/- 0.75, N = 3139.41MIN: 120.33 / MAX: 143.02

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152Initial test 1 No water cool306090120150SE +/- 0.85, N = 3138.72MIN: 120.45 / MAX: 142.01

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152Initial test 1 No water cool306090120150SE +/- 1.67, N = 3140.41MIN: 122.18 / MAX: 145.83

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_lInitial test 1 No water cool1632486480SE +/- 0.41, N = 371.98MIN: 60.96 / MAX: 73.57

Mlpack Benchmark

Mlpack benchmark scripts for machine learning libraries Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterMlpack BenchmarkBenchmark: scikit_svmInitial test 1 No water cool48121620SE +/- 0.05, N = 315.12

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 32 - Model: AlexNetInitial test 1 No water cool50100150200250SE +/- 0.23, N = 3224.56

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: GoogLeNetInitial test 1 No water cool306090120150SE +/- 0.29, N = 3125.83

RNNoise

RNNoise is a recurrent neural network for audio noise reduction developed by Mozilla and Xiph.Org. This test profile is a single-threaded test measuring the time to denoise a sample 26 minute long 16-bit RAW audio file using this recurrent neural network noise suppression library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 2020-06-28Initial test 1 No water cool48121620SE +/- 0.16, N = 313.711. (CC) gcc options: -O2 -pedantic -fvisibility=hidden

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 16 - Model: AlexNetInitial test 1 No water cool306090120150SE +/- 0.24, N = 3148.72

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Bayesian ChangepointInitial test 1 No water cool3691215SE +/- 0.05, N = 313.21

TNN

TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: MobileNet v2Initial test 1 No water cool4080120160200SE +/- 0.72, N = 3183.28MIN: 178.67 / MAX: 193.661. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v1.1Initial test 1 No water cool4080120160200SE +/- 1.36, N = 3179.68MIN: 176.67 / MAX: 181.681. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: ResNet-50Initial test 1 No water cool3691215SE +/- 0.02, N = 312.70

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: GoogLeNetInitial test 1 No water cool3691215SE +/- 0.04, N = 312.36

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Relative EntropyInitial test 1 No water cool246810SE +/- 0.101, N = 48.281

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152Initial test 1 No water cool306090120150SE +/- 0.86, N = 3137.39MIN: 123.44 / MAX: 140.4

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: GPU - Batch Size: 1 - Model: AlexNetInitial test 1 No water cool3691215SE +/- 0.01, N = 312.58

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: AlexNetInitial test 1 No water cool3691215SE +/- 0.01, N = 313.00

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: IP Shapes 3D - Engine: CPUInitial test 1 No water cool0.99491.98982.98473.97964.9745SE +/- 0.00764, N = 34.42170MIN: 4.191. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50Initial test 1 No water cool80160240320400SE +/- 0.44, N = 3380.34MIN: 332.41 / MAX: 387.86

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50Initial test 1 No water cool80160240320400SE +/- 0.25, N = 3379.98MIN: 282.37 / MAX: 389.84

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50Initial test 1 No water cool80160240320400SE +/- 1.52, N = 3380.67MIN: 329.56 / MAX: 390.17

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50Initial test 1 No water cool80160240320400SE +/- 3.53, N = 3380.74MIN: 326.25 / MAX: 392.43

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50Initial test 1 No water cool80160240320400SE +/- 3.59, N = 3383.56MIN: 325.34 / MAX: 393.54

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthInitial test 1 No water cool6001200180024003000SE +/- 3.00, N = 32985.701. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Convolution Batch Shapes Auto - Engine: CPUInitial test 1 No water cool246810SE +/- 0.01686, N = 37.16631MIN: 6.751. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NInitial test 1 No water cool3K6K9K12K15KSE +/- 99.14, N = 1113212.01. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50Initial test 1 No water cool80160240320400SE +/- 3.94, N = 3387.06MIN: 305.77 / MAX: 401.83

Whisper.cpp

Whisper.cpp is a port of OpenAI's Whisper model in C/C++. Whisper.cpp is developed by Georgi Gerganov for transcribing WAV audio files to text / speech recognition. Whisper.cpp supports ARM NEON, x86 AVX, and other advanced CPU features. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.4Model: ggml-medium.en - Input: 2016 State of the UnionInitial test 1 No water cool0.19490.38980.58470.77960.9745SE +/- 0.08461, N = 150.866151. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.12Device: CPU - Batch Size: 1 - Model: GoogLeNetInitial test 1 No water cool1122334455SE +/- 0.14, N = 347.21

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackInitial test 1 No water cool612182430SE +/- 0.00, N = 327.071. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.4Harness: Deconvolution Batch shapes_3d - Engine: CPUInitial test 1 No water cool0.57721.15441.73162.30882.886SE +/- 0.00599, N = 32.56519MIN: 2.371. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

TNN

TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v2Initial test 1 No water cool1020304050SE +/- 0.13, N = 342.22MIN: 41.67 / MAX: 45.741. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DInitial test 1 No water cool70140210280350SE +/- 0.24, N = 3299.471. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

Whisper.cpp

Whisper.cpp is a port of OpenAI's Whisper model in C/C++. Whisper.cpp is developed by Georgi Gerganov for transcribing WAV audio files to text / speech recognition. Whisper.cpp supports ARM NEON, x86 AVX, and other advanced CPU features. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.4Model: ggml-small.en - Input: 2016 State of the UnionInitial test 1 No water cool0.07850.1570.23550.3140.3925SE +/- 0.02582, N = 150.348811. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPInitial test 1 No water cool30060090012001500SE +/- 1.18, N = 31292.531. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadInitial test 1 No water cool612182430SE +/- 0.05, N = 325.461. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionInitial test 1 No water cool80160240320400SE +/- 0.08, N = 3388.931. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadInitial test 1 No water cool612182430SE +/- 0.00, N = 326.831. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

Whisper.cpp

Whisper.cpp is a port of OpenAI's Whisper model in C/C++. Whisper.cpp is developed by Georgi Gerganov for transcribing WAV audio files to text / speech recognition. Whisper.cpp supports ARM NEON, x86 AVX, and other advanced CPU features. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.4Model: ggml-base.en - Input: 2016 State of the UnionInitial test 1 No water cool0.03450.0690.10350.1380.1725SE +/- 0.00750, N = 150.153501. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

Benchmark: Tree

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashInitial test 1 No water cool1122334455SE +/- 0.04, N = 347.901. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

FP16: No - Mode: Inference - Network: VGG16 - Device: CPU

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: cannot import name 'Iterable' from 'collections' (/usr/lib/python3.10/collections/__init__.py)

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

Benchmark: Plot Non-Negative Matrix Factorization

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: cannot import name 'Iterable' from 'collections' (/usr/lib/python3.10/collections/__init__.py)

Scikit-Learn

Scikit-learn is a Python module for machine learning built on NumPy, SciPy, and is BSD-licensed. Learn more via the OpenBenchmarking.org test page.

Benchmark: Kernel PCA Solvers / Time vs. N Samples

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Text Vectorizers

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Kernel PCA Solvers / Time vs. N Components

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Sample Without Replacement

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: RCV1 Logreg Convergencet

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Parallel Pairwise

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Hist Gradient Boosting

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: SGD Regression

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Neighbors

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Feature Expansions

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Incremental PCA

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Isolation Forest

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: LocalOutlierFactor

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Hist Gradient Boosting Adult

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Hist Gradient Boosting Higgs Boson

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: GLM

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Sparse Random Projections / 100 Iterations

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Hist Gradient Boosting Categorical Only

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Polynomial Kernel Approximation

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: 20 Newsgroups / Logistic Regression

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Hist Gradient Boosting Threading

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Covertype Dataset Benchmark

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Isotonic / Logistic

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot OMP vs. LARS

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Hierarchical

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Fast KMeans

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Lasso Path

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: MNIST Dataset

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Sparsify

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Glmnet

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Lasso

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: SAGA

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Isotonic / Perturbed Logarithm

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Isotonic / Pathological

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: TSNE MNIST Dataset

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Ward

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Benchmark: Plot Singular Value Decomposition

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/libblas.so.3: undefined symbol: gotoblas

Benchmark: SGDOneClassSVM

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ImportError: /lib/x86_64-linux-gnu/liblapack.so.3: undefined symbol: gotoblas

Llamafile

Mozilla's Llamafile allows distributing and running large language models (LLMs) as a single file. Llamafile aims to make open-source LLMs more accessible to developers and users. Llamafile supports a variety of models, CPUs and GPUs, and other options. Learn more via the OpenBenchmarking.org test page.

Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./run-mistral: line 2: ./mistral-7b-instruct-v0.2.Q8_0.llamafile: No such file or directory

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./run-wizardcoder: line 2: ./wizardcoder-python-34b-v1.0.Q6_K.llamafile: No such file or directory

Test: llava-v1.5-7b-q4 - Acceleration: CPU

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./run-llava: line 2: ./llava-v1.5-7b-q4.llamafile: No such file or directory

Llama.cpp

Llama.cpp is a port of Facebook's LLaMA model in C/C++ developed by Georgi Gerganov. Llama.cpp allows the inference of LLaMA and other supported models in C/C++. For CPU inference Llama.cpp supports AVX2/AVX-512, ARM NEON, and other modern ISAs along with features like OpenBLAS usage. Learn more via the OpenBenchmarking.org test page.

Model: llama-2-70b-chat.Q5_0.gguf

Initial test 1 No water cool: The test quit with a non-zero exit status. E: main: error: unable to load model

Model: llama-2-13b.Q4_0.gguf

Initial test 1 No water cool: The test quit with a non-zero exit status. E: main: error: unable to load model

Model: llama-2-7b.Q4_0.gguf

Initial test 1 No water cool: The test quit with a non-zero exit status. E: main: error: unable to load model

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

Model: GPT-2 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: super-resolution-10 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: super-resolution-10 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: bertsquad-12 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: bertsquad-12 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: T5 Encoder - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: T5 Encoder - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: yolov4 - Device: CPU - Executor: Standard

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: yolov4 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Model: GPT-2 - Device: CPU - Executor: Parallel

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./onnx: line 2: ./onnxruntime/build/Linux/Release/onnxruntime_perf_test: No such file or directory

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

Model: GoogleNet - Acceleration: CPU - Iterations: 1000

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: GoogleNet - Acceleration: CPU - Iterations: 200

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: GoogleNet - Acceleration: CPU - Iterations: 100

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: AlexNet - Acceleration: CPU - Iterations: 1000

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: AlexNet - Acceleration: CPU - Iterations: 200

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

Model: AlexNet - Acceleration: CPU - Iterations: 100

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./caffe: 3: ./tools/caffe: not found

R Benchmark

This test is a quick-running survey of general R performance Learn more via the OpenBenchmarking.org test page.

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ERROR: Rscript is not found on the system!

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

Backend: BLAS

Initial test 1 No water cool: The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

261 Results Shown

TensorFlow:
  GPU - 256 - VGG-16
  GPU - 256 - ResNet-50
  GPU - 64 - VGG-16
  GPU - 512 - GoogLeNet
  GPU - 32 - VGG-16
  GPU - 256 - GoogLeNet
SHOC Scalable HeterOgeneous Computing
TensorFlow:
  GPU - 512 - AlexNet
  CPU - 256 - VGG-16
  GPU - 64 - ResNet-50
  GPU - 16 - VGG-16
  GPU - 256 - AlexNet
  CPU - 256 - ResNet-50
  GPU - 32 - ResNet-50
  CPU - 512 - GoogLeNet
  GPU - 64 - GoogLeNet
  CPU - 64 - VGG-16
  GPU - 16 - ResNet-50
Numenta Anomaly Benchmark
AI Benchmark Alpha:
  Device AI Score
  Device Training Score
  Device Inference Score
TensorFlow:
  CPU - 256 - GoogLeNet
  GPU - 32 - GoogLeNet
  CPU - 32 - VGG-16
  GPU - 64 - AlexNet
  CPU - 64 - ResNet-50
PyTorch:
  CPU - 512 - Efficientnet_v2_l
  CPU - 16 - Efficientnet_v2_l
  CPU - 32 - Efficientnet_v2_l
  CPU - 256 - Efficientnet_v2_l
  CPU - 64 - Efficientnet_v2_l
OpenCV
OpenVINO:
  Face Detection FP16 - CPU:
    ms
    FPS
TensorFlow
TNN
Mobile Neural Network:
  inception-v3
  mobilenet-v1-1.0
  MobileNetV2_224
  SqueezeNetV1.0
  resnet-v2-50
  squeezenetv1.1
  mobilenetV3
  nasnet
TensorFlow
Numpy Benchmark
TensorFlow
PyTorch:
  CPU - 256 - ResNet-152
  CPU - 64 - ResNet-152
  CPU - 512 - ResNet-152
  CPU - 32 - ResNet-152
  CPU - 16 - ResNet-152
TensorFlow:
  GPU - 32 - AlexNet
  CPU - 32 - ResNet-50
PyTorch
TensorFlow
oneDNN:
  Recurrent Neural Network Training - CPU
  Recurrent Neural Network Inference - CPU
TensorFlow
Neural Magic DeepSparse:
  BERT-Large, NLP Question Answering - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  BERT-Large, NLP Question Answering - Synchronous Single-Stream:
    ms/batch
    items/sec
Mlpack Benchmark
OpenVINO:
  Face Detection FP16-INT8 - CPU:
    ms
    FPS
NCNN:
  CPU - FastestDet
  CPU - vision_transformer
  CPU - regnety_400m
  CPU - squeezenet_ssd
  CPU - yolov4-tiny
  CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  CPU - resnet50
  CPU - alexnet
  CPU - resnet18
  CPU - vgg16
  CPU - googlenet
  CPU - blazeface
  CPU - efficientnet-b0
  CPU - mnasnet
  CPU - shufflenet-v2
  CPU-v3-v3 - mobilenet-v3
  CPU-v2-v2 - mobilenet-v2
  CPU - mobilenet
TensorFlow
NCNN:
  Vulkan GPU - FastestDet
  Vulkan GPU - vision_transformer
  Vulkan GPU - regnety_400m
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - yolov4-tiny
  Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  Vulkan GPU - resnet50
  Vulkan GPU - alexnet
  Vulkan GPU - resnet18
  Vulkan GPU - vgg16
  Vulkan GPU - googlenet
  Vulkan GPU - blazeface
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - mnasnet
  Vulkan GPU - shufflenet-v2
  Vulkan GPU-v3-v3 - mobilenet-v3
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU - mobilenet
OpenVINO:
  Machine Translation EN To DE FP16 - CPU:
    ms
    FPS
  Person Detection FP16 - CPU:
    ms
    FPS
  Person Detection FP32 - CPU:
    ms
    FPS
TensorFlow Lite
OpenVINO:
  Noise Suppression Poconet-Like FP16 - CPU:
    ms
    FPS
TensorFlow Lite
OpenVINO:
  Road Segmentation ADAS FP16-INT8 - CPU:
    ms
    FPS
TensorFlow Lite:
  NASNet Mobile
  Mobilenet Float
  SqueezeNet
OpenVINO:
  Person Vehicle Bike Detection FP16 - CPU:
    ms
    FPS
TensorFlow Lite
OpenVINO:
  Road Segmentation ADAS FP16 - CPU:
    ms
    FPS
  Person Re-Identification Retail FP16 - CPU:
    ms
    FPS
  Handwritten English Recognition FP16-INT8 - CPU:
    ms
    FPS
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
  Face Detection Retail FP16-INT8 - CPU:
    ms
    FPS
  Handwritten English Recognition FP16 - CPU:
    ms
    FPS
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
    ms
    FPS
  Age Gender Recognition Retail 0013 FP16 - CPU:
    ms
    FPS
  Vehicle Detection FP16 - CPU:
    ms
    FPS
  Weld Porosity Detection FP16 - CPU:
    ms
    FPS
  Face Detection Retail FP16 - CPU:
    ms
    FPS
  Weld Porosity Detection FP16-INT8 - CPU:
    ms
    FPS
TensorFlow
Numenta Anomaly Benchmark
TensorFlow
oneDNN
PyTorch:
  CPU - 1 - ResNet-152
  CPU - 512 - ResNet-50
  CPU - 256 - ResNet-50
  CPU - 64 - ResNet-50
  CPU - 32 - ResNet-50
  CPU - 16 - ResNet-50
Neural Magic DeepSparse:
  NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream:
    ms/batch
    items/sec
spaCy:
  en_core_web_trf
  en_core_web_lg
Neural Magic DeepSparse:
  NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream:
    ms/batch
    items/sec
  NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream:
    ms/batch
    items/sec
  NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream:
    ms/batch
    items/sec
  CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream:
    ms/batch
    items/sec
  CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream:
    ms/batch
    items/sec
  NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream:
    ms/batch
    items/sec
  CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  ResNet-50, Baseline - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Detection, YOLOv5s COCO - Synchronous Single-Stream:
    ms/batch
    items/sec
  ResNet-50, Baseline - Synchronous Single-Stream:
    ms/batch
    items/sec
  ResNet-50, Sparse INT8 - Asynchronous Multi-Stream:
    ms/batch
    items/sec
  CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream:
    ms/batch
    items/sec
DeepSpeech
Neural Magic DeepSparse:
  ResNet-50, Sparse INT8 - Synchronous Single-Stream:
    ms/batch
    items/sec
oneDNN
PyTorch:
  NVIDIA CUDA GPU - 64 - Efficientnet_v2_l
  NVIDIA CUDA GPU - 512 - Efficientnet_v2_l
  NVIDIA CUDA GPU - 32 - Efficientnet_v2_l
  NVIDIA CUDA GPU - 16 - Efficientnet_v2_l
  NVIDIA CUDA GPU - 256 - Efficientnet_v2_l
Mlpack Benchmark
TensorFlow
Mlpack Benchmark
TensorFlow
Numenta Anomaly Benchmark
TensorFlow
Numenta Anomaly Benchmark
TensorFlow
PyTorch:
  CPU - 1 - ResNet-50
  NVIDIA CUDA GPU - 256 - ResNet-152
  NVIDIA CUDA GPU - 16 - ResNet-152
  NVIDIA CUDA GPU - 32 - ResNet-152
  NVIDIA CUDA GPU - 64 - ResNet-152
  NVIDIA CUDA GPU - 512 - ResNet-152
  NVIDIA CUDA GPU - 1 - Efficientnet_v2_l
Mlpack Benchmark
TensorFlow:
  CPU - 32 - AlexNet
  CPU - 16 - GoogLeNet
RNNoise
TensorFlow
Numenta Anomaly Benchmark
TNN:
  CPU - MobileNet v2
  CPU - SqueezeNet v1.1
TensorFlow:
  CPU - 1 - ResNet-50
  GPU - 1 - GoogLeNet
Numenta Anomaly Benchmark
PyTorch
TensorFlow:
  GPU - 1 - AlexNet
  CPU - 1 - AlexNet
oneDNN
PyTorch:
  NVIDIA CUDA GPU - 256 - ResNet-50
  NVIDIA CUDA GPU - 64 - ResNet-50
  NVIDIA CUDA GPU - 16 - ResNet-50
  NVIDIA CUDA GPU - 32 - ResNet-50
  NVIDIA CUDA GPU - 512 - ResNet-50
SHOC Scalable HeterOgeneous Computing
oneDNN
SHOC Scalable HeterOgeneous Computing
PyTorch
Whisper.cpp
TensorFlow
SHOC Scalable HeterOgeneous Computing
oneDNN
TNN
SHOC Scalable HeterOgeneous Computing
Whisper.cpp
SHOC Scalable HeterOgeneous Computing:
  OpenCL - FFT SP
  OpenCL - Triad
  OpenCL - Reduction
  OpenCL - Bus Speed Download
Whisper.cpp
SHOC Scalable HeterOgeneous Computing