HDVR4-A8.9600-1 AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C+6G testing with a ASRock A320M-HDV R4.0 (P2.00 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2312201-HERT-HDVR4A802&grr .
HDVR4-A8.9600-1 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C+6G @ 3.10GHz (2 Cores / 4 Threads) ASRock A320M-HDV R4.0 (P2.00 BIOS) AMD 15h 3584MB 1000GB Western Digital WDS100T2B0A llvmpipe AMD Kabini HDMI/DP Realtek RTL8111/8168/8411 Ubuntu 20.04 5.15.0-89-generic (x86_64) GNOME Shell 3.36.9 X Server 1.20.13 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits) 1.1.182 GCC 9.4.0 ext4 1368x768 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-9QDOt0/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x600611a - Python 3.8.10 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT vulnerable + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
HDVR4-A8.9600-1 whisper-cpp: ggml-medium.en - 2016 State of the Union tensorflow: CPU - 64 - VGG-16 scikit-learn: Sparse Rand Projections / 100 Iterations tensorflow: CPU - 256 - GoogLeNet whisper-cpp: ggml-small.en - 2016 State of the Union tensorflow: CPU - 64 - ResNet-50 tensorflow: CPU - 32 - VGG-16 pytorch: CPU - 32 - Efficientnet_v2_l tensorflow: CPU - 512 - AlexNet pytorch: CPU - 16 - Efficientnet_v2_l tensorflow: CPU - 16 - VGG-16 pytorch: CPU - 64 - Efficientnet_v2_l pytorch: CPU - 256 - Efficientnet_v2_l pytorch: CPU - 512 - Efficientnet_v2_l scikit-learn: SAGA scikit-learn: GLM whisper-cpp: ggml-base.en - 2016 State of the Union tensorflow: CPU - 32 - ResNet-50 tensorflow: CPU - 256 - AlexNet caffe: GoogleNet - CPU - 1000 pytorch: CPU - 32 - ResNet-50 pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 32 - ResNet-152 scikit-learn: TSNE MNIST Dataset pytorch: CPU - 64 - ResNet-152 pytorch: CPU - 256 - ResNet-152 pytorch: CPU - 512 - ResNet-152 scikit-learn: Lasso plaidml: No - Inference - VGG16 - CPU tensorflow: CPU - 64 - GoogLeNet pytorch: CPU - 256 - ResNet-50 plaidml: No - Inference - ResNet 50 - CPU scikit-learn: Covertype Dataset Benchmark tensorflow: CPU - 16 - ResNet-50 scikit-learn: Plot Lasso Path scikit-learn: Hist Gradient Boosting Threading caffe: AlexNet - CPU - 1000 pytorch: CPU - 1 - Efficientnet_v2_l scikit-learn: Plot Polynomial Kernel Approximation numenta-nab: KNN CAD scikit-learn: Plot Hierarchical scikit-learn: Kernel PCA Solvers / Time vs. N Samples numenta-nab: Earthgecko Skyline scikit-learn: Plot Singular Value Decomposition mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: squeezenetv1.1 mnn: mobilenetV3 mnn: nasnet pytorch: CPU - 16 - ResNet-50 scikit-learn: Plot Neighbors pytorch: CPU - 64 - ResNet-50 pytorch: CPU - 512 - ResNet-50 deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream tensorflow: CPU - 32 - GoogLeNet scikit-learn: LocalOutlierFactor ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet ncnn: CPU - FastestDet ncnn: CPU - vision_transformer ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet pytorch: CPU - 1 - ResNet-152 tensorflow: CPU - 64 - AlexNet scikit-learn: Feature Expansions tnn: CPU - DenseNet scikit-learn: Tree scikit-learn: Hist Gradient Boosting opencv: DNN - Deep Neural Network scikit-learn: Hist Gradient Boosting Higgs Boson numpy: scikit-learn: SGD Regression scikit-learn: Plot OMP vs. LARS caffe: GoogleNet - CPU - 200 scikit-learn: Sample Without Replacement scikit-learn: Kernel PCA Solvers / Time vs. N Components tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 32 - AlexNet lczero: BLAS deepspeech: CPU numenta-nab: Bayesian Changepoint openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU tensorflow-lite: Mobilenet Float scikit-learn: Sparsify pytorch: CPU - 1 - ResNet-50 onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU numenta-nab: Contextual Anomaly Detector OSE scikit-learn: Plot Ward scikit-learn: Hist Gradient Boosting Adult tensorflow: CPU - 16 - AlexNet deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream scikit-learn: MNIST Dataset caffe: GoogleNet - CPU - 100 mlpack: scikit_ica scikit-learn: Text Vectorizers caffe: AlexNet - CPU - 200 onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU mlpack: scikit_svm scikit-learn: 20 Newsgroups / Logistic Regression scikit-learn: Plot Incremental PCA onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel numenta-nab: Relative Entropy onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel caffe: AlexNet - CPU - 100 onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU scikit-learn: Hist Gradient Boosting Categorical Only deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU tensorflow-lite: Inception V4 tensorflow-lite: Inception ResNet V2 openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU rbenchmark: openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU tensorflow-lite: NASNet Mobile deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU tensorflow-lite: SqueezeNet tensorflow-lite: Mobilenet Quant openvino: Face Detection Retail FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream numenta-nab: Windowed Gaussian tnn: CPU - MobileNet v2 deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream tnn: CPU - SqueezeNet v1.1 deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream rnnoise: onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU tnn: CPU - SqueezeNet v2 onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 29217.395 0.45 8668.525 3.02 8252.8063 1.02 0.53 0.59 12.23 0.59 0.61 0.60 0.60 0.60 1919.812 1877.535 2498.66658 1.50 12.02 2318550 2.21 0.91 0.93 1425.788 0.92 0.92 0.93 1312.064 1.33 4.58 2.17 1.60 965.025 1.47 835.044 838.410 1079113 1.10 761.959 1009.020 722.267 720.168 953.649 692.635 163.103 20.461 17.253 32.409 130.581 17.625 6.958 46.612 2.24 599.881 2.21 2.23 1089.9907 0.9181 4.53 559.538 22.62 1140.71 33.20 68.80 139.80 135.98 42.25 61.47 335.23 75.41 5.32 39.25 25.62 16.53 22.45 31.44 100.53 22.26 1141.93 33.68 68.66 139.59 135.48 42.29 60.75 333.56 75.15 5.33 39.50 25.63 16.32 22.58 31.61 100.21 1.62 10.54 491.632 8874.345 103.793 414.916 106960 309.508 122.42 358.670 355.351 457100 338.633 329.194 4.48 9.07 124 378.78422 325.849 593.05 3.40 30.73 65.08 26921.0 226.792 3.85 38767.3 38464.9 38466.3 272.340 202.927 201.998 7.02 2245.6114 0.8881 177.394 231503 223.37 171.503 216588 21318.6 21158.0 21148.2 44.60 126.691 110.289 120.159 8.32539 119.856 13409.4 0.0745773 107310 9394.74 0.106443 1319.06 0.758124 814.166 1.22825 514.172 1.94502 893.652 1.11900 85.3327 11.7164 1109.54 0.901280 46.2448 21.6196 281.839 3.54808 38.0923 26.2489 154.240 6.48329 22.4319 44.5752 84.5800 11.8228 87.7839 11.3952 74.1834 13.4772 587.13 3.41 61.348 137.2652 14.5588 8841.25 0.23 6728.06 0.3 316.5860 6.3146 157.3097 6.3561 2218.2224 0.9012 1041.2191 0.9604 1044.45 1.91 1042.19 1.92 716.36 2.79 547885 419892 202.73 9.86 84.09 23.78 426.61 4.69 98.95 20.20 0.6143 168.71 11.85 66075.7 2785.3976 0.7139 67.76 29.49 92.76 21.55 34007.2 28341.7 42.70 46.80 2.90 681.70 4.54 438.38 2788.6516 0.7168 1351.8913 0.7398 1354.9126 0.7380 57.324 656.292 325.3001 6.1356 181.7890 5.5004 262.3391 7.6144 261.9379 7.6267 554.8602 3.6022 565.8025 3.5275 253.3570 3.9464 120.2128 8.3171 250.6134 3.9896 42.6926 46.7652 119.9662 8.3342 557.950 23.4090 42.6815 36.435 53.2981 32.5108 42.2339 23.7873 60.6197 45.5410 12.0047 135.312 90.5910 70.4784 54.1334 OpenBenchmarking.org
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-medium.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 3.98, N = 3 29217.40 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
TensorFlow Device: CPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1013 0.2026 0.3039 0.4052 0.5065 SE +/- 0.00, N = 3 0.45
Scikit-Learn Benchmark: Sparse Random Projections / 100 Iterations OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparse Random Projections / 100 Iterations llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 67.77, N = 3 8668.53 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6795 1.359 2.0385 2.718 3.3975 SE +/- 0.03, N = 3 3.02
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-small.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 4.12, N = 3 8252.81 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2295 0.459 0.6885 0.918 1.1475 SE +/- 0.01, N = 3 1.02
TensorFlow Device: CPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1193 0.2386 0.3579 0.4772 0.5965 SE +/- 0.00, N = 3 0.53
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1328 0.2656 0.3984 0.5312 0.664 SE +/- 0.01, N = 5 0.59 MIN: 0.45 / MAX: 0.66
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.02, N = 3 12.23
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1328 0.2656 0.3984 0.5312 0.664 SE +/- 0.00, N = 3 0.59 MIN: 0.45 / MAX: 0.66
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1373 0.2746 0.4119 0.5492 0.6865 SE +/- 0.00, N = 3 0.61
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.01, N = 3 0.60 MIN: 0.44 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.00, N = 3 0.60 MIN: 0.45 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.01, N = 3 0.60 MIN: 0.45 / MAX: 0.66
Scikit-Learn Benchmark: SAGA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SAGA llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 400 800 1200 1600 2000 SE +/- 9.65, N = 3 1919.81 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: GLM OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: GLM llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 400 800 1200 1600 2000 SE +/- 3.03, N = 3 1877.54 1. (F9X) gfortran options: -O0
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-base.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 3.14, N = 3 2498.67 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3375 0.675 1.0125 1.35 1.6875 SE +/- 0.00, N = 3 1.50
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 12.02
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500K 1000K 1500K 2000K 2500K SE +/- 23558.22, N = 3 2318550 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4973 0.9946 1.4919 1.9892 2.4865 SE +/- 0.02, N = 8 2.21 MIN: 1.46 / MAX: 2.33
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2048 0.4096 0.6144 0.8192 1.024 SE +/- 0.01, N = 3 0.91 MIN: 0.58 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2093 0.4186 0.6279 0.8372 1.0465 SE +/- 0.01, N = 3 0.93 MIN: 0.58 / MAX: 0.96
Scikit-Learn Benchmark: TSNE MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: TSNE MNIST Dataset llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 3.72, N = 3 1425.79 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.207 0.414 0.621 0.828 1.035 SE +/- 0.00, N = 3 0.92 MIN: 0.59 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.207 0.414 0.621 0.828 1.035 SE +/- 0.00, N = 3 0.92 MIN: 0.58 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2093 0.4186 0.6279 0.8372 1.0465 SE +/- 0.00, N = 3 0.93 MIN: 0.56 / MAX: 0.96
Scikit-Learn Benchmark: Lasso OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Lasso llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 2.47, N = 3 1312.06 1. (F9X) gfortran options: -O0
PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2993 0.5986 0.8979 1.1972 1.4965 SE +/- 0.01, N = 3 1.33
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0305 2.061 3.0915 4.122 5.1525 SE +/- 0.00, N = 3 4.58
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4883 0.9766 1.4649 1.9532 2.4415 SE +/- 0.02, N = 5 2.17 MIN: 1.45 / MAX: 2.3
PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.36 0.72 1.08 1.44 1.8 SE +/- 0.01, N = 3 1.60
Scikit-Learn Benchmark: Covertype Dataset Benchmark OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Covertype Dataset Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.44, N = 3 965.03 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3308 0.6616 0.9924 1.3232 1.654 SE +/- 0.01, N = 3 1.47
Scikit-Learn Benchmark: Plot Lasso Path OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Lasso Path llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.49, N = 3 835.04 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Threading OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Threading llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.98, N = 3 838.41 1. (F9X) gfortran options: -O0
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200K 400K 600K 800K 1000K SE +/- 2982.40, N = 3 1079113 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2475 0.495 0.7425 0.99 1.2375 SE +/- 0.01, N = 3 1.10 MIN: 0.79 / MAX: 1.16
Scikit-Learn Benchmark: Plot Polynomial Kernel Approximation OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Polynomial Kernel Approximation llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 0.98, N = 3 761.96 1. (F9X) gfortran options: -O0
Numenta Anomaly Benchmark Detector: KNN CAD OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: KNN CAD llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 6.74, N = 3 1009.02
Scikit-Learn Benchmark: Plot Hierarchical OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Hierarchical llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 0.36, N = 3 722.27 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Samples OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Samples llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 1.47, N = 3 720.17 1. (F9X) gfortran options: -O0
Numenta Anomaly Benchmark Detector: Earthgecko Skyline OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Earthgecko Skyline llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.84, N = 3 953.65
Scikit-Learn Benchmark: Plot Singular Value Decomposition OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Singular Value Decomposition llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 2.73, N = 3 692.64 1. (F9X) gfortran options: -O0
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: inception-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.98, N = 3 163.10 MIN: 160.43 / MAX: 252.41 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenet-v1-1.0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.09, N = 3 20.46 MIN: 20.1 / MAX: 40.6 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: MobileNetV2_224 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.06, N = 3 17.25 MIN: 16.89 / MAX: 37.42 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: SqueezeNetV1.0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.08, N = 3 32.41 MIN: 31.77 / MAX: 53.11 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: resnet-v2-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.22, N = 3 130.58 MIN: 129.32 / MAX: 174.02 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: squeezenetv1.1 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.12, N = 3 17.63 MIN: 17.14 / MAX: 37.87 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenetV3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.031, N = 3 6.958 MIN: 6.82 / MAX: 9.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: nasnet OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: nasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.18, N = 3 46.61 MIN: 45.85 / MAX: 67.49 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.504 1.008 1.512 2.016 2.52 SE +/- 0.01, N = 3 2.24 MIN: 1.48 / MAX: 2.34
Scikit-Learn Benchmark: Plot Neighbors OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Neighbors llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 1.93, N = 3 599.88 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4973 0.9946 1.4919 1.9892 2.4865 SE +/- 0.02, N = 3 2.21 MIN: 1.41 / MAX: 2.32
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.5018 1.0036 1.5054 2.0072 2.509 SE +/- 0.02, N = 3 2.23 MIN: 1.47 / MAX: 2.34
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 10.83, N = 9 1089.99
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2066 0.4132 0.6198 0.8264 1.033 SE +/- 0.0086, N = 9 0.9181
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0193 2.0386 3.0579 4.0772 5.0965 SE +/- 0.00, N = 3 4.53
Scikit-Learn Benchmark: LocalOutlierFactor OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: LocalOutlierFactor llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 2.37, N = 3 559.54 1. (F9X) gfortran options: -O0
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.05, N = 3 22.62 MIN: 22.2 / MAX: 28.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.81, N = 3 1140.71 MIN: 1116.02 / MAX: 1259.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.03, N = 3 33.20 MIN: 32.32 / MAX: 52.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.35, N = 3 68.80 MIN: 67.79 / MAX: 146.66 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.25, N = 3 139.80 MIN: 137.44 / MAX: 150.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.11, N = 3 135.98 MIN: 134.41 / MAX: 153.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.10, N = 3 42.25 MIN: 41.49 / MAX: 44.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.12, N = 3 61.47 MIN: 60.86 / MAX: 79.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 1.47, N = 3 335.23 MIN: 328.93 / MAX: 356.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.17, N = 3 75.41 MIN: 73.99 / MAX: 100.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.197 2.394 3.591 4.788 5.985 SE +/- 0.03, N = 3 5.32 MIN: 5.19 / MAX: 8.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.08, N = 3 39.25 MIN: 38.83 / MAX: 45.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.04, N = 3 25.62 MIN: 25.19 / MAX: 58.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.10, N = 3 16.53 MIN: 16.14 / MAX: 36.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.09, N = 3 22.45 MIN: 21.93 / MAX: 41.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.10, N = 3 31.44 MIN: 31.12 / MAX: 34.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.16, N = 3 100.53 MIN: 99.32 / MAX: 121.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.11, N = 3 22.26 MIN: 21.72 / MAX: 40.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.72, N = 3 1141.93 MIN: 1114.61 / MAX: 1288.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.11, N = 3 33.68 MIN: 32.55 / MAX: 40.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.46, N = 3 68.66 MIN: 67.06 / MAX: 88.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.30, N = 3 139.59 MIN: 136.68 / MAX: 156.49 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.32, N = 3 135.48 MIN: 134.06 / MAX: 143.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.13, N = 3 42.29 MIN: 41.47 / MAX: 57.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.25, N = 3 60.75 MIN: 59.78 / MAX: 79.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 1.44, N = 3 333.56 MIN: 326.64 / MAX: 373.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.12, N = 3 75.15 MIN: 73.47 / MAX: 97.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.02, N = 3 5.33 MIN: 5.2 / MAX: 5.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.04, N = 3 39.50 MIN: 39.16 / MAX: 43.83 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.04, N = 3 25.63 MIN: 25.22 / MAX: 28.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.23, N = 3 16.32 MIN: 15.65 / MAX: 36.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.11, N = 3 22.58 MIN: 22.01 / MAX: 65.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.13, N = 3 31.61 MIN: 31.19 / MAX: 35.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.14, N = 3 100.21 MIN: 99.08 / MAX: 122.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3645 0.729 1.0935 1.458 1.8225 SE +/- 0.01, N = 3 1.62 MIN: 1.04 / MAX: 1.7
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 10.54
Scikit-Learn Benchmark: Feature Expansions OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Feature Expansions llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 110 220 330 440 550 SE +/- 0.21, N = 3 491.63 1. (F9X) gfortran options: -O0
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 65.94, N = 3 8874.35 MIN: 8585.49 / MAX: 9341.04 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
Scikit-Learn Benchmark: Tree OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Tree llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.84, N = 15 103.79 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 0.77, N = 3 414.92 1. (F9X) gfortran options: -O0
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.7 Test: DNN - Deep Neural Network llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20K 40K 60K 80K 100K SE +/- 2616.66, N = 15 106960 1. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt
Scikit-Learn Benchmark: Hist Gradient Boosting Higgs Boson OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Higgs Boson llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 2.43, N = 3 309.51 1. (F9X) gfortran options: -O0
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.39, N = 3 122.42
Scikit-Learn Benchmark: SGD Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SGD Regression llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 0.51, N = 3 358.67 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot OMP vs. LARS OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot OMP vs. LARS llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 4.35, N = 3 355.35 1. (F9X) gfortran options: -O0
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 100K 200K 300K 400K 500K SE +/- 498.93, N = 3 457100 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Scikit-Learn Benchmark: Sample Without Replacement OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sample Without Replacement llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 2.47, N = 3 338.63 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Components OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Components llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 0.55, N = 3 329.19 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.008 2.016 3.024 4.032 5.04 SE +/- 0.01, N = 3 4.48
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.00, N = 3 9.07
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.20, N = 3 124 1. (CXX) g++ options: -flto -pthread
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 0.80, N = 3 378.78
Numenta Anomaly Benchmark Detector: Bayesian Changepoint OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Bayesian Changepoint llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 3.10, N = 3 325.85
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 12.86, N = 15 593.05 MIN: 368.67 / MAX: 676.18 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.765 1.53 2.295 3.06 3.825 SE +/- 0.08, N = 15 3.40 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.25, N = 15 30.73 MIN: 17.23 / MAX: 53.2 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.53, N = 15 65.08 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Float llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 228.19, N = 15 26921.0
Scikit-Learn Benchmark: Sparsify OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparsify llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.05, N = 3 226.79 1. (F9X) gfortran options: -O0
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8663 1.7326 2.5989 3.4652 4.3315 SE +/- 0.02, N = 3 3.85 MIN: 2.63 / MAX: 4.09
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 123.68, N = 3 38767.3 MIN: 38319.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 111.64, N = 3 38464.9 MIN: 38219.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 25.25, N = 3 38466.3 MIN: 38310.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Numenta Anomaly Benchmark Detector: Contextual Anomaly Detector OSE OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Contextual Anomaly Detector OSE llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.36, N = 3 272.34
Scikit-Learn Benchmark: Plot Ward OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Ward llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.33, N = 3 202.93 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Adult OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Adult llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 2.73, N = 3 202.00 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.03, N = 3 7.02
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 11.37, N = 3 2245.61
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1998 0.3996 0.5994 0.7992 0.999 SE +/- 0.0040, N = 3 0.8881
Scikit-Learn Benchmark: MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: MNIST Dataset llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.10, N = 3 177.39 1. (F9X) gfortran options: -O0
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50K 100K 150K 200K 250K SE +/- 2399.23, N = 3 231503 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.45, N = 3 223.37
Scikit-Learn Benchmark: Text Vectorizers OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Text Vectorizers llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.44, N = 3 171.50 1. (F9X) gfortran options: -O0
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50K 100K 150K 200K 250K SE +/- 882.15, N = 3 216588 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 32.23, N = 3 21318.6 MIN: 21195.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 31.33, N = 3 21158.0 MIN: 21017.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 10.14, N = 3 21148.2 MIN: 21028.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.34, N = 10 44.60
Scikit-Learn Benchmark: 20 Newsgroups / Logistic Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: 20 Newsgroups / Logistic Regression llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.05, N = 3 126.69 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Incremental PCA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Incremental PCA llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.25, N = 3 110.29 1. (F9X) gfortran options: -O0
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.40, N = 4 120.16 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.09724, N = 4 8.32539 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
Numenta Anomaly Benchmark Detector: Relative Entropy OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Relative Entropy llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.36, N = 3 119.86
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3K 6K 9K 12K 15K SE +/- 55.92, N = 3 13409.4 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0168 0.0336 0.0504 0.0672 0.084 SE +/- 0.0003114, N = 3 0.0745773 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20K 40K 60K 80K 100K SE +/- 551.70, N = 3 107310 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 10.14, N = 3 9394.74 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0239 0.0478 0.0717 0.0956 0.1195 SE +/- 0.000115, N = 3 0.106443 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 4.38, N = 3 1319.06 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1706 0.3412 0.5118 0.6824 0.853 SE +/- 0.002525, N = 3 0.758124 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.30, N = 3 814.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2764 0.5528 0.8292 1.1056 1.382 SE +/- 0.00196, N = 3 1.22825 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 110 220 330 440 550 SE +/- 3.30, N = 3 514.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4376 0.8752 1.3128 1.7504 2.188 SE +/- 0.01254, N = 3 1.94502 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.67, N = 3 893.65 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2518 0.5036 0.7554 1.0072 1.259 SE +/- 0.00084, N = 3 1.11900 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.12, N = 3 85.33 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.02, N = 3 11.72 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 2.36, N = 3 1109.54 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2028 0.4056 0.6084 0.8112 1.014 SE +/- 0.001919, N = 3 0.901280 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.04, N = 3 46.24 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.02, N = 3 21.62 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.20, N = 3 281.84 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7983 1.5966 2.3949 3.1932 3.9915 SE +/- 0.00258, N = 3 3.54808 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.05, N = 3 38.09 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.03, N = 3 26.25 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.40, N = 3 154.24 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.01700, N = 3 6.48329 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.05, N = 3 22.43 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.09, N = 3 44.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.05, N = 3 84.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 11.82 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 1.18, N = 3 87.78 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.15, N = 3 11.40 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 16 32 48 64 80 SE +/- 0.04, N = 3 74.18
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 13.48
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 7.12, N = 4 587.13 MIN: 363.64 / MAX: 654.95 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7673 1.5346 2.3019 3.0692 3.8365 SE +/- 0.04, N = 4 3.41 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
Scikit-Learn Benchmark: Hist Gradient Boosting Categorical Only OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Categorical Only llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.19, N = 3 61.35 1. (F9X) gfortran options: -O0
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.10, N = 3 137.27
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.01, N = 3 14.56
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 5.44, N = 3 8841.25 MIN: 8808.56 / MAX: 8886.06 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0518 0.1036 0.1554 0.2072 0.259 SE +/- 0.00, N = 3 0.23 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1400 2800 4200 5600 7000 SE +/- 12.52, N = 3 6728.06 MIN: 6553.59 / MAX: 7002.51 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0675 0.135 0.2025 0.27 0.3375 SE +/- 0.00, N = 3 0.3 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 3.55, N = 3 316.59
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0711, N = 3 6.3146
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.53, N = 3 157.31
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0214, N = 3 6.3561
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 0.75, N = 3 2218.22
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2028 0.4056 0.6084 0.8112 1.014 SE +/- 0.0003, N = 3 0.9012
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.84, N = 3 1041.22
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2161 0.4322 0.6483 0.8644 1.0805 SE +/- 0.0017, N = 3 0.9604
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.28, N = 3 1044.45 MIN: 970.14 / MAX: 1115.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4298 0.8596 1.2894 1.7192 2.149 SE +/- 0.00, N = 3 1.91 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.70, N = 3 1042.19 MIN: 990.71 / MAX: 1084.32 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.432 0.864 1.296 1.728 2.16 SE +/- 0.00, N = 3 1.92 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 1.68, N = 3 716.36 MIN: 663.2 / MAX: 742.36 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6278 1.2556 1.8834 2.5112 3.139 SE +/- 0.00, N = 3 2.79 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception V4 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120K 240K 360K 480K 600K SE +/- 5686.85, N = 3 547885
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception ResNet V2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90K 180K 270K 360K 450K SE +/- 439.95, N = 3 419892
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 1.03, N = 3 202.73 MIN: 172.53 / MAX: 241.67 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.05, N = 3 9.86 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 1.13, N = 3 84.09 MIN: 43.23 / MAX: 147.08 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.32, N = 3 23.78 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 1.31, N = 3 426.61 MIN: 366.94 / MAX: 466.48 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0553 2.1106 3.1659 4.2212 5.2765 SE +/- 0.01, N = 3 4.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.80, N = 3 98.95 MIN: 63.4 / MAX: 121.43 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.16, N = 3 20.20 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1382 0.2764 0.4146 0.5528 0.691 SE +/- 0.0070, N = 3 0.6143 1. R scripting front-end version 3.6.3 (2020-02-29)
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.99, N = 3 168.71 MIN: 119.34 / MAX: 195.23 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.07, N = 3 11.85 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: NASNet Mobile llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14K 28K 42K 56K 70K SE +/- 427.94, N = 3 66075.7
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 600 1200 1800 2400 3000 SE +/- 3.11, N = 3 2785.40
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1606 0.3212 0.4818 0.6424 0.803 SE +/- 0.0019, N = 3 0.7139
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.29, N = 3 67.76 MIN: 39.77 / MAX: 88.07 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.13, N = 3 29.49 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.25, N = 3 92.76 MIN: 67.15 / MAX: 134.83 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.06, N = 3 21.55 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: SqueezeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7K 14K 21K 28K 35K SE +/- 45.90, N = 3 34007.2
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Quant llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 20.03, N = 3 28341.7
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.09, N = 3 42.70 MIN: 20.92 / MAX: 63.97 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.10, N = 3 46.80 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6525 1.305 1.9575 2.61 3.2625 SE +/- 0.01, N = 3 2.90 MIN: 1.7 / MAX: 21.02 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 2.10, N = 3 681.70 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0215 2.043 3.0645 4.086 5.1075 SE +/- 0.01, N = 3 4.54 MIN: 2.6 / MAX: 35.33 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 1.43, N = 3 438.38 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 600 1200 1800 2400 3000 SE +/- 1.19, N = 3 2788.65
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1613 0.3226 0.4839 0.6452 0.8065 SE +/- 0.0003, N = 3 0.7168
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 7.57, N = 3 1351.89
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1665 0.333 0.4995 0.666 0.8325 SE +/- 0.0042, N = 3 0.7398
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 1.17, N = 3 1354.91
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1661 0.3322 0.4983 0.6644 0.8305 SE +/- 0.0006, N = 3 0.7380
Numenta Anomaly Benchmark Detector: Windowed Gaussian OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Windowed Gaussian llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 13 26 39 52 65 SE +/- 0.27, N = 3 57.32
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 140 280 420 560 700 SE +/- 1.60, N = 3 656.29 MIN: 647.62 / MAX: 668.3 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 0.05, N = 3 325.30
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0051, N = 3 6.1356
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.23, N = 3 181.79
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.2376 2.4752 3.7128 4.9504 6.188 SE +/- 0.0069, N = 3 5.5004
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.82, N = 3 262.34
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0256, N = 3 7.6144
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.37, N = 3 261.94
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0155, N = 3 7.6267
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 1.70, N = 3 554.86
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8105 1.621 2.4315 3.242 4.0525 SE +/- 0.0115, N = 3 3.6022
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 1.23, N = 3 565.80
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7937 1.5874 2.3811 3.1748 3.9685 SE +/- 0.0022, N = 3 3.5275
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.11, N = 3 253.36
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8879 1.7758 2.6637 3.5516 4.4395 SE +/- 0.0017, N = 3 3.9464
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.12, N = 3 120.21
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0084, N = 3 8.3171
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.34, N = 3 250.61
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8977 1.7954 2.6931 3.5908 4.4885 SE +/- 0.0054, N = 3 3.9896
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.19, N = 3 42.69
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.21, N = 3 46.77
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.12, N = 3 119.97
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0086, N = 3 8.3342
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 2.06, N = 3 557.95 MIN: 553.57 / MAX: 564.29 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.03, N = 3 23.41
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.05, N = 3 42.68
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.07, N = 3 36.44 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 12 24 36 48 60 SE +/- 0.30, N = 3 53.30 MIN: 50.67 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.05, N = 3 32.51 MIN: 31.99 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.27, N = 3 42.23 MIN: 40.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.13, N = 3 23.79 MIN: 23.16 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.80, N = 12 60.62 MIN: 55.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.10, N = 3 45.54 MIN: 44.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.04, N = 3 12.00 MIN: 11.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.52, N = 3 135.31 MIN: 132.96 / MAX: 140.41 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.17, N = 3 90.59 MIN: 89.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 16 32 48 64 80 SE +/- 0.22, N = 3 70.48 MIN: 69.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 12 24 36 48 60 SE +/- 0.06, N = 3 54.13 MIN: 51.76 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Phoronix Test Suite v10.8.5