HDVR4-A8.9600-1 AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C+6G testing with a ASRock A320M-HDV R4.0 (P2.00 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2312201-HERT-HDVR4A802&grs .
HDVR4-A8.9600-1 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C+6G @ 3.10GHz (2 Cores / 4 Threads) ASRock A320M-HDV R4.0 (P2.00 BIOS) AMD 15h 3584MB 1000GB Western Digital WDS100T2B0A llvmpipe AMD Kabini HDMI/DP Realtek RTL8111/8168/8411 Ubuntu 20.04 5.15.0-89-generic (x86_64) GNOME Shell 3.36.9 X Server 1.20.13 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits) 1.1.182 GCC 9.4.0 ext4 1368x768 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-9QDOt0/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x600611a - Python 3.8.10 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT vulnerable + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
HDVR4-A8.9600-1 whisper-cpp: ggml-medium.en - 2016 State of the Union whisper-cpp: ggml-small.en - 2016 State of the Union whisper-cpp: ggml-base.en - 2016 State of the Union scikit-learn: Sparse Rand Projections / 100 Iterations scikit-learn: Kernel PCA Solvers / Time vs. N Components scikit-learn: Kernel PCA Solvers / Time vs. N Samples scikit-learn: Hist Gradient Boosting Categorical Only scikit-learn: Plot Polynomial Kernel Approximation scikit-learn: 20 Newsgroups / Logistic Regression scikit-learn: Hist Gradient Boosting Higgs Boson scikit-learn: Plot Singular Value Decomposition scikit-learn: Hist Gradient Boosting Threading scikit-learn: Hist Gradient Boosting Adult scikit-learn: Covertype Dataset Benchmark scikit-learn: Sample Without Replacement scikit-learn: Hist Gradient Boosting scikit-learn: Plot Incremental PCA scikit-learn: TSNE MNIST Dataset scikit-learn: LocalOutlierFactor scikit-learn: Feature Expansions scikit-learn: Plot OMP vs. LARS scikit-learn: Plot Hierarchical scikit-learn: Text Vectorizers scikit-learn: Plot Lasso Path scikit-learn: SGD Regression scikit-learn: Plot Neighbors scikit-learn: MNIST Dataset scikit-learn: Plot Ward scikit-learn: Sparsify scikit-learn: Lasso scikit-learn: Tree scikit-learn: SAGA scikit-learn: GLM mlpack: scikit_svm mlpack: scikit_ica onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel numenta-nab: Contextual Anomaly Detector OSE numenta-nab: Bayesian Changepoint numenta-nab: Earthgecko Skyline numenta-nab: Windowed Gaussian numenta-nab: Relative Entropy numenta-nab: KNN CAD openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU plaidml: No - Inference - ResNet 50 - CPU plaidml: No - Inference - VGG16 - CPU tnn: CPU - SqueezeNet v1.1 tnn: CPU - SqueezeNet v2 tnn: CPU - MobileNet v2 tnn: CPU - DenseNet ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet ncnn: CPU - FastestDet ncnn: CPU - vision_transformer ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet mnn: inception-v3 mnn: mobilenet-v1-1.0 mnn: MobileNetV2_224 mnn: SqueezeNetV1.0 mnn: resnet-v2-50 mnn: squeezenetv1.1 mnn: mobilenetV3 mnn: nasnet caffe: GoogleNet - CPU - 1000 caffe: GoogleNet - CPU - 200 caffe: GoogleNet - CPU - 100 caffe: AlexNet - CPU - 1000 caffe: AlexNet - CPU - 200 caffe: AlexNet - CPU - 100 deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream tensorflow: CPU - 256 - GoogLeNet tensorflow: CPU - 64 - ResNet-50 tensorflow: CPU - 64 - GoogLeNet tensorflow: CPU - 32 - ResNet-50 tensorflow: CPU - 32 - GoogLeNet tensorflow: CPU - 16 - ResNet-50 tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 512 - AlexNet tensorflow: CPU - 256 - AlexNet tensorflow: CPU - 64 - AlexNet tensorflow: CPU - 32 - AlexNet tensorflow: CPU - 16 - AlexNet tensorflow: CPU - 64 - VGG-16 tensorflow: CPU - 32 - VGG-16 tensorflow: CPU - 16 - VGG-16 pytorch: CPU - 512 - Efficientnet_v2_l pytorch: CPU - 256 - Efficientnet_v2_l pytorch: CPU - 64 - Efficientnet_v2_l pytorch: CPU - 32 - Efficientnet_v2_l pytorch: CPU - 16 - Efficientnet_v2_l pytorch: CPU - 1 - Efficientnet_v2_l pytorch: CPU - 512 - ResNet-152 pytorch: CPU - 256 - ResNet-152 pytorch: CPU - 64 - ResNet-152 pytorch: CPU - 512 - ResNet-50 pytorch: CPU - 32 - ResNet-152 pytorch: CPU - 256 - ResNet-50 pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 64 - ResNet-50 pytorch: CPU - 32 - ResNet-50 pytorch: CPU - 16 - ResNet-50 pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 1 - ResNet-50 tensorflow-lite: Inception ResNet V2 tensorflow-lite: Mobilenet Quant tensorflow-lite: Mobilenet Float tensorflow-lite: NASNet Mobile tensorflow-lite: Inception V4 tensorflow-lite: SqueezeNet rnnoise: rbenchmark: deepspeech: CPU numpy: onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - f32 - CPU lczero: BLAS opencv: DNN - Deep Neural Network onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 29217.395 8252.8063 2498.66658 8668.525 329.194 720.168 61.348 761.959 126.691 309.508 692.635 838.410 201.998 965.025 338.633 414.916 110.289 1425.788 559.538 491.632 355.351 722.267 171.503 835.044 358.670 599.881 177.394 202.927 226.792 1312.064 103.793 1919.812 1877.535 44.60 223.37 1.11900 0.901280 11.3952 8.32539 11.8228 6.48329 3.54808 1.94502 0.106443 0.0745773 44.5752 26.2489 1.22825 0.758124 21.6196 11.7164 272.340 325.849 953.649 57.324 119.856 1009.020 2.90 681.70 587.13 3.41 4.54 438.38 84.09 23.78 67.76 29.49 716.36 2.79 202.73 9.86 30.73 65.08 92.76 21.55 98.95 20.20 426.61 4.69 42.70 46.80 6728.06 0.3 168.71 11.85 1044.45 1.91 1042.19 1.92 8841.25 0.23 1.60 1.33 557.950 135.312 656.292 8874.345 22.62 1140.71 33.20 68.80 139.80 135.98 42.25 61.47 335.23 75.41 5.32 39.25 25.62 16.53 22.45 31.44 100.53 22.26 1141.93 33.68 68.66 139.59 135.48 42.29 60.75 333.56 75.15 5.33 39.50 25.63 16.32 22.58 31.61 100.21 163.103 20.461 17.253 32.409 130.581 17.625 6.958 46.612 2318550 457100 231503 1079113 216588 107310 1354.9126 0.7380 2788.6516 0.7168 157.3097 6.3561 316.5860 6.3146 1041.2191 0.9604 2218.2224 0.9012 181.7890 5.5004 325.3001 6.1356 250.6134 3.9896 565.8025 3.5275 120.2128 8.3171 261.9379 7.6267 1089.9907 0.9181 2245.6114 0.8881 253.3570 3.9464 554.8602 3.6022 23.4090 42.6815 42.6926 46.7652 119.9662 8.3342 262.3391 7.6144 74.1834 13.4772 137.2652 14.5588 1351.8913 0.7398 2785.3976 0.7139 3.02 1.02 4.58 1.50 4.53 1.47 4.48 12.23 12.02 10.54 9.07 7.02 0.45 0.53 0.61 0.60 0.60 0.60 0.59 0.59 1.10 0.93 0.92 0.92 2.23 0.93 2.17 0.91 2.21 2.21 2.24 1.62 3.85 419892 28341.7 26921.0 66075.7 547885 34007.2 36.435 0.6143 378.78422 122.42 21318.6 38767.3 21148.2 38464.9 21158.0 38466.3 60.6197 32.5108 70.4784 54.1334 53.2981 90.5910 12.0047 23.7873 45.5410 42.2339 124 106960 893.652 1109.54 87.7839 120.159 84.5800 154.240 281.839 514.172 9394.74 13409.4 22.4319 38.0923 814.166 1319.06 46.2448 85.3327 593.05 3.40 OpenBenchmarking.org
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-medium.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 3.98, N = 3 29217.40 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-small.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 4.12, N = 3 8252.81 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-base.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 3.14, N = 3 2498.67 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Scikit-Learn Benchmark: Sparse Random Projections / 100 Iterations OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparse Random Projections / 100 Iterations llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 67.77, N = 3 8668.53 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Components OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Components llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 0.55, N = 3 329.19 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Samples OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Samples llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 1.47, N = 3 720.17 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Categorical Only OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Categorical Only llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.19, N = 3 61.35 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Polynomial Kernel Approximation OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Polynomial Kernel Approximation llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 0.98, N = 3 761.96 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: 20 Newsgroups / Logistic Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: 20 Newsgroups / Logistic Regression llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.05, N = 3 126.69 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Higgs Boson OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Higgs Boson llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 2.43, N = 3 309.51 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Singular Value Decomposition OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Singular Value Decomposition llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 2.73, N = 3 692.64 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Threading OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Threading llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.98, N = 3 838.41 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Adult OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Adult llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 2.73, N = 3 202.00 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Covertype Dataset Benchmark OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Covertype Dataset Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.44, N = 3 965.03 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Sample Without Replacement OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sample Without Replacement llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 2.47, N = 3 338.63 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 0.77, N = 3 414.92 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Incremental PCA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Incremental PCA llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.25, N = 3 110.29 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: TSNE MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: TSNE MNIST Dataset llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 3.72, N = 3 1425.79 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: LocalOutlierFactor OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: LocalOutlierFactor llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 2.37, N = 3 559.54 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Feature Expansions OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Feature Expansions llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 110 220 330 440 550 SE +/- 0.21, N = 3 491.63 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot OMP vs. LARS OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot OMP vs. LARS llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 4.35, N = 3 355.35 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Hierarchical OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Hierarchical llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 0.36, N = 3 722.27 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Text Vectorizers OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Text Vectorizers llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.44, N = 3 171.50 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Lasso Path OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Lasso Path llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.49, N = 3 835.04 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: SGD Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SGD Regression llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 0.51, N = 3 358.67 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Neighbors OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Neighbors llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 1.93, N = 3 599.88 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: MNIST Dataset llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.10, N = 3 177.39 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Ward OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Ward llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.33, N = 3 202.93 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Sparsify OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparsify llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.05, N = 3 226.79 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Lasso OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Lasso llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 2.47, N = 3 1312.06 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Tree OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Tree llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.84, N = 15 103.79 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: SAGA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SAGA llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 400 800 1200 1600 2000 SE +/- 9.65, N = 3 1919.81 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: GLM OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: GLM llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 400 800 1200 1600 2000 SE +/- 3.03, N = 3 1877.54 1. (F9X) gfortran options: -O0
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.34, N = 10 44.60
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.45, N = 3 223.37
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2518 0.5036 0.7554 1.0072 1.259 SE +/- 0.00084, N = 3 1.11900 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2028 0.4056 0.6084 0.8112 1.014 SE +/- 0.001919, N = 3 0.901280 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.15, N = 3 11.40 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.09724, N = 4 8.32539 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 11.82 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.01700, N = 3 6.48329 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7983 1.5966 2.3949 3.1932 3.9915 SE +/- 0.00258, N = 3 3.54808 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4376 0.8752 1.3128 1.7504 2.188 SE +/- 0.01254, N = 3 1.94502 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0239 0.0478 0.0717 0.0956 0.1195 SE +/- 0.000115, N = 3 0.106443 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0168 0.0336 0.0504 0.0672 0.084 SE +/- 0.0003114, N = 3 0.0745773 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.09, N = 3 44.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.03, N = 3 26.25 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2764 0.5528 0.8292 1.1056 1.382 SE +/- 0.00196, N = 3 1.22825 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1706 0.3412 0.5118 0.6824 0.853 SE +/- 0.002525, N = 3 0.758124 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.02, N = 3 21.62 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.02, N = 3 11.72 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
Numenta Anomaly Benchmark Detector: Contextual Anomaly Detector OSE OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Contextual Anomaly Detector OSE llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.36, N = 3 272.34
Numenta Anomaly Benchmark Detector: Bayesian Changepoint OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Bayesian Changepoint llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 3.10, N = 3 325.85
Numenta Anomaly Benchmark Detector: Earthgecko Skyline OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Earthgecko Skyline llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.84, N = 3 953.65
Numenta Anomaly Benchmark Detector: Windowed Gaussian OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Windowed Gaussian llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 13 26 39 52 65 SE +/- 0.27, N = 3 57.32
Numenta Anomaly Benchmark Detector: Relative Entropy OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Relative Entropy llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.36, N = 3 119.86
Numenta Anomaly Benchmark Detector: KNN CAD OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: KNN CAD llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 6.74, N = 3 1009.02
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6525 1.305 1.9575 2.61 3.2625 SE +/- 0.01, N = 3 2.90 MIN: 1.7 / MAX: 21.02 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 2.10, N = 3 681.70 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 7.12, N = 4 587.13 MIN: 363.64 / MAX: 654.95 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7673 1.5346 2.3019 3.0692 3.8365 SE +/- 0.04, N = 4 3.41 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0215 2.043 3.0645 4.086 5.1075 SE +/- 0.01, N = 3 4.54 MIN: 2.6 / MAX: 35.33 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 1.43, N = 3 438.38 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 1.13, N = 3 84.09 MIN: 43.23 / MAX: 147.08 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.32, N = 3 23.78 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.29, N = 3 67.76 MIN: 39.77 / MAX: 88.07 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.13, N = 3 29.49 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 1.68, N = 3 716.36 MIN: 663.2 / MAX: 742.36 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6278 1.2556 1.8834 2.5112 3.139 SE +/- 0.00, N = 3 2.79 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 1.03, N = 3 202.73 MIN: 172.53 / MAX: 241.67 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.05, N = 3 9.86 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.25, N = 15 30.73 MIN: 17.23 / MAX: 53.2 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.53, N = 15 65.08 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.25, N = 3 92.76 MIN: 67.15 / MAX: 134.83 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.06, N = 3 21.55 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.80, N = 3 98.95 MIN: 63.4 / MAX: 121.43 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.16, N = 3 20.20 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 1.31, N = 3 426.61 MIN: 366.94 / MAX: 466.48 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0553 2.1106 3.1659 4.2212 5.2765 SE +/- 0.01, N = 3 4.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.09, N = 3 42.70 MIN: 20.92 / MAX: 63.97 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.10, N = 3 46.80 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1400 2800 4200 5600 7000 SE +/- 12.52, N = 3 6728.06 MIN: 6553.59 / MAX: 7002.51 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0675 0.135 0.2025 0.27 0.3375 SE +/- 0.00, N = 3 0.3 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.99, N = 3 168.71 MIN: 119.34 / MAX: 195.23 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.07, N = 3 11.85 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.28, N = 3 1044.45 MIN: 970.14 / MAX: 1115.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4298 0.8596 1.2894 1.7192 2.149 SE +/- 0.00, N = 3 1.91 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.70, N = 3 1042.19 MIN: 990.71 / MAX: 1084.32 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.432 0.864 1.296 1.728 2.16 SE +/- 0.00, N = 3 1.92 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 5.44, N = 3 8841.25 MIN: 8808.56 / MAX: 8886.06 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0518 0.1036 0.1554 0.2072 0.259 SE +/- 0.00, N = 3 0.23 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.36 0.72 1.08 1.44 1.8 SE +/- 0.01, N = 3 1.60
PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2993 0.5986 0.8979 1.1972 1.4965 SE +/- 0.01, N = 3 1.33
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 2.06, N = 3 557.95 MIN: 553.57 / MAX: 564.29 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.52, N = 3 135.31 MIN: 132.96 / MAX: 140.41 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 140 280 420 560 700 SE +/- 1.60, N = 3 656.29 MIN: 647.62 / MAX: 668.3 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 65.94, N = 3 8874.35 MIN: 8585.49 / MAX: 9341.04 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.05, N = 3 22.62 MIN: 22.2 / MAX: 28.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.81, N = 3 1140.71 MIN: 1116.02 / MAX: 1259.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.03, N = 3 33.20 MIN: 32.32 / MAX: 52.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.35, N = 3 68.80 MIN: 67.79 / MAX: 146.66 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.25, N = 3 139.80 MIN: 137.44 / MAX: 150.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.11, N = 3 135.98 MIN: 134.41 / MAX: 153.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.10, N = 3 42.25 MIN: 41.49 / MAX: 44.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.12, N = 3 61.47 MIN: 60.86 / MAX: 79.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 1.47, N = 3 335.23 MIN: 328.93 / MAX: 356.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.17, N = 3 75.41 MIN: 73.99 / MAX: 100.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.197 2.394 3.591 4.788 5.985 SE +/- 0.03, N = 3 5.32 MIN: 5.19 / MAX: 8.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.08, N = 3 39.25 MIN: 38.83 / MAX: 45.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.04, N = 3 25.62 MIN: 25.19 / MAX: 58.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.10, N = 3 16.53 MIN: 16.14 / MAX: 36.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.09, N = 3 22.45 MIN: 21.93 / MAX: 41.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.10, N = 3 31.44 MIN: 31.12 / MAX: 34.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.16, N = 3 100.53 MIN: 99.32 / MAX: 121.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.11, N = 3 22.26 MIN: 21.72 / MAX: 40.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.72, N = 3 1141.93 MIN: 1114.61 / MAX: 1288.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.11, N = 3 33.68 MIN: 32.55 / MAX: 40.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.46, N = 3 68.66 MIN: 67.06 / MAX: 88.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.30, N = 3 139.59 MIN: 136.68 / MAX: 156.49 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.32, N = 3 135.48 MIN: 134.06 / MAX: 143.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.13, N = 3 42.29 MIN: 41.47 / MAX: 57.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.25, N = 3 60.75 MIN: 59.78 / MAX: 79.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 1.44, N = 3 333.56 MIN: 326.64 / MAX: 373.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.12, N = 3 75.15 MIN: 73.47 / MAX: 97.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.02, N = 3 5.33 MIN: 5.2 / MAX: 5.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.04, N = 3 39.50 MIN: 39.16 / MAX: 43.83 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.04, N = 3 25.63 MIN: 25.22 / MAX: 28.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.23, N = 3 16.32 MIN: 15.65 / MAX: 36.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.11, N = 3 22.58 MIN: 22.01 / MAX: 65.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.13, N = 3 31.61 MIN: 31.19 / MAX: 35.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.14, N = 3 100.21 MIN: 99.08 / MAX: 122.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: inception-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.98, N = 3 163.10 MIN: 160.43 / MAX: 252.41 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenet-v1-1.0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.09, N = 3 20.46 MIN: 20.1 / MAX: 40.6 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: MobileNetV2_224 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.06, N = 3 17.25 MIN: 16.89 / MAX: 37.42 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: SqueezeNetV1.0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.08, N = 3 32.41 MIN: 31.77 / MAX: 53.11 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: resnet-v2-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.22, N = 3 130.58 MIN: 129.32 / MAX: 174.02 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: squeezenetv1.1 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.12, N = 3 17.63 MIN: 17.14 / MAX: 37.87 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenetV3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.031, N = 3 6.958 MIN: 6.82 / MAX: 9.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: nasnet OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: nasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.18, N = 3 46.61 MIN: 45.85 / MAX: 67.49 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500K 1000K 1500K 2000K 2500K SE +/- 23558.22, N = 3 2318550 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 100K 200K 300K 400K 500K SE +/- 498.93, N = 3 457100 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50K 100K 150K 200K 250K SE +/- 2399.23, N = 3 231503 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200K 400K 600K 800K 1000K SE +/- 2982.40, N = 3 1079113 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50K 100K 150K 200K 250K SE +/- 882.15, N = 3 216588 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20K 40K 60K 80K 100K SE +/- 551.70, N = 3 107310 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 1.17, N = 3 1354.91
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1661 0.3322 0.4983 0.6644 0.8305 SE +/- 0.0006, N = 3 0.7380
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 600 1200 1800 2400 3000 SE +/- 1.19, N = 3 2788.65
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1613 0.3226 0.4839 0.6452 0.8065 SE +/- 0.0003, N = 3 0.7168
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.53, N = 3 157.31
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0214, N = 3 6.3561
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 3.55, N = 3 316.59
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0711, N = 3 6.3146
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.84, N = 3 1041.22
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2161 0.4322 0.6483 0.8644 1.0805 SE +/- 0.0017, N = 3 0.9604
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 0.75, N = 3 2218.22
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2028 0.4056 0.6084 0.8112 1.014 SE +/- 0.0003, N = 3 0.9012
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.23, N = 3 181.79
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.2376 2.4752 3.7128 4.9504 6.188 SE +/- 0.0069, N = 3 5.5004
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 0.05, N = 3 325.30
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0051, N = 3 6.1356
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.34, N = 3 250.61
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8977 1.7954 2.6931 3.5908 4.4885 SE +/- 0.0054, N = 3 3.9896
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 1.23, N = 3 565.80
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7937 1.5874 2.3811 3.1748 3.9685 SE +/- 0.0022, N = 3 3.5275
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.12, N = 3 120.21
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0084, N = 3 8.3171
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.37, N = 3 261.94
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0155, N = 3 7.6267
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 10.83, N = 9 1089.99
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2066 0.4132 0.6198 0.8264 1.033 SE +/- 0.0086, N = 9 0.9181
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 11.37, N = 3 2245.61
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1998 0.3996 0.5994 0.7992 0.999 SE +/- 0.0040, N = 3 0.8881
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.11, N = 3 253.36
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8879 1.7758 2.6637 3.5516 4.4395 SE +/- 0.0017, N = 3 3.9464
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 1.70, N = 3 554.86
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8105 1.621 2.4315 3.242 4.0525 SE +/- 0.0115, N = 3 3.6022
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.03, N = 3 23.41
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.05, N = 3 42.68
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.19, N = 3 42.69
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.21, N = 3 46.77
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.12, N = 3 119.97
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0086, N = 3 8.3342
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.82, N = 3 262.34
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0256, N = 3 7.6144
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 16 32 48 64 80 SE +/- 0.04, N = 3 74.18
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 13.48
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.10, N = 3 137.27
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.01, N = 3 14.56
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 7.57, N = 3 1351.89
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1665 0.333 0.4995 0.666 0.8325 SE +/- 0.0042, N = 3 0.7398
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 600 1200 1800 2400 3000 SE +/- 3.11, N = 3 2785.40
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1606 0.3212 0.4818 0.6424 0.803 SE +/- 0.0019, N = 3 0.7139
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6795 1.359 2.0385 2.718 3.3975 SE +/- 0.03, N = 3 3.02
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2295 0.459 0.6885 0.918 1.1475 SE +/- 0.01, N = 3 1.02
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0305 2.061 3.0915 4.122 5.1525 SE +/- 0.00, N = 3 4.58
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3375 0.675 1.0125 1.35 1.6875 SE +/- 0.00, N = 3 1.50
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0193 2.0386 3.0579 4.0772 5.0965 SE +/- 0.00, N = 3 4.53
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3308 0.6616 0.9924 1.3232 1.654 SE +/- 0.01, N = 3 1.47
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.008 2.016 3.024 4.032 5.04 SE +/- 0.01, N = 3 4.48
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.02, N = 3 12.23
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 12.02
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 10.54
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.00, N = 3 9.07
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.03, N = 3 7.02
TensorFlow Device: CPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1013 0.2026 0.3039 0.4052 0.5065 SE +/- 0.00, N = 3 0.45
TensorFlow Device: CPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1193 0.2386 0.3579 0.4772 0.5965 SE +/- 0.00, N = 3 0.53
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1373 0.2746 0.4119 0.5492 0.6865 SE +/- 0.00, N = 3 0.61
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.01, N = 3 0.60 MIN: 0.45 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.00, N = 3 0.60 MIN: 0.45 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.01, N = 3 0.60 MIN: 0.44 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1328 0.2656 0.3984 0.5312 0.664 SE +/- 0.01, N = 5 0.59 MIN: 0.45 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1328 0.2656 0.3984 0.5312 0.664 SE +/- 0.00, N = 3 0.59 MIN: 0.45 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2475 0.495 0.7425 0.99 1.2375 SE +/- 0.01, N = 3 1.10 MIN: 0.79 / MAX: 1.16
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2093 0.4186 0.6279 0.8372 1.0465 SE +/- 0.00, N = 3 0.93 MIN: 0.56 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.207 0.414 0.621 0.828 1.035 SE +/- 0.00, N = 3 0.92 MIN: 0.58 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.207 0.414 0.621 0.828 1.035 SE +/- 0.00, N = 3 0.92 MIN: 0.59 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.5018 1.0036 1.5054 2.0072 2.509 SE +/- 0.02, N = 3 2.23 MIN: 1.47 / MAX: 2.34
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2093 0.4186 0.6279 0.8372 1.0465 SE +/- 0.01, N = 3 0.93 MIN: 0.58 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4883 0.9766 1.4649 1.9532 2.4415 SE +/- 0.02, N = 5 2.17 MIN: 1.45 / MAX: 2.3
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2048 0.4096 0.6144 0.8192 1.024 SE +/- 0.01, N = 3 0.91 MIN: 0.58 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4973 0.9946 1.4919 1.9892 2.4865 SE +/- 0.02, N = 3 2.21 MIN: 1.41 / MAX: 2.32
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4973 0.9946 1.4919 1.9892 2.4865 SE +/- 0.02, N = 8 2.21 MIN: 1.46 / MAX: 2.33
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.504 1.008 1.512 2.016 2.52 SE +/- 0.01, N = 3 2.24 MIN: 1.48 / MAX: 2.34
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3645 0.729 1.0935 1.458 1.8225 SE +/- 0.01, N = 3 1.62 MIN: 1.04 / MAX: 1.7
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8663 1.7326 2.5989 3.4652 4.3315 SE +/- 0.02, N = 3 3.85 MIN: 2.63 / MAX: 4.09
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception ResNet V2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90K 180K 270K 360K 450K SE +/- 439.95, N = 3 419892
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Quant llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 20.03, N = 3 28341.7
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Float llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 228.19, N = 15 26921.0
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: NASNet Mobile llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14K 28K 42K 56K 70K SE +/- 427.94, N = 3 66075.7
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception V4 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120K 240K 360K 480K 600K SE +/- 5686.85, N = 3 547885
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: SqueezeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7K 14K 21K 28K 35K SE +/- 45.90, N = 3 34007.2
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.07, N = 3 36.44 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1382 0.2764 0.4146 0.5528 0.691 SE +/- 0.0070, N = 3 0.6143 1. R scripting front-end version 3.6.3 (2020-02-29)
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 0.80, N = 3 378.78
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.39, N = 3 122.42
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 32.23, N = 3 21318.6 MIN: 21195.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 123.68, N = 3 38767.3 MIN: 38319.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 10.14, N = 3 21148.2 MIN: 21028.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 111.64, N = 3 38464.9 MIN: 38219.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 31.33, N = 3 21158.0 MIN: 21017.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 25.25, N = 3 38466.3 MIN: 38310.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.80, N = 12 60.62 MIN: 55.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.05, N = 3 32.51 MIN: 31.99 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 16 32 48 64 80 SE +/- 0.22, N = 3 70.48 MIN: 69.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 12 24 36 48 60 SE +/- 0.06, N = 3 54.13 MIN: 51.76 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 12 24 36 48 60 SE +/- 0.30, N = 3 53.30 MIN: 50.67 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.17, N = 3 90.59 MIN: 89.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.04, N = 3 12.00 MIN: 11.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.13, N = 3 23.79 MIN: 23.16 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.10, N = 3 45.54 MIN: 44.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.27, N = 3 42.23 MIN: 40.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.20, N = 3 124 1. (CXX) g++ options: -flto -pthread
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.7 Test: DNN - Deep Neural Network llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20K 40K 60K 80K 100K SE +/- 2616.66, N = 15 106960 1. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.67, N = 3 893.65 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 2.36, N = 3 1109.54 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 1.18, N = 3 87.78 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.40, N = 4 120.16 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.05, N = 3 84.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.40, N = 3 154.24 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.20, N = 3 281.84 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 110 220 330 440 550 SE +/- 3.30, N = 3 514.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 10.14, N = 3 9394.74 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3K 6K 9K 12K 15K SE +/- 55.92, N = 3 13409.4 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.05, N = 3 22.43 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.05, N = 3 38.09 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.30, N = 3 814.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 4.38, N = 3 1319.06 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.04, N = 3 46.24 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.12, N = 3 85.33 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 12.86, N = 15 593.05 MIN: 368.67 / MAX: 676.18 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.765 1.53 2.295 3.06 3.825 SE +/- 0.08, N = 15 3.40 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
Phoronix Test Suite v10.8.5