HDVR4-A8.9600-1 AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C+6G testing with a ASRock A320M-HDV R4.0 (P2.00 BIOS) and llvmpipe on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2312201-HERT-HDVR4A802&grt .
HDVR4-A8.9600-1 Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C+6G @ 3.10GHz (2 Cores / 4 Threads) ASRock A320M-HDV R4.0 (P2.00 BIOS) AMD 15h 3584MB 1000GB Western Digital WDS100T2B0A llvmpipe AMD Kabini HDMI/DP Realtek RTL8111/8168/8411 Ubuntu 20.04 5.15.0-89-generic (x86_64) GNOME Shell 3.36.9 X Server 1.20.13 4.5 Mesa 21.2.6 (LLVM 12.0.0 256 bits) 1.1.182 GCC 9.4.0 ext4 1368x768 OpenBenchmarking.org - Transparent Huge Pages: madvise - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-9QDOt0/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0x600611a - Python 3.8.10 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT vulnerable + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
HDVR4-A8.9600-1 caffe: AlexNet - CPU - 100 caffe: AlexNet - CPU - 200 caffe: AlexNet - CPU - 1000 caffe: GoogleNet - CPU - 100 caffe: GoogleNet - CPU - 200 caffe: GoogleNet - CPU - 1000 deepspeech: CPU lczero: BLAS mlpack: scikit_ica mlpack: scikit_svm mnn: nasnet mnn: mobilenetV3 mnn: squeezenetv1.1 mnn: resnet-v2-50 mnn: SqueezeNetV1.0 mnn: MobileNetV2_224 mnn: mobilenet-v1-1.0 mnn: inception-v3 ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream numenta-nab: KNN CAD numenta-nab: Relative Entropy numenta-nab: Windowed Gaussian numenta-nab: Earthgecko Skyline numenta-nab: Bayesian Changepoint numenta-nab: Contextual Anomaly Detector OSE numpy: onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard opencv: DNN - Deep Neural Network openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU plaidml: No - Inference - VGG16 - CPU plaidml: No - Inference - ResNet 50 - CPU pytorch: CPU - 1 - ResNet-50 pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 16 - ResNet-50 pytorch: CPU - 32 - ResNet-50 pytorch: CPU - 64 - ResNet-50 pytorch: CPU - 16 - ResNet-152 pytorch: CPU - 256 - ResNet-50 pytorch: CPU - 32 - ResNet-152 pytorch: CPU - 512 - ResNet-50 pytorch: CPU - 64 - ResNet-152 pytorch: CPU - 256 - ResNet-152 pytorch: CPU - 512 - ResNet-152 pytorch: CPU - 1 - Efficientnet_v2_l pytorch: CPU - 16 - Efficientnet_v2_l pytorch: CPU - 32 - Efficientnet_v2_l pytorch: CPU - 64 - Efficientnet_v2_l pytorch: CPU - 256 - Efficientnet_v2_l pytorch: CPU - 512 - Efficientnet_v2_l rbenchmark: rnnoise: scikit-learn: GLM scikit-learn: SAGA scikit-learn: Tree scikit-learn: Lasso scikit-learn: Sparsify scikit-learn: Plot Ward scikit-learn: MNIST Dataset scikit-learn: Plot Neighbors scikit-learn: SGD Regression scikit-learn: Plot Lasso Path scikit-learn: Text Vectorizers scikit-learn: Plot Hierarchical scikit-learn: Plot OMP vs. LARS scikit-learn: Feature Expansions scikit-learn: LocalOutlierFactor scikit-learn: TSNE MNIST Dataset scikit-learn: Plot Incremental PCA scikit-learn: Hist Gradient Boosting scikit-learn: Sample Without Replacement scikit-learn: Covertype Dataset Benchmark scikit-learn: Hist Gradient Boosting Adult scikit-learn: Hist Gradient Boosting Threading scikit-learn: Plot Singular Value Decomposition scikit-learn: Hist Gradient Boosting Higgs Boson scikit-learn: 20 Newsgroups / Logistic Regression scikit-learn: Plot Polynomial Kernel Approximation scikit-learn: Hist Gradient Boosting Categorical Only scikit-learn: Kernel PCA Solvers / Time vs. N Samples scikit-learn: Kernel PCA Solvers / Time vs. N Components scikit-learn: Sparse Rand Projections / 100 Iterations tensorflow: CPU - 16 - VGG-16 tensorflow: CPU - 32 - VGG-16 tensorflow: CPU - 64 - VGG-16 tensorflow: CPU - 16 - AlexNet tensorflow: CPU - 32 - AlexNet tensorflow: CPU - 64 - AlexNet tensorflow: CPU - 256 - AlexNet tensorflow: CPU - 512 - AlexNet tensorflow: CPU - 16 - GoogLeNet tensorflow: CPU - 16 - ResNet-50 tensorflow: CPU - 32 - GoogLeNet tensorflow: CPU - 32 - ResNet-50 tensorflow: CPU - 64 - GoogLeNet tensorflow: CPU - 64 - ResNet-50 tensorflow: CPU - 256 - GoogLeNet tensorflow-lite: SqueezeNet tensorflow-lite: Inception V4 tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Float tensorflow-lite: Mobilenet Quant tensorflow-lite: Inception ResNet V2 tnn: CPU - DenseNet tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v2 tnn: CPU - SqueezeNet v1.1 whisper-cpp: ggml-base.en - 2016 State of the Union whisper-cpp: ggml-small.en - 2016 State of the Union whisper-cpp: ggml-medium.en - 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 107310 216588 1079113 231503 457100 2318550 378.78422 124 223.37 44.60 46.612 6.958 17.625 130.581 32.409 17.253 20.461 163.103 100.21 31.61 22.58 16.32 25.63 39.50 5.33 75.15 333.56 60.75 42.29 135.48 139.59 68.66 33.68 1141.93 22.26 100.53 31.44 22.45 16.53 25.62 39.25 5.32 75.41 335.23 61.47 42.25 135.98 139.80 68.80 33.20 1140.71 22.62 0.7139 2785.3976 0.7398 1351.8913 14.5588 137.2652 13.4772 74.1834 7.6144 262.3391 8.3342 119.9662 46.7652 42.6926 42.6815 23.4090 3.6022 554.8602 3.9464 253.3570 0.8881 2245.6114 0.9181 1089.9907 7.6267 261.9379 8.3171 120.2128 3.5275 565.8025 3.9896 250.6134 6.1356 325.3001 5.5004 181.7890 0.9012 2218.2224 0.9604 1041.2191 6.3146 316.5860 6.3561 157.3097 0.7168 2788.6516 0.7380 1354.9126 1009.020 119.856 57.324 953.649 325.849 272.340 122.42 42.2339 45.5410 23.7873 12.0047 90.5910 53.2981 54.1334 70.4784 32.5108 60.6197 38466.3 21158.0 38464.9 21148.2 38767.3 21318.6 11.7164 85.3327 21.6196 46.2448 0.758124 1319.06 1.22825 814.166 26.2489 38.0923 44.5752 22.4319 0.0745773 13409.4 0.106443 9394.74 1.94502 514.172 3.54808 281.839 6.48329 154.240 11.8228 84.5800 8.32539 120.159 11.3952 87.7839 0.901280 1109.54 1.11900 893.652 106960 0.23 8841.25 1.92 1042.19 1.91 1044.45 11.85 168.71 0.3 6728.06 46.80 42.70 4.69 426.61 20.20 98.95 21.55 92.76 65.08 30.73 9.86 202.73 2.79 716.36 29.49 67.76 23.78 84.09 3.40 593.05 438.38 4.54 3.41 587.13 681.70 2.90 1.33 1.60 3.85 1.62 2.24 2.21 2.21 0.91 2.17 0.93 2.23 0.92 0.92 0.93 1.10 0.59 0.59 0.60 0.60 0.60 0.6143 36.435 1877.535 1919.812 103.793 1312.064 226.792 202.927 177.394 599.881 358.670 835.044 171.503 722.267 355.351 491.632 559.538 1425.788 110.289 414.916 338.633 965.025 201.998 838.410 692.635 309.508 126.691 761.959 61.348 720.168 329.194 8668.525 0.61 0.53 0.45 7.02 9.07 10.54 12.02 12.23 4.48 1.47 4.53 1.50 4.58 1.02 3.02 34007.2 547885 66075.7 26921.0 28341.7 419892 8874.345 656.292 135.312 557.950 2498.66658 8252.8063 29217.395 OpenBenchmarking.org
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20K 40K 60K 80K 100K SE +/- 551.70, N = 3 107310 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50K 100K 150K 200K 250K SE +/- 882.15, N = 3 216588 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200K 400K 600K 800K 1000K SE +/- 2982.40, N = 3 1079113 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50K 100K 150K 200K 250K SE +/- 2399.23, N = 3 231503 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 100K 200K 300K 400K 500K SE +/- 498.93, N = 3 457100 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500K 1000K 1500K 2000K 2500K SE +/- 23558.22, N = 3 2318550 1. (CXX) g++ options: -fPIC -O3 -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 0.80, N = 3 378.78
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.20, N = 3 124 1. (CXX) g++ options: -flto -pthread
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.45, N = 3 223.37
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.34, N = 10 44.60
Mobile Neural Network Model: nasnet OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: nasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.18, N = 3 46.61 MIN: 45.85 / MAX: 67.49 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenetV3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenetV3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.031, N = 3 6.958 MIN: 6.82 / MAX: 9.38 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: squeezenetv1.1 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: squeezenetv1.1 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.12, N = 3 17.63 MIN: 17.14 / MAX: 37.87 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: resnet-v2-50 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: resnet-v2-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.22, N = 3 130.58 MIN: 129.32 / MAX: 174.02 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: SqueezeNetV1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: SqueezeNetV1.0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.08, N = 3 32.41 MIN: 31.77 / MAX: 53.11 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: MobileNetV2_224 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: MobileNetV2_224 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.06, N = 3 17.25 MIN: 16.89 / MAX: 37.42 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: mobilenet-v1-1.0 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: mobilenet-v1-1.0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.09, N = 3 20.46 MIN: 20.1 / MAX: 40.6 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
Mobile Neural Network Model: inception-v3 OpenBenchmarking.org ms, Fewer Is Better Mobile Neural Network 2.1 Model: inception-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.98, N = 3 163.10 MIN: 160.43 / MAX: 252.41 1. (CXX) g++ options: -std=c++11 -O3 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.14, N = 3 100.21 MIN: 99.08 / MAX: 122.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.13, N = 3 31.61 MIN: 31.19 / MAX: 35.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.11, N = 3 22.58 MIN: 22.01 / MAX: 65.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.23, N = 3 16.32 MIN: 15.65 / MAX: 36.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.04, N = 3 25.63 MIN: 25.22 / MAX: 28.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.04, N = 3 39.50 MIN: 39.16 / MAX: 43.83 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.1993 2.3986 3.5979 4.7972 5.9965 SE +/- 0.02, N = 3 5.33 MIN: 5.2 / MAX: 5.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.12, N = 3 75.15 MIN: 73.47 / MAX: 97.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 1.44, N = 3 333.56 MIN: 326.64 / MAX: 373.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.25, N = 3 60.75 MIN: 59.78 / MAX: 79.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.13, N = 3 42.29 MIN: 41.47 / MAX: 57.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.32, N = 3 135.48 MIN: 134.06 / MAX: 143.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.30, N = 3 139.59 MIN: 136.68 / MAX: 156.49 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.46, N = 3 68.66 MIN: 67.06 / MAX: 88.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.11, N = 3 33.68 MIN: 32.55 / MAX: 40.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.72, N = 3 1141.93 MIN: 1114.61 / MAX: 1288.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.11, N = 3 22.26 MIN: 21.72 / MAX: 40.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.16, N = 3 100.53 MIN: 99.32 / MAX: 121.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.10, N = 3 31.44 MIN: 31.12 / MAX: 34.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.09, N = 3 22.45 MIN: 21.93 / MAX: 41.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.10, N = 3 16.53 MIN: 16.14 / MAX: 36.47 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.04, N = 3 25.62 MIN: 25.19 / MAX: 58.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.08, N = 3 39.25 MIN: 38.83 / MAX: 45.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.197 2.394 3.591 4.788 5.985 SE +/- 0.03, N = 3 5.32 MIN: 5.19 / MAX: 8.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.17, N = 3 75.41 MIN: 73.99 / MAX: 100.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 1.47, N = 3 335.23 MIN: 328.93 / MAX: 356.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.12, N = 3 61.47 MIN: 60.86 / MAX: 79.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.10, N = 3 42.25 MIN: 41.49 / MAX: 44.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.11, N = 3 135.98 MIN: 134.41 / MAX: 153.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.25, N = 3 139.80 MIN: 137.44 / MAX: 150.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.35, N = 3 68.80 MIN: 67.79 / MAX: 146.66 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.03, N = 3 33.20 MIN: 32.32 / MAX: 52.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.81, N = 3 1140.71 MIN: 1116.02 / MAX: 1259.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.05, N = 3 22.62 MIN: 22.2 / MAX: 28.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread -pthread
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1606 0.3212 0.4818 0.6424 0.803 SE +/- 0.0019, N = 3 0.7139
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 600 1200 1800 2400 3000 SE +/- 3.11, N = 3 2785.40
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1665 0.333 0.4995 0.666 0.8325 SE +/- 0.0042, N = 3 0.7398
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 7.57, N = 3 1351.89
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 4 8 12 16 20 SE +/- 0.01, N = 3 14.56
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.10, N = 3 137.27
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 13.48
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 16 32 48 64 80 SE +/- 0.04, N = 3 74.18
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0256, N = 3 7.6144
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.82, N = 3 262.34
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0086, N = 3 8.3342
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.12, N = 3 119.97
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.21, N = 3 46.77
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.19, N = 3 42.69
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.05, N = 3 42.68
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.03, N = 3 23.41
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8105 1.621 2.4315 3.242 4.0525 SE +/- 0.0115, N = 3 3.6022
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 1.70, N = 3 554.86
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8879 1.7758 2.6637 3.5516 4.4395 SE +/- 0.0017, N = 3 3.9464
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.11, N = 3 253.36
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1998 0.3996 0.5994 0.7992 0.999 SE +/- 0.0040, N = 3 0.8881
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 11.37, N = 3 2245.61
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2066 0.4132 0.6198 0.8264 1.033 SE +/- 0.0086, N = 9 0.9181
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 10.83, N = 9 1089.99
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0155, N = 3 7.6267
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.37, N = 3 261.94
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0084, N = 3 8.3171
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.12, N = 3 120.21
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7937 1.5874 2.3811 3.1748 3.9685 SE +/- 0.0022, N = 3 3.5275
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 1.23, N = 3 565.80
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8977 1.7954 2.6931 3.5908 4.4885 SE +/- 0.0054, N = 3 3.9896
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.34, N = 3 250.61
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0051, N = 3 6.1356
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 0.05, N = 3 325.30
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.2376 2.4752 3.7128 4.9504 6.188 SE +/- 0.0069, N = 3 5.5004
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.23, N = 3 181.79
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2028 0.4056 0.6084 0.8112 1.014 SE +/- 0.0003, N = 3 0.9012
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 0.75, N = 3 2218.22
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2161 0.4322 0.6483 0.8644 1.0805 SE +/- 0.0017, N = 3 0.9604
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.84, N = 3 1041.22
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0711, N = 3 6.3146
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 3.55, N = 3 316.59
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.0214, N = 3 6.3561
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.53, N = 3 157.31
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1613 0.3226 0.4839 0.6452 0.8065 SE +/- 0.0003, N = 3 0.7168
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 600 1200 1800 2400 3000 SE +/- 1.19, N = 3 2788.65
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1661 0.3322 0.4983 0.6644 0.8305 SE +/- 0.0006, N = 3 0.7380
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 1.17, N = 3 1354.91
Numenta Anomaly Benchmark Detector: KNN CAD OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: KNN CAD llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 6.74, N = 3 1009.02
Numenta Anomaly Benchmark Detector: Relative Entropy OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Relative Entropy llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.36, N = 3 119.86
Numenta Anomaly Benchmark Detector: Windowed Gaussian OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Windowed Gaussian llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 13 26 39 52 65 SE +/- 0.27, N = 3 57.32
Numenta Anomaly Benchmark Detector: Earthgecko Skyline OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Earthgecko Skyline llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.84, N = 3 953.65
Numenta Anomaly Benchmark Detector: Bayesian Changepoint OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Bayesian Changepoint llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 3.10, N = 3 325.85
Numenta Anomaly Benchmark Detector: Contextual Anomaly Detector OSE OpenBenchmarking.org Seconds, Fewer Is Better Numenta Anomaly Benchmark 1.1 Detector: Contextual Anomaly Detector OSE llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.36, N = 3 272.34
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.39, N = 3 122.42
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.27, N = 3 42.23 MIN: 40.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.10, N = 3 45.54 MIN: 44.65 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.13, N = 3 23.79 MIN: 23.16 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.04, N = 3 12.00 MIN: 11.72 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.17, N = 3 90.59 MIN: 89.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 12 24 36 48 60 SE +/- 0.30, N = 3 53.30 MIN: 50.67 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 12 24 36 48 60 SE +/- 0.06, N = 3 54.13 MIN: 51.76 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 16 32 48 64 80 SE +/- 0.22, N = 3 70.48 MIN: 69.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.05, N = 3 32.51 MIN: 31.99 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.80, N = 12 60.62 MIN: 55.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 25.25, N = 3 38466.3 MIN: 38310.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 31.33, N = 3 21158.0 MIN: 21017.1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 111.64, N = 3 38464.9 MIN: 38219.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 10.14, N = 3 21148.2 MIN: 21028.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8K 16K 24K 32K 40K SE +/- 123.68, N = 3 38767.3 MIN: 38319.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5K 10K 15K 20K 25K SE +/- 32.23, N = 3 21318.6 MIN: 21195.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.02, N = 3 11.72 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.12, N = 3 85.33 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.02, N = 3 21.62 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.04, N = 3 46.24 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1706 0.3412 0.5118 0.6824 0.853 SE +/- 0.002525, N = 3 0.758124 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 4.38, N = 3 1319.06 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2764 0.5528 0.8292 1.1056 1.382 SE +/- 0.00196, N = 3 1.22825 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.30, N = 3 814.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.03, N = 3 26.25 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 9 18 27 36 45 SE +/- 0.05, N = 3 38.09 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.09, N = 3 44.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.05, N = 3 22.43 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0168 0.0336 0.0504 0.0672 0.084 SE +/- 0.0003114, N = 3 0.0745773 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3K 6K 9K 12K 15K SE +/- 55.92, N = 3 13409.4 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0239 0.0478 0.0717 0.0956 0.1195 SE +/- 0.000115, N = 3 0.106443 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 10.14, N = 3 9394.74 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4376 0.8752 1.3128 1.7504 2.188 SE +/- 0.01254, N = 3 1.94502 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 110 220 330 440 550 SE +/- 3.30, N = 3 514.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7983 1.5966 2.3949 3.1932 3.9915 SE +/- 0.00258, N = 3 3.54808 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 60 120 180 240 300 SE +/- 0.20, N = 3 281.84 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.01700, N = 3 6.48329 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.40, N = 3 154.24 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 11.82 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.05, N = 3 84.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.09724, N = 4 8.32539 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 1.40, N = 4 120.16 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.15, N = 3 11.40 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 1.18, N = 3 87.78 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2028 0.4056 0.6084 0.8112 1.014 SE +/- 0.001919, N = 3 0.901280 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 2.36, N = 3 1109.54 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2518 0.5036 0.7554 1.0072 1.259 SE +/- 0.00084, N = 3 1.11900 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 0.67, N = 3 893.65 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt -lpthread -pthread
OpenCV Test: DNN - Deep Neural Network OpenBenchmarking.org ms, Fewer Is Better OpenCV 4.7 Test: DNN - Deep Neural Network llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20K 40K 60K 80K 100K SE +/- 2616.66, N = 15 106960 1. (CXX) g++ options: -fsigned-char -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -ldl -lm -lpthread -lrt
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0518 0.1036 0.1554 0.2072 0.259 SE +/- 0.00, N = 3 0.23 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 5.44, N = 3 8841.25 MIN: 8808.56 / MAX: 8886.06 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.432 0.864 1.296 1.728 2.16 SE +/- 0.00, N = 3 1.92 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.70, N = 3 1042.19 MIN: 990.71 / MAX: 1084.32 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4298 0.8596 1.2894 1.7192 2.149 SE +/- 0.00, N = 3 1.91 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Detection FP32 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.28, N = 3 1044.45 MIN: 970.14 / MAX: 1115.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.07, N = 3 11.85 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.99, N = 3 168.71 MIN: 119.34 / MAX: 195.23 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.0675 0.135 0.2025 0.27 0.3375 SE +/- 0.00, N = 3 0.3 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1400 2800 4200 5600 7000 SE +/- 12.52, N = 3 6728.06 MIN: 6553.59 / MAX: 7002.51 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 11 22 33 44 55 SE +/- 0.10, N = 3 46.80 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 10 20 30 40 50 SE +/- 0.09, N = 3 42.70 MIN: 20.92 / MAX: 63.97 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0553 2.1106 3.1659 4.2212 5.2765 SE +/- 0.01, N = 3 4.69 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 1.31, N = 3 426.61 MIN: 366.94 / MAX: 466.48 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.16, N = 3 20.20 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Vehicle Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.80, N = 3 98.95 MIN: 63.4 / MAX: 121.43 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 5 10 15 20 25 SE +/- 0.06, N = 3 21.55 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.25, N = 3 92.76 MIN: 67.15 / MAX: 134.83 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.53, N = 15 65.08 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection Retail FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.25, N = 15 30.73 MIN: 17.23 / MAX: 53.2 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.05, N = 3 9.86 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Road Segmentation ADAS FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 1.03, N = 3 202.73 MIN: 172.53 / MAX: 241.67 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6278 1.2556 1.8834 2.5112 3.139 SE +/- 0.00, N = 3 2.79 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 1.68, N = 3 716.36 MIN: 663.2 / MAX: 742.36 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7 14 21 28 35 SE +/- 0.13, N = 3 29.49 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Weld Porosity Detection FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 15 30 45 60 75 SE +/- 0.29, N = 3 67.76 MIN: 39.77 / MAX: 88.07 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6 12 18 24 30 SE +/- 0.32, N = 3 23.78 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Person Vehicle Bike Detection FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 1.13, N = 3 84.09 MIN: 43.23 / MAX: 147.08 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.765 1.53 2.295 3.06 3.825 SE +/- 0.08, N = 15 3.40 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 12.86, N = 15 593.05 MIN: 368.67 / MAX: 676.18 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 1.43, N = 3 438.38 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0215 2.043 3.0645 4.086 5.1075 SE +/- 0.01, N = 3 4.54 MIN: 2.6 / MAX: 35.33 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.7673 1.5346 2.3019 3.0692 3.8365 SE +/- 0.04, N = 4 3.41 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Handwritten English Recognition FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 7.12, N = 4 587.13 MIN: 363.64 / MAX: 654.95 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 2.10, N = 3 681.70 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6525 1.305 1.9575 2.61 3.2625 SE +/- 0.01, N = 3 2.90 MIN: 1.7 / MAX: 21.02 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl
PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: VGG16 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2993 0.5986 0.8979 1.1972 1.4965 SE +/- 0.01, N = 3 1.33
PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU OpenBenchmarking.org FPS, More Is Better PlaidML FP16: No - Mode: Inference - Network: ResNet 50 - Device: CPU llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.36 0.72 1.08 1.44 1.8 SE +/- 0.01, N = 3 1.60
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.8663 1.7326 2.5989 3.4652 4.3315 SE +/- 0.02, N = 3 3.85 MIN: 2.63 / MAX: 4.09
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3645 0.729 1.0935 1.458 1.8225 SE +/- 0.01, N = 3 1.62 MIN: 1.04 / MAX: 1.7
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.504 1.008 1.512 2.016 2.52 SE +/- 0.01, N = 3 2.24 MIN: 1.48 / MAX: 2.34
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4973 0.9946 1.4919 1.9892 2.4865 SE +/- 0.02, N = 8 2.21 MIN: 1.46 / MAX: 2.33
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4973 0.9946 1.4919 1.9892 2.4865 SE +/- 0.02, N = 3 2.21 MIN: 1.41 / MAX: 2.32
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2048 0.4096 0.6144 0.8192 1.024 SE +/- 0.01, N = 3 0.91 MIN: 0.58 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.4883 0.9766 1.4649 1.9532 2.4415 SE +/- 0.02, N = 5 2.17 MIN: 1.45 / MAX: 2.3
PyTorch Device: CPU - Batch Size: 32 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2093 0.4186 0.6279 0.8372 1.0465 SE +/- 0.01, N = 3 0.93 MIN: 0.58 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.5018 1.0036 1.5054 2.0072 2.509 SE +/- 0.02, N = 3 2.23 MIN: 1.47 / MAX: 2.34
PyTorch Device: CPU - Batch Size: 64 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.207 0.414 0.621 0.828 1.035 SE +/- 0.00, N = 3 0.92 MIN: 0.59 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.207 0.414 0.621 0.828 1.035 SE +/- 0.00, N = 3 0.92 MIN: 0.58 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 512 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2093 0.4186 0.6279 0.8372 1.0465 SE +/- 0.00, N = 3 0.93 MIN: 0.56 / MAX: 0.96
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2475 0.495 0.7425 0.99 1.2375 SE +/- 0.01, N = 3 1.10 MIN: 0.79 / MAX: 1.16
PyTorch Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1328 0.2656 0.3984 0.5312 0.664 SE +/- 0.00, N = 3 0.59 MIN: 0.45 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1328 0.2656 0.3984 0.5312 0.664 SE +/- 0.01, N = 5 0.59 MIN: 0.45 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.01, N = 3 0.60 MIN: 0.44 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.00, N = 3 0.60 MIN: 0.45 / MAX: 0.66
PyTorch Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.135 0.27 0.405 0.54 0.675 SE +/- 0.01, N = 3 0.60 MIN: 0.45 / MAX: 0.66
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1382 0.2764 0.4146 0.5528 0.691 SE +/- 0.0070, N = 3 0.6143 1. R scripting front-end version 3.6.3 (2020-02-29)
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 8 16 24 32 40 SE +/- 0.07, N = 3 36.44 1. (CC) gcc options: -O2 -pedantic -fvisibility=hidden
Scikit-Learn Benchmark: GLM OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: GLM llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 400 800 1200 1600 2000 SE +/- 3.03, N = 3 1877.54 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: SAGA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SAGA llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 400 800 1200 1600 2000 SE +/- 9.65, N = 3 1919.81 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Tree OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Tree llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.84, N = 15 103.79 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Lasso OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Lasso llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 2.47, N = 3 1312.06 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Sparsify OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparsify llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 50 100 150 200 250 SE +/- 0.05, N = 3 226.79 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Ward OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Ward llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.33, N = 3 202.93 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: MNIST Dataset llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.10, N = 3 177.39 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Neighbors OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Neighbors llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 130 260 390 520 650 SE +/- 1.93, N = 3 599.88 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: SGD Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: SGD Regression llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 0.51, N = 3 358.67 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Lasso Path OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Lasso Path llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.49, N = 3 835.04 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Text Vectorizers OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Text Vectorizers llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 0.44, N = 3 171.50 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Hierarchical OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Hierarchical llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 0.36, N = 3 722.27 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot OMP vs. LARS OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot OMP vs. LARS llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 80 160 240 320 400 SE +/- 4.35, N = 3 355.35 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Feature Expansions OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Feature Expansions llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 110 220 330 440 550 SE +/- 0.21, N = 3 491.63 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: LocalOutlierFactor OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: LocalOutlierFactor llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 2.37, N = 3 559.54 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: TSNE MNIST Dataset OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: TSNE MNIST Dataset llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 300 600 900 1200 1500 SE +/- 3.72, N = 3 1425.79 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Incremental PCA OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Incremental PCA llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 20 40 60 80 100 SE +/- 0.25, N = 3 110.29 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90 180 270 360 450 SE +/- 0.77, N = 3 414.92 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Sample Without Replacement OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sample Without Replacement llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 2.47, N = 3 338.63 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Covertype Dataset Benchmark OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Covertype Dataset Benchmark llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.44, N = 3 965.03 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Adult OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Adult llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 40 80 120 160 200 SE +/- 2.73, N = 3 202.00 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Threading OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Threading llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 200 400 600 800 1000 SE +/- 1.98, N = 3 838.41 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Singular Value Decomposition OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Singular Value Decomposition llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 150 300 450 600 750 SE +/- 2.73, N = 3 692.64 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Higgs Boson OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Higgs Boson llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 2.43, N = 3 309.51 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: 20 Newsgroups / Logistic Regression OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: 20 Newsgroups / Logistic Regression llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.05, N = 3 126.69 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Plot Polynomial Kernel Approximation OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Plot Polynomial Kernel Approximation llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 0.98, N = 3 761.96 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Hist Gradient Boosting Categorical Only OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Hist Gradient Boosting Categorical Only llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14 28 42 56 70 SE +/- 0.19, N = 3 61.35 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Samples OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Samples llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 160 320 480 640 800 SE +/- 1.47, N = 3 720.17 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Kernel PCA Solvers / Time vs. N Components OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Kernel PCA Solvers / Time vs. N Components llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 70 140 210 280 350 SE +/- 0.55, N = 3 329.19 1. (F9X) gfortran options: -O0
Scikit-Learn Benchmark: Sparse Random Projections / 100 Iterations OpenBenchmarking.org Seconds, Fewer Is Better Scikit-Learn 1.2.2 Benchmark: Sparse Random Projections / 100 Iterations llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 67.77, N = 3 8668.53 1. (F9X) gfortran options: -O0
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1373 0.2746 0.4119 0.5492 0.6865 SE +/- 0.00, N = 3 0.61
TensorFlow Device: CPU - Batch Size: 32 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1193 0.2386 0.3579 0.4772 0.5965 SE +/- 0.00, N = 3 0.53
TensorFlow Device: CPU - Batch Size: 64 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: VGG-16 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.1013 0.2026 0.3039 0.4052 0.5065 SE +/- 0.00, N = 3 0.45
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2 4 6 8 10 SE +/- 0.03, N = 3 7.02
TensorFlow Device: CPU - Batch Size: 32 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.00, N = 3 9.07
TensorFlow Device: CPU - Batch Size: 64 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 10.54
TensorFlow Device: CPU - Batch Size: 256 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.01, N = 3 12.02
TensorFlow Device: CPU - Batch Size: 512 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 512 - Model: AlexNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 3 6 9 12 15 SE +/- 0.02, N = 3 12.23
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.008 2.016 3.024 4.032 5.04 SE +/- 0.01, N = 3 4.48
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3308 0.6616 0.9924 1.3232 1.654 SE +/- 0.01, N = 3 1.47
TensorFlow Device: CPU - Batch Size: 32 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0193 2.0386 3.0579 4.0772 5.0965 SE +/- 0.00, N = 3 4.53
TensorFlow Device: CPU - Batch Size: 32 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 32 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.3375 0.675 1.0125 1.35 1.6875 SE +/- 0.00, N = 3 1.50
TensorFlow Device: CPU - Batch Size: 64 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 1.0305 2.061 3.0915 4.122 5.1525 SE +/- 0.00, N = 3 4.58
TensorFlow Device: CPU - Batch Size: 64 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 64 - Model: ResNet-50 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.2295 0.459 0.6885 0.918 1.1475 SE +/- 0.01, N = 3 1.02
TensorFlow Device: CPU - Batch Size: 256 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 256 - Model: GoogLeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 0.6795 1.359 2.0385 2.718 3.3975 SE +/- 0.03, N = 3 3.02
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: SqueezeNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 7K 14K 21K 28K 35K SE +/- 45.90, N = 3 34007.2
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception V4 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120K 240K 360K 480K 600K SE +/- 5686.85, N = 3 547885
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: NASNet Mobile llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 14K 28K 42K 56K 70K SE +/- 427.94, N = 3 66075.7
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Float llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 228.19, N = 15 26921.0
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Mobilenet Quant llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 20.03, N = 3 28341.7
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2022-05-18 Model: Inception ResNet V2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 90K 180K 270K 360K 450K SE +/- 439.95, N = 3 419892
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 65.94, N = 3 8874.35 MIN: 8585.49 / MAX: 9341.04 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 140 280 420 560 700 SE +/- 1.60, N = 3 656.29 MIN: 647.62 / MAX: 668.3 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 30 60 90 120 150 SE +/- 0.52, N = 3 135.31 MIN: 132.96 / MAX: 140.41 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 120 240 360 480 600 SE +/- 2.06, N = 3 557.95 MIN: 553.57 / MAX: 564.29 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-base.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 500 1000 1500 2000 2500 SE +/- 3.14, N = 3 2498.67 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-small.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 2K 4K 6K 8K 10K SE +/- 4.12, N = 3 8252.81 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-medium.en - Input: 2016 State of the Union llvmpipe - AMD A8-9600 RADEON R7 10 COMPUTE CORES 4C 6K 12K 18K 24K 30K SE +/- 3.98, N = 3 29217.40 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Phoronix Test Suite v10.8.5