hpc-xeon 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2105039-IB-HPCXEON8261&sor .
hpc-xeon Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution 1 1a 2 2a 4 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Device 0998 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN 2 x 7682GB INTEL SSDPF2KX076TZ + 2 x 800GB INTEL SSDPF21Q800GB + 3841GB Micron_9300_MTFDHAL3T8TDP + 960GB INTEL SSDSC2KG96 ASPEED 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Ubuntu 20.04 5.11.0-051100-generic (x86_64) GNOME Shell 3.36.4 X Server 1.20.8 GCC 9.3.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - 1, 2, 2a, 4: NONE / errors=remount-ro,relatime,rw / Block Size: 4096 Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0xd000270 Python Details - Python 2.7.18 + Python 3.8.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
hpc-xeon ior: 2MB - Default Test Directory ior: 4MB - Default Test Directory hpcg: parboil: OpenMP LBM parboil: OpenMP CUTCP parboil: OpenMP Stencil parboil: OpenMP MRI Gridding minife: Small neat: mocassin: Dust 2D tau100.0 arrayfire: BLAS CPU arrayfire: Conjugate Gradient CPU deepspeech: CPU daphne: OpenMP - NDT Mapping daphne: OpenMP - Points2Image daphne: OpenMP - Euclidean Cluster octave-benchmark: ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m openvino: Face Detection 0106 FP16 - CPU openvino: Face Detection 0106 FP16 - CPU openvino: Face Detection 0106 FP32 - CPU openvino: Face Detection 0106 FP32 - CPU openvino: Person Detection 0106 FP16 - CPU openvino: Person Detection 0106 FP16 - CPU openvino: Person Detection 0106 FP32 - CPU openvino: Person Detection 0106 FP32 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP32 - CPU openvino: Age Gender Recognition Retail 0013 FP32 - CPU onnx: yolov4 - OpenMP CPU onnx: bertsquad-10 - OpenMP CPU onnx: fcn-resnet101-11 - OpenMP CPU onnx: shufflenet-v2-10 - OpenMP CPU onnx: super-resolution-10 - OpenMP CPU ecp-candle: P1B2 ecp-candle: P3B1 ecp-candle: P3B2 ai-benchmark: Device Inference Score ai-benchmark: Device Training Score ai-benchmark: Device AI Score mlpack: scikit_ica mlpack: scikit_qda mlpack: scikit_svm mlpack: scikit_linearridgeregression 1 1a 2 2a 4 356.14 343.81 24.2662 13.412544 1.409518 1.653598 415.582367 16991.1 22.921 193 4730.45 4.101 240.08165 433.03 6515.63 442.63 13.134 41.39 18.22 14.42 10.34 16.08 19.28 5.96 30.60 50.93 18.18 8.65 47.38 41.38 32.96 90.90 18.76 2100.58 18.46 2131.30 10.89 3567.46 10.97 3555.17 43169.99 0.79 42475.84 0.8 305 441 114 7399 6153 47.092 1282.471 3201.121 1139 572 1711 65.57 30.66 31.47 3.50 358.60 362.28 24.2519 13.128737 1.430858 1.644848 458.382121 17878.6 22.365 194 4673.13 4.696 179.71648 421.38 6616.65 441.37 12.931 43.17 18.17 14.33 10.82 15.71 20.15 6.11 30.38 51.92 18.02 8.62 48.24 42.91 34.73 91.14 18.79 2097.44 18.44 2138.44 11.02 3524.64 363.19 357.00 24.2219 13.391599 1.438047 1.671065 443.167887 17815.5 22.909 193 4678.03 4.759 182.43483 405.38 6393.69 432.65 12.822 39.28 17.11 13.93 10.01 15.18 18.22 5.79 28.40 50.10 16.78 8.27 41.65 40.14 30.90 89.61 18.78 2102.66 18.45 2136.83 11.02 3538.09 10.91 3563.66 30539.47 0.96 42412.47 0.8 301 443 113 7328 6771 44.978 1292.22 3185.066 1146 582 1728 69.04 33.88 31.13 3.50 359.89 365.79 24.2219 12.952265 1.407187 1.664530 408.710375 17495.6 23.154 194 4684.87 4.370 186.67090 440.88 6495.99 430.14 13.258 41.05 18.00 14.68 10.27 15.74 19.14 5.77 29.62 52.95 17.42 8.89 46.98 40.37 33.16 90.84 18.83 2100.78 18.40 2138.20 11.00 3533.40 10.84 3585.49 30845.88 0.96 42558.09 0.8 300 434 113 7393 6810 45.523 1281.034 3188.631 1135 577 1712 66.04 32.57 32.12 3.41 OpenBenchmarking.org
IOR Block Size: 2MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 2MB - Disk Target: Default Test Directory 2a 4 2 1 80 160 240 320 400 SE +/- 4.82, N = 3 SE +/- 2.90, N = 3 SE +/- 1.88, N = 3 SE +/- 3.31, N = 15 363.19 359.89 358.60 356.14 MIN: 314.55 / MAX: 619.81 MIN: 313.46 / MAX: 610.7 MIN: 316.53 / MAX: 627.12 MIN: 304.39 / MAX: 641.59 1. (CC) gcc options: -O2 -lm -pthread -lmpi
IOR Block Size: 4MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 4MB - Disk Target: Default Test Directory 4 2 2a 1 80 160 240 320 400 SE +/- 2.06, N = 3 SE +/- 2.35, N = 14 SE +/- 1.97, N = 3 SE +/- 2.36, N = 3 365.79 362.28 357.00 343.81 MIN: 305.34 / MAX: 640.45 MIN: 299.19 / MAX: 670.93 MIN: 303.98 / MAX: 645.01 MIN: 284.82 / MAX: 637.37 1. (CC) gcc options: -O2 -lm -pthread -lmpi
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 1a 2 4 2a 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 24.27 24.25 24.22 24.22 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM 4 2 2a 1a 3 6 9 12 15 SE +/- 0.12, N = 15 SE +/- 0.13, N = 12 SE +/- 0.04, N = 3 SE +/- 0.11, N = 3 12.95 13.13 13.39 13.41 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 4 1a 2 2a 0.3236 0.6472 0.9708 1.2944 1.618 SE +/- 0.016684, N = 3 SE +/- 0.015811, N = 4 SE +/- 0.015699, N = 5 SE +/- 0.011700, N = 9 1.407187 1.409518 1.430858 1.438047 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil 2 1a 4 2a 0.376 0.752 1.128 1.504 1.88 SE +/- 0.010033, N = 3 SE +/- 0.011350, N = 3 SE +/- 0.001988, N = 3 SE +/- 0.006902, N = 3 1.644848 1.653598 1.664530 1.671065 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding 4 1a 2a 2 100 200 300 400 500 SE +/- 8.08, N = 9 SE +/- 20.24, N = 6 SE +/- 16.35, N = 9 SE +/- 17.63, N = 6 408.71 415.58 443.17 458.38 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small 2 2a 4 1a 4K 8K 12K 16K 20K SE +/- 343.64, N = 15 SE +/- 313.32, N = 15 SE +/- 298.66, N = 15 SE +/- 154.34, N = 15 17878.6 17815.5 17495.6 16991.1 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
Nebular Empirical Analysis Tool OpenBenchmarking.org Seconds, Fewer Is Better Nebular Empirical Analysis Tool 2020-02-29 2 2a 1a 4 6 12 18 24 30 SE +/- 0.12, N = 3 SE +/- 0.25, N = 15 SE +/- 0.21, N = 15 SE +/- 0.30, N = 15 22.37 22.91 22.92 23.15 1. (F9X) gfortran options: -cpp -ffree-line-length-0 -Jsource/ -fopenmp -O3 -fno-backtrace
Monte Carlo Simulations of Ionised Nebulae Input: Dust 2D tau100.0 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2019-03-24 Input: Dust 2D tau100.0 1a 2a 2 4 40 80 120 160 200 SE +/- 0.58, N = 3 193 193 194 194 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O3 -O2 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU 1a 4 2a 2 1000 2000 3000 4000 5000 SE +/- 13.93, N = 3 SE +/- 19.25, N = 3 SE +/- 13.38, N = 3 SE +/- 9.91, N = 3 4730.45 4684.87 4678.03 4673.13 1. (CXX) g++ options: -rdynamic
ArrayFire Test: Conjugate Gradient CPU OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient CPU 1a 4 2 2a 1.0708 2.1416 3.2124 4.2832 5.354 SE +/- 0.035, N = 3 SE +/- 0.148, N = 12 SE +/- 0.122, N = 15 SE +/- 0.132, N = 15 4.101 4.370 4.696 4.759 1. (CXX) g++ options: -rdynamic
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 2 2a 4 1a 50 100 150 200 250 SE +/- 0.46, N = 3 SE +/- 1.38, N = 12 SE +/- 4.30, N = 12 SE +/- 9.43, N = 12 179.72 182.43 186.67 240.08
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping 4 1a 2 2a 100 200 300 400 500 SE +/- 4.85, N = 4 SE +/- 5.41, N = 3 SE +/- 5.68, N = 15 SE +/- 7.34, N = 15 440.88 433.03 421.38 405.38 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image 2 1a 4 2a 1400 2800 4200 5600 7000 SE +/- 71.05, N = 12 SE +/- 66.31, N = 12 SE +/- 71.06, N = 12 SE +/- 76.80, N = 4 6616.65 6515.63 6495.99 6393.69 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster 1a 2 2a 4 100 200 300 400 500 SE +/- 8.55, N = 12 SE +/- 8.80, N = 15 SE +/- 11.68, N = 12 SE +/- 3.31, N = 3 442.63 441.37 432.65 430.14 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
GNU Octave Benchmark OpenBenchmarking.org Seconds, Fewer Is Better GNU Octave Benchmark 5.2.0 2a 2 1a 4 3 6 9 12 15 SE +/- 0.14, N = 20 SE +/- 0.14, N = 25 SE +/- 0.10, N = 25 SE +/- 0.11, N = 25 12.82 12.93 13.13 13.26
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet 2a 4 1a 2 10 20 30 40 50 SE +/- 0.04, N = 3 SE +/- 0.95, N = 13 SE +/- 1.21, N = 12 SE +/- 1.67, N = 9 39.28 41.05 41.39 43.17 MIN: 38.42 / MAX: 68.14 MIN: 37.92 / MAX: 117.92 MIN: 37.92 / MAX: 77.52 MIN: 37.53 / MAX: 168.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 2a 4 2 1a 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.44, N = 13 SE +/- 0.60, N = 9 SE +/- 0.56, N = 12 17.11 18.00 18.17 18.22 MIN: 16.72 / MAX: 38.78 MIN: 16.69 / MAX: 240.12 MIN: 16.66 / MAX: 255.58 MIN: 16.61 / MAX: 93.24 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 2a 2 1a 4 4 8 12 16 20 SE +/- 0.06, N = 3 SE +/- 0.38, N = 9 SE +/- 0.39, N = 12 SE +/- 0.43, N = 13 13.93 14.33 14.42 14.68 MIN: 13.42 / MAX: 42.9 MIN: 13.43 / MAX: 22.53 MIN: 13.27 / MAX: 61.57 MIN: 13.4 / MAX: 403.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 2a 4 1a 2 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.15, N = 13 SE +/- 0.22, N = 12 SE +/- 0.39, N = 9 10.01 10.27 10.34 10.82 MIN: 9.84 / MAX: 11.17 MIN: 9.74 / MAX: 37.86 MIN: 9.78 / MAX: 30.48 MIN: 9.82 / MAX: 284.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet 2a 2 4 1a 4 8 12 16 20 SE +/- 0.06, N = 3 SE +/- 0.16, N = 9 SE +/- 0.27, N = 13 SE +/- 0.43, N = 12 15.18 15.71 15.74 16.08 MIN: 14.91 / MAX: 42.82 MIN: 14.8 / MAX: 118.47 MIN: 14.84 / MAX: 49.53 MIN: 14.77 / MAX: 63.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 2a 4 1a 2 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.38, N = 13 SE +/- 0.53, N = 12 SE +/- 0.83, N = 9 18.22 19.14 19.28 20.15 MIN: 17.86 / MAX: 36.46 MIN: 17.72 / MAX: 51.28 MIN: 17.82 / MAX: 54.47 MIN: 17.74 / MAX: 131.21 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface 4 2a 1a 2 2 4 6 8 10 SE +/- 0.11, N = 13 SE +/- 0.21, N = 3 SE +/- 0.16, N = 12 SE +/- 0.21, N = 9 5.77 5.79 5.96 6.11 MIN: 5.48 / MAX: 33.8 MIN: 5.5 / MAX: 7.01 MIN: 5.47 / MAX: 8.13 MIN: 5.43 / MAX: 33.62 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet 2a 4 2 1a 7 14 21 28 35 SE +/- 1.36, N = 3 SE +/- 1.00, N = 13 SE +/- 1.67, N = 9 SE +/- 1.15, N = 12 28.40 29.62 30.38 30.60 MIN: 26.44 / MAX: 59.77 MIN: 26.59 / MAX: 97.53 MIN: 26.25 / MAX: 69.1 MIN: 25.89 / MAX: 363.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 2a 1a 2 4 12 24 36 48 60 SE +/- 0.86, N = 3 SE +/- 0.90, N = 12 SE +/- 1.35, N = 9 SE +/- 0.97, N = 13 50.10 50.93 51.92 52.95 MIN: 46.01 / MAX: 368.47 MIN: 44.34 / MAX: 470.99 MIN: 45.29 / MAX: 485.69 MIN: 44.1 / MAX: 1134.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 2a 4 2 1a 4 8 12 16 20 SE +/- 0.61, N = 3 SE +/- 0.52, N = 13 SE +/- 0.79, N = 9 SE +/- 0.60, N = 12 16.78 17.42 18.02 18.18 MIN: 15.83 / MAX: 38.93 MIN: 15.8 / MAX: 53.79 MIN: 15.61 / MAX: 52.43 MIN: 15.63 / MAX: 50.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet 2a 2 1a 4 2 4 6 8 10 SE +/- 0.61, N = 3 SE +/- 0.34, N = 9 SE +/- 0.24, N = 12 SE +/- 0.27, N = 13 8.27 8.62 8.65 8.89 MIN: 7.56 / MAX: 10.65 MIN: 7.59 / MAX: 41.13 MIN: 7.58 / MAX: 29.2 MIN: 7.58 / MAX: 47.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 2a 4 1a 2 11 22 33 44 55 SE +/- 0.06, N = 3 SE +/- 1.26, N = 13 SE +/- 0.97, N = 12 SE +/- 1.43, N = 9 41.65 46.98 47.38 48.24 MIN: 40.83 / MAX: 76.33 MIN: 40.68 / MAX: 309.92 MIN: 40.86 / MAX: 304.03 MIN: 40.1 / MAX: 658.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny 2a 4 1a 2 10 20 30 40 50 SE +/- 3.34, N = 3 SE +/- 1.29, N = 13 SE +/- 1.35, N = 12 SE +/- 1.43, N = 9 40.14 40.37 41.38 42.91 MIN: 35.96 / MAX: 676.26 MIN: 36.01 / MAX: 1384.64 MIN: 35.51 / MAX: 1384.91 MIN: 35.84 / MAX: 953.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd 2a 1a 4 2 8 16 24 32 40 SE +/- 0.05, N = 3 SE +/- 0.91, N = 12 SE +/- 0.94, N = 13 SE +/- 1.38, N = 9 30.90 32.96 33.16 34.73 MIN: 30.39 / MAX: 61.66 MIN: 30.31 / MAX: 200.45 MIN: 30.14 / MAX: 141.88 MIN: 29.98 / MAX: 260.21 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m 2a 4 1a 2 20 40 60 80 100 SE +/- 0.70, N = 3 SE +/- 0.70, N = 13 SE +/- 0.56, N = 12 SE +/- 0.96, N = 9 89.61 90.84 90.90 91.14 MIN: 86.83 / MAX: 232.99 MIN: 85.68 / MAX: 182.76 MIN: 87.44 / MAX: 198.82 MIN: 86.2 / MAX: 160.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenVINO Model: Face Detection 0106 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP16 - Device: CPU 4 2 2a 1a 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.11, N = 3 SE +/- 0.12, N = 3 SE +/- 0.09, N = 3 18.83 18.79 18.78 18.76 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP16 - Device: CPU 2 1a 4 2a 500 1000 1500 2000 2500 SE +/- 8.24, N = 3 SE +/- 4.10, N = 3 SE +/- 1.04, N = 3 SE +/- 8.74, N = 3 2097.44 2100.58 2100.78 2102.66 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP32 - Device: CPU 1a 2a 2 4 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.02, N = 3 18.46 18.45 18.44 18.40 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP32 - Device: CPU 1a 2a 4 2 500 1000 1500 2000 2500 SE +/- 5.79, N = 3 SE +/- 5.92, N = 3 SE +/- 1.18, N = 3 SE +/- 2.13, N = 3 2131.30 2136.83 2138.20 2138.44 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP16 - Device: CPU 2a 2 4 1a 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 11.02 11.02 11.00 10.89 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP16 - Device: CPU 2 4 2a 1a 800 1600 2400 3200 4000 SE +/- 6.11, N = 3 SE +/- 12.37, N = 3 SE +/- 12.83, N = 3 SE +/- 15.45, N = 3 3524.64 3533.40 3538.09 3567.46 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP32 - Device: CPU 1a 2a 4 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 10.97 10.91 10.84 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP32 - Device: CPU 1a 2a 4 800 1600 2400 3200 4000 SE +/- 9.50, N = 3 SE +/- 1.74, N = 3 SE +/- 5.28, N = 3 3555.17 3563.66 3585.49 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 1a 4 2a 9K 18K 27K 36K 45K SE +/- 109.04, N = 3 SE +/- 320.36, N = 3 SE +/- 274.91, N = 3 43169.99 30845.88 30539.47 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 1a 2a 4 0.216 0.432 0.648 0.864 1.08 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 0.79 0.96 0.96 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU 4 1a 2a 9K 18K 27K 36K 45K SE +/- 63.17, N = 3 SE +/- 114.87, N = 3 SE +/- 56.80, N = 3 42558.09 42475.84 42412.47 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU 1a 2a 4 0.18 0.36 0.54 0.72 0.9 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.8 0.8 0.8 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 1a 2a 4 70 140 210 280 350 SE +/- 1.44, N = 3 SE +/- 3.79, N = 3 SE +/- 3.91, N = 3 305 301 300 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 2a 1a 4 100 200 300 400 500 SE +/- 5.39, N = 3 SE +/- 3.33, N = 3 SE +/- 7.23, N = 12 443 441 434 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1a 4 2a 30 60 90 120 150 SE +/- 0.29, N = 3 SE +/- 0.44, N = 3 SE +/- 0.76, N = 3 114 113 113 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 1a 4 2a 1600 3200 4800 6400 8000 SE +/- 28.82, N = 3 SE +/- 16.54, N = 3 SE +/- 86.98, N = 3 7399 7393 7328 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 4 2a 1a 1500 3000 4500 6000 7500 SE +/- 22.93, N = 3 SE +/- 45.37, N = 3 SE +/- 234.15, N = 12 6810 6771 6153 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ECP-CANDLE Benchmark: P1B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P1B2 2a 4 1a 11 22 33 44 55 44.98 45.52 47.09
ECP-CANDLE Benchmark: P3B1 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B1 4 1a 2a 300 600 900 1200 1500 1281.03 1282.47 1292.22
ECP-CANDLE Benchmark: P3B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B2 2a 4 1a 700 1400 2100 2800 3500 3185.07 3188.63 3201.12
AI Benchmark Alpha Device Inference Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Inference Score 2a 1a 4 200 400 600 800 1000 1146 1139 1135
AI Benchmark Alpha Device Training Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Training Score 2a 4 1a 130 260 390 520 650 582 577 572
AI Benchmark Alpha Device AI Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device AI Score 2a 4 1a 400 800 1200 1600 2000 1728 1712 1711
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica 1a 4 2a 15 30 45 60 75 SE +/- 0.50, N = 15 SE +/- 0.88, N = 15 SE +/- 0.81, N = 4 65.57 66.04 69.04
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda 1a 4 2a 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.30, N = 3 SE +/- 0.42, N = 3 30.66 32.57 33.88
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm 2a 1a 4 7 14 21 28 35 SE +/- 0.23, N = 15 SE +/- 0.32, N = 5 SE +/- 0.22, N = 13 31.13 31.47 32.12
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression 4 1a 2a 0.7875 1.575 2.3625 3.15 3.9375 SE +/- 0.05, N = 12 SE +/- 0.04, N = 15 SE +/- 0.07, N = 15 3.41 3.50 3.50
Phoronix Test Suite v10.8.5