hpc-xeon 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2105039-IB-HPCXEON8261&grs&sro .
hpc-xeon Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution 1 1a 2 2a 4 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Device 0998 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN 2 x 7682GB INTEL SSDPF2KX076TZ + 2 x 800GB INTEL SSDPF21Q800GB + 3841GB Micron_9300_MTFDHAL3T8TDP + 960GB INTEL SSDSC2KG96 ASPEED 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Ubuntu 20.04 5.11.0-051100-generic (x86_64) GNOME Shell 3.36.4 X Server 1.20.8 GCC 9.3.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - 1, 2, 2a, 4: NONE / errors=remount-ro,relatime,rw / Block Size: 4096 Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0xd000270 Python Details - Python 2.7.18 + Python 3.8.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
hpc-xeon openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU mlpack: scikit_qda ior: 4MB - Default Test Directory mlpack: scikit_ica ecp-candle: P1B2 parboil: OpenMP LBM neat: daphne: OpenMP - Points2Image octave-benchmark: mlpack: scikit_svm parboil: OpenMP CUTCP onnx: bertsquad-10 - OpenMP CPU ior: 2MB - Default Test Directory ai-benchmark: Device Training Score ncnn: CPU - regnety_400m onnx: yolov4 - OpenMP CPU parboil: OpenMP Stencil arrayfire: BLAS CPU openvino: Person Detection 0106 FP16 - CPU openvino: Person Detection 0106 FP32 - CPU openvino: Person Detection 0106 FP16 - CPU ai-benchmark: Device AI Score ai-benchmark: Device Inference Score onnx: shufflenet-v2-10 - OpenMP CPU onnx: fcn-resnet101-11 - OpenMP CPU ecp-candle: P3B1 openvino: Person Detection 0106 FP32 - CPU mocassin: Dust 2D tau100.0 ecp-candle: P3B2 openvino: Face Detection 0106 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP32 - CPU openvino: Face Detection 0106 FP32 - CPU openvino: Face Detection 0106 FP32 - CPU openvino: Face Detection 0106 FP16 - CPU hpcg: openvino: Age Gender Recognition Retail 0013 FP32 - CPU mlpack: scikit_linearridgeregression onnx: super-resolution-10 - OpenMP CPU ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet daphne: OpenMP - Euclidean Cluster daphne: OpenMP - NDT Mapping deepspeech: CPU arrayfire: Conjugate Gradient CPU minife: Small parboil: OpenMP MRI Gridding 1 1a 2 2a 4 343.81 356.14 43169.99 0.79 30.66 65.57 47.092 13.412544 22.921 6515.63 13.134 31.47 1.409518 441 572 90.90 305 1.653598 4730.45 3567.46 10.97 10.89 1711 1139 7399 114 1282.471 3555.17 193 3201.121 18.76 42475.84 2131.30 18.46 2100.58 24.2662 0.8 3.50 6153 32.96 41.38 47.38 8.65 18.18 50.93 30.60 5.96 19.28 16.08 10.34 14.42 18.22 41.39 442.63 433.03 240.08165 4.101 16991.1 415.582367 362.28 13.128737 22.365 6616.65 12.931 1.430858 358.60 91.14 1.644848 4673.13 3524.64 11.02 194 18.79 2138.44 18.44 2097.44 24.2519 34.73 42.91 48.24 8.62 18.02 51.92 30.38 6.11 20.15 15.71 10.82 14.33 18.17 43.17 441.37 421.38 179.71648 4.696 17878.6 458.382121 30539.47 0.96 33.88 357.00 69.04 44.978 13.391599 22.909 6393.69 12.822 31.13 1.438047 443 363.19 582 89.61 301 1.671065 4678.03 3538.09 10.91 11.02 1728 1146 7328 113 1292.22 3563.66 193 3185.066 18.78 42412.47 2136.83 18.45 2102.66 24.2219 0.8 3.50 6771 30.90 40.14 41.65 8.27 16.78 50.10 28.40 5.79 18.22 15.18 10.01 13.93 17.11 39.28 432.65 405.38 182.43483 4.759 17815.5 443.167887 30845.88 0.96 32.57 365.79 66.04 45.523 12.952265 23.154 6495.99 13.258 32.12 1.407187 434 359.89 577 90.84 300 1.664530 4684.87 3533.40 10.84 11.00 1712 1135 7393 113 1281.034 3585.49 194 3188.631 18.83 42558.09 2138.20 18.40 2100.78 24.2219 0.8 3.41 6810 33.16 40.37 46.98 8.89 17.42 52.95 29.62 5.77 19.14 15.74 10.27 14.68 18.00 41.05 430.14 440.88 186.67090 4.370 17495.6 408.710375 OpenBenchmarking.org
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 1a 2a 4 9K 18K 27K 36K 45K SE +/- 109.04, N = 3 SE +/- 274.91, N = 3 SE +/- 320.36, N = 3 43169.99 30539.47 30845.88 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 1a 2a 4 0.216 0.432 0.648 0.864 1.08 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 0.79 0.96 0.96 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda 1a 2a 4 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.42, N = 3 SE +/- 0.30, N = 3 30.66 33.88 32.57
IOR Block Size: 4MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 4MB - Disk Target: Default Test Directory 1 2 2a 4 80 160 240 320 400 SE +/- 2.36, N = 3 SE +/- 2.35, N = 14 SE +/- 1.97, N = 3 SE +/- 2.06, N = 3 343.81 362.28 357.00 365.79 MIN: 284.82 / MAX: 637.37 MIN: 299.19 / MAX: 670.93 MIN: 303.98 / MAX: 645.01 MIN: 305.34 / MAX: 640.45 1. (CC) gcc options: -O2 -lm -pthread -lmpi
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica 1a 2a 4 15 30 45 60 75 SE +/- 0.50, N = 15 SE +/- 0.81, N = 4 SE +/- 0.88, N = 15 65.57 69.04 66.04
ECP-CANDLE Benchmark: P1B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P1B2 1a 2a 4 11 22 33 44 55 47.09 44.98 45.52
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM 1a 2 2a 4 3 6 9 12 15 SE +/- 0.11, N = 3 SE +/- 0.13, N = 12 SE +/- 0.04, N = 3 SE +/- 0.12, N = 15 13.41 13.13 13.39 12.95 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Nebular Empirical Analysis Tool OpenBenchmarking.org Seconds, Fewer Is Better Nebular Empirical Analysis Tool 2020-02-29 1a 2 2a 4 6 12 18 24 30 SE +/- 0.21, N = 15 SE +/- 0.12, N = 3 SE +/- 0.25, N = 15 SE +/- 0.30, N = 15 22.92 22.37 22.91 23.15 1. (F9X) gfortran options: -cpp -ffree-line-length-0 -Jsource/ -fopenmp -O3 -fno-backtrace
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image 1a 2 2a 4 1400 2800 4200 5600 7000 SE +/- 66.31, N = 12 SE +/- 71.05, N = 12 SE +/- 76.80, N = 4 SE +/- 71.06, N = 12 6515.63 6616.65 6393.69 6495.99 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
GNU Octave Benchmark OpenBenchmarking.org Seconds, Fewer Is Better GNU Octave Benchmark 5.2.0 1a 2 2a 4 3 6 9 12 15 SE +/- 0.10, N = 25 SE +/- 0.14, N = 25 SE +/- 0.14, N = 20 SE +/- 0.11, N = 25 13.13 12.93 12.82 13.26
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm 1a 2a 4 7 14 21 28 35 SE +/- 0.32, N = 5 SE +/- 0.23, N = 15 SE +/- 0.22, N = 13 31.47 31.13 32.12
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 1a 2 2a 4 0.3236 0.6472 0.9708 1.2944 1.618 SE +/- 0.015811, N = 4 SE +/- 0.015699, N = 5 SE +/- 0.011700, N = 9 SE +/- 0.016684, N = 3 1.409518 1.430858 1.438047 1.407187 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 1a 2a 4 100 200 300 400 500 SE +/- 3.33, N = 3 SE +/- 5.39, N = 3 SE +/- 7.23, N = 12 441 443 434 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
IOR Block Size: 2MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 2MB - Disk Target: Default Test Directory 1 2 2a 4 80 160 240 320 400 SE +/- 3.31, N = 15 SE +/- 1.88, N = 3 SE +/- 4.82, N = 3 SE +/- 2.90, N = 3 356.14 358.60 363.19 359.89 MIN: 304.39 / MAX: 641.59 MIN: 316.53 / MAX: 627.12 MIN: 314.55 / MAX: 619.81 MIN: 313.46 / MAX: 610.7 1. (CC) gcc options: -O2 -lm -pthread -lmpi
AI Benchmark Alpha Device Training Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Training Score 1a 2a 4 130 260 390 520 650 572 582 577
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m 1a 2 2a 4 20 40 60 80 100 SE +/- 0.56, N = 12 SE +/- 0.96, N = 9 SE +/- 0.70, N = 3 SE +/- 0.70, N = 13 90.90 91.14 89.61 90.84 MIN: 87.44 / MAX: 198.82 MIN: 86.2 / MAX: 160.68 MIN: 86.83 / MAX: 232.99 MIN: 85.68 / MAX: 182.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 1a 2a 4 70 140 210 280 350 SE +/- 1.44, N = 3 SE +/- 3.79, N = 3 SE +/- 3.91, N = 3 305 301 300 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil 1a 2 2a 4 0.376 0.752 1.128 1.504 1.88 SE +/- 0.011350, N = 3 SE +/- 0.010033, N = 3 SE +/- 0.006902, N = 3 SE +/- 0.001988, N = 3 1.653598 1.644848 1.671065 1.664530 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU 1a 2 2a 4 1000 2000 3000 4000 5000 SE +/- 13.93, N = 3 SE +/- 9.91, N = 3 SE +/- 13.38, N = 3 SE +/- 19.25, N = 3 4730.45 4673.13 4678.03 4684.87 1. (CXX) g++ options: -rdynamic
OpenVINO Model: Person Detection 0106 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP16 - Device: CPU 1a 2 2a 4 800 1600 2400 3200 4000 SE +/- 15.45, N = 3 SE +/- 6.11, N = 3 SE +/- 12.83, N = 3 SE +/- 12.37, N = 3 3567.46 3524.64 3538.09 3533.40 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP32 - Device: CPU 1a 2a 4 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 10.97 10.91 10.84 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP16 - Device: CPU 1a 2 2a 4 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 10.89 11.02 11.02 11.00 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
AI Benchmark Alpha Device AI Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device AI Score 1a 2a 4 400 800 1200 1600 2000 1711 1728 1712
AI Benchmark Alpha Device Inference Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Inference Score 1a 2a 4 200 400 600 800 1000 1139 1146 1135
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 1a 2a 4 1600 3200 4800 6400 8000 SE +/- 28.82, N = 3 SE +/- 86.98, N = 3 SE +/- 16.54, N = 3 7399 7328 7393 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1a 2a 4 30 60 90 120 150 SE +/- 0.29, N = 3 SE +/- 0.76, N = 3 SE +/- 0.44, N = 3 114 113 113 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ECP-CANDLE Benchmark: P3B1 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B1 1a 2a 4 300 600 900 1200 1500 1282.47 1292.22 1281.03
OpenVINO Model: Person Detection 0106 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP32 - Device: CPU 1a 2a 4 800 1600 2400 3200 4000 SE +/- 9.50, N = 3 SE +/- 1.74, N = 3 SE +/- 5.28, N = 3 3555.17 3563.66 3585.49 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
Monte Carlo Simulations of Ionised Nebulae Input: Dust 2D tau100.0 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2019-03-24 Input: Dust 2D tau100.0 1a 2 2a 4 40 80 120 160 200 SE +/- 0.58, N = 3 193 194 193 194 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O3 -O2 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
ECP-CANDLE Benchmark: P3B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B2 1a 2a 4 700 1400 2100 2800 3500 3201.12 3185.07 3188.63
OpenVINO Model: Face Detection 0106 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP16 - Device: CPU 1a 2 2a 4 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.11, N = 3 SE +/- 0.12, N = 3 SE +/- 0.03, N = 3 18.76 18.79 18.78 18.83 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU 1a 2a 4 9K 18K 27K 36K 45K SE +/- 114.87, N = 3 SE +/- 56.80, N = 3 SE +/- 63.17, N = 3 42475.84 42412.47 42558.09 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP32 - Device: CPU 1a 2 2a 4 500 1000 1500 2000 2500 SE +/- 5.79, N = 3 SE +/- 2.13, N = 3 SE +/- 5.92, N = 3 SE +/- 1.18, N = 3 2131.30 2138.44 2136.83 2138.20 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP32 - Device: CPU 1a 2 2a 4 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 SE +/- 0.02, N = 3 18.46 18.44 18.45 18.40 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP16 - Device: CPU 1a 2 2a 4 500 1000 1500 2000 2500 SE +/- 4.10, N = 3 SE +/- 8.24, N = 3 SE +/- 8.74, N = 3 SE +/- 1.04, N = 3 2100.58 2097.44 2102.66 2100.78 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 1a 2 2a 4 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 24.27 24.25 24.22 24.22 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi
OpenVINO Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU 1a 2a 4 0.18 0.36 0.54 0.72 0.9 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.8 0.8 0.8 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression 1a 2a 4 0.7875 1.575 2.3625 3.15 3.9375 SE +/- 0.04, N = 15 SE +/- 0.07, N = 15 SE +/- 0.05, N = 12 3.50 3.50 3.41
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 1a 2a 4 1500 3000 4500 6000 7500 SE +/- 234.15, N = 12 SE +/- 45.37, N = 3 SE +/- 22.93, N = 3 6153 6771 6810 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd 1a 2 2a 4 8 16 24 32 40 SE +/- 0.91, N = 12 SE +/- 1.38, N = 9 SE +/- 0.05, N = 3 SE +/- 0.94, N = 13 32.96 34.73 30.90 33.16 MIN: 30.31 / MAX: 200.45 MIN: 29.98 / MAX: 260.21 MIN: 30.39 / MAX: 61.66 MIN: 30.14 / MAX: 141.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny 1a 2 2a 4 10 20 30 40 50 SE +/- 1.35, N = 12 SE +/- 1.43, N = 9 SE +/- 3.34, N = 3 SE +/- 1.29, N = 13 41.38 42.91 40.14 40.37 MIN: 35.51 / MAX: 1384.91 MIN: 35.84 / MAX: 953.59 MIN: 35.96 / MAX: 676.26 MIN: 36.01 / MAX: 1384.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 1a 2 2a 4 11 22 33 44 55 SE +/- 0.97, N = 12 SE +/- 1.43, N = 9 SE +/- 0.06, N = 3 SE +/- 1.26, N = 13 47.38 48.24 41.65 46.98 MIN: 40.86 / MAX: 304.03 MIN: 40.1 / MAX: 658.14 MIN: 40.83 / MAX: 76.33 MIN: 40.68 / MAX: 309.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet 1a 2 2a 4 2 4 6 8 10 SE +/- 0.24, N = 12 SE +/- 0.34, N = 9 SE +/- 0.61, N = 3 SE +/- 0.27, N = 13 8.65 8.62 8.27 8.89 MIN: 7.58 / MAX: 29.2 MIN: 7.59 / MAX: 41.13 MIN: 7.56 / MAX: 10.65 MIN: 7.58 / MAX: 47.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 1a 2 2a 4 4 8 12 16 20 SE +/- 0.60, N = 12 SE +/- 0.79, N = 9 SE +/- 0.61, N = 3 SE +/- 0.52, N = 13 18.18 18.02 16.78 17.42 MIN: 15.63 / MAX: 50.57 MIN: 15.61 / MAX: 52.43 MIN: 15.83 / MAX: 38.93 MIN: 15.8 / MAX: 53.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 1a 2 2a 4 12 24 36 48 60 SE +/- 0.90, N = 12 SE +/- 1.35, N = 9 SE +/- 0.86, N = 3 SE +/- 0.97, N = 13 50.93 51.92 50.10 52.95 MIN: 44.34 / MAX: 470.99 MIN: 45.29 / MAX: 485.69 MIN: 46.01 / MAX: 368.47 MIN: 44.1 / MAX: 1134.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet 1a 2 2a 4 7 14 21 28 35 SE +/- 1.15, N = 12 SE +/- 1.67, N = 9 SE +/- 1.36, N = 3 SE +/- 1.00, N = 13 30.60 30.38 28.40 29.62 MIN: 25.89 / MAX: 363.03 MIN: 26.25 / MAX: 69.1 MIN: 26.44 / MAX: 59.77 MIN: 26.59 / MAX: 97.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface 1a 2 2a 4 2 4 6 8 10 SE +/- 0.16, N = 12 SE +/- 0.21, N = 9 SE +/- 0.21, N = 3 SE +/- 0.11, N = 13 5.96 6.11 5.79 5.77 MIN: 5.47 / MAX: 8.13 MIN: 5.43 / MAX: 33.62 MIN: 5.5 / MAX: 7.01 MIN: 5.48 / MAX: 33.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 1a 2 2a 4 5 10 15 20 25 SE +/- 0.53, N = 12 SE +/- 0.83, N = 9 SE +/- 0.03, N = 3 SE +/- 0.38, N = 13 19.28 20.15 18.22 19.14 MIN: 17.82 / MAX: 54.47 MIN: 17.74 / MAX: 131.21 MIN: 17.86 / MAX: 36.46 MIN: 17.72 / MAX: 51.28 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet 1a 2 2a 4 4 8 12 16 20 SE +/- 0.43, N = 12 SE +/- 0.16, N = 9 SE +/- 0.06, N = 3 SE +/- 0.27, N = 13 16.08 15.71 15.18 15.74 MIN: 14.77 / MAX: 63.14 MIN: 14.8 / MAX: 118.47 MIN: 14.91 / MAX: 42.82 MIN: 14.84 / MAX: 49.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 1a 2 2a 4 3 6 9 12 15 SE +/- 0.22, N = 12 SE +/- 0.39, N = 9 SE +/- 0.03, N = 3 SE +/- 0.15, N = 13 10.34 10.82 10.01 10.27 MIN: 9.78 / MAX: 30.48 MIN: 9.82 / MAX: 284.11 MIN: 9.84 / MAX: 11.17 MIN: 9.74 / MAX: 37.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 1a 2 2a 4 4 8 12 16 20 SE +/- 0.39, N = 12 SE +/- 0.38, N = 9 SE +/- 0.06, N = 3 SE +/- 0.43, N = 13 14.42 14.33 13.93 14.68 MIN: 13.27 / MAX: 61.57 MIN: 13.43 / MAX: 22.53 MIN: 13.42 / MAX: 42.9 MIN: 13.4 / MAX: 403.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 1a 2 2a 4 4 8 12 16 20 SE +/- 0.56, N = 12 SE +/- 0.60, N = 9 SE +/- 0.07, N = 3 SE +/- 0.44, N = 13 18.22 18.17 17.11 18.00 MIN: 16.61 / MAX: 93.24 MIN: 16.66 / MAX: 255.58 MIN: 16.72 / MAX: 38.78 MIN: 16.69 / MAX: 240.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet 1a 2 2a 4 10 20 30 40 50 SE +/- 1.21, N = 12 SE +/- 1.67, N = 9 SE +/- 0.04, N = 3 SE +/- 0.95, N = 13 41.39 43.17 39.28 41.05 MIN: 37.92 / MAX: 77.52 MIN: 37.53 / MAX: 168.48 MIN: 38.42 / MAX: 68.14 MIN: 37.92 / MAX: 117.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster 1a 2 2a 4 100 200 300 400 500 SE +/- 8.55, N = 12 SE +/- 8.80, N = 15 SE +/- 11.68, N = 12 SE +/- 3.31, N = 3 442.63 441.37 432.65 430.14 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping 1a 2 2a 4 100 200 300 400 500 SE +/- 5.41, N = 3 SE +/- 5.68, N = 15 SE +/- 7.34, N = 15 SE +/- 4.85, N = 4 433.03 421.38 405.38 440.88 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 1a 2 2a 4 50 100 150 200 250 SE +/- 9.43, N = 12 SE +/- 0.46, N = 3 SE +/- 1.38, N = 12 SE +/- 4.30, N = 12 240.08 179.72 182.43 186.67
ArrayFire Test: Conjugate Gradient CPU OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient CPU 1a 2 2a 4 1.0708 2.1416 3.2124 4.2832 5.354 SE +/- 0.035, N = 3 SE +/- 0.122, N = 15 SE +/- 0.132, N = 15 SE +/- 0.148, N = 12 4.101 4.696 4.759 4.370 1. (CXX) g++ options: -rdynamic
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small 1a 2 2a 4 4K 8K 12K 16K 20K SE +/- 154.34, N = 15 SE +/- 343.64, N = 15 SE +/- 313.32, N = 15 SE +/- 298.66, N = 15 16991.1 17878.6 17815.5 17495.6 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding 1a 2 2a 4 100 200 300 400 500 SE +/- 20.24, N = 6 SE +/- 17.63, N = 6 SE +/- 16.35, N = 9 SE +/- 8.08, N = 9 415.58 458.38 443.17 408.71 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Phoronix Test Suite v10.8.5