hpc-xeon 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2105039-IB-HPCXEON8261&gru&rdt .
hpc-xeon Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution 1 1a 2 2a 4 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Device 0998 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN 2 x 7682GB INTEL SSDPF2KX076TZ + 2 x 800GB INTEL SSDPF21Q800GB + 3841GB Micron_9300_MTFDHAL3T8TDP + 960GB INTEL SSDSC2KG96 ASPEED 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Ubuntu 20.04 5.11.0-051100-generic (x86_64) GNOME Shell 3.36.4 X Server 1.20.8 GCC 9.3.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - 1, 2, 2a, 4: NONE / errors=remount-ro,relatime,rw / Block Size: 4096 Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0xd000270 Python Details - Python 2.7.18 + Python 3.8.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
hpc-xeon minife: Small openvino: Face Detection 0106 FP16 - CPU openvino: Face Detection 0106 FP32 - CPU openvino: Person Detection 0106 FP16 - CPU openvino: Person Detection 0106 FP32 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP32 - CPU hpcg: arrayfire: BLAS CPU onnx: yolov4 - OpenMP CPU onnx: bertsquad-10 - OpenMP CPU onnx: fcn-resnet101-11 - OpenMP CPU onnx: shufflenet-v2-10 - OpenMP CPU onnx: super-resolution-10 - OpenMP CPU ior: 2MB - Default Test Directory ior: 4MB - Default Test Directory ai-benchmark: Device Inference Score ai-benchmark: Device Training Score ai-benchmark: Device AI Score daphne: OpenMP - NDT Mapping daphne: OpenMP - Points2Image daphne: OpenMP - Euclidean Cluster arrayfire: Conjugate Gradient CPU ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m openvino: Face Detection 0106 FP16 - CPU openvino: Face Detection 0106 FP32 - CPU openvino: Person Detection 0106 FP16 - CPU openvino: Person Detection 0106 FP32 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP32 - CPU parboil: OpenMP LBM parboil: OpenMP CUTCP parboil: OpenMP Stencil parboil: OpenMP MRI Gridding neat: mocassin: Dust 2D tau100.0 deepspeech: CPU octave-benchmark: ecp-candle: P1B2 ecp-candle: P3B1 ecp-candle: P3B2 mlpack: scikit_ica mlpack: scikit_qda mlpack: scikit_svm mlpack: scikit_linearridgeregression 1 1a 2 2a 4 356.14 343.81 16991.1 18.76 18.46 10.89 10.97 43169.99 42475.84 24.2662 4730.45 305 441 114 7399 6153 1139 572 1711 433.03 6515.63 442.63 4.101 41.39 18.22 14.42 10.34 16.08 19.28 5.96 30.60 50.93 18.18 8.65 47.38 41.38 32.96 90.90 2100.58 2131.30 3567.46 3555.17 0.79 0.8 13.412544 1.409518 1.653598 415.582367 22.921 193 240.08165 13.134 47.092 1282.471 3201.121 65.57 30.66 31.47 3.50 17878.6 18.79 18.44 11.02 24.2519 4673.13 358.60 362.28 421.38 6616.65 441.37 4.696 43.17 18.17 14.33 10.82 15.71 20.15 6.11 30.38 51.92 18.02 8.62 48.24 42.91 34.73 91.14 2097.44 2138.44 3524.64 13.128737 1.430858 1.644848 458.382121 22.365 194 179.71648 12.931 17815.5 18.78 18.45 11.02 10.91 30539.47 42412.47 24.2219 4678.03 301 443 113 7328 6771 363.19 357.00 1146 582 1728 405.38 6393.69 432.65 4.759 39.28 17.11 13.93 10.01 15.18 18.22 5.79 28.40 50.10 16.78 8.27 41.65 40.14 30.90 89.61 2102.66 2136.83 3538.09 3563.66 0.96 0.8 13.391599 1.438047 1.671065 443.167887 22.909 193 182.43483 12.822 44.978 1292.22 3185.066 69.04 33.88 31.13 3.50 17495.6 18.83 18.40 11.00 10.84 30845.88 42558.09 24.2219 4684.87 300 434 113 7393 6810 359.89 365.79 1135 577 1712 440.88 6495.99 430.14 4.370 41.05 18.00 14.68 10.27 15.74 19.14 5.77 29.62 52.95 17.42 8.89 46.98 40.37 33.16 90.84 2100.78 2138.20 3533.40 3585.49 0.96 0.8 12.952265 1.407187 1.664530 408.710375 23.154 194 186.67090 13.258 45.523 1281.034 3188.631 66.04 32.57 32.12 3.41 OpenBenchmarking.org
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small 1a 2 2a 4 4K 8K 12K 16K 20K SE +/- 154.34, N = 15 SE +/- 343.64, N = 15 SE +/- 313.32, N = 15 SE +/- 298.66, N = 15 16991.1 17878.6 17815.5 17495.6 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
OpenVINO Model: Face Detection 0106 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP16 - Device: CPU 1a 2 2a 4 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.11, N = 3 SE +/- 0.12, N = 3 SE +/- 0.03, N = 3 18.76 18.79 18.78 18.83 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP32 - Device: CPU 1a 2 2a 4 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 SE +/- 0.02, N = 3 18.46 18.44 18.45 18.40 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP16 - Device: CPU 1a 2 2a 4 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 10.89 11.02 11.02 11.00 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP32 - Device: CPU 1a 2a 4 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 10.97 10.91 10.84 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 1a 2a 4 9K 18K 27K 36K 45K SE +/- 109.04, N = 3 SE +/- 274.91, N = 3 SE +/- 320.36, N = 3 43169.99 30539.47 30845.88 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU 1a 2a 4 9K 18K 27K 36K 45K SE +/- 114.87, N = 3 SE +/- 56.80, N = 3 SE +/- 63.17, N = 3 42475.84 42412.47 42558.09 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 1a 2 2a 4 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 24.27 24.25 24.22 24.22 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU 1a 2 2a 4 1000 2000 3000 4000 5000 SE +/- 13.93, N = 3 SE +/- 9.91, N = 3 SE +/- 13.38, N = 3 SE +/- 19.25, N = 3 4730.45 4673.13 4678.03 4684.87 1. (CXX) g++ options: -rdynamic
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 1a 2a 4 70 140 210 280 350 SE +/- 1.44, N = 3 SE +/- 3.79, N = 3 SE +/- 3.91, N = 3 305 301 300 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 1a 2a 4 100 200 300 400 500 SE +/- 3.33, N = 3 SE +/- 5.39, N = 3 SE +/- 7.23, N = 12 441 443 434 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1a 2a 4 30 60 90 120 150 SE +/- 0.29, N = 3 SE +/- 0.76, N = 3 SE +/- 0.44, N = 3 114 113 113 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 1a 2a 4 1600 3200 4800 6400 8000 SE +/- 28.82, N = 3 SE +/- 86.98, N = 3 SE +/- 16.54, N = 3 7399 7328 7393 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 1a 2a 4 1500 3000 4500 6000 7500 SE +/- 234.15, N = 12 SE +/- 45.37, N = 3 SE +/- 22.93, N = 3 6153 6771 6810 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
IOR Block Size: 2MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 2MB - Disk Target: Default Test Directory 1 2 2a 4 80 160 240 320 400 SE +/- 3.31, N = 15 SE +/- 1.88, N = 3 SE +/- 4.82, N = 3 SE +/- 2.90, N = 3 356.14 358.60 363.19 359.89 MIN: 304.39 / MAX: 641.59 MIN: 316.53 / MAX: 627.12 MIN: 314.55 / MAX: 619.81 MIN: 313.46 / MAX: 610.7 1. (CC) gcc options: -O2 -lm -pthread -lmpi
IOR Block Size: 4MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 4MB - Disk Target: Default Test Directory 1 2 2a 4 80 160 240 320 400 SE +/- 2.36, N = 3 SE +/- 2.35, N = 14 SE +/- 1.97, N = 3 SE +/- 2.06, N = 3 343.81 362.28 357.00 365.79 MIN: 284.82 / MAX: 637.37 MIN: 299.19 / MAX: 670.93 MIN: 303.98 / MAX: 645.01 MIN: 305.34 / MAX: 640.45 1. (CC) gcc options: -O2 -lm -pthread -lmpi
AI Benchmark Alpha Device Inference Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Inference Score 1a 2a 4 200 400 600 800 1000 1139 1146 1135
AI Benchmark Alpha Device Training Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Training Score 1a 2a 4 130 260 390 520 650 572 582 577
AI Benchmark Alpha Device AI Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device AI Score 1a 2a 4 400 800 1200 1600 2000 1711 1728 1712
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping 1a 2 2a 4 100 200 300 400 500 SE +/- 5.41, N = 3 SE +/- 5.68, N = 15 SE +/- 7.34, N = 15 SE +/- 4.85, N = 4 433.03 421.38 405.38 440.88 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image 1a 2 2a 4 1400 2800 4200 5600 7000 SE +/- 66.31, N = 12 SE +/- 71.05, N = 12 SE +/- 76.80, N = 4 SE +/- 71.06, N = 12 6515.63 6616.65 6393.69 6495.99 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster 1a 2 2a 4 100 200 300 400 500 SE +/- 8.55, N = 12 SE +/- 8.80, N = 15 SE +/- 11.68, N = 12 SE +/- 3.31, N = 3 442.63 441.37 432.65 430.14 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
ArrayFire Test: Conjugate Gradient CPU OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient CPU 1a 2 2a 4 1.0708 2.1416 3.2124 4.2832 5.354 SE +/- 0.035, N = 3 SE +/- 0.122, N = 15 SE +/- 0.132, N = 15 SE +/- 0.148, N = 12 4.101 4.696 4.759 4.370 1. (CXX) g++ options: -rdynamic
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet 1a 2 2a 4 10 20 30 40 50 SE +/- 1.21, N = 12 SE +/- 1.67, N = 9 SE +/- 0.04, N = 3 SE +/- 0.95, N = 13 41.39 43.17 39.28 41.05 MIN: 37.92 / MAX: 77.52 MIN: 37.53 / MAX: 168.48 MIN: 38.42 / MAX: 68.14 MIN: 37.92 / MAX: 117.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 1a 2 2a 4 4 8 12 16 20 SE +/- 0.56, N = 12 SE +/- 0.60, N = 9 SE +/- 0.07, N = 3 SE +/- 0.44, N = 13 18.22 18.17 17.11 18.00 MIN: 16.61 / MAX: 93.24 MIN: 16.66 / MAX: 255.58 MIN: 16.72 / MAX: 38.78 MIN: 16.69 / MAX: 240.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 1a 2 2a 4 4 8 12 16 20 SE +/- 0.39, N = 12 SE +/- 0.38, N = 9 SE +/- 0.06, N = 3 SE +/- 0.43, N = 13 14.42 14.33 13.93 14.68 MIN: 13.27 / MAX: 61.57 MIN: 13.43 / MAX: 22.53 MIN: 13.42 / MAX: 42.9 MIN: 13.4 / MAX: 403.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 1a 2 2a 4 3 6 9 12 15 SE +/- 0.22, N = 12 SE +/- 0.39, N = 9 SE +/- 0.03, N = 3 SE +/- 0.15, N = 13 10.34 10.82 10.01 10.27 MIN: 9.78 / MAX: 30.48 MIN: 9.82 / MAX: 284.11 MIN: 9.84 / MAX: 11.17 MIN: 9.74 / MAX: 37.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet 1a 2 2a 4 4 8 12 16 20 SE +/- 0.43, N = 12 SE +/- 0.16, N = 9 SE +/- 0.06, N = 3 SE +/- 0.27, N = 13 16.08 15.71 15.18 15.74 MIN: 14.77 / MAX: 63.14 MIN: 14.8 / MAX: 118.47 MIN: 14.91 / MAX: 42.82 MIN: 14.84 / MAX: 49.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 1a 2 2a 4 5 10 15 20 25 SE +/- 0.53, N = 12 SE +/- 0.83, N = 9 SE +/- 0.03, N = 3 SE +/- 0.38, N = 13 19.28 20.15 18.22 19.14 MIN: 17.82 / MAX: 54.47 MIN: 17.74 / MAX: 131.21 MIN: 17.86 / MAX: 36.46 MIN: 17.72 / MAX: 51.28 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface 1a 2 2a 4 2 4 6 8 10 SE +/- 0.16, N = 12 SE +/- 0.21, N = 9 SE +/- 0.21, N = 3 SE +/- 0.11, N = 13 5.96 6.11 5.79 5.77 MIN: 5.47 / MAX: 8.13 MIN: 5.43 / MAX: 33.62 MIN: 5.5 / MAX: 7.01 MIN: 5.48 / MAX: 33.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet 1a 2 2a 4 7 14 21 28 35 SE +/- 1.15, N = 12 SE +/- 1.67, N = 9 SE +/- 1.36, N = 3 SE +/- 1.00, N = 13 30.60 30.38 28.40 29.62 MIN: 25.89 / MAX: 363.03 MIN: 26.25 / MAX: 69.1 MIN: 26.44 / MAX: 59.77 MIN: 26.59 / MAX: 97.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 1a 2 2a 4 12 24 36 48 60 SE +/- 0.90, N = 12 SE +/- 1.35, N = 9 SE +/- 0.86, N = 3 SE +/- 0.97, N = 13 50.93 51.92 50.10 52.95 MIN: 44.34 / MAX: 470.99 MIN: 45.29 / MAX: 485.69 MIN: 46.01 / MAX: 368.47 MIN: 44.1 / MAX: 1134.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 1a 2 2a 4 4 8 12 16 20 SE +/- 0.60, N = 12 SE +/- 0.79, N = 9 SE +/- 0.61, N = 3 SE +/- 0.52, N = 13 18.18 18.02 16.78 17.42 MIN: 15.63 / MAX: 50.57 MIN: 15.61 / MAX: 52.43 MIN: 15.83 / MAX: 38.93 MIN: 15.8 / MAX: 53.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet 1a 2 2a 4 2 4 6 8 10 SE +/- 0.24, N = 12 SE +/- 0.34, N = 9 SE +/- 0.61, N = 3 SE +/- 0.27, N = 13 8.65 8.62 8.27 8.89 MIN: 7.58 / MAX: 29.2 MIN: 7.59 / MAX: 41.13 MIN: 7.56 / MAX: 10.65 MIN: 7.58 / MAX: 47.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 1a 2 2a 4 11 22 33 44 55 SE +/- 0.97, N = 12 SE +/- 1.43, N = 9 SE +/- 0.06, N = 3 SE +/- 1.26, N = 13 47.38 48.24 41.65 46.98 MIN: 40.86 / MAX: 304.03 MIN: 40.1 / MAX: 658.14 MIN: 40.83 / MAX: 76.33 MIN: 40.68 / MAX: 309.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny 1a 2 2a 4 10 20 30 40 50 SE +/- 1.35, N = 12 SE +/- 1.43, N = 9 SE +/- 3.34, N = 3 SE +/- 1.29, N = 13 41.38 42.91 40.14 40.37 MIN: 35.51 / MAX: 1384.91 MIN: 35.84 / MAX: 953.59 MIN: 35.96 / MAX: 676.26 MIN: 36.01 / MAX: 1384.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd 1a 2 2a 4 8 16 24 32 40 SE +/- 0.91, N = 12 SE +/- 1.38, N = 9 SE +/- 0.05, N = 3 SE +/- 0.94, N = 13 32.96 34.73 30.90 33.16 MIN: 30.31 / MAX: 200.45 MIN: 29.98 / MAX: 260.21 MIN: 30.39 / MAX: 61.66 MIN: 30.14 / MAX: 141.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m 1a 2 2a 4 20 40 60 80 100 SE +/- 0.56, N = 12 SE +/- 0.96, N = 9 SE +/- 0.70, N = 3 SE +/- 0.70, N = 13 90.90 91.14 89.61 90.84 MIN: 87.44 / MAX: 198.82 MIN: 86.2 / MAX: 160.68 MIN: 86.83 / MAX: 232.99 MIN: 85.68 / MAX: 182.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
OpenVINO Model: Face Detection 0106 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP16 - Device: CPU 1a 2 2a 4 500 1000 1500 2000 2500 SE +/- 4.10, N = 3 SE +/- 8.24, N = 3 SE +/- 8.74, N = 3 SE +/- 1.04, N = 3 2100.58 2097.44 2102.66 2100.78 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP32 - Device: CPU 1a 2 2a 4 500 1000 1500 2000 2500 SE +/- 5.79, N = 3 SE +/- 2.13, N = 3 SE +/- 5.92, N = 3 SE +/- 1.18, N = 3 2131.30 2138.44 2136.83 2138.20 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP16 - Device: CPU 1a 2 2a 4 800 1600 2400 3200 4000 SE +/- 15.45, N = 3 SE +/- 6.11, N = 3 SE +/- 12.83, N = 3 SE +/- 12.37, N = 3 3567.46 3524.64 3538.09 3533.40 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP32 - Device: CPU 1a 2a 4 800 1600 2400 3200 4000 SE +/- 9.50, N = 3 SE +/- 1.74, N = 3 SE +/- 5.28, N = 3 3555.17 3563.66 3585.49 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 1a 2a 4 0.216 0.432 0.648 0.864 1.08 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 0.79 0.96 0.96 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU 1a 2a 4 0.18 0.36 0.54 0.72 0.9 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.8 0.8 0.8 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM 1a 2 2a 4 3 6 9 12 15 SE +/- 0.11, N = 3 SE +/- 0.13, N = 12 SE +/- 0.04, N = 3 SE +/- 0.12, N = 15 13.41 13.13 13.39 12.95 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 1a 2 2a 4 0.3236 0.6472 0.9708 1.2944 1.618 SE +/- 0.015811, N = 4 SE +/- 0.015699, N = 5 SE +/- 0.011700, N = 9 SE +/- 0.016684, N = 3 1.409518 1.430858 1.438047 1.407187 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil 1a 2 2a 4 0.376 0.752 1.128 1.504 1.88 SE +/- 0.011350, N = 3 SE +/- 0.010033, N = 3 SE +/- 0.006902, N = 3 SE +/- 0.001988, N = 3 1.653598 1.644848 1.671065 1.664530 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding 1a 2 2a 4 100 200 300 400 500 SE +/- 20.24, N = 6 SE +/- 17.63, N = 6 SE +/- 16.35, N = 9 SE +/- 8.08, N = 9 415.58 458.38 443.17 408.71 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Nebular Empirical Analysis Tool OpenBenchmarking.org Seconds, Fewer Is Better Nebular Empirical Analysis Tool 2020-02-29 1a 2 2a 4 6 12 18 24 30 SE +/- 0.21, N = 15 SE +/- 0.12, N = 3 SE +/- 0.25, N = 15 SE +/- 0.30, N = 15 22.92 22.37 22.91 23.15 1. (F9X) gfortran options: -cpp -ffree-line-length-0 -Jsource/ -fopenmp -O3 -fno-backtrace
Monte Carlo Simulations of Ionised Nebulae Input: Dust 2D tau100.0 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2019-03-24 Input: Dust 2D tau100.0 1a 2 2a 4 40 80 120 160 200 SE +/- 0.58, N = 3 193 194 193 194 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O3 -O2 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 1a 2 2a 4 50 100 150 200 250 SE +/- 9.43, N = 12 SE +/- 0.46, N = 3 SE +/- 1.38, N = 12 SE +/- 4.30, N = 12 240.08 179.72 182.43 186.67
GNU Octave Benchmark OpenBenchmarking.org Seconds, Fewer Is Better GNU Octave Benchmark 5.2.0 1a 2 2a 4 3 6 9 12 15 SE +/- 0.10, N = 25 SE +/- 0.14, N = 25 SE +/- 0.14, N = 20 SE +/- 0.11, N = 25 13.13 12.93 12.82 13.26
ECP-CANDLE Benchmark: P1B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P1B2 1a 2a 4 11 22 33 44 55 47.09 44.98 45.52
ECP-CANDLE Benchmark: P3B1 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B1 1a 2a 4 300 600 900 1200 1500 1282.47 1292.22 1281.03
ECP-CANDLE Benchmark: P3B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B2 1a 2a 4 700 1400 2100 2800 3500 3201.12 3185.07 3188.63
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica 1a 2a 4 15 30 45 60 75 SE +/- 0.50, N = 15 SE +/- 0.81, N = 4 SE +/- 0.88, N = 15 65.57 69.04 66.04
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda 1a 2a 4 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.42, N = 3 SE +/- 0.30, N = 3 30.66 33.88 32.57
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm 1a 2a 4 7 14 21 28 35 SE +/- 0.32, N = 5 SE +/- 0.23, N = 15 SE +/- 0.22, N = 13 31.47 31.13 32.12
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression 1a 2a 4 0.7875 1.575 2.3625 3.15 3.9375 SE +/- 0.04, N = 15 SE +/- 0.07, N = 15 SE +/- 0.05, N = 12 3.50 3.50 3.41
Phoronix Test Suite v10.8.5