hpc-xeon 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2105039-IB-HPCXEON8261&rdt&grr .
hpc-xeon Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution 1 1a 2 2a 4 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Device 0998 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN 2 x 7682GB INTEL SSDPF2KX076TZ + 2 x 800GB INTEL SSDPF21Q800GB + 3841GB Micron_9300_MTFDHAL3T8TDP + 960GB INTEL SSDSC2KG96 ASPEED 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Ubuntu 20.04 5.11.0-051100-generic (x86_64) GNOME Shell 3.36.4 X Server 1.20.8 GCC 9.3.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - 1, 2, 2a, 4: NONE / errors=remount-ro,relatime,rw / Block Size: 4096 Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0xd000270 Python Details - Python 2.7.18 + Python 3.8.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
hpc-xeon parboil: OpenMP MRI Gridding ecp-candle: P3B2 deepspeech: CPU ai-benchmark: Device AI Score ai-benchmark: Device Training Score ai-benchmark: Device Inference Score daphne: OpenMP - Points2Image ecp-candle: P3B1 ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet mlpack: scikit_linearridgeregression mlpack: scikit_ica hpcg: onnx: bertsquad-10 - OpenMP CPU onnx: super-resolution-10 - OpenMP CPU mocassin: Dust 2D tau100.0 ior: 4MB - Default Test Directory daphne: OpenMP - Euclidean Cluster mlpack: scikit_svm minife: Small onnx: fcn-resnet101-11 - OpenMP CPU onnx: yolov4 - OpenMP CPU onnx: shufflenet-v2-10 - OpenMP CPU daphne: OpenMP - NDT Mapping mlpack: scikit_qda octave-benchmark: ior: 2MB - Default Test Directory neat: openvino: Face Detection 0106 FP16 - CPU openvino: Face Detection 0106 FP16 - CPU openvino: Face Detection 0106 FP32 - CPU openvino: Face Detection 0106 FP32 - CPU openvino: Person Detection 0106 FP16 - CPU openvino: Person Detection 0106 FP16 - CPU openvino: Person Detection 0106 FP32 - CPU openvino: Person Detection 0106 FP32 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP32 - CPU openvino: Age Gender Recognition Retail 0013 FP32 - CPU parboil: OpenMP LBM ecp-candle: P1B2 arrayfire: Conjugate Gradient CPU arrayfire: BLAS CPU parboil: OpenMP Stencil parboil: OpenMP CUTCP 1 1a 2 2a 4 343.81 356.14 415.582367 3201.121 240.08165 1711 572 1139 6515.63 1282.471 90.90 32.96 41.38 47.38 8.65 18.18 50.93 30.60 5.96 19.28 16.08 10.34 14.42 18.22 41.39 3.50 65.57 24.2662 441 6153 193 442.63 31.47 16991.1 114 305 7399 433.03 30.66 13.134 22.921 2100.58 18.76 2131.30 18.46 3567.46 10.89 3555.17 10.97 0.79 43169.99 0.8 42475.84 13.412544 47.092 4.101 4730.45 1.653598 1.409518 458.382121 179.71648 6616.65 91.14 34.73 42.91 48.24 8.62 18.02 51.92 30.38 6.11 20.15 15.71 10.82 14.33 18.17 43.17 24.2519 194 362.28 441.37 17878.6 421.38 12.931 358.60 22.365 2097.44 18.79 2138.44 18.44 3524.64 11.02 13.128737 4.696 4673.13 1.644848 1.430858 443.167887 3185.066 182.43483 1728 582 1146 6393.69 1292.22 89.61 30.90 40.14 41.65 8.27 16.78 50.10 28.40 5.79 18.22 15.18 10.01 13.93 17.11 39.28 3.50 69.04 24.2219 443 6771 193 357.00 432.65 31.13 17815.5 113 301 7328 405.38 33.88 12.822 363.19 22.909 2102.66 18.78 2136.83 18.45 3538.09 11.02 3563.66 10.91 0.96 30539.47 0.8 42412.47 13.391599 44.978 4.759 4678.03 1.671065 1.438047 408.710375 3188.631 186.67090 1712 577 1135 6495.99 1281.034 90.84 33.16 40.37 46.98 8.89 17.42 52.95 29.62 5.77 19.14 15.74 10.27 14.68 18.00 41.05 3.41 66.04 24.2219 434 6810 194 365.79 430.14 32.12 17495.6 113 300 7393 440.88 32.57 13.258 359.89 23.154 2100.78 18.83 2138.20 18.40 3533.40 11.00 3585.49 10.84 0.96 30845.88 0.8 42558.09 12.952265 45.523 4.370 4684.87 1.664530 1.407187 OpenBenchmarking.org
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding 1a 2 2a 4 100 200 300 400 500 SE +/- 20.24, N = 6 SE +/- 17.63, N = 6 SE +/- 16.35, N = 9 SE +/- 8.08, N = 9 415.58 458.38 443.17 408.71 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
ECP-CANDLE Benchmark: P3B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B2 1a 2a 4 700 1400 2100 2800 3500 3201.12 3185.07 3188.63
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 1a 2 2a 4 50 100 150 200 250 SE +/- 9.43, N = 12 SE +/- 0.46, N = 3 SE +/- 1.38, N = 12 SE +/- 4.30, N = 12 240.08 179.72 182.43 186.67
AI Benchmark Alpha Device AI Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device AI Score 1a 2a 4 400 800 1200 1600 2000 1711 1728 1712
AI Benchmark Alpha Device Training Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Training Score 1a 2a 4 130 260 390 520 650 572 582 577
AI Benchmark Alpha Device Inference Score OpenBenchmarking.org Score, More Is Better AI Benchmark Alpha 0.1.2 Device Inference Score 1a 2a 4 200 400 600 800 1000 1139 1146 1135
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image 1a 2 2a 4 1400 2800 4200 5600 7000 SE +/- 66.31, N = 12 SE +/- 71.05, N = 12 SE +/- 76.80, N = 4 SE +/- 71.06, N = 12 6515.63 6616.65 6393.69 6495.99 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
ECP-CANDLE Benchmark: P3B1 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P3B1 1a 2a 4 300 600 900 1200 1500 1282.47 1292.22 1281.03
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: regnety_400m 1a 2 2a 4 20 40 60 80 100 SE +/- 0.56, N = 12 SE +/- 0.96, N = 9 SE +/- 0.70, N = 3 SE +/- 0.70, N = 13 90.90 91.14 89.61 90.84 MIN: 87.44 / MAX: 198.82 MIN: 86.2 / MAX: 160.68 MIN: 86.83 / MAX: 232.99 MIN: 85.68 / MAX: 182.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: squeezenet_ssd 1a 2 2a 4 8 16 24 32 40 SE +/- 0.91, N = 12 SE +/- 1.38, N = 9 SE +/- 0.05, N = 3 SE +/- 0.94, N = 13 32.96 34.73 30.90 33.16 MIN: 30.31 / MAX: 200.45 MIN: 29.98 / MAX: 260.21 MIN: 30.39 / MAX: 61.66 MIN: 30.14 / MAX: 141.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: yolov4-tiny 1a 2 2a 4 10 20 30 40 50 SE +/- 1.35, N = 12 SE +/- 1.43, N = 9 SE +/- 3.34, N = 3 SE +/- 1.29, N = 13 41.38 42.91 40.14 40.37 MIN: 35.51 / MAX: 1384.91 MIN: 35.84 / MAX: 953.59 MIN: 35.96 / MAX: 676.26 MIN: 36.01 / MAX: 1384.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet50 1a 2 2a 4 11 22 33 44 55 SE +/- 0.97, N = 12 SE +/- 1.43, N = 9 SE +/- 0.06, N = 3 SE +/- 1.26, N = 13 47.38 48.24 41.65 46.98 MIN: 40.86 / MAX: 304.03 MIN: 40.1 / MAX: 658.14 MIN: 40.83 / MAX: 76.33 MIN: 40.68 / MAX: 309.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: alexnet 1a 2 2a 4 2 4 6 8 10 SE +/- 0.24, N = 12 SE +/- 0.34, N = 9 SE +/- 0.61, N = 3 SE +/- 0.27, N = 13 8.65 8.62 8.27 8.89 MIN: 7.58 / MAX: 29.2 MIN: 7.59 / MAX: 41.13 MIN: 7.56 / MAX: 10.65 MIN: 7.58 / MAX: 47.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: resnet18 1a 2 2a 4 4 8 12 16 20 SE +/- 0.60, N = 12 SE +/- 0.79, N = 9 SE +/- 0.61, N = 3 SE +/- 0.52, N = 13 18.18 18.02 16.78 17.42 MIN: 15.63 / MAX: 50.57 MIN: 15.61 / MAX: 52.43 MIN: 15.83 / MAX: 38.93 MIN: 15.8 / MAX: 53.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: vgg16 1a 2 2a 4 12 24 36 48 60 SE +/- 0.90, N = 12 SE +/- 1.35, N = 9 SE +/- 0.86, N = 3 SE +/- 0.97, N = 13 50.93 51.92 50.10 52.95 MIN: 44.34 / MAX: 470.99 MIN: 45.29 / MAX: 485.69 MIN: 46.01 / MAX: 368.47 MIN: 44.1 / MAX: 1134.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: googlenet 1a 2 2a 4 7 14 21 28 35 SE +/- 1.15, N = 12 SE +/- 1.67, N = 9 SE +/- 1.36, N = 3 SE +/- 1.00, N = 13 30.60 30.38 28.40 29.62 MIN: 25.89 / MAX: 363.03 MIN: 26.25 / MAX: 69.1 MIN: 26.44 / MAX: 59.77 MIN: 26.59 / MAX: 97.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: blazeface 1a 2 2a 4 2 4 6 8 10 SE +/- 0.16, N = 12 SE +/- 0.21, N = 9 SE +/- 0.21, N = 3 SE +/- 0.11, N = 13 5.96 6.11 5.79 5.77 MIN: 5.47 / MAX: 8.13 MIN: 5.43 / MAX: 33.62 MIN: 5.5 / MAX: 7.01 MIN: 5.48 / MAX: 33.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: efficientnet-b0 1a 2 2a 4 5 10 15 20 25 SE +/- 0.53, N = 12 SE +/- 0.83, N = 9 SE +/- 0.03, N = 3 SE +/- 0.38, N = 13 19.28 20.15 18.22 19.14 MIN: 17.82 / MAX: 54.47 MIN: 17.74 / MAX: 131.21 MIN: 17.86 / MAX: 36.46 MIN: 17.72 / MAX: 51.28 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mnasnet 1a 2 2a 4 4 8 12 16 20 SE +/- 0.43, N = 12 SE +/- 0.16, N = 9 SE +/- 0.06, N = 3 SE +/- 0.27, N = 13 16.08 15.71 15.18 15.74 MIN: 14.77 / MAX: 63.14 MIN: 14.8 / MAX: 118.47 MIN: 14.91 / MAX: 42.82 MIN: 14.84 / MAX: 49.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: shufflenet-v2 1a 2 2a 4 3 6 9 12 15 SE +/- 0.22, N = 12 SE +/- 0.39, N = 9 SE +/- 0.03, N = 3 SE +/- 0.15, N = 13 10.34 10.82 10.01 10.27 MIN: 9.78 / MAX: 30.48 MIN: 9.82 / MAX: 284.11 MIN: 9.84 / MAX: 11.17 MIN: 9.74 / MAX: 37.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v3-v3 - Model: mobilenet-v3 1a 2 2a 4 4 8 12 16 20 SE +/- 0.39, N = 12 SE +/- 0.38, N = 9 SE +/- 0.06, N = 3 SE +/- 0.43, N = 13 14.42 14.33 13.93 14.68 MIN: 13.27 / MAX: 61.57 MIN: 13.43 / MAX: 22.53 MIN: 13.42 / MAX: 42.9 MIN: 13.4 / MAX: 403.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU-v2-v2 - Model: mobilenet-v2 1a 2 2a 4 4 8 12 16 20 SE +/- 0.56, N = 12 SE +/- 0.60, N = 9 SE +/- 0.07, N = 3 SE +/- 0.44, N = 13 18.22 18.17 17.11 18.00 MIN: 16.61 / MAX: 93.24 MIN: 16.66 / MAX: 255.58 MIN: 16.72 / MAX: 38.78 MIN: 16.69 / MAX: 240.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20201218 Target: CPU - Model: mobilenet 1a 2 2a 4 10 20 30 40 50 SE +/- 1.21, N = 12 SE +/- 1.67, N = 9 SE +/- 0.04, N = 3 SE +/- 0.95, N = 13 41.39 43.17 39.28 41.05 MIN: 37.92 / MAX: 77.52 MIN: 37.53 / MAX: 168.48 MIN: 38.42 / MAX: 68.14 MIN: 37.92 / MAX: 117.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Mlpack Benchmark Benchmark: scikit_linearridgeregression OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_linearridgeregression 1a 2a 4 0.7875 1.575 2.3625 3.15 3.9375 SE +/- 0.04, N = 15 SE +/- 0.07, N = 15 SE +/- 0.05, N = 12 3.50 3.50 3.41
Mlpack Benchmark Benchmark: scikit_ica OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_ica 1a 2a 4 15 30 45 60 75 SE +/- 0.50, N = 15 SE +/- 0.81, N = 4 SE +/- 0.88, N = 15 65.57 69.04 66.04
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 1a 2 2a 4 6 12 18 24 30 SE +/- 0.05, N = 3 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 24.27 24.25 24.22 24.22 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi
ONNX Runtime Model: bertsquad-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: bertsquad-10 - Device: OpenMP CPU 1a 2a 4 100 200 300 400 500 SE +/- 3.33, N = 3 SE +/- 5.39, N = 3 SE +/- 7.23, N = 12 441 443 434 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: super-resolution-10 - Device: OpenMP CPU 1a 2a 4 1500 3000 4500 6000 7500 SE +/- 234.15, N = 12 SE +/- 45.37, N = 3 SE +/- 22.93, N = 3 6153 6771 6810 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
Monte Carlo Simulations of Ionised Nebulae Input: Dust 2D tau100.0 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2019-03-24 Input: Dust 2D tau100.0 1a 2 2a 4 40 80 120 160 200 SE +/- 0.58, N = 3 193 194 193 194 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O3 -O2 -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
IOR Block Size: 4MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 4MB - Disk Target: Default Test Directory 1 2 2a 4 80 160 240 320 400 SE +/- 2.36, N = 3 SE +/- 2.35, N = 14 SE +/- 1.97, N = 3 SE +/- 2.06, N = 3 343.81 362.28 357.00 365.79 MIN: 284.82 / MAX: 637.37 MIN: 299.19 / MAX: 670.93 MIN: 303.98 / MAX: 645.01 MIN: 305.34 / MAX: 640.45 1. (CC) gcc options: -O2 -lm -pthread -lmpi
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster 1a 2 2a 4 100 200 300 400 500 SE +/- 8.55, N = 12 SE +/- 8.80, N = 15 SE +/- 11.68, N = 12 SE +/- 3.31, N = 3 442.63 441.37 432.65 430.14 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Mlpack Benchmark Benchmark: scikit_svm OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_svm 1a 2a 4 7 14 21 28 35 SE +/- 0.32, N = 5 SE +/- 0.23, N = 15 SE +/- 0.22, N = 13 31.47 31.13 32.12
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small 1a 2 2a 4 4K 8K 12K 16K 20K SE +/- 154.34, N = 15 SE +/- 343.64, N = 15 SE +/- 313.32, N = 15 SE +/- 298.66, N = 15 16991.1 17878.6 17815.5 17495.6 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
ONNX Runtime Model: fcn-resnet101-11 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: fcn-resnet101-11 - Device: OpenMP CPU 1a 2a 4 30 60 90 120 150 SE +/- 0.29, N = 3 SE +/- 0.76, N = 3 SE +/- 0.44, N = 3 114 113 113 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: yolov4 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: yolov4 - Device: OpenMP CPU 1a 2a 4 70 140 210 280 350 SE +/- 1.44, N = 3 SE +/- 3.79, N = 3 SE +/- 3.91, N = 3 305 301 300 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
ONNX Runtime Model: shufflenet-v2-10 - Device: OpenMP CPU OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.6 Model: shufflenet-v2-10 - Device: OpenMP CPU 1a 2a 4 1600 3200 4800 6400 8000 SE +/- 28.82, N = 3 SE +/- 86.98, N = 3 SE +/- 16.54, N = 3 7399 7328 7393 1. (CXX) g++ options: -fopenmp -ffunction-sections -fdata-sections -O3 -ldl -lrt
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping 1a 2 2a 4 100 200 300 400 500 SE +/- 5.41, N = 3 SE +/- 5.68, N = 15 SE +/- 7.34, N = 15 SE +/- 4.85, N = 4 433.03 421.38 405.38 440.88 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Mlpack Benchmark Benchmark: scikit_qda OpenBenchmarking.org Seconds, Fewer Is Better Mlpack Benchmark Benchmark: scikit_qda 1a 2a 4 8 16 24 32 40 SE +/- 0.02, N = 3 SE +/- 0.42, N = 3 SE +/- 0.30, N = 3 30.66 33.88 32.57
GNU Octave Benchmark OpenBenchmarking.org Seconds, Fewer Is Better GNU Octave Benchmark 5.2.0 1a 2 2a 4 3 6 9 12 15 SE +/- 0.10, N = 25 SE +/- 0.14, N = 25 SE +/- 0.14, N = 20 SE +/- 0.11, N = 25 13.13 12.93 12.82 13.26
IOR Block Size: 2MB - Disk Target: Default Test Directory OpenBenchmarking.org MB/s, More Is Better IOR 3.3.0 Block Size: 2MB - Disk Target: Default Test Directory 1 2 2a 4 80 160 240 320 400 SE +/- 3.31, N = 15 SE +/- 1.88, N = 3 SE +/- 4.82, N = 3 SE +/- 2.90, N = 3 356.14 358.60 363.19 359.89 MIN: 304.39 / MAX: 641.59 MIN: 316.53 / MAX: 627.12 MIN: 314.55 / MAX: 619.81 MIN: 313.46 / MAX: 610.7 1. (CC) gcc options: -O2 -lm -pthread -lmpi
Nebular Empirical Analysis Tool OpenBenchmarking.org Seconds, Fewer Is Better Nebular Empirical Analysis Tool 2020-02-29 1a 2 2a 4 6 12 18 24 30 SE +/- 0.21, N = 15 SE +/- 0.12, N = 3 SE +/- 0.25, N = 15 SE +/- 0.30, N = 15 22.92 22.37 22.91 23.15 1. (F9X) gfortran options: -cpp -ffree-line-length-0 -Jsource/ -fopenmp -O3 -fno-backtrace
OpenVINO Model: Face Detection 0106 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP16 - Device: CPU 1a 2 2a 4 500 1000 1500 2000 2500 SE +/- 4.10, N = 3 SE +/- 8.24, N = 3 SE +/- 8.74, N = 3 SE +/- 1.04, N = 3 2100.58 2097.44 2102.66 2100.78 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP16 - Device: CPU 1a 2 2a 4 5 10 15 20 25 SE +/- 0.09, N = 3 SE +/- 0.11, N = 3 SE +/- 0.12, N = 3 SE +/- 0.03, N = 3 18.76 18.79 18.78 18.83 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP32 - Device: CPU 1a 2 2a 4 500 1000 1500 2000 2500 SE +/- 5.79, N = 3 SE +/- 2.13, N = 3 SE +/- 5.92, N = 3 SE +/- 1.18, N = 3 2131.30 2138.44 2136.83 2138.20 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Face Detection 0106 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Face Detection 0106 FP32 - Device: CPU 1a 2 2a 4 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 SE +/- 0.10, N = 3 SE +/- 0.02, N = 3 18.46 18.44 18.45 18.40 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP16 - Device: CPU 1a 2 2a 4 800 1600 2400 3200 4000 SE +/- 15.45, N = 3 SE +/- 6.11, N = 3 SE +/- 12.83, N = 3 SE +/- 12.37, N = 3 3567.46 3524.64 3538.09 3533.40 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP16 - Device: CPU 1a 2 2a 4 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 10.89 11.02 11.02 11.00 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP32 - Device: CPU 1a 2a 4 800 1600 2400 3200 4000 SE +/- 9.50, N = 3 SE +/- 1.74, N = 3 SE +/- 5.28, N = 3 3555.17 3563.66 3585.49 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Person Detection 0106 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Person Detection 0106 FP32 - Device: CPU 1a 2a 4 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 10.97 10.91 10.84 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 1a 2a 4 0.216 0.432 0.648 0.864 1.08 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 0.79 0.96 0.96 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU 1a 2a 4 9K 18K 27K 36K 45K SE +/- 109.04, N = 3 SE +/- 274.91, N = 3 SE +/- 320.36, N = 3 43169.99 30539.47 30845.88 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU 1a 2a 4 0.18 0.36 0.54 0.72 0.9 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 0.8 0.8 0.8 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
OpenVINO Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2021.1 Model: Age Gender Recognition Retail 0013 FP32 - Device: CPU 1a 2a 4 9K 18K 27K 36K 45K SE +/- 114.87, N = 3 SE +/- 56.80, N = 3 SE +/- 63.17, N = 3 42475.84 42412.47 42558.09 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -pie -pthread -lpthread
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM 1a 2 2a 4 3 6 9 12 15 SE +/- 0.11, N = 3 SE +/- 0.13, N = 12 SE +/- 0.04, N = 3 SE +/- 0.12, N = 15 13.41 13.13 13.39 12.95 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
ECP-CANDLE Benchmark: P1B2 OpenBenchmarking.org Seconds, Fewer Is Better ECP-CANDLE 0.3 Benchmark: P1B2 1a 2a 4 11 22 33 44 55 47.09 44.98 45.52
ArrayFire Test: Conjugate Gradient CPU OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.7 Test: Conjugate Gradient CPU 1a 2 2a 4 1.0708 2.1416 3.2124 4.2832 5.354 SE +/- 0.035, N = 3 SE +/- 0.122, N = 15 SE +/- 0.132, N = 15 SE +/- 0.148, N = 12 4.101 4.696 4.759 4.370 1. (CXX) g++ options: -rdynamic
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU 1a 2 2a 4 1000 2000 3000 4000 5000 SE +/- 13.93, N = 3 SE +/- 9.91, N = 3 SE +/- 13.38, N = 3 SE +/- 19.25, N = 3 4730.45 4673.13 4678.03 4684.87 1. (CXX) g++ options: -rdynamic
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil 1a 2 2a 4 0.376 0.752 1.128 1.504 1.88 SE +/- 0.011350, N = 3 SE +/- 0.010033, N = 3 SE +/- 0.006902, N = 3 SE +/- 0.001988, N = 3 1.653598 1.644848 1.671065 1.664530 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 1a 2 2a 4 0.3236 0.6472 0.9708 1.2944 1.618 SE +/- 0.015811, N = 4 SE +/- 0.015699, N = 5 SE +/- 0.011700, N = 9 SE +/- 0.016684, N = 3 1.409518 1.430858 1.438047 1.407187 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Phoronix Test Suite v10.8.5