NCNN mit Vulkan AMD Ryzen 9 3950X 16-Core testing with a ASUS ROG CROSSHAIR VIII HERO (WI-FI) (1302 BIOS) and NVIDIA GeForce RTX 2080 Ti 11GB on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2009242-FI-NCNNMITVU25 .
NCNN mit Vulkan Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution TR 3950X + RTX 2080 Ti 2 3 AMD Ryzen 9 3950X 16-Core @ 3.50GHz (16 Cores / 32 Threads) ASUS ROG CROSSHAIR VIII HERO (WI-FI) (1302 BIOS) AMD Starship/Matisse 16GB 2000GB Corsair Force MP600 + 2000GB NVIDIA GeForce RTX 2080 Ti 11GB (1350/7000MHz) NVIDIA TU102 HD Audio DELL P2415Q Realtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 20.04 5.4.0-47-generic (x86_64) GNOME Shell 3.36.4 X Server 1.20.8 NVIDIA 450.66 4.6.0 OpenCL 1.2 CUDA 11.0.228 + OpenCL 2.0 AMD-APP (3182.0) 1.2.133 GCC 9.3.0 + CUDA 11.0 ext4 3840x2160 OpenBenchmarking.org Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8701013 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
NCNN mit Vulkan ncnn: CPU - squeezenet ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny TR 3950X + RTX 2080 Ti 2 3 27.61 17.11 5.83 5.19 5.04 5.23 7.03 1.98 35.00 91.89 17.39 16.74 53.85 28.85 28.16 4.10 2.34 1.60 1.26 1.42 2.52 0.59 35.17 87.17 17.31 2.23 54.18 7.25 26.72 16.68 5.82 5.23 5.04 5.23 7.13 1.99 35.20 85.95 17.34 16.79 53.55 28.62 28.02 4.09 2.27 1.61 1.26 1.42 2.53 0.59 35.29 86.51 17.41 2.32 54.09 7.09 26.42 16.82 6.00 5.31 5.06 5.35 7.25 2.00 35.16 85.97 17.39 16.81 53.22 28.83 28.53 4.10 2.41 1.60 1.27 1.42 2.52 0.58 35.16 87.13 17.65 2.45 54.60 7.13 OpenBenchmarking.org
NCNN Target: CPU - Model: squeezenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: squeezenet TR 3950X + RTX 2080 Ti 2 3 6 12 18 24 30 SE +/- 0.26, N = 3 SE +/- 0.42, N = 3 SE +/- 0.13, N = 3 27.61 26.72 26.42 MIN: 26.1 / MAX: 36.11 MIN: 25.12 / MAX: 36.54 MIN: 24.93 / MAX: 27.32 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mobilenet TR 3950X + RTX 2080 Ti 2 3 4 8 12 16 20 SE +/- 0.14, N = 3 SE +/- 0.20, N = 3 SE +/- 0.26, N = 3 17.11 16.68 16.82 MIN: 16.65 / MAX: 19.36 MIN: 16.11 / MAX: 17.62 MIN: 16.26 / MAX: 24.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v2-v2 - Model: mobilenet-v2 TR 3950X + RTX 2080 Ti 2 3 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 5.83 5.82 6.00 MIN: 5.66 / MAX: 6.57 MIN: 5.59 / MAX: 8.08 MIN: 5.77 / MAX: 7.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU-v3-v3 - Model: mobilenet-v3 TR 3950X + RTX 2080 Ti 2 3 1.1948 2.3896 3.5844 4.7792 5.974 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.09, N = 3 5.19 5.23 5.31 MIN: 5.08 / MAX: 5.38 MIN: 5.11 / MAX: 7.07 MIN: 5.15 / MAX: 14.95 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: shufflenet-v2 TR 3950X + RTX 2080 Ti 2 3 1.1385 2.277 3.4155 4.554 5.6925 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 5.04 5.04 5.06 MIN: 4.89 / MAX: 8.78 MIN: 4.93 / MAX: 5.44 MIN: 4.95 / MAX: 5.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: mnasnet TR 3950X + RTX 2080 Ti 2 3 1.2038 2.4076 3.6114 4.8152 6.019 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.07, N = 3 5.23 5.23 5.35 MIN: 5.12 / MAX: 6.1 MIN: 5.11 / MAX: 5.46 MIN: 5.18 / MAX: 5.7 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: efficientnet-b0 TR 3950X + RTX 2080 Ti 2 3 2 4 6 8 10 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.10, N = 3 7.03 7.13 7.25 MIN: 6.91 / MAX: 7.22 MIN: 7.03 / MAX: 7.74 MIN: 6.97 / MAX: 7.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: blazeface TR 3950X + RTX 2080 Ti 2 3 0.45 0.9 1.35 1.8 2.25 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 1.98 1.99 2.00 MIN: 1.94 / MAX: 2.12 MIN: 1.93 / MAX: 2.11 MIN: 1.95 / MAX: 2.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: googlenet TR 3950X + RTX 2080 Ti 2 3 8 16 24 32 40 SE +/- 0.06, N = 3 SE +/- 0.17, N = 3 SE +/- 0.08, N = 3 35.00 35.20 35.16 MIN: 34.18 / MAX: 35.87 MIN: 34.38 / MAX: 44.97 MIN: 34.6 / MAX: 36.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: vgg16 TR 3950X + RTX 2080 Ti 2 3 20 40 60 80 100 SE +/- 6.10, N = 3 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 91.89 85.95 85.97 MIN: 84.91 / MAX: 1168.4 MIN: 84.99 / MAX: 100.08 MIN: 85.07 / MAX: 95.89 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet18 TR 3950X + RTX 2080 Ti 2 3 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 17.39 17.34 17.39 MIN: 16.97 / MAX: 26.26 MIN: 16.85 / MAX: 27.19 MIN: 17.15 / MAX: 17.62 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: alexnet TR 3950X + RTX 2080 Ti 2 3 4 8 12 16 20 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 SE +/- 0.09, N = 3 16.74 16.79 16.81 MIN: 16.53 / MAX: 17.35 MIN: 16.6 / MAX: 17.47 MIN: 16.5 / MAX: 18.96 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: resnet50 TR 3950X + RTX 2080 Ti 2 3 12 24 36 48 60 SE +/- 0.48, N = 3 SE +/- 0.23, N = 3 SE +/- 0.07, N = 3 53.85 53.55 53.22 MIN: 52.74 / MAX: 64.3 MIN: 52.67 / MAX: 62.67 MIN: 52.57 / MAX: 53.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: CPU - Model: yolov4-tiny TR 3950X + RTX 2080 Ti 2 3 7 14 21 28 35 SE +/- 0.13, N = 3 SE +/- 0.09, N = 3 SE +/- 0.16, N = 3 28.85 28.62 28.83 MIN: 28.43 / MAX: 29.45 MIN: 28.24 / MAX: 37.65 MIN: 28.3 / MAX: 35.97 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: squeezenet TR 3950X + RTX 2080 Ti 2 3 7 14 21 28 35 SE +/- 0.26, N = 3 SE +/- 0.42, N = 3 SE +/- 0.43, N = 3 28.16 28.02 28.53 MIN: 26.66 / MAX: 29.72 MIN: 26.5 / MAX: 33.23 MIN: 26.56 / MAX: 30.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: mobilenet TR 3950X + RTX 2080 Ti 2 3 0.9225 1.845 2.7675 3.69 4.6125 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 4.10 4.09 4.10 MIN: 4.03 / MAX: 7.22 MIN: 4.05 / MAX: 4.84 MIN: 4.06 / MAX: 4.86 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 TR 3950X + RTX 2080 Ti 2 3 0.5423 1.0846 1.6269 2.1692 2.7115 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 SE +/- 0.03, N = 3 2.34 2.27 2.41 MIN: 1.34 / MAX: 15.98 MIN: 1.34 / MAX: 23.06 MIN: 1.35 / MAX: 15.97 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 TR 3950X + RTX 2080 Ti 2 3 0.3623 0.7246 1.0869 1.4492 1.8115 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.60 1.61 1.60 MIN: 1.58 / MAX: 1.74 MIN: 1.59 / MAX: 2.76 MIN: 1.58 / MAX: 2.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: shufflenet-v2 TR 3950X + RTX 2080 Ti 2 3 0.2858 0.5716 0.8574 1.1432 1.429 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 1.26 1.26 1.27 MIN: 1.25 / MAX: 1.29 MIN: 1.25 / MAX: 1.3 MIN: 1.25 / MAX: 2.12 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: mnasnet TR 3950X + RTX 2080 Ti 2 3 0.3195 0.639 0.9585 1.278 1.5975 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 1.42 1.42 1.42 MIN: 1.4 / MAX: 1.51 MIN: 1.41 / MAX: 2.58 MIN: 1.41 / MAX: 1.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: efficientnet-b0 TR 3950X + RTX 2080 Ti 2 3 0.5693 1.1386 1.7079 2.2772 2.8465 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 2.52 2.53 2.52 MIN: 2.51 / MAX: 2.55 MIN: 2.51 / MAX: 2.65 MIN: 2.51 / MAX: 2.79 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: blazeface TR 3950X + RTX 2080 Ti 2 3 0.1328 0.2656 0.3984 0.5312 0.664 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 0.59 0.59 0.58 MIN: 0.56 / MAX: 5.35 MIN: 0.56 / MAX: 8.6 MIN: 0.56 / MAX: 4.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: googlenet TR 3950X + RTX 2080 Ti 2 3 8 16 24 32 40 SE +/- 0.21, N = 3 SE +/- 0.17, N = 3 SE +/- 0.18, N = 3 35.17 35.29 35.16 MIN: 34.11 / MAX: 44.36 MIN: 34.45 / MAX: 43.71 MIN: 34.22 / MAX: 43.96 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: vgg16 TR 3950X + RTX 2080 Ti 2 3 20 40 60 80 100 SE +/- 0.90, N = 3 SE +/- 0.38, N = 3 SE +/- 0.25, N = 3 87.17 86.51 87.13 MIN: 84.9 / MAX: 105.29 MIN: 85.12 / MAX: 96.36 MIN: 85.52 / MAX: 100.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: resnet18 TR 3950X + RTX 2080 Ti 2 3 4 8 12 16 20 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 SE +/- 0.20, N = 3 17.31 17.41 17.65 MIN: 15.73 / MAX: 18.73 MIN: 17.08 / MAX: 19.12 MIN: 16.27 / MAX: 25.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: alexnet TR 3950X + RTX 2080 Ti 2 3 0.5513 1.1026 1.6539 2.2052 2.7565 SE +/- 0.17, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 2.23 2.32 2.45 MIN: 1.5 / MAX: 17.19 MIN: 1.5 / MAX: 12.93 MIN: 1.51 / MAX: 15.82 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: resnet50 TR 3950X + RTX 2080 Ti 2 3 12 24 36 48 60 SE +/- 0.30, N = 3 SE +/- 0.67, N = 3 SE +/- 0.19, N = 3 54.18 54.09 54.60 MIN: 52.65 / MAX: 62.38 MIN: 52.17 / MAX: 61.67 MIN: 53.73 / MAX: 64.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20200916 Target: Vulkan GPU - Model: yolov4-tiny TR 3950X + RTX 2080 Ti 2 3 2 4 6 8 10 SE +/- 0.17, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 7.25 7.09 7.13 MIN: 6.88 / MAX: 8.47 MIN: 6.89 / MAX: 9.07 MIN: 6.93 / MAX: 8.62 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Phoronix Test Suite v10.8.4