NCNN mit Vulkan

AMD Ryzen 9 3950X 16-Core testing with a ASUS ROG CROSSHAIR VIII HERO (WI-FI) (1302 BIOS) and NVIDIA GeForce RTX 2080 Ti 11GB on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2009242-FI-NCNNMITVU25.

NCNN mit VulkanProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionTR 3950X + RTX 2080 Ti23AMD Ryzen 9 3950X 16-Core @ 3.50GHz (16 Cores / 32 Threads)ASUS ROG CROSSHAIR VIII HERO (WI-FI) (1302 BIOS)AMD Starship/Matisse16GB2000GB Corsair Force MP600 + 2000GBNVIDIA GeForce RTX 2080 Ti 11GB (1350/7000MHz)NVIDIA TU102 HD AudioDELL P2415QRealtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200Ubuntu 20.045.4.0-47-generic (x86_64)GNOME Shell 3.36.4X Server 1.20.8NVIDIA 450.664.6.0OpenCL 1.2 CUDA 11.0.228 + OpenCL 2.0 AMD-APP (3182.0)1.2.133GCC 9.3.0 + CUDA 11.0ext43840x2160OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8701013Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

NCNN mit Vulkanncnn: CPU - squeezenetncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPU - yolov4-tinyncnn: Vulkan GPU - squeezenetncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyTR 3950X + RTX 2080 Ti2327.6117.115.835.195.045.237.031.9835.0091.8917.3916.7453.8528.8528.164.102.341.601.261.422.520.5935.1787.1717.312.2354.187.2526.7216.685.825.235.045.237.131.9935.2085.9517.3416.7953.5528.6228.024.092.271.611.261.422.530.5935.2986.5117.412.3254.097.0926.4216.826.005.315.065.357.252.0035.1685.9717.3916.8153.2228.8328.534.102.411.601.271.422.520.5835.1687.1317.652.4554.607.13OpenBenchmarking.org

NCNN

Target: CPU - Model: squeezenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: squeezenetTR 3950X + RTX 2080 Ti23612182430SE +/- 0.26, N = 3SE +/- 0.42, N = 3SE +/- 0.13, N = 327.6126.7226.42MIN: 26.1 / MAX: 36.11MIN: 25.12 / MAX: 36.54MIN: 24.93 / MAX: 27.321. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: mobilenetTR 3950X + RTX 2080 Ti2348121620SE +/- 0.14, N = 3SE +/- 0.20, N = 3SE +/- 0.26, N = 317.1116.6816.82MIN: 16.65 / MAX: 19.36MIN: 16.11 / MAX: 17.62MIN: 16.26 / MAX: 24.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU-v2-v2 - Model: mobilenet-v2TR 3950X + RTX 2080 Ti23246810SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.07, N = 35.835.826.00MIN: 5.66 / MAX: 6.57MIN: 5.59 / MAX: 8.08MIN: 5.77 / MAX: 7.881. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU-v3-v3 - Model: mobilenet-v3TR 3950X + RTX 2080 Ti231.19482.38963.58444.77925.974SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.09, N = 35.195.235.31MIN: 5.08 / MAX: 5.38MIN: 5.11 / MAX: 7.07MIN: 5.15 / MAX: 14.951. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: shufflenet-v2TR 3950X + RTX 2080 Ti231.13852.2773.41554.5545.6925SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.04, N = 35.045.045.06MIN: 4.89 / MAX: 8.78MIN: 4.93 / MAX: 5.44MIN: 4.95 / MAX: 5.771. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: mnasnetTR 3950X + RTX 2080 Ti231.20382.40763.61144.81526.019SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 35.235.235.35MIN: 5.12 / MAX: 6.1MIN: 5.11 / MAX: 5.46MIN: 5.18 / MAX: 5.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: efficientnet-b0TR 3950X + RTX 2080 Ti23246810SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.10, N = 37.037.137.25MIN: 6.91 / MAX: 7.22MIN: 7.03 / MAX: 7.74MIN: 6.97 / MAX: 7.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: blazefaceTR 3950X + RTX 2080 Ti230.450.91.351.82.25SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 31.981.992.00MIN: 1.94 / MAX: 2.12MIN: 1.93 / MAX: 2.11MIN: 1.95 / MAX: 2.641. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: googlenetTR 3950X + RTX 2080 Ti23816243240SE +/- 0.06, N = 3SE +/- 0.17, N = 3SE +/- 0.08, N = 335.0035.2035.16MIN: 34.18 / MAX: 35.87MIN: 34.38 / MAX: 44.97MIN: 34.6 / MAX: 36.581. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: vgg16TR 3950X + RTX 2080 Ti2320406080100SE +/- 6.10, N = 3SE +/- 0.12, N = 3SE +/- 0.12, N = 391.8985.9585.97MIN: 84.91 / MAX: 1168.4MIN: 84.99 / MAX: 100.08MIN: 85.07 / MAX: 95.891. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: resnet18TR 3950X + RTX 2080 Ti2348121620SE +/- 0.10, N = 3SE +/- 0.06, N = 3SE +/- 0.02, N = 317.3917.3417.39MIN: 16.97 / MAX: 26.26MIN: 16.85 / MAX: 27.19MIN: 17.15 / MAX: 17.621. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: alexnetTR 3950X + RTX 2080 Ti2348121620SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.09, N = 316.7416.7916.81MIN: 16.53 / MAX: 17.35MIN: 16.6 / MAX: 17.47MIN: 16.5 / MAX: 18.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: resnet50TR 3950X + RTX 2080 Ti231224364860SE +/- 0.48, N = 3SE +/- 0.23, N = 3SE +/- 0.07, N = 353.8553.5553.22MIN: 52.74 / MAX: 64.3MIN: 52.67 / MAX: 62.67MIN: 52.57 / MAX: 53.91. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: yolov4-tinyTR 3950X + RTX 2080 Ti23714212835SE +/- 0.13, N = 3SE +/- 0.09, N = 3SE +/- 0.16, N = 328.8528.6228.83MIN: 28.43 / MAX: 29.45MIN: 28.24 / MAX: 37.65MIN: 28.3 / MAX: 35.971. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: squeezenetTR 3950X + RTX 2080 Ti23714212835SE +/- 0.26, N = 3SE +/- 0.42, N = 3SE +/- 0.43, N = 328.1628.0228.53MIN: 26.66 / MAX: 29.72MIN: 26.5 / MAX: 33.23MIN: 26.56 / MAX: 30.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: mobilenetTR 3950X + RTX 2080 Ti230.92251.8452.76753.694.6125SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 34.104.094.10MIN: 4.03 / MAX: 7.22MIN: 4.05 / MAX: 4.84MIN: 4.06 / MAX: 4.861. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2TR 3950X + RTX 2080 Ti230.54231.08461.62692.16922.7115SE +/- 0.01, N = 3SE +/- 0.08, N = 3SE +/- 0.03, N = 32.342.272.41MIN: 1.34 / MAX: 15.98MIN: 1.34 / MAX: 23.06MIN: 1.35 / MAX: 15.971. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3TR 3950X + RTX 2080 Ti230.36230.72461.08691.44921.8115SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.601.611.60MIN: 1.58 / MAX: 1.74MIN: 1.59 / MAX: 2.76MIN: 1.58 / MAX: 2.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: shufflenet-v2TR 3950X + RTX 2080 Ti230.28580.57160.85741.14321.429SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 31.261.261.27MIN: 1.25 / MAX: 1.29MIN: 1.25 / MAX: 1.3MIN: 1.25 / MAX: 2.121. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: mnasnetTR 3950X + RTX 2080 Ti230.31950.6390.95851.2781.5975SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.421.421.42MIN: 1.4 / MAX: 1.51MIN: 1.41 / MAX: 2.58MIN: 1.41 / MAX: 1.881. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: efficientnet-b0TR 3950X + RTX 2080 Ti230.56931.13861.70792.27722.8465SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 32.522.532.52MIN: 2.51 / MAX: 2.55MIN: 2.51 / MAX: 2.65MIN: 2.51 / MAX: 2.791. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: blazefaceTR 3950X + RTX 2080 Ti230.13280.26560.39840.53120.664SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 30.590.590.58MIN: 0.56 / MAX: 5.35MIN: 0.56 / MAX: 8.6MIN: 0.56 / MAX: 4.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: googlenetTR 3950X + RTX 2080 Ti23816243240SE +/- 0.21, N = 3SE +/- 0.17, N = 3SE +/- 0.18, N = 335.1735.2935.16MIN: 34.11 / MAX: 44.36MIN: 34.45 / MAX: 43.71MIN: 34.22 / MAX: 43.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: vgg16TR 3950X + RTX 2080 Ti2320406080100SE +/- 0.90, N = 3SE +/- 0.38, N = 3SE +/- 0.25, N = 387.1786.5187.13MIN: 84.9 / MAX: 105.29MIN: 85.12 / MAX: 96.36MIN: 85.52 / MAX: 100.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: resnet18TR 3950X + RTX 2080 Ti2348121620SE +/- 0.12, N = 3SE +/- 0.12, N = 3SE +/- 0.20, N = 317.3117.4117.65MIN: 15.73 / MAX: 18.73MIN: 17.08 / MAX: 19.12MIN: 16.27 / MAX: 25.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: alexnetTR 3950X + RTX 2080 Ti230.55131.10261.65392.20522.7565SE +/- 0.17, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 32.232.322.45MIN: 1.5 / MAX: 17.19MIN: 1.5 / MAX: 12.93MIN: 1.51 / MAX: 15.821. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: resnet50TR 3950X + RTX 2080 Ti231224364860SE +/- 0.30, N = 3SE +/- 0.67, N = 3SE +/- 0.19, N = 354.1854.0954.60MIN: 52.65 / MAX: 62.38MIN: 52.17 / MAX: 61.67MIN: 53.73 / MAX: 64.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: yolov4-tinyTR 3950X + RTX 2080 Ti23246810SE +/- 0.17, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 37.257.097.13MIN: 6.88 / MAX: 8.47MIN: 6.89 / MAX: 9.07MIN: 6.93 / MAX: 8.621. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread


Phoronix Test Suite v10.8.4