NCNN mit Vulkan

AMD Ryzen 9 3950X 16-Core testing with a ASUS ROG CROSSHAIR VIII HERO (WI-FI) (1302 BIOS) and NVIDIA GeForce RTX 2080 Ti 11GB on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2009242-FI-NCNNMITVU25&grt&sro.

NCNN mit VulkanProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionTR 3950X + RTX 2080 Ti23AMD Ryzen 9 3950X 16-Core @ 3.50GHz (16 Cores / 32 Threads)ASUS ROG CROSSHAIR VIII HERO (WI-FI) (1302 BIOS)AMD Starship/Matisse16GB2000GB Corsair Force MP600 + 2000GBNVIDIA GeForce RTX 2080 Ti 11GB (1350/7000MHz)NVIDIA TU102 HD AudioDELL P2415QRealtek RTL8125 2.5GbE + Intel I211 + Intel Wi-Fi 6 AX200Ubuntu 20.045.4.0-47-generic (x86_64)GNOME Shell 3.36.4X Server 1.20.8NVIDIA 450.664.6.0OpenCL 1.2 CUDA 11.0.228 + OpenCL 2.0 AMD-APP (3182.0)1.2.133GCC 9.3.0 + CUDA 11.0ext43840x2160OpenBenchmarking.orgCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8701013Security Details- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

NCNN mit Vulkanncnn: CPU - squeezenetncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPU - yolov4-tinyncnn: Vulkan GPU - squeezenetncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyTR 3950X + RTX 2080 Ti2327.6117.115.835.195.045.237.031.9835.0091.8917.3916.7453.8528.8528.164.102.341.601.261.422.520.5935.1787.1717.312.2354.187.2526.7216.685.825.235.045.237.131.9935.2085.9517.3416.7953.5528.6228.024.092.271.611.261.422.530.5935.2986.5117.412.3254.097.0926.4216.826.005.315.065.357.252.0035.1685.9717.3916.8153.2228.8328.534.102.411.601.271.422.520.5835.1687.1317.652.4554.607.13OpenBenchmarking.org

NCNN

Target: CPU - Model: squeezenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: squeezenet23TR 3950X + RTX 2080 Ti612182430SE +/- 0.42, N = 3SE +/- 0.13, N = 3SE +/- 0.26, N = 326.7226.4227.61MIN: 25.12 / MAX: 36.54MIN: 24.93 / MAX: 27.32MIN: 26.1 / MAX: 36.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: mobilenet23TR 3950X + RTX 2080 Ti48121620SE +/- 0.20, N = 3SE +/- 0.26, N = 3SE +/- 0.14, N = 316.6816.8217.11MIN: 16.11 / MAX: 17.62MIN: 16.26 / MAX: 24.63MIN: 16.65 / MAX: 19.361. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU-v2-v2 - Model: mobilenet-v223TR 3950X + RTX 2080 Ti246810SE +/- 0.06, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 35.826.005.83MIN: 5.59 / MAX: 8.08MIN: 5.77 / MAX: 7.88MIN: 5.66 / MAX: 6.571. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU-v3-v3 - Model: mobilenet-v323TR 3950X + RTX 2080 Ti1.19482.38963.58444.77925.974SE +/- 0.01, N = 3SE +/- 0.09, N = 3SE +/- 0.02, N = 35.235.315.19MIN: 5.11 / MAX: 7.07MIN: 5.15 / MAX: 14.95MIN: 5.08 / MAX: 5.381. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: shufflenet-v223TR 3950X + RTX 2080 Ti1.13852.2773.41554.5545.6925SE +/- 0.04, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 35.045.065.04MIN: 4.93 / MAX: 5.44MIN: 4.95 / MAX: 5.77MIN: 4.89 / MAX: 8.781. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: mnasnet23TR 3950X + RTX 2080 Ti1.20382.40763.61144.81526.019SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 35.235.355.23MIN: 5.11 / MAX: 5.46MIN: 5.18 / MAX: 5.7MIN: 5.12 / MAX: 6.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: efficientnet-b023TR 3950X + RTX 2080 Ti246810SE +/- 0.00, N = 3SE +/- 0.10, N = 3SE +/- 0.03, N = 37.137.257.03MIN: 7.03 / MAX: 7.74MIN: 6.97 / MAX: 7.68MIN: 6.91 / MAX: 7.221. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: blazeface23TR 3950X + RTX 2080 Ti0.450.91.351.82.25SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 31.992.001.98MIN: 1.93 / MAX: 2.11MIN: 1.95 / MAX: 2.64MIN: 1.94 / MAX: 2.121. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: googlenet23TR 3950X + RTX 2080 Ti816243240SE +/- 0.17, N = 3SE +/- 0.08, N = 3SE +/- 0.06, N = 335.2035.1635.00MIN: 34.38 / MAX: 44.97MIN: 34.6 / MAX: 36.58MIN: 34.18 / MAX: 35.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: vgg1623TR 3950X + RTX 2080 Ti20406080100SE +/- 0.12, N = 3SE +/- 0.12, N = 3SE +/- 6.10, N = 385.9585.9791.89MIN: 84.99 / MAX: 100.08MIN: 85.07 / MAX: 95.89MIN: 84.91 / MAX: 1168.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: resnet1823TR 3950X + RTX 2080 Ti48121620SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.10, N = 317.3417.3917.39MIN: 16.85 / MAX: 27.19MIN: 17.15 / MAX: 17.62MIN: 16.97 / MAX: 26.261. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: alexnet23TR 3950X + RTX 2080 Ti48121620SE +/- 0.02, N = 3SE +/- 0.09, N = 3SE +/- 0.04, N = 316.7916.8116.74MIN: 16.6 / MAX: 17.47MIN: 16.5 / MAX: 18.96MIN: 16.53 / MAX: 17.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: resnet5023TR 3950X + RTX 2080 Ti1224364860SE +/- 0.23, N = 3SE +/- 0.07, N = 3SE +/- 0.48, N = 353.5553.2253.85MIN: 52.67 / MAX: 62.67MIN: 52.57 / MAX: 53.9MIN: 52.74 / MAX: 64.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: CPU - Model: yolov4-tiny23TR 3950X + RTX 2080 Ti714212835SE +/- 0.09, N = 3SE +/- 0.16, N = 3SE +/- 0.13, N = 328.6228.8328.85MIN: 28.24 / MAX: 37.65MIN: 28.3 / MAX: 35.97MIN: 28.43 / MAX: 29.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: squeezenet23TR 3950X + RTX 2080 Ti714212835SE +/- 0.42, N = 3SE +/- 0.43, N = 3SE +/- 0.26, N = 328.0228.5328.16MIN: 26.5 / MAX: 33.23MIN: 26.56 / MAX: 30.3MIN: 26.66 / MAX: 29.721. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: mobilenet23TR 3950X + RTX 2080 Ti0.92251.8452.76753.694.6125SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 34.094.104.10MIN: 4.05 / MAX: 4.84MIN: 4.06 / MAX: 4.86MIN: 4.03 / MAX: 7.221. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU-v2-v2 - Model: mobilenet-v223TR 3950X + RTX 2080 Ti0.54231.08461.62692.16922.7115SE +/- 0.08, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 32.272.412.34MIN: 1.34 / MAX: 23.06MIN: 1.35 / MAX: 15.97MIN: 1.34 / MAX: 15.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU-v3-v3 - Model: mobilenet-v323TR 3950X + RTX 2080 Ti0.36230.72461.08691.44921.8115SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 31.611.601.60MIN: 1.59 / MAX: 2.76MIN: 1.58 / MAX: 2.81MIN: 1.58 / MAX: 1.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: shufflenet-v223TR 3950X + RTX 2080 Ti0.28580.57160.85741.14321.429SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 31.261.271.26MIN: 1.25 / MAX: 1.3MIN: 1.25 / MAX: 2.12MIN: 1.25 / MAX: 1.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: mnasnet23TR 3950X + RTX 2080 Ti0.31950.6390.95851.2781.5975SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.421.421.42MIN: 1.41 / MAX: 2.58MIN: 1.41 / MAX: 1.88MIN: 1.4 / MAX: 1.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: efficientnet-b023TR 3950X + RTX 2080 Ti0.56931.13861.70792.27722.8465SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 32.532.522.52MIN: 2.51 / MAX: 2.65MIN: 2.51 / MAX: 2.79MIN: 2.51 / MAX: 2.551. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: blazeface23TR 3950X + RTX 2080 Ti0.13280.26560.39840.53120.664SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 30.590.580.59MIN: 0.56 / MAX: 8.6MIN: 0.56 / MAX: 4.59MIN: 0.56 / MAX: 5.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: googlenet23TR 3950X + RTX 2080 Ti816243240SE +/- 0.17, N = 3SE +/- 0.18, N = 3SE +/- 0.21, N = 335.2935.1635.17MIN: 34.45 / MAX: 43.71MIN: 34.22 / MAX: 43.96MIN: 34.11 / MAX: 44.361. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: vgg1623TR 3950X + RTX 2080 Ti20406080100SE +/- 0.38, N = 3SE +/- 0.25, N = 3SE +/- 0.90, N = 386.5187.1387.17MIN: 85.12 / MAX: 96.36MIN: 85.52 / MAX: 100.74MIN: 84.9 / MAX: 105.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: resnet1823TR 3950X + RTX 2080 Ti48121620SE +/- 0.12, N = 3SE +/- 0.20, N = 3SE +/- 0.12, N = 317.4117.6517.31MIN: 17.08 / MAX: 19.12MIN: 16.27 / MAX: 25.81MIN: 15.73 / MAX: 18.731. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: alexnet23TR 3950X + RTX 2080 Ti0.55131.10261.65392.20522.7565SE +/- 0.07, N = 3SE +/- 0.06, N = 3SE +/- 0.17, N = 32.322.452.23MIN: 1.5 / MAX: 12.93MIN: 1.51 / MAX: 15.82MIN: 1.5 / MAX: 17.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: resnet5023TR 3950X + RTX 2080 Ti1224364860SE +/- 0.67, N = 3SE +/- 0.19, N = 3SE +/- 0.30, N = 354.0954.6054.18MIN: 52.17 / MAX: 61.67MIN: 53.73 / MAX: 64.51MIN: 52.65 / MAX: 62.381. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20200916Target: Vulkan GPU - Model: yolov4-tiny23TR 3950X + RTX 2080 Ti246810SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.17, N = 37.097.137.25MIN: 6.89 / MAX: 9.07MIN: 6.93 / MAX: 8.62MIN: 6.88 / MAX: 8.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread


Phoronix Test Suite v10.8.4