ncnn llama

Intel Core Ultra 9 285K testing with a ASUS ROG MAXIMUS Z890 HERO (1203 BIOS) and ASUS AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412290-NE-NCNNLLAMA46&grw&sor.

ncnn llamaProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLCompilerFile-SystemScreen ResolutionabcdIntel Core Ultra 9 285K @ 5.10GHz (24 Cores)ASUS ROG MAXIMUS Z890 HERO (1203 BIOS)Intel Device ae7f2 x 16GB DDR5-6400MT/s Micron CP16G64C38U5B.M8D1Western Digital WD_BLACK SN850X 1000GB + 4001GB Western Digital WD_BLACK SN850X 4000GBASUS AMD Radeon RX 7900 XTX 24GBIntel Device 7f50ASUS VP28URealtek Device 8126 + Intel I226-V + Intel Wi-Fi 7Ubuntu 24.106.11.0-13-generic (x86_64)GNOME Shell 47.0X Server + Wayland4.6 Mesa 25.0~git2412210600.83a7d9~oibaf~o (git-83a7d9a 2024-12-21 oracular-oibaf-pp (LLVM 19.1.1 DRM 3.58)GCC 14.2.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x114 - Thermald 2.5.8Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ncnn llamallama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048ncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: CPU - vision_transformerncnn: CPU - FastestDetncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetabcd8.6950.9147.1946.328.9051.2047.4046.5331.17112.0697.9587.3071.2344.825.1715.9039.7685.607.7050.4842.2220.3418.7855.7071.2343.5679.12164.12104.3562.6671.7554.485.3019.2830.9682.4011.1347.3642.0220.8317.2348.2871.7544.4777.93189.21104.8854.208.850.6547.346.449.1951.6247.5146.5429.36111.6997.7887.5370.3355.689.535.4745.2573.647.3156.8141.2818.6518.3859.7270.3343.7181.94169.79103.9257.8671.4751.2714.2631.2251.2979.078.8354.3741.9917.418.7552.8871.4743.9378.04196.2101.4954.478.8251.1247.2646.269.1551.3547.5546.5832.44113.2998.1186.9369.5748.026.3113.8620.484.8818.6947.1241.9216.4417.8653.6569.5744.1778.21147.09107.0670.670.149.185.9131.5138.5375.9217.3553.6841.9923.8818.4148.9370.143.5179.2157.04102.9743.488.9850.8647.3969.7245.054.1511.9919.2485.1410.6547.7942.4523.5217.6462.0569.7244.7383.35134.37105.1245.5870.6150.54.148.2234.9865.046.1250.0541.9417.2716.6749.8470.6144.6179.67132.19103.9267.85OpenBenchmarking.org

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128dcba3691215SE +/- 0.06, N = 38.988.828.808.691. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512cadb1224364860SE +/- 0.11, N = 351.1250.9150.8650.651. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024dbca1122334455SE +/- 0.05, N = 347.3947.3047.2647.191. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048bac1122334455SE +/- 0.07, N = 346.4446.3246.261. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128bca3691215SE +/- 0.09, N = 39.199.158.901. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512bca1224364860SE +/- 0.11, N = 351.6251.3551.201. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024cba1122334455SE +/- 0.04, N = 347.5547.5147.401. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048cba1122334455SE +/- 0.05, N = 346.5846.5446.531. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128cab816243240SE +/- 0.39, N = 332.4431.1729.361. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512cab306090120150SE +/- 0.35, N = 3113.29112.06111.691. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024cab20406080100SE +/- 0.16, N = 398.1197.9597.781. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048bac20406080100SE +/- 0.10, N = 387.5387.3086.931. (CXX) g++ options: -O3

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mobilenetcdba1632486480SE +/- 0.62, N = 369.5769.7270.3371.23MIN: 9.08 / MAX: 80.55MIN: 9.13 / MAX: 83.05MIN: 11.53 / MAX: 80.88MIN: 11.21 / MAX: 82.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v2-v2 - Model: mobilenet-v2adcb1326395265SE +/- 6.03, N = 344.8245.0548.0255.68MIN: 3.84 / MAX: 68.4MIN: 3.75 / MAX: 68.38MIN: 3.75 / MAX: 68.62MIN: 3.78 / MAX: 68.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v3-v3 - Model: mobilenet-v3dacb3691215SE +/- 1.04, N = 34.155.176.319.53MIN: 3.98 / MAX: 4.5MIN: 3.96 / MAX: 78.99MIN: 4.01 / MAX: 79.64MIN: 4.04 / MAX: 791. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: shufflenet-v2bdca48121620SE +/- 3.47, N = 35.4711.9913.8615.90MIN: 3.86 / MAX: 74.99MIN: 3.84 / MAX: 74.44MIN: 3.84 / MAX: 76.25MIN: 3.82 / MAX: 75.541. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mnasnetdcab1020304050SE +/- 1.69, N = 319.2420.4039.7645.25MIN: 3.59 / MAX: 65.09MIN: 3.6 / MAX: 66.21MIN: 3.6 / MAX: 67.4MIN: 3.72 / MAX: 67.371. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: efficientnet-b0bcda20406080100SE +/- 3.03, N = 373.6484.8885.1485.60MIN: 6.25 / MAX: 115.94MIN: 6.28 / MAX: 113.65MIN: 6.37 / MAX: 116.23MIN: 6.21 / MAX: 115.161. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: blazefacebadc510152025SE +/- 2.18, N = 37.317.7010.6518.69MIN: 2.35 / MAX: 51.61MIN: 2.32 / MAX: 53.44MIN: 2.33 / MAX: 52.36MIN: 2.34 / MAX: 52.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: googlenetcdab1326395265SE +/- 1.33, N = 347.1247.7950.4856.81MIN: 7.45 / MAX: 103.26MIN: 7.54 / MAX: 105.7MIN: 7.39 / MAX: 105.51MIN: 7.49 / MAX: 102.721. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vgg16bcad1020304050SE +/- 0.30, N = 341.2841.9242.2242.45MIN: 24.94 / MAX: 47.12MIN: 24.97 / MAX: 47.79MIN: 24.2 / MAX: 47.3MIN: 25.96 / MAX: 47.911. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet18cbad612182430SE +/- 2.01, N = 316.4418.6520.3423.52MIN: 4.42 / MAX: 44.68MIN: 4.47 / MAX: 45.11MIN: 4.45 / MAX: 44.84MIN: 4.46 / MAX: 46.341. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: alexnetdcba510152025SE +/- 0.49, N = 317.6417.8618.3818.78MIN: 3.21 / MAX: 22.72MIN: 3.21 / MAX: 22.48MIN: 3.18 / MAX: 22.74MIN: 3.18 / MAX: 22.691. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet50cabd1428425670SE +/- 1.70, N = 353.6555.7059.7262.05MIN: 9.97 / MAX: 91.49MIN: 9.9 / MAX: 93.55MIN: 9.94 / MAX: 94MIN: 9.98 / MAX: 90.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3cdba1632486480SE +/- 0.62, N = 369.5769.7270.3371.23MIN: 9.08 / MAX: 80.55MIN: 9.13 / MAX: 83.05MIN: 11.53 / MAX: 80.88MIN: 11.21 / MAX: 82.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: yolov4-tinyabcd1020304050SE +/- 0.23, N = 343.5643.7144.1744.73MIN: 14.43 / MAX: 49.75MIN: 19.46 / MAX: 49.45MIN: 15.49 / MAX: 50.39MIN: 16.6 / MAX: 50.311. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: squeezenet_ssdcabd20406080100SE +/- 3.12, N = 378.2179.1281.9483.35MIN: 7.27 / MAX: 101.31MIN: 7.22 / MAX: 103.38MIN: 7.23 / MAX: 102.07MIN: 7.34 / MAX: 103.91. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: regnety_400mdcab4080120160200SE +/- 6.08, N = 3134.37147.09164.12169.79MIN: 21.58 / MAX: 473.03MIN: 21.39 / MAX: 477.74MIN: 21.38 / MAX: 476.7MIN: 21.74 / MAX: 479.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vision_transformerbadc20406080100SE +/- 0.22, N = 3103.92104.35105.12107.06MIN: 44.99 / MAX: 116.24MIN: 41.42 / MAX: 116.07MIN: 42.41 / MAX: 117.29MIN: 40.18 / MAX: 117.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: FastestDetdbac1632486480SE +/- 4.55, N = 345.5857.8662.6670.60MIN: 5.02 / MAX: 96.81MIN: 5.06 / MAX: 96.39MIN: 4.99 / MAX: 96.55MIN: 5.07 / MAX: 96.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mobilenetcdba1632486480SE +/- 0.34, N = 370.1070.6171.4771.75MIN: 11.58 / MAX: 81.3MIN: 13.63 / MAX: 80.56MIN: 15.45 / MAX: 79.59MIN: 11.94 / MAX: 80.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2cdba1224364860SE +/- 1.36, N = 349.1850.5051.2754.48MIN: 3.97 / MAX: 68.88MIN: 3.77 / MAX: 68.22MIN: 3.9 / MAX: 68.67MIN: 3.8 / MAX: 68.761. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3dacb48121620SE +/- 1.14, N = 34.145.305.9114.26MIN: 3.96 / MAX: 4.43MIN: 3.99 / MAX: 80.9MIN: 4 / MAX: 77.51MIN: 4.02 / MAX: 79.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: shufflenet-v2dabc714212835SE +/- 4.90, N = 38.2219.2831.2231.51MIN: 3.81 / MAX: 75.54MIN: 3.83 / MAX: 76.29MIN: 3.84 / MAX: 76.79MIN: 3.86 / MAX: 75.581. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mnasnetadcb1224364860SE +/- 3.31, N = 330.9634.9838.5351.29MIN: 3.59 / MAX: 67.26MIN: 3.62 / MAX: 66.99MIN: 3.61 / MAX: 67.69MIN: 3.6 / MAX: 66.321. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: efficientnet-b0dcba20406080100SE +/- 3.96, N = 365.0475.9279.0782.40MIN: 6.22 / MAX: 113.74MIN: 6.25 / MAX: 113.38MIN: 6.34 / MAX: 115.89MIN: 6.21 / MAX: 115.661. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: blazefacedbac48121620SE +/- 4.01, N = 36.128.8311.1317.35MIN: 2.33 / MAX: 52.84MIN: 2.35 / MAX: 52.07MIN: 2.33 / MAX: 53.38MIN: 2.34 / MAX: 53.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: googlenetadcb1224364860SE +/- 3.10, N = 347.3650.0553.6854.37MIN: 7.41 / MAX: 105.78MIN: 7.41 / MAX: 102.05MIN: 7.42 / MAX: 108.36MIN: 7.48 / MAX: 102.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vgg16dbca1020304050SE +/- 0.05, N = 341.9441.9941.9942.02MIN: 25.14 / MAX: 47.3MIN: 25.33 / MAX: 46.26MIN: 25.22 / MAX: 47.18MIN: 24.02 / MAX: 47.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet18dbac612182430SE +/- 0.63, N = 317.2717.4020.8323.88MIN: 4.45 / MAX: 45.45MIN: 4.47 / MAX: 44.79MIN: 4.44 / MAX: 45.19MIN: 4.49 / MAX: 44.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: alexnetdacb510152025SE +/- 1.03, N = 316.6717.2318.4118.75MIN: 3.2 / MAX: 22.1MIN: 3.2 / MAX: 22.74MIN: 3.21 / MAX: 22.59MIN: 3.27 / MAX: 23.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet50acdb1224364860SE +/- 1.61, N = 348.2848.9349.8452.88MIN: 9.92 / MAX: 92.21MIN: 10 / MAX: 90.33MIN: 9.95 / MAX: 89.78MIN: 10 / MAX: 91.721. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3cdba1632486480SE +/- 0.34, N = 370.1070.6171.4771.75MIN: 11.58 / MAX: 81.3MIN: 13.63 / MAX: 80.56MIN: 15.45 / MAX: 79.59MIN: 11.94 / MAX: 80.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: yolov4-tinycbad1020304050SE +/- 0.19, N = 343.5143.9344.4744.61MIN: 13.73 / MAX: 48.78MIN: 12.4 / MAX: 49.68MIN: 17.29 / MAX: 49.86MIN: 14.39 / MAX: 49.431. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: squeezenet_ssdabcd20406080100SE +/- 1.69, N = 377.9378.0479.2079.67MIN: 7.16 / MAX: 103.09MIN: 7.28 / MAX: 103.38MIN: 7.17 / MAX: 101.7MIN: 7.3 / MAX: 101.711. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: regnety_400mdcab4080120160200SE +/- 3.06, N = 3132.19157.04189.21196.20MIN: 21.33 / MAX: 480.09MIN: 21.6 / MAX: 479.14MIN: 21.44 / MAX: 478.58MIN: 21.65 / MAX: 485.471. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vision_transformerbcda20406080100SE +/- 0.66, N = 3101.49102.97103.92104.88MIN: 40.32 / MAX: 115.58MIN: 40.99 / MAX: 116.63MIN: 40.94 / MAX: 117.51MIN: 39.82 / MAX: 118.731. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: FastestDetcabd1530456075SE +/- 3.57, N = 343.4854.2054.4767.85MIN: 5.02 / MAX: 96.02MIN: 5.02 / MAX: 95.65MIN: 5.06 / MAX: 96.49MIN: 5.04 / MAX: 96.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread


Phoronix Test Suite v10.8.5