ncnn llama Intel Core Ultra 9 285K testing with a ASUS ROG MAXIMUS Z890 HERO (1203 BIOS) and ASUS AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412290-NE-NCNNLLAMA46 .
ncnn llama Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d Intel Core Ultra 9 285K @ 5.10GHz (24 Cores) ASUS ROG MAXIMUS Z890 HERO (1203 BIOS) Intel Device ae7f 2 x 16GB DDR5-6400MT/s Micron CP16G64C38U5B.M8D1 Western Digital WD_BLACK SN850X 1000GB + 4001GB Western Digital WD_BLACK SN850X 4000GB ASUS AMD Radeon RX 7900 XTX 24GB Intel Device 7f50 ASUS VP28U Realtek Device 8126 + Intel I226-V + Intel Wi-Fi 7 Ubuntu 24.10 6.11.0-13-generic (x86_64) GNOME Shell 47.0 X Server + Wayland 4.6 Mesa 25.0~git2412210600.83a7d9~oibaf~o (git-83a7d9a 2024-12-21 oracular-oibaf-pp (LLVM 19.1.1 DRM 3.58) GCC 14.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x114 - Thermald 2.5.8 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ncnn llama ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 a b c d 71.23 44.82 5.17 15.90 39.76 85.60 7.70 50.48 42.22 20.34 18.78 55.70 71.23 43.56 79.12 164.12 104.35 62.66 71.75 54.48 5.30 19.28 30.96 82.40 11.13 47.36 42.02 20.83 17.23 48.28 71.75 44.47 77.93 189.21 104.88 54.20 8.69 50.91 47.19 46.32 8.90 51.20 47.40 46.53 31.17 112.06 97.95 87.30 70.33 55.68 9.53 5.47 45.25 73.64 7.31 56.81 41.28 18.65 18.38 59.72 70.33 43.71 81.94 169.79 103.92 57.86 71.47 51.27 14.26 31.22 51.29 79.07 8.83 54.37 41.99 17.4 18.75 52.88 71.47 43.93 78.04 196.2 101.49 54.47 8.8 50.65 47.3 46.44 9.19 51.62 47.51 46.54 29.36 111.69 97.78 87.53 69.57 48.02 6.31 13.86 20.4 84.88 18.69 47.12 41.92 16.44 17.86 53.65 69.57 44.17 78.21 147.09 107.06 70.6 70.1 49.18 5.91 31.51 38.53 75.92 17.35 53.68 41.99 23.88 18.41 48.93 70.1 43.51 79.2 157.04 102.97 43.48 8.82 51.12 47.26 46.26 9.15 51.35 47.55 46.58 32.44 113.29 98.11 86.93 69.72 45.05 4.15 11.99 19.24 85.14 10.65 47.79 42.45 23.52 17.64 62.05 69.72 44.73 83.35 134.37 105.12 45.58 70.61 50.5 4.14 8.22 34.98 65.04 6.12 50.05 41.94 17.27 16.67 49.84 70.61 44.61 79.67 132.19 103.92 67.85 8.98 50.86 47.39 OpenBenchmarking.org
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mobilenet a b c d 16 32 48 64 80 SE +/- 0.62, N = 3 71.23 70.33 69.57 69.72 MIN: 11.21 / MAX: 82.45 MIN: 11.53 / MAX: 80.88 MIN: 9.08 / MAX: 80.55 MIN: 9.13 / MAX: 83.05 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v2-v2 - Model: mobilenet-v2 a b c d 13 26 39 52 65 SE +/- 6.03, N = 3 44.82 55.68 48.02 45.05 MIN: 3.84 / MAX: 68.4 MIN: 3.78 / MAX: 68.51 MIN: 3.75 / MAX: 68.62 MIN: 3.75 / MAX: 68.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v3-v3 - Model: mobilenet-v3 a b c d 3 6 9 12 15 SE +/- 1.04, N = 3 5.17 9.53 6.31 4.15 MIN: 3.96 / MAX: 78.99 MIN: 4.04 / MAX: 79 MIN: 4.01 / MAX: 79.64 MIN: 3.98 / MAX: 4.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: shufflenet-v2 a b c d 4 8 12 16 20 SE +/- 3.47, N = 3 15.90 5.47 13.86 11.99 MIN: 3.82 / MAX: 75.54 MIN: 3.86 / MAX: 74.99 MIN: 3.84 / MAX: 76.25 MIN: 3.84 / MAX: 74.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mnasnet a b c d 10 20 30 40 50 SE +/- 1.69, N = 3 39.76 45.25 20.40 19.24 MIN: 3.6 / MAX: 67.4 MIN: 3.72 / MAX: 67.37 MIN: 3.6 / MAX: 66.21 MIN: 3.59 / MAX: 65.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: efficientnet-b0 a b c d 20 40 60 80 100 SE +/- 3.03, N = 3 85.60 73.64 84.88 85.14 MIN: 6.21 / MAX: 115.16 MIN: 6.25 / MAX: 115.94 MIN: 6.28 / MAX: 113.65 MIN: 6.37 / MAX: 116.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: blazeface a b c d 5 10 15 20 25 SE +/- 2.18, N = 3 7.70 7.31 18.69 10.65 MIN: 2.32 / MAX: 53.44 MIN: 2.35 / MAX: 51.61 MIN: 2.34 / MAX: 52.3 MIN: 2.33 / MAX: 52.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: googlenet a b c d 13 26 39 52 65 SE +/- 1.33, N = 3 50.48 56.81 47.12 47.79 MIN: 7.39 / MAX: 105.51 MIN: 7.49 / MAX: 102.72 MIN: 7.45 / MAX: 103.26 MIN: 7.54 / MAX: 105.7 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vgg16 a b c d 10 20 30 40 50 SE +/- 0.30, N = 3 42.22 41.28 41.92 42.45 MIN: 24.2 / MAX: 47.3 MIN: 24.94 / MAX: 47.12 MIN: 24.97 / MAX: 47.79 MIN: 25.96 / MAX: 47.91 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet18 a b c d 6 12 18 24 30 SE +/- 2.01, N = 3 20.34 18.65 16.44 23.52 MIN: 4.45 / MAX: 44.84 MIN: 4.47 / MAX: 45.11 MIN: 4.42 / MAX: 44.68 MIN: 4.46 / MAX: 46.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: alexnet a b c d 5 10 15 20 25 SE +/- 0.49, N = 3 18.78 18.38 17.86 17.64 MIN: 3.18 / MAX: 22.69 MIN: 3.18 / MAX: 22.74 MIN: 3.21 / MAX: 22.48 MIN: 3.21 / MAX: 22.72 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet50 a b c d 14 28 42 56 70 SE +/- 1.70, N = 3 55.70 59.72 53.65 62.05 MIN: 9.9 / MAX: 93.55 MIN: 9.94 / MAX: 94 MIN: 9.97 / MAX: 91.49 MIN: 9.98 / MAX: 90.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 16 32 48 64 80 SE +/- 0.62, N = 3 71.23 70.33 69.57 69.72 MIN: 11.21 / MAX: 82.45 MIN: 11.53 / MAX: 80.88 MIN: 9.08 / MAX: 80.55 MIN: 9.13 / MAX: 83.05 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: yolov4-tiny a b c d 10 20 30 40 50 SE +/- 0.23, N = 3 43.56 43.71 44.17 44.73 MIN: 14.43 / MAX: 49.75 MIN: 19.46 / MAX: 49.45 MIN: 15.49 / MAX: 50.39 MIN: 16.6 / MAX: 50.31 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: squeezenet_ssd a b c d 20 40 60 80 100 SE +/- 3.12, N = 3 79.12 81.94 78.21 83.35 MIN: 7.22 / MAX: 103.38 MIN: 7.23 / MAX: 102.07 MIN: 7.27 / MAX: 101.31 MIN: 7.34 / MAX: 103.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: regnety_400m a b c d 40 80 120 160 200 SE +/- 6.08, N = 3 164.12 169.79 147.09 134.37 MIN: 21.38 / MAX: 476.7 MIN: 21.74 / MAX: 479.47 MIN: 21.39 / MAX: 477.74 MIN: 21.58 / MAX: 473.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vision_transformer a b c d 20 40 60 80 100 SE +/- 0.22, N = 3 104.35 103.92 107.06 105.12 MIN: 41.42 / MAX: 116.07 MIN: 44.99 / MAX: 116.24 MIN: 40.18 / MAX: 117.48 MIN: 42.41 / MAX: 117.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: FastestDet a b c d 16 32 48 64 80 SE +/- 4.55, N = 3 62.66 57.86 70.60 45.58 MIN: 4.99 / MAX: 96.55 MIN: 5.06 / MAX: 96.39 MIN: 5.07 / MAX: 96.45 MIN: 5.02 / MAX: 96.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mobilenet a b c d 16 32 48 64 80 SE +/- 0.34, N = 3 71.75 71.47 70.10 70.61 MIN: 11.94 / MAX: 80.56 MIN: 15.45 / MAX: 79.59 MIN: 11.58 / MAX: 81.3 MIN: 13.63 / MAX: 80.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 a b c d 12 24 36 48 60 SE +/- 1.36, N = 3 54.48 51.27 49.18 50.50 MIN: 3.8 / MAX: 68.76 MIN: 3.9 / MAX: 68.67 MIN: 3.97 / MAX: 68.88 MIN: 3.77 / MAX: 68.22 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b c d 4 8 12 16 20 SE +/- 1.14, N = 3 5.30 14.26 5.91 4.14 MIN: 3.99 / MAX: 80.9 MIN: 4.02 / MAX: 79.3 MIN: 4 / MAX: 77.51 MIN: 3.96 / MAX: 4.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: shufflenet-v2 a b c d 7 14 21 28 35 SE +/- 4.90, N = 3 19.28 31.22 31.51 8.22 MIN: 3.83 / MAX: 76.29 MIN: 3.84 / MAX: 76.79 MIN: 3.86 / MAX: 75.58 MIN: 3.81 / MAX: 75.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mnasnet a b c d 12 24 36 48 60 SE +/- 3.31, N = 3 30.96 51.29 38.53 34.98 MIN: 3.59 / MAX: 67.26 MIN: 3.6 / MAX: 66.32 MIN: 3.61 / MAX: 67.69 MIN: 3.62 / MAX: 66.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: efficientnet-b0 a b c d 20 40 60 80 100 SE +/- 3.96, N = 3 82.40 79.07 75.92 65.04 MIN: 6.21 / MAX: 115.66 MIN: 6.34 / MAX: 115.89 MIN: 6.25 / MAX: 113.38 MIN: 6.22 / MAX: 113.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: blazeface a b c d 4 8 12 16 20 SE +/- 4.01, N = 3 11.13 8.83 17.35 6.12 MIN: 2.33 / MAX: 53.38 MIN: 2.35 / MAX: 52.07 MIN: 2.34 / MAX: 53.07 MIN: 2.33 / MAX: 52.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: googlenet a b c d 12 24 36 48 60 SE +/- 3.10, N = 3 47.36 54.37 53.68 50.05 MIN: 7.41 / MAX: 105.78 MIN: 7.48 / MAX: 102.35 MIN: 7.42 / MAX: 108.36 MIN: 7.41 / MAX: 102.05 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vgg16 a b c d 10 20 30 40 50 SE +/- 0.05, N = 3 42.02 41.99 41.99 41.94 MIN: 24.02 / MAX: 47.17 MIN: 25.33 / MAX: 46.26 MIN: 25.22 / MAX: 47.18 MIN: 25.14 / MAX: 47.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet18 a b c d 6 12 18 24 30 SE +/- 0.63, N = 3 20.83 17.40 23.88 17.27 MIN: 4.44 / MAX: 45.19 MIN: 4.47 / MAX: 44.79 MIN: 4.49 / MAX: 44.81 MIN: 4.45 / MAX: 45.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: alexnet a b c d 5 10 15 20 25 SE +/- 1.03, N = 3 17.23 18.75 18.41 16.67 MIN: 3.2 / MAX: 22.74 MIN: 3.27 / MAX: 23.09 MIN: 3.21 / MAX: 22.59 MIN: 3.2 / MAX: 22.1 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet50 a b c d 12 24 36 48 60 SE +/- 1.61, N = 3 48.28 52.88 48.93 49.84 MIN: 9.92 / MAX: 92.21 MIN: 10 / MAX: 91.72 MIN: 10 / MAX: 90.33 MIN: 9.95 / MAX: 89.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 16 32 48 64 80 SE +/- 0.34, N = 3 71.75 71.47 70.10 70.61 MIN: 11.94 / MAX: 80.56 MIN: 15.45 / MAX: 79.59 MIN: 11.58 / MAX: 81.3 MIN: 13.63 / MAX: 80.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: yolov4-tiny a b c d 10 20 30 40 50 SE +/- 0.19, N = 3 44.47 43.93 43.51 44.61 MIN: 17.29 / MAX: 49.86 MIN: 12.4 / MAX: 49.68 MIN: 13.73 / MAX: 48.78 MIN: 14.39 / MAX: 49.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: squeezenet_ssd a b c d 20 40 60 80 100 SE +/- 1.69, N = 3 77.93 78.04 79.20 79.67 MIN: 7.16 / MAX: 103.09 MIN: 7.28 / MAX: 103.38 MIN: 7.17 / MAX: 101.7 MIN: 7.3 / MAX: 101.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: regnety_400m a b c d 40 80 120 160 200 SE +/- 3.06, N = 3 189.21 196.20 157.04 132.19 MIN: 21.44 / MAX: 478.58 MIN: 21.65 / MAX: 485.47 MIN: 21.6 / MAX: 479.14 MIN: 21.33 / MAX: 480.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vision_transformer a b c d 20 40 60 80 100 SE +/- 0.66, N = 3 104.88 101.49 102.97 103.92 MIN: 39.82 / MAX: 118.73 MIN: 40.32 / MAX: 115.58 MIN: 40.99 / MAX: 116.63 MIN: 40.94 / MAX: 117.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: FastestDet a b c d 15 30 45 60 75 SE +/- 3.57, N = 3 54.20 54.47 43.48 67.85 MIN: 5.02 / MAX: 95.65 MIN: 5.06 / MAX: 96.49 MIN: 5.02 / MAX: 96.02 MIN: 5.04 / MAX: 96.21 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b c d 3 6 9 12 15 SE +/- 0.06, N = 3 8.69 8.80 8.82 8.98 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b c d 12 24 36 48 60 SE +/- 0.11, N = 3 50.91 50.65 51.12 50.86 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b c d 11 22 33 44 55 SE +/- 0.05, N = 3 47.19 47.30 47.26 47.39 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b c 11 22 33 44 55 SE +/- 0.07, N = 3 46.32 46.44 46.26 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b c 3 6 9 12 15 SE +/- 0.09, N = 3 8.90 9.19 9.15 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b c 12 24 36 48 60 SE +/- 0.11, N = 3 51.20 51.62 51.35 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b c 11 22 33 44 55 SE +/- 0.04, N = 3 47.40 47.51 47.55 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b c 11 22 33 44 55 SE +/- 0.05, N = 3 46.53 46.54 46.58 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b c 8 16 24 32 40 SE +/- 0.39, N = 3 31.17 29.36 32.44 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b c 30 60 90 120 150 SE +/- 0.35, N = 3 112.06 111.69 113.29 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b c 20 40 60 80 100 SE +/- 0.16, N = 3 97.95 97.78 98.11 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b c 20 40 60 80 100 SE +/- 0.10, N = 3 87.30 87.53 86.93 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5