ncnn llama Intel Core Ultra 7 256V testing with a ASUS Zenbook S 14 UX5406SA_UX5406SA UX5406SA v1.0 (UX5406SA.300 BIOS) and ASUS Intel LNL 7GB on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412296-NE-NCNNLLAMA15&grr .
ncnn llama Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL OpenCL Compiler File-System Screen Resolution a b c d Intel Core Ultra 7 256V @ 4.70GHz (8 Cores) ASUS Zenbook S 14 UX5406SA_UX5406SA UX5406SA v1.0 (UX5406SA.300 BIOS) Intel Device a87f 8 x 2GB LPDDR5-8533MT/s Samsung 1024GB Western Digital WD PC SN560 SDDPNQE-1T00-1102 ASUS Intel LNL 7GB Intel Lunar Lake-M HD Audio Intel Device a840 Ubuntu 24.10 6.12.0-rc6-phx-drm-next (x86_64) GNOME Shell 47.0 X Server + Wayland 4.6 Mesa 25.0~git2411250600.45c523~oibaf~o (git-45c5231 2024-11-25 oracular-oibaf-pp OpenCL 3.0 GCC 14.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: performance) - Platform Profile: performance - CPU Microcode: 0x114 - Thermald 2.5.8 - ACPI Profile: performance Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ncnn llama llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 ncnn: CPU - FastestDet ncnn: CPU - vision_transformer ncnn: CPU - regnety_400m ncnn: CPU - squeezenet_ssd ncnn: CPU - yolov4-tiny ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: CPU - resnet50 ncnn: CPU - alexnet ncnn: CPU - resnet18 ncnn: CPU - vgg16 ncnn: CPU - googlenet ncnn: CPU - blazeface ncnn: CPU - efficientnet-b0 ncnn: CPU - mnasnet ncnn: CPU - shufflenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU - mobilenet ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 a b c d 27.16 27.11 28 28.09 58.73 4.93 199.02 21.56 15.17 17.7 14.14 30.49 8.39 11.36 48.72 16.62 2.37 11.16 6.33 4.82 6.13 7.01 14.14 4.82 196.62 16.59 9.57 17.64 13.99 19.1 5.91 7.56 44.95 11.25 1.92 7.92 4.47 3.65 4.51 4.75 13.99 28.7 28.77 60.83 8.87 9.24 61.73 38.35 27.09 27.22 27.9 27.93 61.4 4.69 199.84 16.51 9.55 17.62 13.59 18.96 5.86 7.51 44.4 11.21 1.84 7.99 4.48 3.65 4.5 5 13.59 4.95 200.12 16.73 9.77 17.74 14.35 19.26 6.17 7.65 44.61 11.28 1.92 7.99 4.37 3.66 4.39 4.87 14.35 28.3 28.61 61.49 8.85 9.22 61.34 38.35 27.15 27.19 26.25 27.78 56.79 4.68 197.49 16.9 9.66 17.54 14.4 19.25 6.12 7.59 44.85 11.26 1.88 7.84 4.49 3.64 4.43 4.9 14.4 4.81 202.37 16.39 9.66 17.51 13.88 19.06 5.91 7.55 44.65 11.27 1.92 7.87 4.4 3.65 4.53 5 13.88 28.54 28.43 54.08 8.85 9.2 60.85 38.43 27.01 27.26 27.81 26.87 57.28 4.94 199.3 16.34 9.63 17.93 14.29 19.2 6.06 7.62 44.32 11.36 1.94 7.95 4.52 3.65 4.53 4.98 14.29 4.63 198.73 16.25 9.7 17.84 13.63 19.06 5.94 7.56 45.04 11.25 1.91 7.98 4.42 3.66 4.49 4.96 13.63 28.5 28.51 54.39 8.83 9.21 62.31 37.64 OpenBenchmarking.org
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b c d 6 12 18 24 30 27.16 27.09 27.15 27.01 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b c d 6 12 18 24 30 27.11 27.22 27.19 27.26 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b c d 7 14 21 28 35 28.00 27.90 26.25 27.81 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b c d 7 14 21 28 35 28.09 27.93 27.78 26.87 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b c d 14 28 42 56 70 58.73 61.40 56.79 57.28 1. (CXX) g++ options: -O3
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: FastestDet a b c d 1.1115 2.223 3.3345 4.446 5.5575 4.93 4.69 4.68 4.94 MIN: 4.85 / MAX: 5.13 MIN: 4.36 / MAX: 4.91 MIN: 4.45 / MAX: 5.02 MIN: 4.53 / MAX: 5.07 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vision_transformer a b c d 40 80 120 160 200 199.02 199.84 197.49 199.30 MIN: 194.44 / MAX: 203.76 MIN: 194.4 / MAX: 204.25 MIN: 193.52 / MAX: 203.23 MIN: 195.1 / MAX: 203.24 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: regnety_400m a b c d 5 10 15 20 25 21.56 16.51 16.90 16.34 MIN: 20.87 / MAX: 28.32 MIN: 15.62 / MAX: 18.22 MIN: 15.72 / MAX: 18 MIN: 15.56 / MAX: 21.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: squeezenet_ssd a b c d 4 8 12 16 20 15.17 9.55 9.66 9.63 MIN: 13.87 / MAX: 20.8 MIN: 9.04 / MAX: 10.3 MIN: 9.09 / MAX: 10.11 MIN: 9.13 / MAX: 10.02 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: yolov4-tiny a b c d 4 8 12 16 20 17.70 17.62 17.54 17.93 MIN: 16.34 / MAX: 19.92 MIN: 16.86 / MAX: 18.72 MIN: 16.19 / MAX: 18.43 MIN: 16.96 / MAX: 19.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 4 8 12 16 20 14.14 13.59 14.40 14.29 MIN: 12.81 / MAX: 19.39 MIN: 12.35 / MAX: 16.17 MIN: 13.42 / MAX: 16.32 MIN: 13.43 / MAX: 15.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet50 a b c d 7 14 21 28 35 30.49 18.96 19.25 19.20 MIN: 28.71 / MAX: 34.17 MIN: 18.53 / MAX: 19.8 MIN: 18.8 / MAX: 21.77 MIN: 18.8 / MAX: 19.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: alexnet a b c d 2 4 6 8 10 8.39 5.86 6.12 6.06 MIN: 7.52 / MAX: 9.64 MIN: 5.71 / MAX: 6.23 MIN: 5.95 / MAX: 6.36 MIN: 5.91 / MAX: 7.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet18 a b c d 3 6 9 12 15 11.36 7.51 7.59 7.62 MIN: 10.53 / MAX: 12.29 MIN: 7.26 / MAX: 7.95 MIN: 7.4 / MAX: 7.96 MIN: 7.39 / MAX: 8.42 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vgg16 a b c d 11 22 33 44 55 48.72 44.40 44.85 44.32 MIN: 47.5 / MAX: 50.51 MIN: 41.87 / MAX: 45.98 MIN: 42.51 / MAX: 46.34 MIN: 41.83 / MAX: 45.85 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: googlenet a b c d 4 8 12 16 20 16.62 11.21 11.26 11.36 MIN: 15.75 / MAX: 17.72 MIN: 10.76 / MAX: 11.55 MIN: 10.66 / MAX: 13.28 MIN: 11.11 / MAX: 11.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: blazeface a b c d 0.5333 1.0666 1.5999 2.1332 2.6665 2.37 1.84 1.88 1.94 MIN: 2.23 / MAX: 2.49 MIN: 1.74 / MAX: 1.91 MIN: 1.82 / MAX: 1.99 MIN: 1.9 / MAX: 1.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: efficientnet-b0 a b c d 3 6 9 12 15 11.16 7.99 7.84 7.95 MIN: 10.59 / MAX: 11.76 MIN: 7.54 / MAX: 9.91 MIN: 7.34 / MAX: 9.39 MIN: 7.4 / MAX: 8.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mnasnet a b c d 2 4 6 8 10 6.33 4.48 4.49 4.52 MIN: 5.92 / MAX: 7.4 MIN: 4.18 / MAX: 4.99 MIN: 4.38 / MAX: 5.08 MIN: 4.29 / MAX: 7.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: shufflenet-v2 a b c d 1.0845 2.169 3.2535 4.338 5.4225 4.82 3.65 3.64 3.65 MIN: 4.59 / MAX: 5.02 MIN: 3.46 / MAX: 5.16 MIN: 3.49 / MAX: 4.97 MIN: 3.6 / MAX: 5.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v3-v3 - Model: mobilenet-v3 a b c d 2 4 6 8 10 6.13 4.50 4.43 4.53 MIN: 5.69 / MAX: 8.86 MIN: 4.3 / MAX: 7.07 MIN: 4.15 / MAX: 6.44 MIN: 4.32 / MAX: 6.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v2-v2 - Model: mobilenet-v2 a b c d 2 4 6 8 10 7.01 5.00 4.90 4.98 MIN: 6.26 / MAX: 8.27 MIN: 4.79 / MAX: 6.66 MIN: 4.22 / MAX: 7.14 MIN: 4.61 / MAX: 7.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mobilenet a b c d 4 8 12 16 20 14.14 13.59 14.40 14.29 MIN: 12.81 / MAX: 19.39 MIN: 12.35 / MAX: 16.17 MIN: 13.42 / MAX: 16.32 MIN: 13.43 / MAX: 15.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: FastestDet a b c d 1.1138 2.2276 3.3414 4.4552 5.569 4.82 4.95 4.81 4.63 MIN: 4.71 / MAX: 5.03 MIN: 4.86 / MAX: 5.31 MIN: 4.69 / MAX: 5.02 MIN: 4.33 / MAX: 4.89 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vision_transformer a b c d 40 80 120 160 200 196.62 200.12 202.37 198.73 MIN: 192.65 / MAX: 200.94 MIN: 194.52 / MAX: 204.7 MIN: 195.84 / MAX: 206.71 MIN: 194.73 / MAX: 203.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: regnety_400m a b c d 4 8 12 16 20 16.59 16.73 16.39 16.25 MIN: 15.66 / MAX: 19.68 MIN: 15.69 / MAX: 21 MIN: 15.67 / MAX: 19.65 MIN: 15.65 / MAX: 17.82 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: squeezenet_ssd a b c d 3 6 9 12 15 9.57 9.77 9.66 9.70 MIN: 9.13 / MAX: 11.39 MIN: 9.06 / MAX: 10.17 MIN: 8.92 / MAX: 10.07 MIN: 9.2 / MAX: 10.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: yolov4-tiny a b c d 4 8 12 16 20 17.64 17.74 17.51 17.84 MIN: 16.96 / MAX: 18.52 MIN: 16.9 / MAX: 19.03 MIN: 16.35 / MAX: 19.44 MIN: 16.99 / MAX: 19.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 4 8 12 16 20 13.99 14.35 13.88 13.63 MIN: 12.4 / MAX: 16.02 MIN: 13.46 / MAX: 16.3 MIN: 12.41 / MAX: 15.7 MIN: 12.41 / MAX: 15.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet50 a b c d 5 10 15 20 25 19.10 19.26 19.06 19.06 MIN: 18.54 / MAX: 21.48 MIN: 18.87 / MAX: 20.2 MIN: 18.63 / MAX: 19.69 MIN: 18.66 / MAX: 19.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: alexnet a b c d 2 4 6 8 10 5.91 6.17 5.91 5.94 MIN: 5.73 / MAX: 6.22 MIN: 5.98 / MAX: 6.59 MIN: 5.74 / MAX: 6.27 MIN: 5.75 / MAX: 6.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet18 a b c d 2 4 6 8 10 7.56 7.65 7.55 7.56 MIN: 7.29 / MAX: 7.92 MIN: 7.42 / MAX: 7.91 MIN: 7.3 / MAX: 9.6 MIN: 7.28 / MAX: 7.94 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vgg16 a b c d 10 20 30 40 50 44.95 44.61 44.65 45.04 MIN: 42.32 / MAX: 46.93 MIN: 42.3 / MAX: 46.22 MIN: 41.13 / MAX: 46.45 MIN: 43.03 / MAX: 46.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: googlenet a b c d 3 6 9 12 15 11.25 11.28 11.27 11.25 MIN: 10.92 / MAX: 12.23 MIN: 10.66 / MAX: 11.92 MIN: 10.72 / MAX: 14.11 MIN: 10.97 / MAX: 12.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: blazeface a b c d 0.432 0.864 1.296 1.728 2.16 1.92 1.92 1.92 1.91 MIN: 1.87 / MAX: 1.98 MIN: 1.83 / MAX: 2.03 MIN: 1.84 / MAX: 1.98 MIN: 1.84 / MAX: 1.97 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: efficientnet-b0 a b c d 2 4 6 8 10 7.92 7.99 7.87 7.98 MIN: 7.56 / MAX: 8.16 MIN: 7.88 / MAX: 8.15 MIN: 7.36 / MAX: 9.34 MIN: 7.68 / MAX: 8.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mnasnet a b c d 1.0058 2.0116 3.0174 4.0232 5.029 4.47 4.37 4.40 4.42 MIN: 4.22 / MAX: 5.04 MIN: 4.19 / MAX: 4.99 MIN: 4.22 / MAX: 4.98 MIN: 4.23 / MAX: 4.94 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: shufflenet-v2 a b c d 0.8235 1.647 2.4705 3.294 4.1175 3.65 3.66 3.65 3.66 MIN: 3.6 / MAX: 3.72 MIN: 3.58 / MAX: 6.33 MIN: 3.48 / MAX: 5.38 MIN: 3.59 / MAX: 5.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b c d 1.0193 2.0386 3.0579 4.0772 5.0965 4.51 4.39 4.53 4.49 MIN: 4.25 / MAX: 4.9 MIN: 4.16 / MAX: 6.09 MIN: 4.3 / MAX: 7.77 MIN: 4.21 / MAX: 5.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 a b c d 1.125 2.25 3.375 4.5 5.625 4.75 4.87 5.00 4.96 MIN: 4.23 / MAX: 5.52 MIN: 4.22 / MAX: 7.49 MIN: 4.26 / MAX: 7.25 MIN: 4.56 / MAX: 7.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mobilenet a b c d 4 8 12 16 20 13.99 14.35 13.88 13.63 MIN: 12.4 / MAX: 16.02 MIN: 13.46 / MAX: 16.3 MIN: 12.41 / MAX: 15.7 MIN: 12.41 / MAX: 15.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b c d 7 14 21 28 35 28.70 28.30 28.54 28.50 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b c d 7 14 21 28 35 28.77 28.61 28.43 28.51 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b c d 14 28 42 56 70 60.83 61.49 54.08 54.39 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b c d 2 4 6 8 10 8.87 8.85 8.85 8.83 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b c d 3 6 9 12 15 9.24 9.22 9.20 9.21 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b c d 14 28 42 56 70 61.73 61.34 60.85 62.31 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b c d 9 18 27 36 45 38.35 38.35 38.43 37.64 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5