ncnn llama Intel Core Ultra 7 256V testing with a ASUS Zenbook S 14 UX5406SA_UX5406SA UX5406SA v1.0 (UX5406SA.300 BIOS) and ASUS Intel LNL 7GB on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412296-NE-NCNNLLAMA15 .
ncnn llama Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL OpenCL Compiler File-System Screen Resolution a b c d Intel Core Ultra 7 256V @ 4.70GHz (8 Cores) ASUS Zenbook S 14 UX5406SA_UX5406SA UX5406SA v1.0 (UX5406SA.300 BIOS) Intel Device a87f 8 x 2GB LPDDR5-8533MT/s Samsung 1024GB Western Digital WD PC SN560 SDDPNQE-1T00-1102 ASUS Intel LNL 7GB Intel Lunar Lake-M HD Audio Intel Device a840 Ubuntu 24.10 6.12.0-rc6-phx-drm-next (x86_64) GNOME Shell 47.0 X Server + Wayland 4.6 Mesa 25.0~git2411250600.45c523~oibaf~o (git-45c5231 2024-11-25 oracular-oibaf-pp OpenCL 3.0 GCC 14.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: performance) - Platform Profile: performance - CPU Microcode: 0x114 - Thermald 2.5.8 - ACPI Profile: performance Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ncnn llama ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 a b c d 14.14 7.01 6.13 4.82 6.33 11.16 2.37 16.62 48.72 11.36 8.39 30.49 14.14 17.7 15.17 21.56 199.02 4.93 13.99 4.75 4.51 3.65 4.47 7.92 1.92 11.25 44.95 7.56 5.91 19.1 13.99 17.64 9.57 16.59 196.62 4.82 8.87 28.77 28.09 27.16 9.24 28.7 28 27.11 38.35 61.73 60.83 58.73 13.59 5 4.5 3.65 4.48 7.99 1.84 11.21 44.4 7.51 5.86 18.96 13.59 17.62 9.55 16.51 199.84 4.69 14.35 4.87 4.39 3.66 4.37 7.99 1.92 11.28 44.61 7.65 6.17 19.26 14.35 17.74 9.77 16.73 200.12 4.95 8.85 28.61 27.93 27.09 9.22 28.3 27.9 27.22 38.35 61.34 61.49 61.4 14.4 4.9 4.43 3.64 4.49 7.84 1.88 11.26 44.85 7.59 6.12 19.25 14.4 17.54 9.66 16.9 197.49 4.68 13.88 5 4.53 3.65 4.4 7.87 1.92 11.27 44.65 7.55 5.91 19.06 13.88 17.51 9.66 16.39 202.37 4.81 8.85 28.43 27.78 27.15 9.2 28.54 26.25 27.19 38.43 60.85 54.08 56.79 14.29 4.98 4.53 3.65 4.52 7.95 1.94 11.36 44.32 7.62 6.06 19.2 14.29 17.93 9.63 16.34 199.3 4.94 13.63 4.96 4.49 3.66 4.42 7.98 1.91 11.25 45.04 7.56 5.94 19.06 13.63 17.84 9.7 16.25 198.73 4.63 8.83 28.51 26.87 27.01 9.21 28.5 27.81 27.26 37.64 62.31 54.39 57.28 OpenBenchmarking.org
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mobilenet a b c d 4 8 12 16 20 14.14 13.59 14.40 14.29 MIN: 12.81 / MAX: 19.39 MIN: 12.35 / MAX: 16.17 MIN: 13.42 / MAX: 16.32 MIN: 13.43 / MAX: 15.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v2-v2 - Model: mobilenet-v2 a b c d 2 4 6 8 10 7.01 5.00 4.90 4.98 MIN: 6.26 / MAX: 8.27 MIN: 4.79 / MAX: 6.66 MIN: 4.22 / MAX: 7.14 MIN: 4.61 / MAX: 7.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v3-v3 - Model: mobilenet-v3 a b c d 2 4 6 8 10 6.13 4.50 4.43 4.53 MIN: 5.69 / MAX: 8.86 MIN: 4.3 / MAX: 7.07 MIN: 4.15 / MAX: 6.44 MIN: 4.32 / MAX: 6.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: shufflenet-v2 a b c d 1.0845 2.169 3.2535 4.338 5.4225 4.82 3.65 3.64 3.65 MIN: 4.59 / MAX: 5.02 MIN: 3.46 / MAX: 5.16 MIN: 3.49 / MAX: 4.97 MIN: 3.6 / MAX: 5.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mnasnet a b c d 2 4 6 8 10 6.33 4.48 4.49 4.52 MIN: 5.92 / MAX: 7.4 MIN: 4.18 / MAX: 4.99 MIN: 4.38 / MAX: 5.08 MIN: 4.29 / MAX: 7.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: efficientnet-b0 a b c d 3 6 9 12 15 11.16 7.99 7.84 7.95 MIN: 10.59 / MAX: 11.76 MIN: 7.54 / MAX: 9.91 MIN: 7.34 / MAX: 9.39 MIN: 7.4 / MAX: 8.36 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: blazeface a b c d 0.5333 1.0666 1.5999 2.1332 2.6665 2.37 1.84 1.88 1.94 MIN: 2.23 / MAX: 2.49 MIN: 1.74 / MAX: 1.91 MIN: 1.82 / MAX: 1.99 MIN: 1.9 / MAX: 1.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: googlenet a b c d 4 8 12 16 20 16.62 11.21 11.26 11.36 MIN: 15.75 / MAX: 17.72 MIN: 10.76 / MAX: 11.55 MIN: 10.66 / MAX: 13.28 MIN: 11.11 / MAX: 11.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vgg16 a b c d 11 22 33 44 55 48.72 44.40 44.85 44.32 MIN: 47.5 / MAX: 50.51 MIN: 41.87 / MAX: 45.98 MIN: 42.51 / MAX: 46.34 MIN: 41.83 / MAX: 45.85 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet18 a b c d 3 6 9 12 15 11.36 7.51 7.59 7.62 MIN: 10.53 / MAX: 12.29 MIN: 7.26 / MAX: 7.95 MIN: 7.4 / MAX: 7.96 MIN: 7.39 / MAX: 8.42 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: alexnet a b c d 2 4 6 8 10 8.39 5.86 6.12 6.06 MIN: 7.52 / MAX: 9.64 MIN: 5.71 / MAX: 6.23 MIN: 5.95 / MAX: 6.36 MIN: 5.91 / MAX: 7.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet50 a b c d 7 14 21 28 35 30.49 18.96 19.25 19.20 MIN: 28.71 / MAX: 34.17 MIN: 18.53 / MAX: 19.8 MIN: 18.8 / MAX: 21.77 MIN: 18.8 / MAX: 19.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 4 8 12 16 20 14.14 13.59 14.40 14.29 MIN: 12.81 / MAX: 19.39 MIN: 12.35 / MAX: 16.17 MIN: 13.42 / MAX: 16.32 MIN: 13.43 / MAX: 15.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: yolov4-tiny a b c d 4 8 12 16 20 17.70 17.62 17.54 17.93 MIN: 16.34 / MAX: 19.92 MIN: 16.86 / MAX: 18.72 MIN: 16.19 / MAX: 18.43 MIN: 16.96 / MAX: 19.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: squeezenet_ssd a b c d 4 8 12 16 20 15.17 9.55 9.66 9.63 MIN: 13.87 / MAX: 20.8 MIN: 9.04 / MAX: 10.3 MIN: 9.09 / MAX: 10.11 MIN: 9.13 / MAX: 10.02 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: regnety_400m a b c d 5 10 15 20 25 21.56 16.51 16.90 16.34 MIN: 20.87 / MAX: 28.32 MIN: 15.62 / MAX: 18.22 MIN: 15.72 / MAX: 18 MIN: 15.56 / MAX: 21.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vision_transformer a b c d 40 80 120 160 200 199.02 199.84 197.49 199.30 MIN: 194.44 / MAX: 203.76 MIN: 194.4 / MAX: 204.25 MIN: 193.52 / MAX: 203.23 MIN: 195.1 / MAX: 203.24 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: FastestDet a b c d 1.1115 2.223 3.3345 4.446 5.5575 4.93 4.69 4.68 4.94 MIN: 4.85 / MAX: 5.13 MIN: 4.36 / MAX: 4.91 MIN: 4.45 / MAX: 5.02 MIN: 4.53 / MAX: 5.07 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mobilenet a b c d 4 8 12 16 20 13.99 14.35 13.88 13.63 MIN: 12.4 / MAX: 16.02 MIN: 13.46 / MAX: 16.3 MIN: 12.41 / MAX: 15.7 MIN: 12.41 / MAX: 15.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 a b c d 1.125 2.25 3.375 4.5 5.625 4.75 4.87 5.00 4.96 MIN: 4.23 / MAX: 5.52 MIN: 4.22 / MAX: 7.49 MIN: 4.26 / MAX: 7.25 MIN: 4.56 / MAX: 7.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b c d 1.0193 2.0386 3.0579 4.0772 5.0965 4.51 4.39 4.53 4.49 MIN: 4.25 / MAX: 4.9 MIN: 4.16 / MAX: 6.09 MIN: 4.3 / MAX: 7.77 MIN: 4.21 / MAX: 5.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: shufflenet-v2 a b c d 0.8235 1.647 2.4705 3.294 4.1175 3.65 3.66 3.65 3.66 MIN: 3.6 / MAX: 3.72 MIN: 3.58 / MAX: 6.33 MIN: 3.48 / MAX: 5.38 MIN: 3.59 / MAX: 5.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mnasnet a b c d 1.0058 2.0116 3.0174 4.0232 5.029 4.47 4.37 4.40 4.42 MIN: 4.22 / MAX: 5.04 MIN: 4.19 / MAX: 4.99 MIN: 4.22 / MAX: 4.98 MIN: 4.23 / MAX: 4.94 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: efficientnet-b0 a b c d 2 4 6 8 10 7.92 7.99 7.87 7.98 MIN: 7.56 / MAX: 8.16 MIN: 7.88 / MAX: 8.15 MIN: 7.36 / MAX: 9.34 MIN: 7.68 / MAX: 8.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: blazeface a b c d 0.432 0.864 1.296 1.728 2.16 1.92 1.92 1.92 1.91 MIN: 1.87 / MAX: 1.98 MIN: 1.83 / MAX: 2.03 MIN: 1.84 / MAX: 1.98 MIN: 1.84 / MAX: 1.97 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: googlenet a b c d 3 6 9 12 15 11.25 11.28 11.27 11.25 MIN: 10.92 / MAX: 12.23 MIN: 10.66 / MAX: 11.92 MIN: 10.72 / MAX: 14.11 MIN: 10.97 / MAX: 12.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vgg16 a b c d 10 20 30 40 50 44.95 44.61 44.65 45.04 MIN: 42.32 / MAX: 46.93 MIN: 42.3 / MAX: 46.22 MIN: 41.13 / MAX: 46.45 MIN: 43.03 / MAX: 46.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet18 a b c d 2 4 6 8 10 7.56 7.65 7.55 7.56 MIN: 7.29 / MAX: 7.92 MIN: 7.42 / MAX: 7.91 MIN: 7.3 / MAX: 9.6 MIN: 7.28 / MAX: 7.94 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: alexnet a b c d 2 4 6 8 10 5.91 6.17 5.91 5.94 MIN: 5.73 / MAX: 6.22 MIN: 5.98 / MAX: 6.59 MIN: 5.74 / MAX: 6.27 MIN: 5.75 / MAX: 6.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet50 a b c d 5 10 15 20 25 19.10 19.26 19.06 19.06 MIN: 18.54 / MAX: 21.48 MIN: 18.87 / MAX: 20.2 MIN: 18.63 / MAX: 19.69 MIN: 18.66 / MAX: 19.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 4 8 12 16 20 13.99 14.35 13.88 13.63 MIN: 12.4 / MAX: 16.02 MIN: 13.46 / MAX: 16.3 MIN: 12.41 / MAX: 15.7 MIN: 12.41 / MAX: 15.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: yolov4-tiny a b c d 4 8 12 16 20 17.64 17.74 17.51 17.84 MIN: 16.96 / MAX: 18.52 MIN: 16.9 / MAX: 19.03 MIN: 16.35 / MAX: 19.44 MIN: 16.99 / MAX: 19.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: squeezenet_ssd a b c d 3 6 9 12 15 9.57 9.77 9.66 9.70 MIN: 9.13 / MAX: 11.39 MIN: 9.06 / MAX: 10.17 MIN: 8.92 / MAX: 10.07 MIN: 9.2 / MAX: 10.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: regnety_400m a b c d 4 8 12 16 20 16.59 16.73 16.39 16.25 MIN: 15.66 / MAX: 19.68 MIN: 15.69 / MAX: 21 MIN: 15.67 / MAX: 19.65 MIN: 15.65 / MAX: 17.82 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vision_transformer a b c d 40 80 120 160 200 196.62 200.12 202.37 198.73 MIN: 192.65 / MAX: 200.94 MIN: 194.52 / MAX: 204.7 MIN: 195.84 / MAX: 206.71 MIN: 194.73 / MAX: 203.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: FastestDet a b c d 1.1138 2.2276 3.3414 4.4552 5.569 4.82 4.95 4.81 4.63 MIN: 4.71 / MAX: 5.03 MIN: 4.86 / MAX: 5.31 MIN: 4.69 / MAX: 5.02 MIN: 4.33 / MAX: 4.89 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b c d 2 4 6 8 10 8.87 8.85 8.85 8.83 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b c d 7 14 21 28 35 28.77 28.61 28.43 28.51 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b c d 7 14 21 28 35 28.09 27.93 27.78 26.87 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b c d 6 12 18 24 30 27.16 27.09 27.15 27.01 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b c d 3 6 9 12 15 9.24 9.22 9.20 9.21 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b c d 7 14 21 28 35 28.70 28.30 28.54 28.50 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b c d 7 14 21 28 35 28.00 27.90 26.25 27.81 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b c d 6 12 18 24 30 27.11 27.22 27.19 27.26 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b c d 9 18 27 36 45 38.35 38.35 38.43 37.64 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b c d 14 28 42 56 70 61.73 61.34 60.85 62.31 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b c d 14 28 42 56 70 60.83 61.49 54.08 54.39 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b c d 14 28 42 56 70 58.73 61.40 56.79 57.28 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5