ncnn llama ryzen ai AMD Ryzen AI 9 HX 370 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and AMD Radeon 512MB on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412295-NE-NCNNLLAMA93 .
ncnn llama ryzen ai Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c AMD Ryzen AI 9 HX 370 @ 4.37GHz (12 Cores / 24 Threads) ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) AMD Device 1507 4 x 8GB LPDDR5-7500MT/s Samsung K3KL9L90CM-MGCT 1024GB MTFDKBA1T0QFM-1BD1AABGB AMD Radeon 512MB AMD Rembrandt Radeon HD Audio MEDIATEK Device 7925 Ubuntu 24.10 6.11.0-rc6-phx (x86_64) GNOME Shell 47.0 X Server + Wayland 4.6 Mesa 24.2.3-1ubuntu1 (LLVM 19.1.0 DRM 3.58) GCC 14.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - amdgpu.dcdebugmask=0x600 - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ncnn llama ryzen ai ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 a b c 11.61 4.23 2.97 2.57 2.99 4.57 1 8.02 34.92 5.74 4.65 13.79 11.61 16.66 8.11 8.44 64.65 3.28 11.33 4.1 3.02 2.62 3.03 4.56 1.01 8.51 35.8 6.55 5.31 14.72 11.33 15.57 8.93 8.48 65.73 3.81 10.16 34.67 30.41 29.61 10.26 31.23 30.77 29.7 53.26 124.53 122.15 114.91 11.49 4.05 3.03 2.59 3.03 4.54 1.09 8.21 34.34 5.92 4.79 14.03 11.49 16.66 8.33 8.96 67.13 4.13 12.67 4.54 3.41 2.96 3.65 5.22 1.19 9.49 35.2 6.82 5.4 14.65 12.67 15.95 9.41 9.3 67.7 4.23 10.17 37.63 31.93 31.04 10.35 33.32 32.5 30.28 53.94 138.47 151.7 137.1 11.72 4.13 3.02 2.57 3.05 4.59 1 7.85 34.05 5.63 4.39 14.35 11.72 16.23 8.3 8.68 63.27 3.81 11.27 4.11 3.13 2.69 3.05 4.76 1.04 8.47 33.45 6.46 5.1 14.06 11.27 15.77 8.69 8.77 64.41 4.12 10.12 37.59 33.19 30.88 10.36 33.2 32.44 30.57 54.39 136.99 144.13 135.14 OpenBenchmarking.org
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mobilenet a b c 3 6 9 12 15 11.61 11.49 11.72 MIN: 11.1 / MAX: 23.56 MIN: 11.36 / MAX: 14.39 MIN: 11.3 / MAX: 35.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v2-v2 - Model: mobilenet-v2 a b c 0.9518 1.9036 2.8554 3.8072 4.759 4.23 4.05 4.13 MIN: 3.44 / MAX: 17.58 MIN: 3.41 / MAX: 5.99 MIN: 3.49 / MAX: 5.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v3-v3 - Model: mobilenet-v3 a b c 0.6818 1.3636 2.0454 2.7272 3.409 2.97 3.03 3.02 MIN: 2.92 / MAX: 4.63 MIN: 2.94 / MAX: 8.81 MIN: 2.96 / MAX: 4.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: shufflenet-v2 a b c 0.5828 1.1656 1.7484 2.3312 2.914 2.57 2.59 2.57 MIN: 2.49 / MAX: 7.42 MIN: 2.55 / MAX: 4.08 MIN: 2.53 / MAX: 4.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mnasnet a b c 0.6863 1.3726 2.0589 2.7452 3.4315 2.99 3.03 3.05 MIN: 2.93 / MAX: 4.4 MIN: 2.99 / MAX: 4.54 MIN: 2.97 / MAX: 4.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: efficientnet-b0 a b c 1.0328 2.0656 3.0984 4.1312 5.164 4.57 4.54 4.59 MIN: 4.5 / MAX: 8.31 MIN: 4.48 / MAX: 6.51 MIN: 4.52 / MAX: 7.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: blazeface a b c 0.2453 0.4906 0.7359 0.9812 1.2265 1.00 1.09 1.00 MIN: 0.98 / MAX: 1.32 MIN: 1.05 / MAX: 6.92 MIN: 0.98 / MAX: 2.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: googlenet a b c 2 4 6 8 10 8.02 8.21 7.85 MIN: 7.65 / MAX: 34.96 MIN: 8.04 / MAX: 20.87 MIN: 7.54 / MAX: 40.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vgg16 a b c 8 16 24 32 40 34.92 34.34 34.05 MIN: 33.21 / MAX: 71.54 MIN: 31.53 / MAX: 79.85 MIN: 32.12 / MAX: 65.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet18 a b c 1.332 2.664 3.996 5.328 6.66 5.74 5.92 5.63 MIN: 5.6 / MAX: 7.56 MIN: 5.44 / MAX: 24.29 MIN: 5.5 / MAX: 7.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: alexnet a b c 1.0778 2.1556 3.2334 4.3112 5.389 4.65 4.79 4.39 MIN: 4.2 / MAX: 22.47 MIN: 4.42 / MAX: 10.5 MIN: 4.12 / MAX: 4.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet50 a b c 4 8 12 16 20 13.79 14.03 14.35 MIN: 13.4 / MAX: 28.04 MIN: 13.74 / MAX: 20.15 MIN: 14.17 / MAX: 16.67 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c 3 6 9 12 15 11.61 11.49 11.72 MIN: 11.1 / MAX: 23.56 MIN: 11.36 / MAX: 14.39 MIN: 11.3 / MAX: 35.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: yolov4-tiny a b c 4 8 12 16 20 16.66 16.66 16.23 MIN: 14.94 / MAX: 60.31 MIN: 16.04 / MAX: 42.41 MIN: 15.4 / MAX: 25.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: squeezenet_ssd a b c 2 4 6 8 10 8.11 8.33 8.30 MIN: 7.96 / MAX: 14.05 MIN: 8.2 / MAX: 10.67 MIN: 8.21 / MAX: 9.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: regnety_400m a b c 3 6 9 12 15 8.44 8.96 8.68 MIN: 8.34 / MAX: 10.28 MIN: 8.41 / MAX: 68.08 MIN: 8.33 / MAX: 62.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vision_transformer a b c 15 30 45 60 75 64.65 67.13 63.27 MIN: 58.81 / MAX: 107.05 MIN: 62.34 / MAX: 114.31 MIN: 61.33 / MAX: 84.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: FastestDet a b c 0.9293 1.8586 2.7879 3.7172 4.6465 3.28 4.13 3.81 MIN: 3.22 / MAX: 7.63 MIN: 4.1 / MAX: 5.77 MIN: 3.79 / MAX: 4.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mobilenet a b c 3 6 9 12 15 11.33 12.67 11.27 MIN: 11.16 / MAX: 18.76 MIN: 12.24 / MAX: 20.51 MIN: 10.9 / MAX: 33.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 a b c 1.0215 2.043 3.0645 4.086 5.1075 4.10 4.54 4.11 MIN: 3.4 / MAX: 10.88 MIN: 3.98 / MAX: 6.73 MIN: 3.46 / MAX: 6.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b c 0.7673 1.5346 2.3019 3.0692 3.8365 3.02 3.41 3.13 MIN: 2.97 / MAX: 4.46 MIN: 3.34 / MAX: 5.15 MIN: 3.06 / MAX: 4.97 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: shufflenet-v2 a b c 0.666 1.332 1.998 2.664 3.33 2.62 2.96 2.69 MIN: 2.56 / MAX: 5.99 MIN: 2.92 / MAX: 4.5 MIN: 2.66 / MAX: 3.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mnasnet a b c 0.8213 1.6426 2.4639 3.2852 4.1065 3.03 3.65 3.05 MIN: 2.95 / MAX: 6.34 MIN: 3.59 / MAX: 5.28 MIN: 2.97 / MAX: 8.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: efficientnet-b0 a b c 1.1745 2.349 3.5235 4.698 5.8725 4.56 5.22 4.76 MIN: 4.49 / MAX: 6.26 MIN: 5.16 / MAX: 6.84 MIN: 4.69 / MAX: 7.83 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: blazeface a b c 0.2678 0.5356 0.8034 1.0712 1.339 1.01 1.19 1.04 MIN: 0.99 / MAX: 2.69 MIN: 1.14 / MAX: 7.92 MIN: 1.02 / MAX: 2.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: googlenet a b c 3 6 9 12 15 8.51 9.49 8.47 MIN: 7.98 / MAX: 34.51 MIN: 9.19 / MAX: 15.31 MIN: 7.96 / MAX: 20.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vgg16 a b c 8 16 24 32 40 35.80 35.20 33.45 MIN: 32.19 / MAX: 125.49 MIN: 32.79 / MAX: 90.63 MIN: 32.29 / MAX: 53.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet18 a b c 2 4 6 8 10 6.55 6.82 6.46 MIN: 6.25 / MAX: 31.85 MIN: 6.38 / MAX: 37.9 MIN: 5.71 / MAX: 55.37 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: alexnet a b c 1.215 2.43 3.645 4.86 6.075 5.31 5.40 5.10 MIN: 4.94 / MAX: 7.23 MIN: 5.04 / MAX: 7.04 MIN: 4.76 / MAX: 48.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet50 a b c 4 8 12 16 20 14.72 14.65 14.06 MIN: 14.4 / MAX: 21.48 MIN: 14.41 / MAX: 20.75 MIN: 13.35 / MAX: 76.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c 3 6 9 12 15 11.33 12.67 11.27 MIN: 11.16 / MAX: 18.76 MIN: 12.24 / MAX: 20.51 MIN: 10.9 / MAX: 33.6 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: yolov4-tiny a b c 4 8 12 16 20 15.57 15.95 15.77 MIN: 15.11 / MAX: 21.2 MIN: 15.06 / MAX: 46.73 MIN: 14.9 / MAX: 64.97 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: squeezenet_ssd a b c 3 6 9 12 15 8.93 9.41 8.69 MIN: 8.8 / MAX: 11.44 MIN: 9.29 / MAX: 11.18 MIN: 8.52 / MAX: 14.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: regnety_400m a b c 3 6 9 12 15 8.48 9.30 8.77 MIN: 8.36 / MAX: 14.49 MIN: 9.16 / MAX: 13.75 MIN: 8.62 / MAX: 12.87 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vision_transformer a b c 15 30 45 60 75 65.73 67.70 64.41 MIN: 59.09 / MAX: 105.79 MIN: 62.75 / MAX: 75.81 MIN: 57 / MAX: 128.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: FastestDet a b c 0.9518 1.9036 2.8554 3.8072 4.759 3.81 4.23 4.12 MIN: 3.75 / MAX: 6.11 MIN: 4.19 / MAX: 6.15 MIN: 4.08 / MAX: 5.88 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b c 3 6 9 12 15 10.16 10.17 10.12 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b c 9 18 27 36 45 34.67 37.63 37.59 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b c 8 16 24 32 40 30.41 31.93 33.19 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b c 7 14 21 28 35 29.61 31.04 30.88 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b c 3 6 9 12 15 10.26 10.35 10.36 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b c 8 16 24 32 40 31.23 33.32 33.20 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b c 8 16 24 32 40 30.77 32.50 32.44 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b c 7 14 21 28 35 29.70 30.28 30.57 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b c 12 24 36 48 60 53.26 53.94 54.39 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b c 30 60 90 120 150 124.53 138.47 136.99 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b c 30 60 90 120 150 122.15 151.70 144.13 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b c 30 60 90 120 150 114.91 137.10 135.14 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5