ncnn llama ryzen ai

AMD Ryzen AI 9 HX 370 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and AMD Radeon 512MB on Ubuntu 24.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412295-NE-NCNNLLAMA93&sor.

ncnn llama ryzen aiProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLCompilerFile-SystemScreen ResolutionabcAMD Ryzen AI 9 HX 370 @ 4.37GHz (12 Cores / 24 Threads)ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS)AMD Device 15074 x 8GB LPDDR5-7500MT/s Samsung K3KL9L90CM-MGCT1024GB MTFDKBA1T0QFM-1BD1AABGBAMD Radeon 512MBAMD Rembrandt Radeon HD AudioMEDIATEK Device 7925Ubuntu 24.106.11.0-rc6-phx (x86_64)GNOME Shell 47.0X Server + Wayland4.6 Mesa 24.2.3-1ubuntu1 (LLVM 19.1.0 DRM 3.58)GCC 14.2.0ext42880x1800OpenBenchmarking.orgKernel Details- amdgpu.dcdebugmask=0x600 - Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ncnn llama ryzen aincnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: CPU - vision_transformerncnn: CPU - FastestDetncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048abc11.614.232.972.572.994.5718.0234.925.744.6513.7911.6116.668.118.4464.653.2811.334.13.022.623.034.561.018.5135.86.555.3114.7211.3315.578.938.4865.733.8110.1634.6730.4129.6110.2631.2330.7729.753.26124.53122.15114.9111.494.053.032.593.034.541.098.2134.345.924.7914.0311.4916.668.338.9667.134.1312.674.543.412.963.655.221.199.4935.26.825.414.6512.6715.959.419.367.74.2310.1737.6331.9331.0410.3533.3232.530.2853.94138.47151.7137.111.724.133.022.573.054.5917.8534.055.634.3914.3511.7216.238.38.6863.273.8111.274.113.132.693.054.761.048.4733.456.465.114.0611.2715.778.698.7764.414.1210.1237.5933.1930.8810.3633.232.4430.5754.39136.99144.13135.14OpenBenchmarking.org

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mobilenetbac369121511.4911.6111.72MIN: 11.36 / MAX: 14.39MIN: 11.1 / MAX: 23.56MIN: 11.3 / MAX: 35.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v2-v2 - Model: mobilenet-v2bca0.95181.90362.85543.80724.7594.054.134.23MIN: 3.41 / MAX: 5.99MIN: 3.49 / MAX: 5.81MIN: 3.44 / MAX: 17.581. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v3-v3 - Model: mobilenet-v3acb0.68181.36362.04542.72723.4092.973.023.03MIN: 2.92 / MAX: 4.63MIN: 2.96 / MAX: 4.65MIN: 2.94 / MAX: 8.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: shufflenet-v2acb0.58281.16561.74842.33122.9142.572.572.59MIN: 2.49 / MAX: 7.42MIN: 2.53 / MAX: 4.41MIN: 2.55 / MAX: 4.081. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mnasnetabc0.68631.37262.05892.74523.43152.993.033.05MIN: 2.93 / MAX: 4.4MIN: 2.99 / MAX: 4.54MIN: 2.97 / MAX: 4.461. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: efficientnet-b0bac1.03282.06563.09844.13125.1644.544.574.59MIN: 4.48 / MAX: 6.51MIN: 4.5 / MAX: 8.31MIN: 4.52 / MAX: 7.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: blazefaceacb0.24530.49060.73590.98121.22651.001.001.09MIN: 0.98 / MAX: 1.32MIN: 0.98 / MAX: 2.6MIN: 1.05 / MAX: 6.921. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: googlenetcab2468107.858.028.21MIN: 7.54 / MAX: 40.26MIN: 7.65 / MAX: 34.96MIN: 8.04 / MAX: 20.871. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vgg16cba81624324034.0534.3434.92MIN: 32.12 / MAX: 65.41MIN: 31.53 / MAX: 79.85MIN: 33.21 / MAX: 71.541. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet18cab1.3322.6643.9965.3286.665.635.745.92MIN: 5.5 / MAX: 7.64MIN: 5.6 / MAX: 7.56MIN: 5.44 / MAX: 24.291. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: alexnetcab1.07782.15563.23344.31125.3894.394.654.79MIN: 4.12 / MAX: 4.64MIN: 4.2 / MAX: 22.47MIN: 4.42 / MAX: 10.51. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet50abc4812162013.7914.0314.35MIN: 13.4 / MAX: 28.04MIN: 13.74 / MAX: 20.15MIN: 14.17 / MAX: 16.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3bac369121511.4911.6111.72MIN: 11.36 / MAX: 14.39MIN: 11.1 / MAX: 23.56MIN: 11.3 / MAX: 35.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: yolov4-tinycab4812162016.2316.6616.66MIN: 15.4 / MAX: 25.68MIN: 14.94 / MAX: 60.31MIN: 16.04 / MAX: 42.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: squeezenet_ssdacb2468108.118.308.33MIN: 7.96 / MAX: 14.05MIN: 8.21 / MAX: 9.81MIN: 8.2 / MAX: 10.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: regnety_400macb36912158.448.688.96MIN: 8.34 / MAX: 10.28MIN: 8.33 / MAX: 62.13MIN: 8.41 / MAX: 68.081. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vision_transformercab153045607563.2764.6567.13MIN: 61.33 / MAX: 84.29MIN: 58.81 / MAX: 107.05MIN: 62.34 / MAX: 114.311. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: FastestDetacb0.92931.85862.78793.71724.64653.283.814.13MIN: 3.22 / MAX: 7.63MIN: 3.79 / MAX: 4.18MIN: 4.1 / MAX: 5.771. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mobilenetcab369121511.2711.3312.67MIN: 10.9 / MAX: 33.6MIN: 11.16 / MAX: 18.76MIN: 12.24 / MAX: 20.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2acb1.02152.0433.06454.0865.10754.104.114.54MIN: 3.4 / MAX: 10.88MIN: 3.46 / MAX: 6.51MIN: 3.98 / MAX: 6.731. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3acb0.76731.53462.30193.06923.83653.023.133.41MIN: 2.97 / MAX: 4.46MIN: 3.06 / MAX: 4.97MIN: 3.34 / MAX: 5.151. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: shufflenet-v2acb0.6661.3321.9982.6643.332.622.692.96MIN: 2.56 / MAX: 5.99MIN: 2.66 / MAX: 3.78MIN: 2.92 / MAX: 4.51. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mnasnetacb0.82131.64262.46393.28524.10653.033.053.65MIN: 2.95 / MAX: 6.34MIN: 2.97 / MAX: 8.74MIN: 3.59 / MAX: 5.281. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: efficientnet-b0acb1.17452.3493.52354.6985.87254.564.765.22MIN: 4.49 / MAX: 6.26MIN: 4.69 / MAX: 7.83MIN: 5.16 / MAX: 6.841. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: blazefaceacb0.26780.53560.80341.07121.3391.011.041.19MIN: 0.99 / MAX: 2.69MIN: 1.02 / MAX: 2.73MIN: 1.14 / MAX: 7.921. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: googlenetcab36912158.478.519.49MIN: 7.96 / MAX: 20.87MIN: 7.98 / MAX: 34.51MIN: 9.19 / MAX: 15.311. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vgg16cba81624324033.4535.2035.80MIN: 32.29 / MAX: 53.44MIN: 32.79 / MAX: 90.63MIN: 32.19 / MAX: 125.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet18cab2468106.466.556.82MIN: 5.71 / MAX: 55.37MIN: 6.25 / MAX: 31.85MIN: 6.38 / MAX: 37.91. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: alexnetcab1.2152.433.6454.866.0755.105.315.40MIN: 4.76 / MAX: 48.23MIN: 4.94 / MAX: 7.23MIN: 5.04 / MAX: 7.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet50cba4812162014.0614.6514.72MIN: 13.35 / MAX: 76.9MIN: 14.41 / MAX: 20.75MIN: 14.4 / MAX: 21.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3cab369121511.2711.3312.67MIN: 10.9 / MAX: 33.6MIN: 11.16 / MAX: 18.76MIN: 12.24 / MAX: 20.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: yolov4-tinyacb4812162015.5715.7715.95MIN: 15.11 / MAX: 21.2MIN: 14.9 / MAX: 64.97MIN: 15.06 / MAX: 46.731. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: squeezenet_ssdcab36912158.698.939.41MIN: 8.52 / MAX: 14.64MIN: 8.8 / MAX: 11.44MIN: 9.29 / MAX: 11.181. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: regnety_400macb36912158.488.779.30MIN: 8.36 / MAX: 14.49MIN: 8.62 / MAX: 12.87MIN: 9.16 / MAX: 13.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vision_transformercab153045607564.4165.7367.70MIN: 57 / MAX: 128.53MIN: 59.09 / MAX: 105.79MIN: 62.75 / MAX: 75.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: FastestDetacb0.95181.90362.85543.80724.7593.814.124.23MIN: 3.75 / MAX: 6.11MIN: 4.08 / MAX: 5.88MIN: 4.19 / MAX: 6.151. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128bac369121510.1710.1610.121. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512bca91827364537.6337.5934.671. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024cba81624324033.1931.9330.411. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048bca71421283531.0430.8829.611. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128cba369121510.3610.3510.261. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512bca81624324033.3233.2031.231. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024bca81624324032.5032.4430.771. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048cba71421283530.5730.2829.701. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128cba122436486054.3953.9453.261. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512bca306090120150138.47136.99124.531. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024bca306090120150151.70144.13122.151. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048bca306090120150137.10135.14114.911. (CXX) g++ options: -O3


Phoronix Test Suite v10.8.5