ncnn llama

Intel Core Ultra 7 155H testing with a MTL Swift SFG14-72T Coral_MTH (V1.01 BIOS) and Intel Arc MTL 8GB on Ubuntu 24.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412304-NE-NCNNLLAMA37&sro.

ncnn llamaProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLOpenCLCompilerFile-SystemScreen ResolutionabcIntel Core Ultra 7 155H @ 4.80GHz (16 Cores / 22 Threads)MTL Swift SFG14-72T Coral_MTH (V1.01 BIOS)Intel Device 7e7f8 x 2GB LPDDR5-6400MT/s Micron MT62F1G32D2DS-0261024GB Micron_2550_MTFDKBA1T0TGEIntel Arc MTL 8GBIntel Meteor Lake-P HD AudioIntel Meteor Lake PCH CNVi WiFiUbuntu 24.106.11.0-rc6-phx (x86_64)GNOME Shell 47.0X Server + Wayland4.6 Mesa 25.0~git2411250600.45c523~oibaf~o (git-45c5231 2024-11-25 oracular-oibaf-ppOpenCL 3.0GCC 14.2.0ext43840x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x1e - Thermald 2.5.8Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected

ncnn llamancnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: CPU - vision_transformerncnn: CPU - FastestDetncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048abc16.757.687.786.747.4412.834.4914.4736.888.976.3921.4216.7520.1112.9642.43148.179.1516.577.748.037.017.8913.254.4914.4536.888.876.4421.416.5720.0313.141.53147.629.277.320.8820.4819.937.5120.5220.3519.8522.9188.0982.0475.5616.67.697.716.777.3912.364.4114.2836.788.986.2921.2116.620.0212.8139.9150.679.216.697.627.956.897.6713.144.4614.4836.639.026.4321.2916.6920.0213.0841.56147.559.237.3920.8720.6120.157.5720.7920.3919.8823.2588.3182.1776.1316.697.727.826.797.4312.354.4214.436.558.876.3921.3716.6919.7613.0240.4149.749.1816.797.747.866.87.6712.994.4814.4436.898.836.3721.4416.7919.8913.0541.71148.259.37.3420.7820.5819.867.5320.6220.3819.723.3389.1982.0175.56OpenBenchmarking.org

NCNN

Target: CPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mobilenetabc4812162016.7516.6016.69MIN: 14.93 / MAX: 20.88MIN: 14.51 / MAX: 21.62MIN: 14.67 / MAX: 22.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v2-v2 - Model: mobilenet-v2abc2468107.687.697.72MIN: 6.41 / MAX: 11.21MIN: 6.45 / MAX: 10.36MIN: 6.42 / MAX: 9.331. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v3-v3 - Model: mobilenet-v3abc2468107.787.717.82MIN: 6.43 / MAX: 10.34MIN: 6.41 / MAX: 10.93MIN: 6.59 / MAX: 10.21. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: shufflenet-v2abc2468106.746.776.79MIN: 6.38 / MAX: 7.47MIN: 6.39 / MAX: 7.29MIN: 6.05 / MAX: 9.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mnasnetabc2468107.447.397.43MIN: 6.33 / MAX: 8.18MIN: 6.52 / MAX: 10.84MIN: 6.46 / MAX: 10.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: efficientnet-b0abc369121512.8312.3612.35MIN: 11.4 / MAX: 16.56MIN: 10.91 / MAX: 17.1MIN: 10.5 / MAX: 17.241. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: blazefaceabc1.01032.02063.03094.04125.05154.494.414.42MIN: 4.22 / MAX: 4.83MIN: 3.55 / MAX: 6.6MIN: 3.64 / MAX: 5.891. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: googlenetabc4812162014.4714.2814.40MIN: 13.56 / MAX: 19.51MIN: 13.29 / MAX: 19.25MIN: 13.45 / MAX: 20.131. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vgg16abc81624324036.8836.7836.55MIN: 33.2 / MAX: 42.02MIN: 32.94 / MAX: 42.53MIN: 33.17 / MAX: 41.661. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet18abc36912158.978.988.87MIN: 8.17 / MAX: 12.1MIN: 8.19 / MAX: 12.72MIN: 8.21 / MAX: 12.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: alexnetabc2468106.396.296.39MIN: 5.46 / MAX: 9.11MIN: 5.57 / MAX: 8.75MIN: 5.77 / MAX: 8.361. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet50abc51015202521.4221.2121.37MIN: 20.05 / MAX: 26.84MIN: 19.75 / MAX: 25.95MIN: 20.01 / MAX: 25.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3abc4812162016.7516.6016.69MIN: 14.93 / MAX: 20.88MIN: 14.51 / MAX: 21.62MIN: 14.67 / MAX: 22.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: yolov4-tinyabc51015202520.1120.0219.76MIN: 17.71 / MAX: 27.12MIN: 17.01 / MAX: 26.57MIN: 17.47 / MAX: 26.611. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: squeezenet_ssdabc369121512.9612.8113.02MIN: 11.92 / MAX: 16.61MIN: 11.75 / MAX: 15.98MIN: 11.97 / MAX: 17.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: regnety_400mabc102030405042.4339.9040.40MIN: 36.44 / MAX: 54.34MIN: 34.56 / MAX: 43.09MIN: 35.22 / MAX: 44.821. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vision_transformerabc306090120150148.17150.67149.74MIN: 115.2 / MAX: 184.69MIN: 115.09 / MAX: 182.83MIN: 115.38 / MAX: 192.661. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: CPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: FastestDetabc36912159.159.209.18MIN: 7.81 / MAX: 10.38MIN: 7.55 / MAX: 11.72MIN: 7.59 / MAX: 9.961. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mobilenetabc4812162016.5716.6916.79MIN: 14.41 / MAX: 20.36MIN: 14.89 / MAX: 19.78MIN: 14.77 / MAX: 21.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2abc2468107.747.627.74MIN: 6.54 / MAX: 9.96MIN: 6.4 / MAX: 11.61MIN: 6.46 / MAX: 11.781. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3abc2468108.037.957.86MIN: 6.85 / MAX: 12.15MIN: 6.62 / MAX: 11.98MIN: 6.8 / MAX: 11.681. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: shufflenet-v2abc2468107.016.896.80MIN: 5.71 / MAX: 10.26MIN: 5.74 / MAX: 10.47MIN: 5.75 / MAX: 9.761. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mnasnetabc2468107.897.677.67MIN: 6.21 / MAX: 12.29MIN: 6.59 / MAX: 11.5MIN: 6.39 / MAX: 11.691. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: efficientnet-b0abc369121513.2513.1412.99MIN: 11.37 / MAX: 16.4MIN: 11.23 / MAX: 17.39MIN: 11.41 / MAX: 15.941. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: blazefaceabc1.01032.02063.03094.04125.05154.494.464.48MIN: 3.6 / MAX: 6.53MIN: 4.21 / MAX: 5MIN: 4.29 / MAX: 4.881. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: googlenetabc4812162014.4514.4814.44MIN: 13.51 / MAX: 20.97MIN: 13.55 / MAX: 18.16MIN: 13.47 / MAX: 17.851. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vgg16abc81624324036.8836.6336.89MIN: 33.67 / MAX: 42.46MIN: 33.11 / MAX: 43.2MIN: 33.19 / MAX: 43.261. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet18abc36912158.879.028.83MIN: 8.26 / MAX: 12.35MIN: 8.25 / MAX: 13.6MIN: 8.12 / MAX: 12.051. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: alexnetabc2468106.446.436.37MIN: 5.72 / MAX: 8.52MIN: 5.74 / MAX: 8.68MIN: 5.51 / MAX: 8.641. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet50abc51015202521.4021.2921.44MIN: 20.34 / MAX: 26.47MIN: 19.81 / MAX: 26.44MIN: 20.23 / MAX: 25.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3abc4812162016.5716.6916.79MIN: 14.41 / MAX: 20.36MIN: 14.89 / MAX: 19.78MIN: 14.77 / MAX: 21.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: yolov4-tinyabc51015202520.0320.0219.89MIN: 17.73 / MAX: 25.71MIN: 17.52 / MAX: 25.48MIN: 17.8 / MAX: 26.151. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: squeezenet_ssdabc369121513.1013.0813.05MIN: 12.01 / MAX: 17.22MIN: 11.88 / MAX: 16.69MIN: 11.97 / MAX: 16.691. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: regnety_400mabc102030405041.5341.5641.71MIN: 34.82 / MAX: 45.86MIN: 35.3 / MAX: 44.58MIN: 35.35 / MAX: 44.461. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vision_transformerabc306090120150147.62147.55148.25MIN: 116.91 / MAX: 177.85MIN: 116.1 / MAX: 182.72MIN: 117.01 / MAX: 183.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: FastestDetabc36912159.279.239.30MIN: 7.32 / MAX: 11.69MIN: 7.72 / MAX: 11.53MIN: 8.55 / MAX: 10.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128abc2468107.307.397.341. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512abc51015202520.8820.8720.781. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024abc51015202520.4820.6120.581. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048abc51015202519.9320.1519.861. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128abc2468107.517.577.531. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512abc51015202520.5220.7920.621. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024abc51015202520.3520.3920.381. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048abc51015202519.8519.8819.701. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128abc61218243022.9123.2523.331. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512abc2040608010088.0988.3189.191. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024abc2040608010082.0482.1782.011. (CXX) g++ options: -O3

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048abc2040608010075.5676.1375.561. (CXX) g++ options: -O3


Phoronix Test Suite v10.8.5