Llamafile

Mozilla's Llamafile allows distributing and running large language models (LLMs) as a single file. Llamafile aims to make open-source LLMs more accessible to developers and users. Llamafile supports a variety of models, CPUs and GPUs, and other options.


Llamafile 0.8.16

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16

OpenBenchmarking.org metrics for this test profile configuration based on 36 public results since 5 December 2024 with the latest data as of 13 December 2024.

Below is an overview of the generalized performance for components where there is sufficient statistically significant data based upon user-uploaded results. It is important to keep in mind particularly in the Linux/open-source space there can be vastly different OS configurations, with this overview intended to offer just general guidance as to the performance expectations.

Component
Details
Percentile Rank
# Compatible Public Results
Tokens Per Second (Average)
Zen 5 [64 Cores / 128 Threads]
96th
3
64.6 +/- 0.4
Mid-Tier
75th
< 23.5
Zen 4 [64 Cores / 128 Threads]
68th
4
22.8 +/- 1.0
Zen 5 [16 Cores / 32 Threads]
57th
4
13.5
Zen 5 [10 Cores / 20 Threads]
51st
3
12.7 +/- 1.1
Median
50th
12.2
Zen 5 [8 Cores / 16 Threads]
38th
4
11.0 +/- 0.1
Zen 4 [8 Cores / 16 Threads]
29th
3
10.9 +/- 0.4
Zen 4 [12 Cores / 24 Threads]
26th
4
10.5
Low-Tier
25th
< 10.5
Zen 4 [16 Cores / 32 Threads]
10th
5
10.2