llama AMD Ryzen 7 9800X3D 8-Core testing with a ASRock X870E Taichi (3.12.AS02 BIOS) and AMD Radeon PRO W7500 8GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412052-PTS-LLAMA11353&gru&sro .
llama Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB c AMD Ryzen 7 9800X3D 8-Core @ 6.22GHz (8 Cores / 16 Threads) ASRock X870E Taichi (3.12.AS02 BIOS) AMD Device 14d8 2 x 16GB DDR5-6000MT/s F5-6000J2836G16G Western Digital WD_BLACK SN850X 2000GB AMD Radeon PRO W7500 8GB AMD Navi 31 HDMI/DP DELL U2723QE Realtek Device 8126 + MEDIATEK Device 0717 Ubuntu 24.04 6.8.0-49-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.2.0-devel (LLVM 18.1.7 DRM 3.58) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xb404023 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
llama llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 16 llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 128 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 256 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 512 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 1024 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 2048 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 256 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 512 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 128 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 16 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 1024 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 2048 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 128 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 256 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 512 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 1024 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Prompt Processing 2048 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 256 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 512 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 1024 llamafile: wizardcoder-python-34b-v1.0.Q6_K - Prompt Processing 2048 a b AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB c 21.47 21.42 4096 8192 27.54 16384 32768 27.57 11.19 4096 8192 11.31 2.09 16384 32768 2.1 4096 8192 16384 32768 1536 3072 6144 12288 20.78 21.42 4096 8192 26.28 16384 32768 27.51 10.94 4096 8192 11.3 2 16384 32768 2.1 4096 8192 16384 32768 1536 3072 6144 12288 20.77 21.43 4096 8192 26.42 16384 32768 27.55 10.94 4096 8192 11.3 1.98 16384 32768 2.1 4096 8192 16384 32768 1536 3072 6144 12288 20.79 21.43 4096 8192 26.51 16384 32768 27.52 10.95 4096 8192 11.3 1.96 16384 32768 2.1 4096 8192 16384 32768 1536 3072 6144 12288 OpenBenchmarking.org
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 5 10 15 20 25 SE +/- 0.01, N = 3 20.77 21.47 20.78 20.79
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 5 10 15 20 25 SE +/- 0.00, N = 3 21.43 21.42 21.42 21.43
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 900 1800 2700 3600 4500 SE +/- 0.00, N = 3 4096 4096 4096 4096
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 2K 4K 6K 8K 10K SE +/- 0.00, N = 3 8192 8192 8192 8192
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 6 12 18 24 30 SE +/- 0.00, N = 3 26.42 27.54 26.28 26.51
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 4K 8K 12K 16K 20K SE +/- 0.00, N = 3 16384 16384 16384 16384
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 7K 14K 21K 28K 35K SE +/- 0.00, N = 3 32768 32768 32768 32768
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 6 12 18 24 30 SE +/- 0.02, N = 3 27.55 27.57 27.51 27.52
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 3 6 9 12 15 SE +/- 0.13, N = 3 10.94 11.19 10.94 10.95
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 900 1800 2700 3600 4500 SE +/- 0.00, N = 3 4096 4096 4096 4096
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 2K 4K 6K 8K 10K SE +/- 0.00, N = 3 8192 8192 8192 8192
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 3 6 9 12 15 SE +/- 0.00, N = 3 11.30 11.31 11.30 11.30
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 0.4703 0.9406 1.4109 1.8812 2.3515 SE +/- 0.02, N = 12 1.98 2.09 2.00 1.96
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 4K 8K 12K 16K 20K SE +/- 0.00, N = 3 16384 16384 16384 16384
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 7K 14K 21K 28K 35K SE +/- 0.00, N = 3 32768 32768 32768 32768
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 0.4725 0.945 1.4175 1.89 2.3625 SE +/- 0.00, N = 3 2.1 2.1 2.1 2.1
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 900 1800 2700 3600 4500 SE +/- 0.00, N = 3 4096 4096 4096 4096
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 2K 4K 6K 8K 10K SE +/- 0.00, N = 3 8192 8192 8192 8192
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 4K 8K 12K 16K 20K SE +/- 0.00, N = 3 16384 16384 16384 16384
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 7K 14K 21K 28K 35K SE +/- 0.00, N = 3 32768 32768 32768 32768
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 300 600 900 1200 1500 SE +/- 0.00, N = 3 1536 1536 1536 1536
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 700 1400 2100 2800 3500 SE +/- 0.00, N = 3 3072 3072 3072 3072
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 1300 2600 3900 5200 6500 SE +/- 0.00, N = 3 6144 6144 6144 6144
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048 AMD Ryzen 7 9800X3D 8-Core - AMD Radeon PRO W7500 8GB a b c 3K 6K 9K 12K 15K SE +/- 0.00, N = 3 12288 12288 12288 12288
Phoronix Test Suite v10.8.5