aorus-llama-cpp AMD Ryzen 9 9950X 16-Core testing with a Gigabyte X870 AORUS ELITE WIFI7 (F3h BIOS) and Gigabyte NVIDIA GeForce RTX 4090 24GB on openSUSE Leap 15.6 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2502038-NE-AORUSLLAM92 .
aorus-llama-cpp Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Display Server Display Driver Compiler File-System Screen Resolution AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce AMD Ryzen 9 9950X 16-Core @ 4.30GHz (16 Cores / 32 Threads) Gigabyte X870 AORUS ELITE WIFI7 (F3h BIOS) AMD Raphael/Granite 4 x 32 GB DDR5-3600MT/s CMH64GX5M2B6000C38 1000GB Samsung SSD 970 EVO Plus 1TB + 2 x 2000GB Samsung SSD 870 Gigabyte NVIDIA GeForce RTX 4090 24GB NVIDIA AD102 HD Audio SyncMaster Realtek RTL8125 2.5GbE + MEDIATEK Device 7925 openSUSE Leap 15.6 6.4.0-150600.23.30-default (x86_64) X Server 1.21.1.11 NVIDIA GCC 11.3.0 + CUDA 12.6 btrfs 1280x1024 OpenBenchmarking.org - Transparent Huge Pages: always - --build=x86_64-suse-linux --disable-libcc1 --disable-libssp --disable-libstdcxx-pch --disable-libvtv --disable-plugin --disable-werror --enable-cet=auto --enable-checking=release --enable-gnu-indirect-function --enable-languages=c,c++,objc,fortran,obj-c++,ada,go,d --enable-libphobos --enable-libstdcxx-allocator=new --enable-linux-futex --enable-multilib --enable-offload-targets=nvptx-none, --enable-ssp --enable-version-specific-runtime-libs --host=x86_64-suse-linux --mandir=/usr/share/man --with-arch-32=x86-64 --with-gcc-major-version-only --with-slibdir=/lib64 --with-tune=generic --without-cuda-driver --without-system-libunwind - Scaling Governor: acpi-cpufreq ondemand (Boost: Enabled) - CPU Microcode: 0xb404023 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Vulnerable + spectre_v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers + spectre_v2: Vulnerable; IBPB: disabled; STIBP: disabled; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
aorus-llama-cpp llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 llama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: NVIDIA CUDA - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: NVIDIA CUDA - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llama-cpp: NVIDIA CUDA - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 5.7 29.63 31.09 31.85 100.83 6.02 11536.45 10755.70 9540.82 29.62 30.91 32.02 105.74 42.74 11646.75 10797.67 9555.98 48.71 51.51 52.34 137.78 4793.60 4752.53 4573.06 OpenBenchmarking.org
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 1.2825 2.565 3.8475 5.13 6.4125 SE +/- 0.00, N = 3 5.7 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 7 14 21 28 35 SE +/- 0.06, N = 3 29.63 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 7 14 21 28 35 SE +/- 0.06, N = 3 31.09 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 7 14 21 28 35 SE +/- 0.13, N = 3 31.85 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 20 40 60 80 100 SE +/- 0.02, N = 3 100.83 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 2 4 6 8 10 SE +/- 0.00, N = 3 6.02 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 2K 4K 6K 8K 10K SE +/- 4.04, N = 3 11536.45 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 2K 4K 6K 8K 10K SE +/- 9.57, N = 3 10755.70 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 2K 4K 6K 8K 10K SE +/- 2.82, N = 3 9540.82 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 7 14 21 28 35 SE +/- 0.06, N = 3 29.62 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 7 14 21 28 35 SE +/- 0.09, N = 3 30.91 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 7 14 21 28 35 SE +/- 0.14, N = 3 32.02 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 20 40 60 80 100 SE +/- 0.04, N = 3 105.74 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 10 20 30 40 50 SE +/- 0.09, N = 3 42.74 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 2K 4K 6K 8K 10K SE +/- 1.40, N = 3 11646.75 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 2K 4K 6K 8K 10K SE +/- 4.21, N = 3 10797.67 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 2K 4K 6K 8K 10K SE +/- 1.50, N = 3 9555.98 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 11 22 33 44 55 SE +/- 0.27, N = 3 48.71 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 12 24 36 48 60 SE +/- 0.01, N = 3 51.51 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 12 24 36 48 60 SE +/- 0.13, N = 3 52.34 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 30 60 90 120 150 SE +/- 0.59, N = 3 137.78 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 1000 2000 3000 4000 5000 SE +/- 9.24, N = 3 4793.60 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 1000 2000 3000 4000 5000 SE +/- 11.92, N = 3 4752.53 1. (CXX) g++ options: -O3
Llama.cpp Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 AMD Ryzen 9 9950X 16-Core - Gigabyte NVIDIA GeForce 1000 2000 3000 4000 5000 SE +/- 11.17, N = 3 4573.06 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5