NVIDIA LLAMA.CPP Tests for a future article. Intel Core Ultra 9 285K testing with a ASUS ROG MAXIMUS Z890 HERO (1203 BIOS) and ASUS NVIDIA GeForce RTX 5090 32GB on Ubuntu 24.10 via the Phoronix Test Suite. RTX 5090: Processor: Intel Core Ultra 9 285K @ 5.10GHz (24 Cores), Motherboard: ASUS ROG MAXIMUS Z890 HERO (1203 BIOS), Chipset: Intel Device ae7f, Memory: 2 x 16GB DDR5-6400MT/s Micron CP16G64C38U5B.M8D1, Disk: 4001GB Western Digital WD_BLACK SN850X 4000GB + 1000GB Western Digital WDS100T1X0E-00AFY0, Graphics: ASUS NVIDIA GeForce RTX 5090 32GB, Audio: Intel Device 7f50, Monitor: ASUS VP28U, Network: Realtek Device 8126 + Intel I226-V + Intel Wi-Fi 7 OS: Ubuntu 24.10, Kernel: 6.11.0-13-generic (x86_64), Desktop: GNOME Shell 47.0, Display Server: X Server 1.21.1.13, Display Driver: NVIDIA 570.86.10, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.8.51 + OpenCL 3.0, Compiler: GCC 14.2.0 + CUDA 12.8, File-System: ext4, Screen Resolution: 3840x2160 Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better RTX 5090 . 158.57 |============================================================ Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better RTX 5090 . 12730.90 |========================================================== Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better RTX 5090 . 12362.00 |========================================================== Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better RTX 5090 . 11198.06 |========================================================== Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better RTX 5090 . 166.28 |============================================================ Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better RTX 5090 . 12859.57 |========================================================== Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better RTX 5090 . 12357.95 |========================================================== Llama.cpp b4397 Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better RTX 5090 . 11234.62 |========================================================== Llama.cpp b4397 Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better RTX 5090 . 107.50 |============================================================ Llama.cpp b4397 Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better RTX 5090 . 4166.56 |=========================================================== Llama.cpp b4397 Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better RTX 5090 . 4150.99 |=========================================================== Llama.cpp b4397 Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better RTX 5090 . 4099.51 |===========================================================