llama cpp update AMD Ryzen Threadripper 7980X 64-Cores testing with a System76 Thelio Major (FA Z5 BIOS) and AMD Radeon RX 6700 XT 12GB on Ubuntu 24.04 via the Phoronix Test Suite. a: Processor: AMD Ryzen Threadripper 7980X 64-Cores @ 7.79GHz (64 Cores / 128 Threads), Motherboard: System76 Thelio Major (FA Z5 BIOS), Chipset: AMD Device 14a4, Memory: 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2, Disk: 1000GB CT1000T700SSD5, Graphics: AMD Radeon RX 6700 XT 12GB, Audio: AMD Device 14cc, Monitor: DELL P2415Q, Network: Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic (x86_64), Desktop: GNOME Shell 46.0, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 24.0.9-0ubuntu0.2 (LLVM 17.0.6 DRM 3.57), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 b: Processor: AMD Ryzen Threadripper 7980X 64-Cores @ 7.79GHz (64 Cores / 128 Threads), Motherboard: System76 Thelio Major (FA Z5 BIOS), Chipset: AMD Device 14a4, Memory: 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2, Disk: 1000GB CT1000T700SSD5, Graphics: AMD Radeon RX 6700 XT 12GB, Audio: AMD Device 14cc, Monitor: DELL P2415Q, Network: Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic (x86_64), Desktop: GNOME Shell 46.0, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 24.0.9-0ubuntu0.2 (LLVM 17.0.6 DRM 3.57), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 c: Processor: AMD Ryzen Threadripper 7980X 64-Cores @ 7.79GHz (64 Cores / 128 Threads), Motherboard: System76 Thelio Major (FA Z5 BIOS), Chipset: AMD Device 14a4, Memory: 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2, Disk: 1000GB CT1000T700SSD5, Graphics: AMD Radeon RX 6700 XT 12GB, Audio: AMD Device 14cc, Monitor: DELL P2415Q, Network: Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic (x86_64), Desktop: GNOME Shell 46.0, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 24.0.9-0ubuntu0.2 (LLVM 17.0.6 DRM 3.57), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 15.54 |==================================================================== b . 15.50 |==================================================================== c . 15.60 |==================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 102.05 |================================================================= b . 103.50 |================================================================== c . 104.43 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 123.08 |================================================================== b . 124.40 |=================================================================== c . 122.97 |================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 154.68 |=================================================================== b . 153.21 |================================================================== c . 153.51 |================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 16.46 |==================================================================== b . 16.52 |==================================================================== c . 16.45 |==================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 103.99 |================================================================== b . 105.14 |=================================================================== c . 103.32 |================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 126.34 |================================================================= b . 129.31 |=================================================================== c . 126.69 |================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 149.47 |================================================================= b . 149.65 |================================================================= c . 154.26 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 76.34 |==================================================================== b . 76.53 |==================================================================== c . 76.56 |==================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 249.15 |=================================================================== b . 250.07 |=================================================================== c . 250.38 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 328.55 |================================================================ b . 329.89 |================================================================= c . 341.69 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 418.96 |=================================================================== b . 405.82 |================================================================= c . 406.84 |=================================================================