llama cpp update
AMD Ryzen Threadripper 7980X 64-Cores testing with a System76 Thelio Major (FA Z5 BIOS) and AMD Radeon RX 6700 XT 12GB on Ubuntu 24.04 via the Phoronix Test Suite.


a: 

	Processor: AMD Ryzen Threadripper 7980X 64-Cores @ 7.79GHz (64 Cores / 128 Threads), Motherboard: System76 Thelio Major (FA Z5 BIOS), Chipset: AMD Device 14a4, Memory: 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2, Disk: 1000GB CT1000T700SSD5, Graphics: AMD Radeon RX 6700 XT 12GB, Audio: AMD Device 14cc, Monitor: DELL P2415Q, Network: Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E

	OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic (x86_64), Desktop: GNOME Shell 46.0, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 24.0.9-0ubuntu0.2 (LLVM 17.0.6 DRM 3.57), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200

b: 

	Processor: AMD Ryzen Threadripper 7980X 64-Cores @ 7.79GHz (64 Cores / 128 Threads), Motherboard: System76 Thelio Major (FA Z5 BIOS), Chipset: AMD Device 14a4, Memory: 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2, Disk: 1000GB CT1000T700SSD5, Graphics: AMD Radeon RX 6700 XT 12GB, Audio: AMD Device 14cc, Monitor: DELL P2415Q, Network: Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E

	OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic (x86_64), Desktop: GNOME Shell 46.0, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 24.0.9-0ubuntu0.2 (LLVM 17.0.6 DRM 3.57), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200

c: 

	Processor: AMD Ryzen Threadripper 7980X 64-Cores @ 7.79GHz (64 Cores / 128 Threads), Motherboard: System76 Thelio Major (FA Z5 BIOS), Chipset: AMD Device 14a4, Memory: 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2, Disk: 1000GB CT1000T700SSD5, Graphics: AMD Radeon RX 6700 XT 12GB, Audio: AMD Device 14cc, Monitor: DELL P2415Q, Network: Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E

	OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic (x86_64), Desktop: GNOME Shell 46.0, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 24.0.9-0ubuntu0.2 (LLVM 17.0.6 DRM 3.57), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200


Llama.cpp b4154
Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128
Tokens Per Second > Higher Is Better
a . 15.54 |====================================================================
b . 15.50 |====================================================================
c . 15.60 |====================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512
Tokens Per Second > Higher Is Better
a . 102.05 |=================================================================
b . 103.50 |==================================================================
c . 104.43 |===================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024
Tokens Per Second > Higher Is Better
a . 123.08 |==================================================================
b . 124.40 |===================================================================
c . 122.97 |==================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048
Tokens Per Second > Higher Is Better
a . 154.68 |===================================================================
b . 153.21 |==================================================================
c . 153.51 |==================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128
Tokens Per Second > Higher Is Better
a . 16.46 |====================================================================
b . 16.52 |====================================================================
c . 16.45 |====================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512
Tokens Per Second > Higher Is Better
a . 103.99 |==================================================================
b . 105.14 |===================================================================
c . 103.32 |==================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024
Tokens Per Second > Higher Is Better
a . 126.34 |=================================================================
b . 129.31 |===================================================================
c . 126.69 |==================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048
Tokens Per Second > Higher Is Better
a . 149.47 |=================================================================
b . 149.65 |=================================================================
c . 154.26 |===================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128
Tokens Per Second > Higher Is Better
a . 76.34 |====================================================================
b . 76.53 |====================================================================
c . 76.56 |====================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512
Tokens Per Second > Higher Is Better
a . 249.15 |===================================================================
b . 250.07 |===================================================================
c . 250.38 |===================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024
Tokens Per Second > Higher Is Better
a . 328.55 |================================================================
b . 329.89 |=================================================================
c . 341.69 |===================================================================


Llama.cpp b4154
Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048
Tokens Per Second > Higher Is Better
a . 418.96 |===================================================================
b . 405.82 |=================================================================
c . 406.84 |=================================================================