llama cpp grace ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite. a: Processor: ARMv8 Neoverse-V2 @ 3.47GHz (72 Cores), Motherboard: Pegatron JIMBO P4352 (00022432 BIOS), Memory: 1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC1, Disk: 1000GB CT1000T700SSD3, Graphics: ASPEED, Network: 2 x Intel X550 OS: Ubuntu 24.04, Kernel: 6.8.0-49-generic-64k (aarch64), Compiler: GCC 13.2.0 + Clang 18.1.3 + CUDA 11.8, File-System: ext4, Screen Resolution: 1920x1200 b: Processor: ARMv8 Neoverse-V2 @ 3.47GHz (72 Cores), Motherboard: Pegatron JIMBO P4352 (00022432 BIOS), Memory: 1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC1, Disk: 1000GB CT1000T700SSD3, Graphics: ASPEED, Network: 2 x Intel X550 OS: Ubuntu 24.04, Kernel: 6.8.0-49-generic-64k (aarch64), Compiler: GCC 13.2.0 + Clang 18.1.3 + CUDA 11.8, File-System: ext4, Screen Resolution: 1920x1200 c: Processor: ARMv8 Neoverse-V2 @ 3.47GHz (72 Cores), Motherboard: Pegatron JIMBO P4352 (00022432 BIOS), Memory: 1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC1, Disk: 1000GB CT1000T700SSD3, Graphics: ASPEED, Network: 2 x Intel X550 OS: Ubuntu 24.04, Kernel: 6.8.0-49-generic-64k (aarch64), Compiler: GCC 13.2.0 + Clang 18.1.3 + CUDA 11.8, File-System: ext4, Screen Resolution: 1920x1200 d: Processor: ARMv8 Neoverse-V2 @ 3.47GHz (72 Cores), Motherboard: Pegatron JIMBO P4352 (00022432 BIOS), Memory: 1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC1, Disk: 1000GB CT1000T700SSD3, Graphics: ASPEED, Network: 2 x Intel X550 OS: Ubuntu 24.04, Kernel: 6.8.0-49-generic-64k (aarch64), Compiler: GCC 13.2.0 + Clang 18.1.3 + CUDA 11.8, File-System: ext4, Screen Resolution: 1920x1200 Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 20.07 |================================================================== b . 20.56 |==================================================================== c . 20.70 |==================================================================== d . 18.24 |============================================================ Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 121.74 |=================================================================== b . 121.76 |=================================================================== c . 121.85 |=================================================================== d . 120.71 |================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 118.77 |=================================================================== b . 118.88 |=================================================================== c . 119.02 |=================================================================== d . 118.98 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 105.71 |=================================================================== b . 105.56 |=================================================================== c . 105.75 |=================================================================== d . 106.10 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 21.48 |=================================================================== b . 21.78 |==================================================================== c . 20.06 |=============================================================== d . 19.48 |============================================================= Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 122.12 |=================================================================== b . 122.28 |=================================================================== c . 122.54 |=================================================================== d . 121.45 |================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 119.72 |=================================================================== b . 119.92 |=================================================================== c . 119.62 |=================================================================== d . 119.36 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 107.00 |=================================================================== b . 106.65 |=================================================================== c . 106.90 |=================================================================== d . 106.49 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 50.75 |==================================================================== b . 50.87 |==================================================================== c . 51.00 |==================================================================== d . 49.91 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 123.29 |=============================================================== b . 123.69 |=============================================================== c . 123.66 |=============================================================== d . 131.19 |=================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 132.61 |=================================================================== b . 132.75 |=================================================================== c . 128.63 |================================================================= d . 131.74 |================================================================== Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 131.81 |================================================================== b . 134.42 |=================================================================== c . 130.52 |================================================================= d . 133.64 |===================================================================