9654 llama ncnn Tests for a future article. 2 x AMD EPYC 9654 96-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 24.10 via the Phoronix Test Suite. a: Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1007B BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 3201GB Micron_7450_MTFDKCB3T2TFS + 257GB Flash Drive, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 24.10, Kernel: 6.11.0-13-generic (x86_64), Desktop: GNOME Shell 47.0, Display Server: X Server, Compiler: GCC 14.2.0, File-System: ext4, Screen Resolution: 1920x1200 b: Processor: 2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads), Motherboard: AMD Titanite_4G (RTI1007B BIOS), Chipset: AMD Device 14a4, Memory: 1520GB, Disk: 3201GB Micron_7450_MTFDKCB3T2TFS + 257GB Flash Drive, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 24.10, Kernel: 6.11.0-13-generic (x86_64), Desktop: GNOME Shell 47.0, Display Server: X Server, Compiler: GCC 14.2.0, File-System: ext4, Screen Resolution: 1920x1200 NCNN 20241226 Target: CPU - Model: mobilenet ms < Lower Is Better a . 34.84 |=============================================================== b . 37.56 |==================================================================== NCNN 20241226 Target: CPU-v2-v2 - Model: mobilenet-v2 ms < Lower Is Better a . 24.59 |================================================================== b . 25.39 |==================================================================== NCNN 20241226 Target: CPU-v3-v3 - Model: mobilenet-v3 ms < Lower Is Better a . 31.68 |==================================================================== b . 27.30 |=========================================================== NCNN 20241226 Target: CPU - Model: shufflenet-v2 ms < Lower Is Better a . 30.69 |=================================================================== b . 31.22 |==================================================================== NCNN 20241226 Target: CPU - Model: mnasnet ms < Lower Is Better a . 22.36 |============================================================== b . 24.67 |==================================================================== NCNN 20241226 Target: CPU - Model: efficientnet-b0 ms < Lower Is Better a . 35.79 |================================================================== b . 36.82 |==================================================================== NCNN 20241226 Target: CPU - Model: blazeface ms < Lower Is Better a . 16.36 |==================================================================== b . 16.32 |==================================================================== NCNN 20241226 Target: CPU - Model: googlenet ms < Lower Is Better a . 40.06 |============================================================= b . 44.60 |==================================================================== NCNN 20241226 Target: CPU - Model: vgg16 ms < Lower Is Better a . 60.49 |============================================================= b . 67.76 |==================================================================== NCNN 20241226 Target: CPU - Model: resnet18 ms < Lower Is Better a . 25.47 |============================================================== b . 27.73 |==================================================================== NCNN 20241226 Target: CPU - Model: alexnet ms < Lower Is Better a . 10.56 |=============================================================== b . 11.32 |==================================================================== NCNN 20241226 Target: CPU - Model: resnet50 ms < Lower Is Better a . 43.40 |=============================================================== b . 46.85 |==================================================================== NCNN 20241226 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 ms < Lower Is Better a . 34.84 |=============================================================== b . 37.56 |==================================================================== NCNN 20241226 Target: CPU - Model: yolov4-tiny ms < Lower Is Better a . 44.00 |=============================================================== b . 47.74 |==================================================================== NCNN 20241226 Target: CPU - Model: squeezenet_ssd ms < Lower Is Better a . 51.30 |================================================================= b . 53.55 |==================================================================== NCNN 20241226 Target: CPU - Model: regnety_400m ms < Lower Is Better a . 142.51 |=================================================================== b . 141.36 |================================================================== NCNN 20241226 Target: CPU - Model: vision_transformer ms < Lower Is Better a . 71.80 |==================================================================== b . 71.49 |==================================================================== NCNN 20241226 Target: CPU - Model: FastestDet ms < Lower Is Better a . 35.47 |==================================================================== b . 35.68 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 20.74 |==================================================================== b . 20.55 |=================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 80.26 |==================================================================== b . 79.31 |=================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 75.61 |==================================================================== b . 75.72 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 74.75 |================================================================ b . 78.96 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 21.54 |==================================================================== b . 21.41 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 75.15 |================================================================= b . 78.25 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 79.86 |==================================================================== b . 78.85 |=================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 77.23 |================================================================== b . 79.17 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 35.60 |=================================================================== b . 36.04 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 168.94 |=================================================================== b . 169.42 |=================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 166.49 |=================================================================== b . 164.59 |================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 162.34 |=================================================================== b . 161.38 |===================================================================