ncnn llama arm ARMv8 Neoverse-N1 testing with a System76 Thelio Astra (3.02 BIOS) and NVIDIA RTX A400/PCIe 4GB on Ubuntu 24.04 via the Phoronix Test Suite. a: Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: System76 Thelio Astra (3.02 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 8 x 32GB DDR4-3200MT/s Micron 18ASF4G72PDZ-3G2F1, Disk: 1024GB KINGSTON SKC3000S1024G, Graphics: NVIDIA RTX A400/PCIe 4GB, Audio: NVIDIA Device 2291, Monitor: DELL P2415Q, Network: 2 x Intel X550 + Intel I210 OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic-64k (aarch64), Desktop: GNOME Shell 46.0, Display Server: X Server, Display Driver: NVIDIA 550.120, OpenGL: 4.6.0, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 3840x2160 b: Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: System76 Thelio Astra (3.02 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 8 x 32GB DDR4-3200MT/s Micron 18ASF4G72PDZ-3G2F1, Disk: 1024GB KINGSTON SKC3000S1024G, Graphics: NVIDIA RTX A400/PCIe 4GB, Audio: NVIDIA Device 2291, Monitor: DELL P2415Q, Network: 2 x Intel X550 + Intel I210 OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic-64k (aarch64), Desktop: GNOME Shell 46.0, Display Server: X Server, Display Driver: NVIDIA 550.120, OpenGL: 4.6.0, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 3840x2160 c: Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: System76 Thelio Astra (3.02 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 8 x 32GB DDR4-3200MT/s Micron 18ASF4G72PDZ-3G2F1, Disk: 1024GB KINGSTON SKC3000S1024G, Graphics: NVIDIA RTX A400/PCIe 4GB, Audio: NVIDIA Device 2291, Monitor: DELL P2415Q, Network: 2 x Intel X550 + Intel I210 OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic-64k (aarch64), Desktop: GNOME Shell 46.0, Display Server: X Server, Display Driver: NVIDIA 550.120, OpenGL: 4.6.0, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 3840x2160 d: Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: System76 Thelio Astra (3.02 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 8 x 32GB DDR4-3200MT/s Micron 18ASF4G72PDZ-3G2F1, Disk: 1024GB KINGSTON SKC3000S1024G, Graphics: NVIDIA RTX A400/PCIe 4GB, Audio: NVIDIA Device 2291, Monitor: DELL P2415Q, Network: 2 x Intel X550 + Intel I210 OS: Ubuntu 24.04, Kernel: 6.8.0-48-generic-64k (aarch64), Desktop: GNOME Shell 46.0, Display Server: X Server, Display Driver: NVIDIA 550.120, OpenGL: 4.6.0, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 3840x2160 NCNN 20241226 Target: CPU - Model: mobilenet ms < Lower Is Better a . 132.32 |=================================================================== b . 132.57 |=================================================================== c . 133.30 |=================================================================== d . 132.77 |=================================================================== NCNN 20241226 Target: CPU-v2-v2 - Model: mobilenet-v2 ms < Lower Is Better a . 109.39 |================================================================== b . 110.23 |=================================================================== c . 109.86 |=================================================================== d . 109.62 |=================================================================== NCNN 20241226 Target: CPU-v3-v3 - Model: mobilenet-v3 ms < Lower Is Better a . 126.91 |=================================================================== b . 126.22 |=================================================================== c . 126.50 |=================================================================== d . 126.24 |=================================================================== NCNN 20241226 Target: CPU - Model: shufflenet-v2 ms < Lower Is Better a . 124.14 |=================================================================== b . 122.90 |================================================================== c . 123.54 |=================================================================== d . 122.95 |================================================================== NCNN 20241226 Target: CPU - Model: mnasnet ms < Lower Is Better a . 107.34 |=================================================================== b . 107.58 |=================================================================== c . 107.83 |=================================================================== d . 106.92 |================================================================== NCNN 20241226 Target: CPU - Model: efficientnet-b0 ms < Lower Is Better a . 180.13 |=================================================================== b . 180.43 |=================================================================== c . 180.77 |=================================================================== d . 179.82 |=================================================================== NCNN 20241226 Target: CPU - Model: blazeface ms < Lower Is Better a . 86.92 |==================================================================== b . 86.72 |==================================================================== c . 86.93 |==================================================================== d . 86.83 |==================================================================== NCNN 20241226 Target: CPU - Model: googlenet ms < Lower Is Better a . 167.09 |=================================================================== b . 166.67 |================================================================== c . 168.01 |=================================================================== d . 166.96 |=================================================================== NCNN 20241226 Target: CPU - Model: vgg16 ms < Lower Is Better a . 64.26 |==================================================================== b . 64.22 |==================================================================== c . 64.62 |==================================================================== d . 64.32 |==================================================================== NCNN 20241226 Target: CPU - Model: resnet18 ms < Lower Is Better a . 74.65 |==================================================================== b . 74.71 |==================================================================== c . 74.77 |==================================================================== d . 74.86 |==================================================================== NCNN 20241226 Target: CPU - Model: alexnet ms < Lower Is Better a . 37.61 |=================================================================== b . 38.03 |==================================================================== c . 37.77 |==================================================================== d . 37.96 |==================================================================== NCNN 20241226 Target: CPU - Model: resnet50 ms < Lower Is Better a . 152.92 |=================================================================== b . 153.31 |=================================================================== c . 153.04 |=================================================================== d . 152.99 |=================================================================== NCNN 20241226 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 ms < Lower Is Better a . 132.32 |=================================================================== b . 132.57 |=================================================================== c . 133.30 |=================================================================== d . 132.77 |=================================================================== NCNN 20241226 Target: CPU - Model: yolov4-tiny ms < Lower Is Better a . 80.31 |==================================================================== b . 80.15 |==================================================================== c . 80.63 |==================================================================== d . 80.07 |==================================================================== NCNN 20241226 Target: CPU - Model: squeezenet_ssd ms < Lower Is Better a . 197.60 |================================================================== b . 199.41 |=================================================================== c . 198.44 |=================================================================== d . 198.01 |=================================================================== NCNN 20241226 Target: CPU - Model: regnety_400m ms < Lower Is Better a . 881.00 |=================================================================== b . 879.25 |=================================================================== c . 880.92 |=================================================================== d . 882.32 |=================================================================== NCNN 20241226 Target: CPU - Model: vision_transformer ms < Lower Is Better a . 214.96 |=================================================================== b . 215.28 |=================================================================== c . 215.13 |=================================================================== d . 215.79 |=================================================================== NCNN 20241226 Target: CPU - Model: FastestDet ms < Lower Is Better a . 155.86 |=================================================================== b . 155.88 |=================================================================== c . 156.04 |=================================================================== d . 156.43 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: mobilenet ms < Lower Is Better a . 132.64 |=================================================================== b . 131.57 |================================================================== c . 132.63 |=================================================================== d . 131.96 |=================================================================== NCNN 20241226 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 ms < Lower Is Better a . 109.33 |================================================================== b . 110.25 |=================================================================== c . 109.94 |=================================================================== d . 110.18 |=================================================================== NCNN 20241226 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 ms < Lower Is Better a . 126.22 |=================================================================== b . 126.98 |=================================================================== c . 125.86 |================================================================== d . 126.02 |================================================================== NCNN 20241226 Target: Vulkan GPU - Model: shufflenet-v2 ms < Lower Is Better a . 122.92 |=================================================================== b . 123.67 |=================================================================== c . 123.34 |=================================================================== d . 123.07 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: mnasnet ms < Lower Is Better a . 107.35 |================================================================== b . 107.14 |================================================================== c . 108.18 |=================================================================== d . 107.43 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: efficientnet-b0 ms < Lower Is Better a . 180.53 |=================================================================== b . 180.89 |=================================================================== c . 180.51 |=================================================================== d . 180.31 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: blazeface ms < Lower Is Better a . 86.65 |==================================================================== b . 86.84 |==================================================================== c . 86.81 |==================================================================== d . 86.84 |==================================================================== NCNN 20241226 Target: Vulkan GPU - Model: googlenet ms < Lower Is Better a . 166.62 |================================================================== b . 166.82 |=================================================================== c . 167.89 |=================================================================== d . 166.97 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: vgg16 ms < Lower Is Better a . 64.74 |==================================================================== b . 64.66 |==================================================================== c . 63.93 |=================================================================== d . 64.54 |==================================================================== NCNN 20241226 Target: Vulkan GPU - Model: resnet18 ms < Lower Is Better a . 75.04 |==================================================================== b . 74.56 |==================================================================== c . 74.67 |==================================================================== d . 74.61 |==================================================================== NCNN 20241226 Target: Vulkan GPU - Model: alexnet ms < Lower Is Better a . 37.96 |==================================================================== b . 37.98 |==================================================================== c . 38.01 |==================================================================== d . 38.10 |==================================================================== NCNN 20241226 Target: Vulkan GPU - Model: resnet50 ms < Lower Is Better a . 153.16 |=================================================================== b . 152.86 |=================================================================== c . 153.40 |=================================================================== d . 153.57 |=================================================================== NCNN 20241226 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 ms < Lower Is Better a . 132.64 |=================================================================== b . 131.57 |================================================================== c . 132.63 |=================================================================== d . 131.96 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: yolov4-tiny ms < Lower Is Better a . 81.40 |==================================================================== b . 80.73 |=================================================================== c . 80.76 |=================================================================== d . 80.87 |==================================================================== NCNN 20241226 Target: Vulkan GPU - Model: squeezenet_ssd ms < Lower Is Better a . 198.87 |=================================================================== b . 199.52 |=================================================================== c . 198.24 |=================================================================== d . 198.08 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: regnety_400m ms < Lower Is Better a . 880.29 |=================================================================== b . 881.65 |=================================================================== c . 882.11 |=================================================================== d . 880.20 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: vision_transformer ms < Lower Is Better a . 214.77 |=================================================================== b . 215.84 |=================================================================== c . 215.40 |=================================================================== d . 214.85 |=================================================================== NCNN 20241226 Target: Vulkan GPU - Model: FastestDet ms < Lower Is Better a . 155.28 |================================================================== b . 156.44 |=================================================================== c . 156.25 |=================================================================== d . 156.59 |=================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 2.01 |===================================================================== b . 2.01 |===================================================================== c . 2.01 |===================================================================== d . 2.01 |===================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 70.53 |==================================================================== b . 69.70 |=================================================================== c . 69.11 |=================================================================== d . 70.01 |=================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 65.80 |==================================================================== b . 65.98 |==================================================================== c . 66.05 |==================================================================== d . 65.88 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 63.45 |==================================================================== b . 63.07 |==================================================================== c . 63.42 |==================================================================== d . 63.49 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 2.02 |===================================================================== b . 2.02 |===================================================================== c . 2.02 |===================================================================== d . 2.01 |===================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 70.55 |==================================================================== b . 70.70 |==================================================================== c . 67.77 |================================================================= d . 67.57 |================================================================= Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 65.84 |=================================================================== b . 66.00 |==================================================================== c . 66.03 |==================================================================== d . 66.33 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 63.46 |==================================================================== b . 63.62 |==================================================================== c . 63.21 |==================================================================== d . 63.65 |==================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 Tokens Per Second > Higher Is Better a . 1.23 |===================================================================== b . 1.23 |===================================================================== c . 1.23 |===================================================================== d . 1.23 |===================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 Tokens Per Second > Higher Is Better a . 146.54 |=================================================================== b . 144.15 |================================================================== c . 146.19 |=================================================================== d . 144.66 |================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 Tokens Per Second > Higher Is Better a . 142.75 |=================================================================== b . 142.81 |=================================================================== c . 142.21 |================================================================== d . 143.29 |=================================================================== Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 Tokens Per Second > Higher Is Better a . 132.64 |================================================================== b . 133.69 |=================================================================== c . 134.47 |=================================================================== d . 133.99 |===================================================================