llama smoke

ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2411249-NE-LLAMASMOK60.

llama smokeProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionaARMv8 Neoverse-V2 @ 3.47GHz (72 Cores)Pegatron JIMBO P4352 (00022432 BIOS)1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC11000GB CT1000T700SSD3ASPEED2 x Intel X550Ubuntu 24.046.8.0-49-generic-64k (aarch64)GCC 13.2.0 + Clang 18.1.3 + CUDA 11.8ext41920x1200OpenBenchmarking.org- Transparent Huge Pages: madvise- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-dIwDw0/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v - Scaling Governor: cppc_cpufreq ondemand (Boost: Disabled)- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

llama smokellama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048a20.04121.12119.19106.0150.62126.98130.04131.3820.99122.57119.72106.83OpenBenchmarking.org

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128a510152025SE +/- 0.31, N = 1520.041. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa28433203492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa1.7106.9180.6OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor50100150200250

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.171.098.2OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 0.22, N = 3121.121. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa48733023492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa2.7198.5279.8OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor70140210280350

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.179.895.9OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.06, N = 3119.191. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa31934283492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa3.0220.7290.7OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor70140210280350

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.187.696.3OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048a20406080100SE +/- 0.04, N = 3106.011. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa76134613492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa3.7229.2286.2OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor70140210280350

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.191.996.9OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128a1122334455SE +/- 0.60, N = 450.621. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa65632703492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa3.488.1132.5OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor4080120160200

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.166.895.3OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 1.34, N = 4126.981. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa55633873492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa2.8146.3197.1OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor50100150200250

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.176.997.6OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.95, N = 3130.041. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa58934253492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa2.8162.5202.6OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor50100150200250

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.183.197.9OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048a306090120150SE +/- 0.52, N = 3131.381. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa201034743492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa3.0174.2203.0OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor50100150200250

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.188.798.2OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128a510152025SE +/- 0.03, N = 320.991. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa82733493492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa2.9100.3179.4OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor50100150200250

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.170.796.6OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 0.22, N = 3122.571. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa140434393492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa2.5202.6283.0OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor70140210280350

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.180.395.9OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.18, N = 3119.721. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa52134383492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa3.5222.8285.5OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor70140210280350

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.187.596.3OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048a20406080100SE +/- 0.16, N = 3106.831. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

CPU Peak Freq (Highest CPU Core Frequency) Monitor

MinAvgMaxa345634913492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

Llama.cpp

CPU Power Consumption Monitor

MinAvgMaxa3.9229.3283.6OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor70140210280350

Llama.cpp

CPU Usage (Summary) Monitor

MinAvgMaxa0.191.897.0OpenBenchmarking.orgPercent, Fewer Is BetterLlama.cpp b4154CPU Usage (Summary) Monitor20406080100


Phoronix Test Suite v10.8.5