llama smoke

ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2411249-NE-LLAMASMOK60
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
a
November 24
  39 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


llama smokeOpenBenchmarking.orgPhoronix Test SuiteARMv8 Neoverse-V2 @ 3.47GHz (72 Cores)Pegatron JIMBO P4352 (00022432 BIOS)1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC11000GB CT1000T700SSD3ASPEED2 x Intel X550Ubuntu 24.046.8.0-49-generic-64k (aarch64)GCC 13.2.0 + Clang 18.1.3 + CUDA 11.8ext41920x1200ProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionLlama Smoke BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-dIwDw0/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v - Scaling Governor: cppc_cpufreq ondemand (Boost: Disabled)- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

llama smokellama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048a20.04121.12119.19106.0150.62126.98130.04131.3820.99122.57119.72106.83OpenBenchmarking.org

Llama.cpp

MinAvgMaxa28433203492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa48733023492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa31934283492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa76134613492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa65632703492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa55633873492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa58934253492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa201034743492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa82733493492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa140434393492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa52134383492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

MinAvgMaxa345634913492OpenBenchmarking.orgMegahertz, More Is BetterLlama.cpp b4154CPU Peak Freq (Highest CPU Core Frequency) Monitor8001600240032004000

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128a510152025SE +/- 0.31, N = 1520.041. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 0.22, N = 3121.121. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.06, N = 3119.191. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048a20406080100SE +/- 0.04, N = 3106.011. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128a1122334455SE +/- 0.60, N = 450.621. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 1.34, N = 4126.981. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.95, N = 3130.041. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048a306090120150SE +/- 0.52, N = 3131.381. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128a510152025SE +/- 0.03, N = 320.991. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 0.22, N = 3122.571. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.18, N = 3119.721. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048a20406080100SE +/- 0.16, N = 3106.831. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas