llama smoke

ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2411249-NE-LLAMASMOK60
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
a
November 24
  39 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


llama smokeOpenBenchmarking.orgPhoronix Test SuiteARMv8 Neoverse-V2 @ 3.47GHz (72 Cores)Pegatron JIMBO P4352 (00022432 BIOS)1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC11000GB CT1000T700SSD3ASPEED2 x Intel X550Ubuntu 24.046.8.0-49-generic-64k (aarch64)GCC 13.2.0 + Clang 18.1.3 + CUDA 11.8ext41920x1200ProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionLlama Smoke BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-dIwDw0/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v - Scaling Governor: cppc_cpufreq ondemand (Boost: Disabled)- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

llama smokellama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048a20.04121.12119.19106.0150.62126.98130.04131.3820.99122.57119.72106.83OpenBenchmarking.org

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128a510152025SE +/- 0.31, N = 1520.041. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 0.22, N = 3121.121. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.06, N = 3119.191. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048a20406080100SE +/- 0.04, N = 3106.011. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128a1122334455SE +/- 0.60, N = 450.621. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 1.34, N = 4126.981. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.95, N = 3130.041. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048a306090120150SE +/- 0.52, N = 3131.381. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128a510152025SE +/- 0.03, N = 320.991. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512a306090120150SE +/- 0.22, N = 3122.571. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024a306090120150SE +/- 0.18, N = 3119.721. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048a20406080100SE +/- 0.16, N = 3106.831. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas