nnn

ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and NVIDIA GH200 144G HBM3e 143GB on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412154-NE-NNN52010343&grr&sor.

nnnProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.47GHz (72 Cores)Pegatron JIMBO P4352 (00022432 BIOS)1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC11000GB CT1000T700SSD3NVIDIA GH200 144G HBM3e 143GB2 x Intel X550Ubuntu 24.046.8.0-50-generic-64k (aarch64)NVIDIAGCC 13.3.0 + CUDA 12.4ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-Nz4ro4/gcc-13-13.3.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v Processor Details- Scaling Governor: cppc_cpufreq ondemand (Boost: Disabled)Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

nnnsrsran: PUSCH Processor Benchmark, Throughput Totalsrsran: PUSCH Processor Benchmark, Throughput Threadllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048vvenc: Bosphorus 4K - Fastsrsran: PDSCH Processor Benchmark, Throughput Threadllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024x265: Bosphorus 4Kllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512vvenc: Bosphorus 4K - Fastersrsran: PDSCH Processor Benchmark, Throughput Totalvvenc: Bosphorus 1080p - Fastx265: Bosphorus 1080pllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512vvenc: Bosphorus 1080p - Fasterabcd1760.152.619.8926.04106.46107.88136.569.43328.657.71120.08120.57136.3713.91128.9518.68212647.719.81820.73122.47122.2736.7941821.848.121.0422.55106.54106.97136.159.364464.550.70120.15120.86138.2013.82130.7418.65113307.719.70120.12123.26123.1336.9341834.652.720.9523.20106.75107.58138.759.374393.251.50120.25121.13136.1513.89129.9718.42013057.619.69619.94123.10123.2836.9721907.050.621.2722.40106.77106.76138.369.390389.252.89120.26120.76139.0413.64131.8218.33612934.919.62619.90123.00123.2636.803OpenBenchmarking.org

srsRAN Project

Test: PUSCH Processor Benchmark, Throughput Total

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PUSCH Processor Benchmark, Throughput Totaldcba400800120016002000SE +/- 0.22, N = 3SE +/- 21.87, N = 12SE +/- 21.78, N = 121907.01834.61821.81760.11. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PUSCH Processor Benchmark, Throughput Thread

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PUSCH Processor Benchmark, Throughput Threadcadb1224364860SE +/- 1.61, N = 15SE +/- 2.74, N = 15SE +/- 3.67, N = 1252.752.650.648.11. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128dbca510152025SE +/- 0.35, N = 15SE +/- 0.44, N = 12SE +/- 0.32, N = 1521.2721.0420.9519.891. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128acbd612182430SE +/- 0.34, N = 15SE +/- 0.37, N = 15SE +/- 0.31, N = 1526.0423.2022.5522.401. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048dcba20406080100SE +/- 0.30, N = 3SE +/- 0.21, N = 3SE +/- 0.20, N = 3106.77106.75106.54106.461. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048acbd20406080100SE +/- 0.28, N = 3SE +/- 0.38, N = 3SE +/- 0.28, N = 3107.88107.58106.97106.761. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048cdab306090120150SE +/- 1.02, N = 3SE +/- 0.53, N = 3SE +/- 1.10, N = 3138.75138.36136.56136.151. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

VVenC

Video Input: Bosphorus 4K - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 4K - Video Preset: Fastadcb3691215SE +/- 0.016, N = 3SE +/- 0.001, N = 3SE +/- 0.013, N = 39.4309.3909.3749.3641. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

srsRAN Project

Test: PDSCH Processor Benchmark, Throughput Thread

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PDSCH Processor Benchmark, Throughput Threadbcda100200300400500SE +/- 39.20, N = 12SE +/- 11.73, N = 12SE +/- 16.01, N = 12464.5393.2389.2328.61. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128adcb1326395265SE +/- 0.79, N = 12SE +/- 0.53, N = 15SE +/- 0.54, N = 1557.7152.8951.5050.701. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024dcba306090120150SE +/- 0.30, N = 3SE +/- 0.20, N = 3SE +/- 0.17, N = 3120.26120.25120.15120.081. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024cbda306090120150SE +/- 0.16, N = 3SE +/- 0.10, N = 3SE +/- 0.02, N = 3121.13120.86120.76120.571. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024dbac306090120150SE +/- 0.96, N = 3SE +/- 1.97, N = 3SE +/- 0.39, N = 3139.04138.20136.37136.151. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

x265

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 4.1Video Input: Bosphorus 4Kacbd48121620SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.08, N = 313.9113.8913.8213.641. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512dbca306090120150SE +/- 0.54, N = 3SE +/- 1.12, N = 8SE +/- 1.60, N = 3131.82130.74129.97128.951. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

VVenC

Video Input: Bosphorus 4K - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 4K - Video Preset: Fasterabcd510152025SE +/- 0.06, N = 3SE +/- 0.06, N = 3SE +/- 0.11, N = 318.6818.6518.4218.341. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

srsRAN Project

Test: PDSCH Processor Benchmark, Throughput Total

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PDSCH Processor Benchmark, Throughput Totalbcda3K6K9K12K15KSE +/- 135.81, N = 3SE +/- 130.37, N = 3SE +/- 159.32, N = 413307.713057.612934.912647.71. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

VVenC

Video Input: Bosphorus 1080p - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 1080p - Video Preset: Fastabcd510152025SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.17, N = 319.8219.7019.7019.631. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

x265

Video Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 4.1Video Input: Bosphorus 1080pabcd510152025SE +/- 0.24, N = 3SE +/- 0.14, N = 3SE +/- 0.13, N = 320.7320.1219.9419.901. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512bcda306090120150SE +/- 0.14, N = 3SE +/- 0.24, N = 3SE +/- 0.20, N = 3123.26123.10123.00122.471. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512cdba306090120150SE +/- 0.37, N = 3SE +/- 0.37, N = 3SE +/- 0.40, N = 3123.28123.26123.13122.271. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

VVenC

Video Input: Bosphorus 1080p - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 1080p - Video Preset: Fastercbda918273645SE +/- 0.48, N = 3SE +/- 0.13, N = 3SE +/- 0.24, N = 336.9736.9336.8036.791. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects


Phoronix Test Suite v10.8.5