nnn

ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and NVIDIA GH200 144G HBM3e 143GB on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412154-NE-NNN52010343&grs.

nnnProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.47GHz (72 Cores)Pegatron JIMBO P4352 (00022432 BIOS)1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC11000GB CT1000T700SSD3NVIDIA GH200 144G HBM3e 143GB2 x Intel X550Ubuntu 24.046.8.0-50-generic-64k (aarch64)NVIDIAGCC 13.3.0 + CUDA 12.4ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-Nz4ro4/gcc-13-13.3.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v Processor Details- Scaling Governor: cppc_cpufreq ondemand (Boost: Disabled)Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

nnnllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128srsran: PUSCH Processor Benchmark, Throughput Totalsrsran: PDSCH Processor Benchmark, Throughput Totalx265: Bosphorus 1080pllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024x265: Bosphorus 4Kllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048vvenc: Bosphorus 4K - Fasterllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048vvenc: Bosphorus 1080p - Fastllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512vvenc: Bosphorus 4K - Fastllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512vvenc: Bosphorus 1080p - Fasterllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128srsran: PUSCH Processor Benchmark, Throughput Threadsrsran: PDSCH Processor Benchmark, Throughput Threadabcd57.711760.112647.720.73128.95136.3713.91136.5618.682107.8819.818122.279.43122.4736.794120.57106.46120.0826.0419.8952.6328.650.701821.813307.720.12130.74138.2013.82136.1518.651106.9719.701123.139.364123.2636.934120.86106.54120.1522.5521.0448.1464.551.501834.613057.619.94129.97136.1513.89138.7518.420107.5819.696123.289.374123.1036.972121.13106.75120.2523.2020.9552.7393.252.891907.012934.919.90131.82139.0413.64138.3618.336106.7619.626123.269.390123.0036.803120.76106.77120.2622.4021.2750.6389.2OpenBenchmarking.org

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128abcd1326395265SE +/- 0.54, N = 15SE +/- 0.53, N = 15SE +/- 0.79, N = 1257.7150.7051.5052.891. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

srsRAN Project

Test: PUSCH Processor Benchmark, Throughput Total

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PUSCH Processor Benchmark, Throughput Totalabcd400800120016002000SE +/- 21.78, N = 12SE +/- 21.87, N = 12SE +/- 0.22, N = 31760.11821.81834.61907.01. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PDSCH Processor Benchmark, Throughput Total

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PDSCH Processor Benchmark, Throughput Totalabcd3K6K9K12K15KSE +/- 135.81, N = 3SE +/- 130.37, N = 3SE +/- 159.32, N = 412647.713307.713057.612934.91. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

x265

Video Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 4.1Video Input: Bosphorus 1080pabcd510152025SE +/- 0.24, N = 3SE +/- 0.14, N = 3SE +/- 0.13, N = 320.7320.1219.9419.901. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512abcd306090120150SE +/- 1.12, N = 8SE +/- 1.60, N = 3SE +/- 0.54, N = 3128.95130.74129.97131.821. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024abcd306090120150SE +/- 1.97, N = 3SE +/- 0.39, N = 3SE +/- 0.96, N = 3136.37138.20136.15139.041. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

x265

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 4.1Video Input: Bosphorus 4Kabcd48121620SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 0.08, N = 313.9113.8213.8913.641. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048abcd306090120150SE +/- 1.10, N = 3SE +/- 1.02, N = 3SE +/- 0.53, N = 3136.56136.15138.75138.361. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

VVenC

Video Input: Bosphorus 4K - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 4K - Video Preset: Fasterabcd510152025SE +/- 0.06, N = 3SE +/- 0.06, N = 3SE +/- 0.11, N = 318.6818.6518.4218.341. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048abcd20406080100SE +/- 0.38, N = 3SE +/- 0.28, N = 3SE +/- 0.28, N = 3107.88106.97107.58106.761. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

VVenC

Video Input: Bosphorus 1080p - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 1080p - Video Preset: Fastabcd510152025SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.17, N = 319.8219.7019.7019.631. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512abcd306090120150SE +/- 0.40, N = 3SE +/- 0.37, N = 3SE +/- 0.37, N = 3122.27123.13123.28123.261. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

VVenC

Video Input: Bosphorus 4K - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 4K - Video Preset: Fastabcd3691215SE +/- 0.013, N = 3SE +/- 0.001, N = 3SE +/- 0.016, N = 39.4309.3649.3749.3901. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512abcd306090120150SE +/- 0.14, N = 3SE +/- 0.24, N = 3SE +/- 0.20, N = 3122.47123.26123.10123.001. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

VVenC

Video Input: Bosphorus 1080p - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 1080p - Video Preset: Fasterabcd918273645SE +/- 0.13, N = 3SE +/- 0.48, N = 3SE +/- 0.24, N = 336.7936.9336.9736.801. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024abcd306090120150SE +/- 0.10, N = 3SE +/- 0.16, N = 3SE +/- 0.02, N = 3120.57120.86121.13120.761. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048abcd20406080100SE +/- 0.20, N = 3SE +/- 0.21, N = 3SE +/- 0.30, N = 3106.46106.54106.75106.771. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024abcd306090120150SE +/- 0.17, N = 3SE +/- 0.20, N = 3SE +/- 0.30, N = 3120.08120.15120.25120.261. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128abcd612182430SE +/- 0.37, N = 15SE +/- 0.34, N = 15SE +/- 0.31, N = 1526.0422.5523.2022.401. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128abcd510152025SE +/- 0.44, N = 12SE +/- 0.32, N = 15SE +/- 0.35, N = 1519.8921.0420.9521.271. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

srsRAN Project

Test: PUSCH Processor Benchmark, Throughput Thread

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PUSCH Processor Benchmark, Throughput Threadabcd1224364860SE +/- 3.67, N = 12SE +/- 1.61, N = 15SE +/- 2.74, N = 1552.648.152.750.61. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PDSCH Processor Benchmark, Throughput Thread

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PDSCH Processor Benchmark, Throughput Threadabcd100200300400500SE +/- 39.20, N = 12SE +/- 11.73, N = 12SE +/- 16.01, N = 12328.6464.5393.2389.21. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl


Phoronix Test Suite v10.8.5