nnn

ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and NVIDIA GH200 144G HBM3e 143GB on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412154-NE-NNN52010343&sor.

nnnProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.47GHz (72 Cores)Pegatron JIMBO P4352 (00022432 BIOS)1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC11000GB CT1000T700SSD3NVIDIA GH200 144G HBM3e 143GB2 x Intel X550Ubuntu 24.046.8.0-50-generic-64k (aarch64)NVIDIAGCC 13.3.0 + CUDA 12.4ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-Nz4ro4/gcc-13-13.3.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v Processor Details- Scaling Governor: cppc_cpufreq ondemand (Boost: Disabled)Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

nnnsrsran: PDSCH Processor Benchmark, Throughput Totalsrsran: PUSCH Processor Benchmark, Throughput Totalsrsran: PDSCH Processor Benchmark, Throughput Threadsrsran: PUSCH Processor Benchmark, Throughput Threadvvenc: Bosphorus 4K - Fastvvenc: Bosphorus 4K - Fastervvenc: Bosphorus 1080p - Fastvvenc: Bosphorus 1080p - Fasterx265: Bosphorus 4Kx265: Bosphorus 1080pllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048abcd12647.71760.1328.652.69.4318.68219.81836.79413.9120.7319.89122.47120.08106.4626.04122.27120.57107.8857.71128.95136.37136.5613307.71821.8464.548.19.36418.65119.70136.93413.8220.1221.04123.26120.15106.5422.55123.13120.86106.9750.70130.74138.20136.1513057.61834.6393.252.79.37418.42019.69636.97213.8919.9420.95123.10120.25106.7523.20123.28121.13107.5851.50129.97136.15138.7512934.91907.0389.250.69.39018.33619.62636.80313.6419.9021.27123.00120.26106.7722.40123.26120.76106.7652.89131.82139.04138.36OpenBenchmarking.org

srsRAN Project

Test: PDSCH Processor Benchmark, Throughput Total

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PDSCH Processor Benchmark, Throughput Totalbcda3K6K9K12K15KSE +/- 135.81, N = 3SE +/- 130.37, N = 3SE +/- 159.32, N = 413307.713057.612934.912647.71. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PUSCH Processor Benchmark, Throughput Total

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PUSCH Processor Benchmark, Throughput Totaldcba400800120016002000SE +/- 0.22, N = 3SE +/- 21.87, N = 12SE +/- 21.78, N = 121907.01834.61821.81760.11. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PDSCH Processor Benchmark, Throughput Thread

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PDSCH Processor Benchmark, Throughput Threadbcda100200300400500SE +/- 39.20, N = 12SE +/- 11.73, N = 12SE +/- 16.01, N = 12464.5393.2389.2328.61. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

srsRAN Project

Test: PUSCH Processor Benchmark, Throughput Thread

OpenBenchmarking.orgMbps, More Is BettersrsRAN Project 24.10Test: PUSCH Processor Benchmark, Throughput Threadcadb1224364860SE +/- 1.61, N = 15SE +/- 2.74, N = 15SE +/- 3.67, N = 1252.752.650.648.11. (CXX) g++ options: -O3 -march=armv8-a -mtune=generic -fno-trapping-math -fno-math-errno -ldl

VVenC

Video Input: Bosphorus 4K - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 4K - Video Preset: Fastadcb3691215SE +/- 0.016, N = 3SE +/- 0.001, N = 3SE +/- 0.013, N = 39.4309.3909.3749.3641. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

VVenC

Video Input: Bosphorus 4K - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 4K - Video Preset: Fasterabcd510152025SE +/- 0.06, N = 3SE +/- 0.06, N = 3SE +/- 0.11, N = 318.6818.6518.4218.341. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

VVenC

Video Input: Bosphorus 1080p - Video Preset: Fast

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 1080p - Video Preset: Fastabcd510152025SE +/- 0.09, N = 3SE +/- 0.07, N = 3SE +/- 0.17, N = 319.8219.7019.7019.631. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

VVenC

Video Input: Bosphorus 1080p - Video Preset: Faster

OpenBenchmarking.orgFrames Per Second, More Is BetterVVenC 1.13Video Input: Bosphorus 1080p - Video Preset: Fastercbda918273645SE +/- 0.48, N = 3SE +/- 0.13, N = 3SE +/- 0.24, N = 336.9736.9336.8036.791. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects

x265

Video Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 4.1Video Input: Bosphorus 4Kacbd48121620SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.08, N = 313.9113.8913.8213.641. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

x265

Video Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 4.1Video Input: Bosphorus 1080pabcd510152025SE +/- 0.24, N = 3SE +/- 0.14, N = 3SE +/- 0.13, N = 320.7320.1219.9419.901. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128dbca510152025SE +/- 0.35, N = 15SE +/- 0.44, N = 12SE +/- 0.32, N = 1521.2721.0420.9519.891. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512bcda306090120150SE +/- 0.14, N = 3SE +/- 0.24, N = 3SE +/- 0.20, N = 3123.26123.10123.00122.471. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024dcba306090120150SE +/- 0.30, N = 3SE +/- 0.20, N = 3SE +/- 0.17, N = 3120.26120.25120.15120.081. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048dcba20406080100SE +/- 0.30, N = 3SE +/- 0.21, N = 3SE +/- 0.20, N = 3106.77106.75106.54106.461. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128acbd612182430SE +/- 0.34, N = 15SE +/- 0.37, N = 15SE +/- 0.31, N = 1526.0423.2022.5522.401. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512cdba306090120150SE +/- 0.37, N = 3SE +/- 0.37, N = 3SE +/- 0.40, N = 3123.28123.26123.13122.271. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024cbda306090120150SE +/- 0.16, N = 3SE +/- 0.10, N = 3SE +/- 0.02, N = 3121.13120.86120.76120.571. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048acbd20406080100SE +/- 0.28, N = 3SE +/- 0.38, N = 3SE +/- 0.28, N = 3107.88107.58106.97106.761. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128adcb1326395265SE +/- 0.79, N = 12SE +/- 0.53, N = 15SE +/- 0.54, N = 1557.7152.8951.5050.701. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512dbca306090120150SE +/- 0.54, N = 3SE +/- 1.12, N = 8SE +/- 1.60, N = 3131.82130.74129.97128.951. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024dbac306090120150SE +/- 0.96, N = 3SE +/- 1.97, N = 3SE +/- 0.39, N = 3139.04138.20136.37136.151. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048cdab306090120150SE +/- 1.02, N = 3SE +/- 0.53, N = 3SE +/- 1.10, N = 3138.75138.36136.56136.151. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -mcpu=native -fopenmp -lopenblas


Phoronix Test Suite v10.8.5