new tests 9655 Dec Benchmarks for a future article. AMD EPYC 9655P 96-Core testing with a Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS) and ASPEED on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412134-NE-NEWTESTS903&grs .
new tests 9655 Dec Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a b c AMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads) Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS) AMD 1Ah 12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF 3201GB Micron_7450_MTFDKCB3T2TFS ASPEED 2 x Broadcom NetXtreme BCM5720 PCIe Ubuntu 24.10 6.13.0-rc1-phx (x86_64) GNOME Shell 47.0 X Server GCC 14.2.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
new tests 9655 Dec llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 16 srsran: PUSCH Processor Benchmark, Throughput Thread llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16 llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 16 relion: Basic - CPU llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 128 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 128 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128 vvenc: Bosphorus 4K - Fast srsran: PDSCH Processor Benchmark, Throughput Total vvenc: Bosphorus 1080p - Fast srsran: PDSCH Processor Benchmark, Throughput Thread x265: Bosphorus 1080p x265: Bosphorus 4K vvenc: Bosphorus 4K - Faster llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 128 srsran: PUSCH Processor Benchmark, Throughput Total vvenc: Bosphorus 1080p - Faster a b c 5.91 317.1 38.72 86.83 65.15 137.746 80.69 52.75 122.87 10.653 60763.5 29.577 2424.2 118.41 41.46 23.259 14.43 8552.1 62.913 14.31 317.1 54.57 122.08 86.02 150.359 86.5 55.07 125.5 10.921 59350.2 29.913 2428.9 118.69 41.47 23.406 14.4 8590.2 62.999 14.28 158.7 54.49 119.14 79.73 151.654 85.05 55.16 126.93 10.908 59987.8 29.879 2449.8 119.56 41.13 23.429 14.33 8537.4 63.151 OpenBenchmarking.org
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 a b c 4 8 12 16 20 5.91 14.31 14.28
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Thread OpenBenchmarking.org Mbps, More Is Better srsRAN Project 24.10 Test: PUSCH Processor Benchmark, Throughput Thread a b c 70 140 210 280 350 317.1 317.1 158.7 1. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 a b c 12 24 36 48 60 38.72 54.57 54.49
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 a b c 30 60 90 120 150 86.83 122.08 119.14
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 a b c 20 40 60 80 100 65.15 86.02 79.73
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 5.0 Test: Basic - Device: CPU a b c 30 60 90 120 150 137.75 150.36 151.65 1. (CXX) g++ options: -fPIC -std=c++14 -fopenmp -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -ljpeg -lmpi_cxx -lmpi
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 a b c 20 40 60 80 100 80.69 86.50 85.05
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 a b c 12 24 36 48 60 52.75 55.07 55.16
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 a b c 30 60 90 120 150 122.87 125.50 126.93
VVenC Video Input: Bosphorus 4K - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.13 Video Input: Bosphorus 4K - Video Preset: Fast a b c 3 6 9 12 15 10.65 10.92 10.91 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
srsRAN Project Test: PDSCH Processor Benchmark, Throughput Total OpenBenchmarking.org Mbps, More Is Better srsRAN Project 24.10 Test: PDSCH Processor Benchmark, Throughput Total a b c 13K 26K 39K 52K 65K 60763.5 59350.2 59987.8 1. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl
VVenC Video Input: Bosphorus 1080p - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.13 Video Input: Bosphorus 1080p - Video Preset: Fast a b c 7 14 21 28 35 29.58 29.91 29.88 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
srsRAN Project Test: PDSCH Processor Benchmark, Throughput Thread OpenBenchmarking.org Mbps, More Is Better srsRAN Project 24.10 Test: PDSCH Processor Benchmark, Throughput Thread a b c 500 1000 1500 2000 2500 2424.2 2428.9 2449.8 1. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl
x265 Video Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better x265 4.1 Video Input: Bosphorus 1080p a b c 30 60 90 120 150 118.41 118.69 119.56 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
x265 Video Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better x265 4.1 Video Input: Bosphorus 4K a b c 9 18 27 36 45 41.46 41.47 41.13 1. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma
VVenC Video Input: Bosphorus 4K - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.13 Video Input: Bosphorus 4K - Video Preset: Faster a b c 6 12 18 24 30 23.26 23.41 23.43 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128 a b c 4 8 12 16 20 14.43 14.40 14.33
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Total OpenBenchmarking.org Mbps, More Is Better srsRAN Project 24.10 Test: PUSCH Processor Benchmark, Throughput Total a b c 2K 4K 6K 8K 10K 8552.1 8590.2 8537.4 1. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl
VVenC Video Input: Bosphorus 1080p - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.13 Video Input: Bosphorus 1080p - Video Preset: Faster a b c 14 28 42 56 70 62.91 63.00 63.15 1. (CXX) g++ options: -O3 -flto=auto -fno-fat-lto-objects
Phoronix Test Suite v10.8.5