feb 9950X AMD Ryzen 9 9950X 16-Core testing with a ASRock X870E Taichi (3.12.AS02 BIOS) and XFX AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2502104-PTS-FEB9950X21&grr&sor .
feb 9950X Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen 9 9950X 16-Core @ 5.75GHz (16 Cores / 32 Threads) ASRock X870E Taichi (3.12.AS02 BIOS) AMD Device 14d8 2 x 16GB DDR5-6000MT/s F5-6000J2836G16G Western Digital WD_BLACK SN850X 2000GB XFX AMD Radeon RX 7900 XTX 24GB AMD Navi 31 HDMI/DP DELL U2723QE Realtek Device 8126 + MEDIATEK Device 0717 Ubuntu 24.04 6.12.3-061203-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.2.0-devel (LLVM 18.1.7 DRM 3.59) GCC 13.3.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - CPU Microcode: 0xb404023 Python Details - Python 2.7.16 + Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
feb 9950X llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 qmcpack: O_ae_pyscf_UHF qmcpack: Li2_STO_ae llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 qmcpack: FeCO6_b3lyp_gms qmcpack: LiH_ae_MSD llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 liquid-dsp: 32 - 256 - 512 liquid-dsp: 16 - 256 - 512 liquid-dsp: 8 - 256 - 512 liquid-dsp: 32 - 256 - 57 liquid-dsp: 4 - 256 - 512 liquid-dsp: 32 - 256 - 32 liquid-dsp: 2 - 256 - 512 liquid-dsp: 8 - 256 - 32 liquid-dsp: 16 - 256 - 57 liquid-dsp: 8 - 256 - 57 liquid-dsp: 4 - 256 - 57 liquid-dsp: 16 - 256 - 32 liquid-dsp: 1 - 256 - 512 liquid-dsp: 4 - 256 - 32 liquid-dsp: 2 - 256 - 57 liquid-dsp: 2 - 256 - 32 liquid-dsp: 1 - 256 - 57 liquid-dsp: 1 - 256 - 32 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 qmcpack: H4_ae llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 a b c d 88.41 88.09 125.15 121.76 9.17 91.42 90.33 9.71 52.822 42.036 91.25 89.38 375.05 618230000 582860000 315200000 1596800000 158040000 1583300000 82303000 454100000 1096000000 586180000 315060000 892520000 41410000 229660000 182470000 113930000 90088000 56787000 395.79 11 65.08 413.38 89.13 90.16 126.79 123.24 9.17 91.37 92.07 9.74 52.295 42.005 92.04 88.6 370.37 617980000 574670000 322050000 1601200000 160690000 1587600000 82576000 454350000 1093500000 585930000 313980000 893150000 41571000 229740000 183150000 113780000 90375000 56789000 402.62 10.36 65.01 418.76 89.14 89.69 124.76 122.97 9.17 90.08 89.3 9.75 52.529 41.883 88.49 93.6 371.21 618330000 580280000 312230000 1599700000 160370000 1584800000 82543000 453740000 1091900000 589920000 315910000 891310000 42347000 229920000 183260000 114160000 90142000 57980000 394.84 10.69 65.1 406.48 89.55 89.74 124.25 123.99 9.19 90.29 91.21 9.74 52.313 41.811 90.77 92.18 376.59 616800000 585540000 317070000 1599300000 155760000 1583700000 80008000 454640000 1089400000 592680000 315130000 892320000 42423000 229660000 182390000 113880000 90466000 59018000 399.88 10.54 64.9 420.46 OpenBenchmarking.org
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 d c b a 20 40 60 80 100 89.55 89.14 89.13 88.41 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 b d c a 20 40 60 80 100 90.16 89.74 89.69 88.09 1. (CXX) g++ options: -O3
QMCPACK Input: O_ae_pyscf_UHF OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: O_ae_pyscf_UHF d c a b 30 60 90 120 150 124.25 124.76 125.15 126.79 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
QMCPACK Input: Li2_STO_ae OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: Li2_STO_ae a c b d 30 60 90 120 150 121.76 122.97 123.24 123.99 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 d c b a 3 6 9 12 15 9.19 9.17 9.17 9.17 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b d c 20 40 60 80 100 91.42 91.37 90.29 90.08 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 b d a c 20 40 60 80 100 92.07 91.21 90.33 89.30 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 c d b a 3 6 9 12 15 9.75 9.74 9.74 9.71 1. (CXX) g++ options: -O3
QMCPACK Input: FeCO6_b3lyp_gms OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: FeCO6_b3lyp_gms b d c a 12 24 36 48 60 52.30 52.31 52.53 52.82 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
QMCPACK Input: LiH_ae_MSD OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: LiH_ae_MSD d c b a 10 20 30 40 50 41.81 41.88 42.01 42.04 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 b a d c 20 40 60 80 100 92.04 91.25 90.77 88.49 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 c d a b 20 40 60 80 100 93.60 92.18 89.38 88.60 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 d a c b 80 160 240 320 400 376.59 375.05 371.21 370.37 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 32 - Buffer Length: 256 - Filter Length: 512 c a b d 130M 260M 390M 520M 650M 618330000 618230000 617980000 616800000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 512 d a c b 130M 260M 390M 520M 650M 585540000 582860000 580280000 574670000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 512 b d a c 70M 140M 210M 280M 350M 322050000 317070000 315200000 312230000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 32 - Buffer Length: 256 - Filter Length: 57 b c d a 300M 600M 900M 1200M 1500M 1601200000 1599700000 1599300000 1596800000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 512 b c a d 30M 60M 90M 120M 150M 160690000 160370000 158040000 155760000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 32 - Buffer Length: 256 - Filter Length: 32 b c d a 300M 600M 900M 1200M 1500M 1587600000 1584800000 1583700000 1583300000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 512 b c a d 20M 40M 60M 80M 100M 82576000 82543000 82303000 80008000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 32 d b a c 100M 200M 300M 400M 500M 454640000 454350000 454100000 453740000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 57 a b c d 200M 400M 600M 800M 1000M 1096000000 1093500000 1091900000 1089400000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 57 d c a b 130M 260M 390M 520M 650M 592680000 589920000 586180000 585930000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 57 c d a b 70M 140M 210M 280M 350M 315910000 315130000 315060000 313980000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 32 b a d c 200M 400M 600M 800M 1000M 893150000 892520000 892320000 891310000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 512 d c b a 9M 18M 27M 36M 45M 42423000 42347000 41571000 41410000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 32 c b d a 50M 100M 150M 200M 250M 229920000 229740000 229660000 229660000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 57 c b a d 40M 80M 120M 160M 200M 183260000 183150000 182470000 182390000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 32 c a d b 20M 40M 60M 80M 100M 114160000 113930000 113880000 113780000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 57 d b c a 20M 40M 60M 80M 100M 90466000 90375000 90142000 90088000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 32 d c b a 13M 26M 39M 52M 65M 59018000 57980000 56789000 56787000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 b d a c 90 180 270 360 450 402.62 399.88 395.79 394.84 1. (CXX) g++ options: -O3
QMCPACK Input: H4_ae OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: H4_ae b d c a 3 6 9 12 15 10.36 10.54 10.69 11.00 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 c a b d 15 30 45 60 75 65.10 65.08 65.01 64.90 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 d b a c 90 180 270 360 450 420.46 418.76 413.38 406.48 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5