feb 9950X AMD Ryzen 9 9950X 16-Core testing with a ASRock X870E Taichi (3.12.AS02 BIOS) and XFX AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2502104-PTS-FEB9950X21&grs&sor .
feb 9950X Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen 9 9950X 16-Core @ 5.75GHz (16 Cores / 32 Threads) ASRock X870E Taichi (3.12.AS02 BIOS) AMD Device 14d8 2 x 16GB DDR5-6000MT/s F5-6000J2836G16G Western Digital WD_BLACK SN850X 2000GB XFX AMD Radeon RX 7900 XTX 24GB AMD Navi 31 HDMI/DP DELL U2723QE Realtek Device 8126 + MEDIATEK Device 0717 Ubuntu 24.04 6.12.3-061203-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.2.0-devel (LLVM 18.1.7 DRM 3.59) GCC 13.3.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - CPU Microcode: 0xb404023 Python Details - Python 2.7.16 + Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
feb 9950X qmcpack: H4_ae llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 liquid-dsp: 1 - 256 - 32 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 liquid-dsp: 2 - 256 - 512 liquid-dsp: 4 - 256 - 512 liquid-dsp: 8 - 256 - 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 liquid-dsp: 1 - 256 - 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 qmcpack: O_ae_pyscf_UHF llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 liquid-dsp: 16 - 256 - 512 qmcpack: Li2_STO_ae llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 liquid-dsp: 8 - 256 - 57 qmcpack: FeCO6_b3lyp_gms liquid-dsp: 4 - 256 - 57 liquid-dsp: 16 - 256 - 57 qmcpack: LiH_ae_MSD liquid-dsp: 2 - 256 - 57 liquid-dsp: 1 - 256 - 57 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 liquid-dsp: 2 - 256 - 32 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 liquid-dsp: 32 - 256 - 57 liquid-dsp: 32 - 256 - 32 liquid-dsp: 32 - 256 - 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 liquid-dsp: 16 - 256 - 32 liquid-dsp: 8 - 256 - 32 liquid-dsp: 4 - 256 - 32 a b c d 11 89.38 91.25 56787000 413.38 82303000 158040000 315200000 90.33 41410000 88.09 125.15 395.79 582860000 121.76 375.05 91.42 88.41 586180000 52.822 315060000 1096000000 42.036 182470000 90088000 9.71 113930000 65.08 1596800000 1583300000 618230000 9.17 892520000 454100000 229660000 10.36 88.6 92.04 56789000 418.76 82576000 160690000 322050000 92.07 41571000 90.16 126.79 402.62 574670000 123.24 370.37 91.37 89.13 585930000 52.295 313980000 1093500000 42.005 183150000 90375000 9.74 113780000 65.01 1601200000 1587600000 617980000 9.17 893150000 454350000 229740000 10.69 93.6 88.49 57980000 406.48 82543000 160370000 312230000 89.3 42347000 89.69 124.76 394.84 580280000 122.97 371.21 90.08 89.14 589920000 52.529 315910000 1091900000 41.883 183260000 90142000 9.75 114160000 65.1 1599700000 1584800000 618330000 9.17 891310000 453740000 229920000 10.54 92.18 90.77 59018000 420.46 80008000 155760000 317070000 91.21 42423000 89.74 124.25 399.88 585540000 123.99 376.59 90.29 89.55 592680000 52.313 315130000 1089400000 41.811 182390000 90466000 9.74 113880000 64.9 1599300000 1583700000 616800000 9.19 892320000 454640000 229660000 OpenBenchmarking.org
QMCPACK Input: H4_ae OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: H4_ae b d c a 3 6 9 12 15 10.36 10.54 10.69 11.00 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 c d a b 20 40 60 80 100 93.60 92.18 89.38 88.60 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 b a d c 20 40 60 80 100 92.04 91.25 90.77 88.49 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 32 d c b a 13M 26M 39M 52M 65M 59018000 57980000 56789000 56787000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 d b a c 90 180 270 360 450 420.46 418.76 413.38 406.48 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 512 b c a d 20M 40M 60M 80M 100M 82576000 82543000 82303000 80008000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 512 b c a d 30M 60M 90M 120M 150M 160690000 160370000 158040000 155760000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 512 b d a c 70M 140M 210M 280M 350M 322050000 317070000 315200000 312230000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 b d a c 20 40 60 80 100 92.07 91.21 90.33 89.30 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 512 d c b a 9M 18M 27M 36M 45M 42423000 42347000 41571000 41410000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 b d c a 20 40 60 80 100 90.16 89.74 89.69 88.09 1. (CXX) g++ options: -O3
QMCPACK Input: O_ae_pyscf_UHF OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: O_ae_pyscf_UHF d c a b 30 60 90 120 150 124.25 124.76 125.15 126.79 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 b d a c 90 180 270 360 450 402.62 399.88 395.79 394.84 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 512 d a c b 130M 260M 390M 520M 650M 585540000 582860000 580280000 574670000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
QMCPACK Input: Li2_STO_ae OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: Li2_STO_ae a c b d 30 60 90 120 150 121.76 122.97 123.24 123.99 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 d a c b 80 160 240 320 400 376.59 375.05 371.21 370.37 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b d c 20 40 60 80 100 91.42 91.37 90.29 90.08 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 d c b a 20 40 60 80 100 89.55 89.14 89.13 88.41 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 57 d c a b 130M 260M 390M 520M 650M 592680000 589920000 586180000 585930000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
QMCPACK Input: FeCO6_b3lyp_gms OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: FeCO6_b3lyp_gms b d c a 12 24 36 48 60 52.30 52.31 52.53 52.82 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 57 c d a b 70M 140M 210M 280M 350M 315910000 315130000 315060000 313980000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 57 a b c d 200M 400M 600M 800M 1000M 1096000000 1093500000 1091900000 1089400000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
QMCPACK Input: LiH_ae_MSD OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 4.0 Input: LiH_ae_MSD d c b a 10 20 30 40 50 41.81 41.88 42.01 42.04 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 57 c b a d 40M 80M 120M 160M 200M 183260000 183150000 182470000 182390000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 57 d b c a 20M 40M 60M 80M 100M 90466000 90375000 90142000 90088000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 c d b a 3 6 9 12 15 9.75 9.74 9.74 9.71 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 32 c a d b 20M 40M 60M 80M 100M 114160000 113930000 113880000 113780000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 c a b d 15 30 45 60 75 65.10 65.08 65.01 64.90 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 32 - Buffer Length: 256 - Filter Length: 57 b c d a 300M 600M 900M 1200M 1500M 1601200000 1599700000 1599300000 1596800000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 32 - Buffer Length: 256 - Filter Length: 32 b c d a 300M 600M 900M 1200M 1500M 1587600000 1584800000 1583700000 1583300000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 32 - Buffer Length: 256 - Filter Length: 512 c a b d 130M 260M 390M 520M 650M 618330000 618230000 617980000 616800000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 d c b a 3 6 9 12 15 9.19 9.17 9.17 9.17 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 32 b a d c 200M 400M 600M 800M 1000M 893150000 892520000 892320000 891310000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 32 d b a c 100M 200M 300M 400M 500M 454640000 454350000 454100000 453740000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 32 c b d a 50M 100M 150M 200M 250M 229920000 229740000 229660000 229660000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Phoronix Test Suite v10.8.5