new feb AMD Ryzen AI 9 HX 370 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and AMD Radeon 512MB on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2502026-NE-NEWFEB88615&grr&rdt .
new feb Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen AI 9 HX 370 @ 4.37GHz (12 Cores / 24 Threads) ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) AMD Device 1507 4 x 8GB LPDDR5-7500MT/s Samsung K3KL9L90CM-MGCT 1024GB MTFDKBA1T0QFM-1BD1AABGB AMD Radeon 512MB AMD Rembrandt Radeon HD Audio MEDIATEK Device 7925 Ubuntu 24.10 6.11.0-rc6-phx (x86_64) GNOME Shell 47.0 X Server + Wayland 4.6 Mesa 24.2.3-1ubuntu1 (LLVM 19.1.0 DRM 3.58) GCC 14.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - amdgpu.dcdebugmask=0x600 - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - a: Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_power) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced - b: Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_power) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced - c: Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: power) - Platform Profile: low-power - CPU Microcode: 0xb204011 - ACPI Profile: low-power - d: Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
new feb llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 srsran: PUSCH Processor Benchmark, Throughput Total llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 liquid-dsp: 24 - 256 - 512 liquid-dsp: 16 - 256 - 512 liquid-dsp: 8 - 256 - 512 liquid-dsp: 4 - 256 - 512 liquid-dsp: 24 - 256 - 57 liquid-dsp: 2 - 256 - 512 liquid-dsp: 1 - 256 - 512 liquid-dsp: 24 - 256 - 32 liquid-dsp: 8 - 256 - 57 liquid-dsp: 16 - 256 - 57 liquid-dsp: 16 - 256 - 32 liquid-dsp: 4 - 256 - 57 liquid-dsp: 2 - 256 - 32 liquid-dsp: 8 - 256 - 32 liquid-dsp: 2 - 256 - 57 liquid-dsp: 1 - 256 - 57 liquid-dsp: 1 - 256 - 32 liquid-dsp: 4 - 256 - 32 srsran: PUSCH Processor Benchmark, Throughput Thread srsran: PDSCH Processor Benchmark, Throughput Total llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 srsran: PDSCH Processor Benchmark, Throughput Thread a b c d 34.55 36.83 37.26 38.36 146.25 39.29 37.4 10.14 10.62 2302.7 171.34 175350000 174070000 127510000 71434000 627970000 40464000 19340000 623040000 370880000 532610000 481950000 208370000 82974000 289220000 126200000 51965000 32814000 147090000 142.2 13881.1 176.3 55.48 922.1 33.36 34.19 35.73 37.67 128.98 38.08 37.34 10.1 10.59 2311.5 142.69 175110000 173140000 131770000 72413000 627070000 40117000 16375000 630610000 370980000 537800000 487150000 205530000 82734000 292860000 126430000 51976000 32809000 147980000 142.2 13645.2 145.77 50.29 919.2 29.49 29.66 30.57 30.83 113.16 30.4 30.92 10.14 10.12 1962.2 119.29 173480000 172230000 131960000 71542000 621930000 39374000 10019000 629110000 234220000 395490000 319550000 119840000 41054000 163770000 63351000 31499000 20243000 81463000 106.4 10930.4 123.08 54.77 696.8 32.15 31.43 32.86 31.89 147.57 31.27 33.9 10.11 10.43 2306.4 153.27 173070000 171480000 132340000 72638000 616900000 38620000 16680000 623460000 376430000 534610000 482810000 212600000 77682000 292960000 121610000 52131000 33348000 144880000 178 14074.7 147.4 53.94 1176.4 OpenBenchmarking.org
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b c d 8 16 24 32 40 34.55 33.36 29.49 32.15 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b c d 8 16 24 32 40 36.83 34.19 29.66 31.43 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b c d 9 18 27 36 45 37.26 35.73 30.57 32.86 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b c d 9 18 27 36 45 38.36 37.67 30.83 31.89 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b c d 30 60 90 120 150 146.25 128.98 113.16 147.57 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b c d 9 18 27 36 45 39.29 38.08 30.40 31.27 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b c d 9 18 27 36 45 37.40 37.34 30.92 33.90 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b c d 3 6 9 12 15 10.14 10.10 10.14 10.11 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b c d 3 6 9 12 15 10.62 10.59 10.12 10.43 1. (CXX) g++ options: -O3
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Total OpenBenchmarking.org Mbps, More Is Better srsRAN Project 24.10 Test: PUSCH Processor Benchmark, Throughput Total a b c d 500 1000 1500 2000 2500 2302.7 2311.5 1962.2 2306.4 1. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b c d 40 80 120 160 200 171.34 142.69 119.29 153.27 1. (CXX) g++ options: -O3
Liquid-DSP Threads: 24 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 24 - Buffer Length: 256 - Filter Length: 512 a b c d 40M 80M 120M 160M 200M 175350000 175110000 173480000 173070000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 512 a b c d 40M 80M 120M 160M 200M 174070000 173140000 172230000 171480000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 512 a b c d 30M 60M 90M 120M 150M 127510000 131770000 131960000 132340000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 512 a b c d 16M 32M 48M 64M 80M 71434000 72413000 71542000 72638000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 24 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 24 - Buffer Length: 256 - Filter Length: 57 a b c d 130M 260M 390M 520M 650M 627970000 627070000 621930000 616900000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 512 a b c d 9M 18M 27M 36M 45M 40464000 40117000 39374000 38620000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 512 a b c d 4M 8M 12M 16M 20M 19340000 16375000 10019000 16680000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 24 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 24 - Buffer Length: 256 - Filter Length: 32 a b c d 140M 280M 420M 560M 700M 623040000 630610000 629110000 623460000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 57 a b c d 80M 160M 240M 320M 400M 370880000 370980000 234220000 376430000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 57 a b c d 120M 240M 360M 480M 600M 532610000 537800000 395490000 534610000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 16 - Buffer Length: 256 - Filter Length: 32 a b c d 100M 200M 300M 400M 500M 481950000 487150000 319550000 482810000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 57 a b c d 50M 100M 150M 200M 250M 208370000 205530000 119840000 212600000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 32 a b c d 20M 40M 60M 80M 100M 82974000 82734000 41054000 77682000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 8 - Buffer Length: 256 - Filter Length: 32 a b c d 60M 120M 180M 240M 300M 289220000 292860000 163770000 292960000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 2 - Buffer Length: 256 - Filter Length: 57 a b c d 30M 60M 90M 120M 150M 126200000 126430000 63351000 121610000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 57 a b c d 11M 22M 33M 44M 55M 51965000 51976000 31499000 52131000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 1 - Buffer Length: 256 - Filter Length: 32 a b c d 7M 14M 21M 28M 35M 32814000 32809000 20243000 33348000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.7 Threads: 4 - Buffer Length: 256 - Filter Length: 32 a b c d 30M 60M 90M 120M 150M 147090000 147980000 81463000 144880000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
srsRAN Project Test: PUSCH Processor Benchmark, Throughput Thread OpenBenchmarking.org Mbps, More Is Better srsRAN Project 24.10 Test: PUSCH Processor Benchmark, Throughput Thread a b c d 40 80 120 160 200 142.2 142.2 106.4 178.0 1. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl
srsRAN Project Test: PDSCH Processor Benchmark, Throughput Total OpenBenchmarking.org Mbps, More Is Better srsRAN Project 24.10 Test: PDSCH Processor Benchmark, Throughput Total a b c d 3K 6K 9K 12K 15K 13881.1 13645.2 10930.4 14074.7 1. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b c d 40 80 120 160 200 176.30 145.77 123.08 147.40 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b c d 12 24 36 48 60 55.48 50.29 54.77 53.94 1. (CXX) g++ options: -O3
srsRAN Project Test: PDSCH Processor Benchmark, Throughput Thread OpenBenchmarking.org Mbps, More Is Better srsRAN Project 24.10 Test: PDSCH Processor Benchmark, Throughput Thread a b c d 300 600 900 1200 1500 922.1 919.2 696.8 1176.4 1. (CXX) g++ options: -O3 -march=native -mtune=generic -fno-trapping-math -fno-math-errno -ldl
Phoronix Test Suite v10.8.5