Llama3bench AMD Ryzen Threadripper 2990WX 32-Core testing with a ASRock X399 Phantom Gaming 6 (P1.10 BIOS) and AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB on AlmaLinux 9.5 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2411289-NE-LLAMA3BEN89&grt .
Llama3bench Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution LLama3bench1 AMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads) ASRock X399 Phantom Gaming 6 (P1.10 BIOS) AMD 17h 128GB 1024GB INTEL SSDPEKNW010T8 AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1183/1500MHz) Realtek ALC1220 PHL 273B9 Intel I211 + Realtek RTL8125 2.5GbE AlmaLinux 9.5 5.14.0-503.14.1.el9_5.x86_64 (x86_64) GNOME Shell 40.10 X Server GCC 11.5.0 20240719 xfs 1920x1080 OpenBenchmarking.org - Transparent Huge Pages: always - --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-host-bind-now --enable-host-pie --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-link-serialization=1 --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-arch_64=x86-64-v2 --with-build-config=bootstrap-lto --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver --without-isl - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0x800820d - SELinux + gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Mitigation of untrained return thunk; SMT vulnerable + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines; IBPB: conditional; STIBP: disabled; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
Llama3bench llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 LLama3bench1 2.75 4.18 4.20 4.14 OpenBenchmarking.org
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 LLama3bench1 0.6188 1.2376 1.8564 2.4752 3.094 SE +/- 0.01, N = 3 2.75 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 LLama3bench1 0.9405 1.881 2.8215 3.762 4.7025 SE +/- 0.01, N = 3 4.18 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 LLama3bench1 0.945 1.89 2.835 3.78 4.725 SE +/- 0.02, N = 3 4.20 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 LLama3bench1 0.9315 1.863 2.7945 3.726 4.6575 SE +/- 0.02, N = 3 4.14 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Phoronix Test Suite v10.8.5