llama cpp 7995WX AMD Ryzen Threadripper PRO 7995WX 96-Cores testing with a HP 8B24 (U65 Ver. 01.01.04 BIOS) and NVIDIA RTX A4000 16GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401105-PTS-LLAMACPP49&sro .
llama cpp 7995WX Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution a b c d AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads) HP 8B24 (U65 Ver. 01.01.04 BIOS) AMD Device 14a4 8 x 16 GB DRAM-5600MT/s Hynix HMCG78AGBRA190N 2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1 NVIDIA RTX A4000 16GB NVIDIA GA104 HD Audio ASUS VP28U Realtek RTL8111/8168/8411 Ubuntu 23.10 6.5.0-14-generic (x86_64) GNOME Shell 45.0 X Server 1.21.1.7 NVIDIA 535.129.03 4.6.0 OpenCL 3.0 CUDA 12.2.147 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
llama cpp 7995WX llama-cpp: llama-2-7b.Q4_0.gguf llama-cpp: llama-2-13b.Q4_0.gguf llama-cpp: llama-2-70b-chat.Q5_0.gguf a b c d 31.53 20.30 4.66 32.17 20.02 4.63 30.92 20.14 4.66 31.09 20.08 4.69 OpenBenchmarking.org
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf a b c d 7 14 21 28 35 SE +/- 0.29, N = 15 SE +/- 0.32, N = 5 SE +/- 0.22, N = 3 SE +/- 0.29, N = 3 31.53 32.17 30.92 31.09 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf a b c d 5 10 15 20 25 SE +/- 0.23, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.07, N = 3 20.30 20.02 20.14 20.08 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf a b c d 1.0553 2.1106 3.1659 4.2212 5.2765 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 4.66 4.63 4.66 4.69 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Phoronix Test Suite v10.8.4