llama ryzen AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA NV174 8GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401115-PTS-LLAMARYZ22&rdt&grt .
llama ryzen Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16 GB DRAM-4800MT/s F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB NVIDIA NV174 8GB NVIDIA GA104 HD Audio DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700rc2daily20231127-generic (x86_64) GNOME Shell 45.1 X Server 1.21.1.7 + Wayland nouveau 4.3 Mesa 24.0~git2311260600.945288~oibaf~m (git-945288f 2023-11-26 mantic-oibaf-ppa) GCC 13.2.0 + LLVM 16.0.6 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601203 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
llama ryzen cachebench: Read cachebench: Write cachebench: Read / Modify / Write llama-cpp: llama-2-7b.Q4_0.gguf llama-cpp: llama-2-13b.Q4_0.gguf a b c d e 14564.526857 82521.012306 148655.268243 16.57 8.91 13928.904051 82683.705256 149322.503152 16.29 8.59 14012.378556 82816.166886 149341.694893 16.01 8.64 13941.948907 82411.453785 148643.769977 16.24 8.69 14000.991451 82721.458246 149477.486712 16.37 8.50 OpenBenchmarking.org
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read a b c d e 3K 6K 9K 12K 15K SE +/- 6.58, N = 3 SE +/- 17.63, N = 3 SE +/- 13.31, N = 3 SE +/- 10.09, N = 3 14564.53 13928.90 14012.38 13941.95 14000.99 MIN: 14559.58 / MAX: 14565.55 MIN: 13869.2 / MAX: 13952.83 MIN: 13989.59 / MAX: 14051.73 MIN: 13893.22 / MAX: 13992.49 MIN: 13972.94 / MAX: 14166.78 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write a b c d e 20K 40K 60K 80K 100K SE +/- 110.45, N = 3 SE +/- 25.26, N = 3 SE +/- 129.02, N = 3 SE +/- 28.12, N = 3 82521.01 82683.71 82816.17 82411.45 82721.46 MIN: 82045.36 / MAX: 82824.07 MIN: 82026.91 / MAX: 83187.56 MIN: 82313.94 / MAX: 83190.85 MIN: 81658.82 / MAX: 83010.71 MIN: 82103.53 / MAX: 83190.13 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write a b c d e 30K 60K 90K 120K 150K SE +/- 49.22, N = 3 SE +/- 28.47, N = 3 SE +/- 74.37, N = 3 SE +/- 94.46, N = 3 148655.27 149322.50 149341.69 148643.77 149477.49 MIN: 117568.67 / MAX: 160508.34 MIN: 118003.83 / MAX: 161230.87 MIN: 117995 / MAX: 161238.81 MIN: 117490.81 / MAX: 160497.66 MIN: 118036.01 / MAX: 161269.58 1. (CC) gcc options: -O3 -lrt
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf a b c d e 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.15, N = 3 SE +/- 0.15, N = 7 SE +/- 0.09, N = 3 16.57 16.29 16.01 16.24 16.37 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf a b c d e 2 4 6 8 10 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 8.91 8.59 8.64 8.69 8.50 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Phoronix Test Suite v10.8.5