llama ryzen

AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and NVIDIA NV174 8GB on Ubuntu 23.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2401115-PTS-LLAMARYZ22&grs&sro.

llama ryzenProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionabcdeAMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads)ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS)AMD Device 14d82 x 16 GB DRAM-4800MT/s F5-6000J3038F16G2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GBNVIDIA NV174 8GBNVIDIA GA104 HD AudioDELL U2723QEIntel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411Ubuntu 23.106.7.0-060700rc2daily20231127-generic (x86_64)GNOME Shell 45.1X Server 1.21.1.7 + Waylandnouveau4.3 Mesa 24.0~git2311260600.945288~oibaf~m (git-945288f 2023-11-26 mantic-oibaf-ppa)GCC 13.2.0 + LLVM 16.0.6ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601203Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

llama ryzenllama-cpp: llama-2-13b.Q4_0.ggufcachebench: Readllama-cpp: llama-2-7b.Q4_0.ggufcachebench: Read / Modify / Writecachebench: Writeabcde8.9114564.52685716.57148655.26824382521.0123068.5913928.90405116.29149322.50315282683.7052568.6414012.37855616.01149341.69489382816.1668868.6913941.94890716.24148643.76997782411.4537858.5014000.99145116.37149477.48671282721.458246OpenBenchmarking.org

Llama.cpp

Model: llama-2-13b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-13b.Q4_0.ggufabcde246810SE +/- 0.06, N = 3SE +/- 0.05, N = 3SE +/- 0.01, N = 3SE +/- 0.05, N = 38.918.598.648.698.501. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

CacheBench

Test: Read

OpenBenchmarking.orgMB/s, More Is BetterCacheBenchTest: Readabcde3K6K9K12K15KSE +/- 6.58, N = 3SE +/- 17.63, N = 3SE +/- 13.31, N = 3SE +/- 10.09, N = 314564.5313928.9014012.3813941.9514000.99MIN: 14559.58 / MAX: 14565.55MIN: 13869.2 / MAX: 13952.83MIN: 13989.59 / MAX: 14051.73MIN: 13893.22 / MAX: 13992.49MIN: 13972.94 / MAX: 14166.781. (CC) gcc options: -O3 -lrt

Llama.cpp

Model: llama-2-7b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-7b.Q4_0.ggufabcde48121620SE +/- 0.03, N = 3SE +/- 0.15, N = 3SE +/- 0.15, N = 7SE +/- 0.09, N = 316.5716.2916.0116.2416.371. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas

CacheBench

Test: Read / Modify / Write

OpenBenchmarking.orgMB/s, More Is BetterCacheBenchTest: Read / Modify / Writeabcde30K60K90K120K150KSE +/- 49.22, N = 3SE +/- 28.47, N = 3SE +/- 74.37, N = 3SE +/- 94.46, N = 3148655.27149322.50149341.69148643.77149477.49MIN: 117568.67 / MAX: 160508.34MIN: 118003.83 / MAX: 161230.87MIN: 117995 / MAX: 161238.81MIN: 117490.81 / MAX: 160497.66MIN: 118036.01 / MAX: 161269.581. (CC) gcc options: -O3 -lrt

CacheBench

Test: Write

OpenBenchmarking.orgMB/s, More Is BetterCacheBenchTest: Writeabcde20K40K60K80K100KSE +/- 110.45, N = 3SE +/- 25.26, N = 3SE +/- 129.02, N = 3SE +/- 28.12, N = 382521.0182683.7182816.1782411.4582721.46MIN: 82045.36 / MAX: 82824.07MIN: 82026.91 / MAX: 83187.56MIN: 82313.94 / MAX: 83190.85MIN: 81658.82 / MAX: 83010.71MIN: 82103.53 / MAX: 83190.131. (CC) gcc options: -O3 -lrt


Phoronix Test Suite v10.8.5