kkk AMD Ryzen Threadripper PRO 7995WX 96-Cores testing with a HP Z6 G5 A Workstation 8B24 (U65 Ver. 01.01.04 BIOS) and NVIDIA RTX A4000 16GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401223-PTS-KKK7127595&grr&sro&rro .
kkk Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Compiler File-System Screen Resolution a b c d e AMD Ryzen Threadripper PRO 7995WX 96-Cores @ 6.44GHz (96 Cores / 192 Threads) HP Z6 G5 A Workstation 8B24 (U65 Ver. 01.01.04 BIOS) AMD Device 14a4 8 x 16 GB DRAM-5200MT/s Hynix HMCG78AGBRA190N 2 x 1024GB SAMSUNG MZVL21T0HCLR-00BH1 NVIDIA RTX A4000 16GB NVIDIA GA104 HD Audio ASUS VP28U Realtek RTL8111/8168/8411 Ubuntu 23.10 6.5.0-14-generic (x86_64) GNOME Shell 45.0 X Server 1.21.1.7 NVIDIA 535.129.03 4.6.0 OpenCL 3.0 CUDA 12.2.147 GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
kkk llamafile: mistral-7b-instruct-v0.2.Q8_0 - CPU llamafile: wizardcoder-python-34b-v1.0.Q6_K - CPU compress-lz4: 9 - Decompression Speed compress-lz4: 9 - Compression Speed compress-lz4: 1 - Decompression Speed compress-lz4: 1 - Compression Speed compress-lz4: 3 - Decompression Speed compress-lz4: 3 - Compression Speed llamafile: llava-v1.5-7b-q4 - CPU a b c d e 17.89 7.66 5681.9 48.06 6052.9 954.27 5398.7 144.10 27.77 18.01 7.65 5590.2 47.42 6005.4 941.76 5396.3 143.73 27.87 17.96 7.63 5580.7 47.32 6014.7 945.60 5352.5 142.97 27.91 18.02 7.63 5620.0 47.59 5950.4 938.32 5439.7 145.00 27.87 18.00 7.66 5663.7 47.97 5931.0 934.49 5410.1 144.49 27.67 OpenBenchmarking.org
Llamafile Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU e d c b a 4 8 12 16 20 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 18.00 18.02 17.96 18.01 17.89
Llamafile Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU e d c b a 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 7.66 7.63 7.63 7.65 7.66
LZ4 Compression Compression Level: 9 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Decompression Speed e d c b a 1200 2400 3600 4800 6000 SE +/- 26.41, N = 3 SE +/- 37.68, N = 3 SE +/- 34.89, N = 3 SE +/- 33.75, N = 3 SE +/- 12.30, N = 3 5663.7 5620.0 5580.7 5590.2 5681.9 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Compression Speed e d c b a 11 22 33 44 55 SE +/- 0.19, N = 3 SE +/- 0.29, N = 3 SE +/- 0.18, N = 3 SE +/- 0.24, N = 3 SE +/- 0.10, N = 3 47.97 47.59 47.32 47.42 48.06 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 1 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Decompression Speed e d c b a 1300 2600 3900 5200 6500 SE +/- 6.66, N = 3 SE +/- 61.87, N = 3 SE +/- 42.49, N = 3 SE +/- 9.19, N = 3 SE +/- 6.59, N = 3 5931.0 5950.4 6014.7 6005.4 6052.9 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 1 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Compression Speed e d c b a 200 400 600 800 1000 SE +/- 2.33, N = 3 SE +/- 9.75, N = 3 SE +/- 6.64, N = 3 SE +/- 2.97, N = 3 SE +/- 2.19, N = 3 934.49 938.32 945.60 941.76 954.27 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Decompression Speed e d c b a 1200 2400 3600 4800 6000 SE +/- 42.04, N = 3 SE +/- 40.02, N = 3 SE +/- 36.23, N = 3 SE +/- 76.63, N = 3 SE +/- 11.60, N = 3 5410.1 5439.7 5352.5 5396.3 5398.7 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Compression Speed e d c b a 30 60 90 120 150 SE +/- 1.33, N = 3 SE +/- 1.20, N = 3 SE +/- 0.86, N = 3 SE +/- 1.76, N = 3 SE +/- 0.26, N = 3 144.49 145.00 142.97 143.73 144.10 1. (CC) gcc options: -O3
Llamafile Test: llava-v1.5-7b-q4 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: llava-v1.5-7b-q4 - Acceleration: CPU e d c b a 7 14 21 28 35 SE +/- 0.20, N = 3 SE +/- 0.15, N = 3 SE +/- 0.15, N = 3 SE +/- 0.15, N = 3 SE +/- 0.11, N = 3 27.67 27.87 27.91 27.87 27.77
Phoronix Test Suite v10.8.5