AMD EPYC 8324P 32-Core testing with a AMD Cinnabar (RCB1009C BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2401278-NE-LLL53839360 lll - Phoronix Test Suite lll AMD EPYC 8324P 32-Core testing with a AMD Cinnabar (RCB1009C BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2401278-NE-LLL53839360&sor&grt .
lll Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a b c AMD EPYC 8324P 32-Core @ 2.65GHz (32 Cores / 64 Threads) AMD Cinnabar (RCB1009C BIOS) AMD Device 14a4 6 x 32 GB DRAM-4800MT/s Samsung M321R4GA0BB0-CQKMG 3201GB Micron_7450_MTFDKCB3T2TFS ASPEED 2 x Broadcom NetXtreme BCM5720 PCIe Ubuntu 23.10 6.5.0-5-generic (x86_64) GNOME Shell X Server 1.21.1.7 GCC 13.2.0 ext4 640x480 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-FTCNCZ/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-FTCNCZ/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xaa00212 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
lll cachebench: Read cachebench: Write cachebench: Read / Modify / Write llama-cpp: llama-2-7b.Q4_0.gguf llama-cpp: llama-2-13b.Q4_0.gguf llama-cpp: llama-2-70b-chat.Q5_0.gguf llamafile: llava-v1.5-7b-q4 - CPU llamafile: mistral-7b-instruct-v0.2.Q8_0 - CPU llamafile: wizardcoder-python-34b-v1.0.Q6_K - CPU compress-lz4: 1 - Compression Speed compress-lz4: 1 - Decompression Speed compress-lz4: 3 - Compression Speed compress-lz4: 3 - Decompression Speed compress-lz4: 9 - Compression Speed compress-lz4: 9 - Decompression Speed a b c 7612.835004 45638.468414 87264.314442 29.58 17.97 3.42 23.54 14.75 5.5 565.03 3650.1 88.2 3306.9 28.35 3471.6 7615.129372 45639.681202 87154.946426 30.03 17.95 3.43 23.57 14.76 5.53 564.21 3649.8 88.23 3307.6 28.34 3467.4 7615.214808 45639.009333 87247.993005 29.85 18.31 3.43 23.76 14.75 5.51 565.42 3647.9 88.09 3308.6 28.29 3468.1 OpenBenchmarking.org
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read c b a 1600 3200 4800 6400 8000 SE +/- 0.07, N = 3 SE +/- 0.12, N = 3 7615.21 7615.13 7612.84 MIN: 7612.23 / MAX: 7615.74 MIN: 7614.3 / MAX: 7615.59 MIN: 7612.38 / MAX: 7613.67 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write b c a 10K 20K 30K 40K 50K SE +/- 0.89, N = 3 SE +/- 0.76, N = 3 45639.68 45639.01 45638.47 MIN: 45474.92 / MAX: 45693.28 MIN: 45474.02 / MAX: 45692.39 MIN: 45475.7 / MAX: 45690.04 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write a c b 20K 40K 60K 80K 100K SE +/- 10.58, N = 3 SE +/- 87.15, N = 3 87264.31 87247.99 87154.95 MIN: 65718.12 / MAX: 90689.42 MIN: 65710.28 / MAX: 90697.68 MIN: 65719.2 / MAX: 90700.03 1. (CC) gcc options: -O3 -lrt
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf b c a 7 14 21 28 35 SE +/- 0.22, N = 12 SE +/- 0.12, N = 3 30.03 29.85 29.58 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf c a b 5 10 15 20 25 SE +/- 0.17, N = 3 SE +/- 0.04, N = 3 18.31 17.97 17.95 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf c b a 0.7718 1.5436 2.3154 3.0872 3.859 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 3.43 3.43 3.42 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llamafile Test: llava-v1.5-7b-q4 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: llava-v1.5-7b-q4 - Acceleration: CPU c b a 6 12 18 24 30 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 23.76 23.57 23.54
Llamafile Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU b c a 4 8 12 16 20 SE +/- 0.08, N = 3 SE +/- 0.09, N = 3 14.76 14.75 14.75
Llamafile Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU b c a 1.2443 2.4886 3.7329 4.9772 6.2215 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 5.53 5.51 5.50
LZ4 Compression Compression Level: 1 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Compression Speed c a b 120 240 360 480 600 SE +/- 0.78, N = 3 SE +/- 0.34, N = 3 565.42 565.03 564.21 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 1 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Decompression Speed a b c 800 1600 2400 3200 4000 SE +/- 0.58, N = 3 SE +/- 0.41, N = 3 3650.1 3649.8 3647.9 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Compression Speed b a c 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.10, N = 3 88.23 88.20 88.09 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Decompression Speed c b a 700 1400 2100 2800 3500 SE +/- 0.62, N = 3 SE +/- 0.71, N = 3 3308.6 3307.6 3306.9 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Compression Speed a b c 7 14 21 28 35 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 28.35 28.34 28.29 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Decompression Speed a c b 700 1400 2100 2800 3500 SE +/- 0.17, N = 3 SE +/- 0.52, N = 3 3471.6 3468.1 3467.4 1. (CC) gcc options: -O3
Phoronix Test Suite v10.8.4