hw-mig AMD EPYC 7R13 48-Core testing with a Supermicro H12SSL-I v1.02 (2.7 BIOS) and NVIDIA GeForce RTX 4090 24GB on EndeavourOS rolling via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402176-NE-HWMIG953002&grt .
hw-mig Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB AMD EPYC 7R13 48-Core AMD EPYC 7R13 48-Core @ 3.73GHz (48 Cores / 96 Threads) Supermicro H12SSL-I v1.02 (2.7 BIOS) AMD Starship/Matisse 256GB 15363GB Micron_7450_MTFDKCC15T3TFR NVIDIA GeForce RTX 4090 24GB NVIDIA AD102 HD Audio 38GN950 2 x Intel X710 for 10GbE SFP+ EndeavourOS rolling 6.7.4-zen1-1-zen (x86_64) Xfce 4.18 X Server 1.21.1.11 NVIDIA 545.29.06 4.6.0 GCC 13.2.1 20230801 + Clang 16.0.6 + LLVM 16.0.6 + CUDA 12.3 btrfs 3840x1600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Environment Details - NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin" Compiler Details - --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details - Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa0011d1 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
hw-mig blender: BMW27 - NVIDIA CUDA blender: Classroom - NVIDIA CUDA blender: Fishy Cat - NVIDIA CUDA blender: Barbershop - NVIDIA CUDA blender: Pabellon Barcelona - NVIDIA CUDA llama-cpp: llama-2-7b.Q4_0.gguf llama-cpp: llama-2-13b.Q4_0.gguf llama-cpp: llama-2-70b-chat.Q5_0.gguf llamafile: llava-v1.5-7b-q4 - CPU llamafile: mistral-7b-instruct-v0.2.Q8_0 - CPU llamafile: wizardcoder-python-34b-v1.0.Q6_K - CPU memcached: 1:1 memcached: 1:5 memcached: 5:1 memcached: 1:10 memcached: 1:100 povray: Trace Time redis: GET - 50 redis: GET - 1000 redis: SET - 1000 redis: LPOP - 1000 redis: SADD - 1000 redis: LPUSH - 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB AMD EPYC 7R13 48-Core 5.80 10.36 12.00 45.57 21.48 23.17 14.45 2.86 22.30 13.34 4.73 1375588.16 3850395.87 870630.72 4989344.12 5147962.66 2792000.50 2505995.40 2055844.83 1679590.47 2268077.83 1744859.50 10.616 OpenBenchmarking.org
Blender Blend File: BMW27 - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: BMW27 - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 1.305 2.61 3.915 5.22 6.525 SE +/- 0.02, N = 3 5.80
Blender Blend File: Classroom - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Classroom - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 3 6 9 12 15 SE +/- 0.01, N = 3 10.36
Blender Blend File: Fishy Cat - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Fishy Cat - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 3 6 9 12 15 SE +/- 0.04, N = 3 12.00
Blender Blend File: Barbershop - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Barbershop - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 10 20 30 40 50 SE +/- 0.07, N = 3 45.57
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 5 10 15 20 25 SE +/- 0.00, N = 3 21.48
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 6 12 18 24 30 SE +/- 0.03, N = 3 23.17 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 4 8 12 16 20 SE +/- 0.01, N = 3 14.45 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 0.6435 1.287 1.9305 2.574 3.2175 SE +/- 0.01, N = 3 2.86 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas
Llamafile Test: llava-v1.5-7b-q4 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: llava-v1.5-7b-q4 - Acceleration: CPU AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 5 10 15 20 25 SE +/- 0.05, N = 3 22.30
Llamafile Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 3 6 9 12 15 SE +/- 0.03, N = 3 13.34
Llamafile Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 1.0643 2.1286 3.1929 4.2572 5.3215 SE +/- 0.01, N = 3 4.73
Memcached Set To Get Ratio: 1:1 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:1 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 300K 600K 900K 1200K 1500K SE +/- 8189.12, N = 3 1375588.16 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Memcached Set To Get Ratio: 1:5 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:5 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 800K 1600K 2400K 3200K 4000K SE +/- 34072.71, N = 3 3850395.87 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Memcached Set To Get Ratio: 5:1 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 5:1 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 200K 400K 600K 800K 1000K SE +/- 1751.22, N = 3 870630.72 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Memcached Set To Get Ratio: 1:10 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:10 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 1.1M 2.2M 3.3M 4.4M 5.5M SE +/- 8659.87, N = 3 4989344.12 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Memcached Set To Get Ratio: 1:100 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:100 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 1.1M 2.2M 3.3M 4.4M 5.5M SE +/- 11107.87, N = 3 5147962.66 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
POV-Ray Trace Time OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time AMD EPYC 7R13 48-Core 3 6 9 12 15 SE +/- 0.02, N = 3 10.62 1. (CXX) g++ options: -pipe -O3 -ffast-math -march=native -lXpm -lSM -lICE -lX11 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
Redis Test: GET - Parallel Connections: 50 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: GET - Parallel Connections: 50 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 600K 1200K 1800K 2400K 3000K SE +/- 19715.89, N = 3 2792000.50 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: GET - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: GET - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 500K 1000K 1500K 2000K 2500K SE +/- 38739.75, N = 15 2505995.40 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: SET - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: SET - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 400K 800K 1200K 1600K 2000K SE +/- 20463.94, N = 12 2055844.83 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: LPOP - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: LPOP - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 400K 800K 1200K 1600K 2000K SE +/- 11821.21, N = 12 1679590.47 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: SADD - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: SADD - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 500K 1000K 1500K 2000K 2500K SE +/- 33076.13, N = 15 2268077.83 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: LPUSH - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: LPUSH - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 400K 800K 1200K 1600K 2000K SE +/- 22632.71, N = 15 1744859.50 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Phoronix Test Suite v10.8.5