hw-mig AMD EPYC 7R13 48-Core testing with a Supermicro H12SSL-I v1.02 (2.7 BIOS) and NVIDIA GeForce RTX 4090 24GB on EndeavourOS rolling via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402171-NE-HWMIG942302 .
hw-mig Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB AMD EPYC 7R13 48-Core AMD EPYC 7R13 48-Core @ 3.73GHz (48 Cores / 96 Threads) Supermicro H12SSL-I v1.02 (2.7 BIOS) AMD Starship/Matisse 256GB 15363GB Micron_7450_MTFDKCC15T3TFR NVIDIA GeForce RTX 4090 24GB NVIDIA AD102 HD Audio 38GN950 2 x Intel X710 for 10GbE SFP+ EndeavourOS rolling 6.7.4-zen1-1-zen (x86_64) Xfce 4.18 X Server 1.21.1.11 NVIDIA 545.29.06 4.6.0 GCC 13.2.1 20230801 + Clang 16.0.6 + LLVM 16.0.6 + CUDA 12.3 btrfs 3840x1600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: always Environment Details - NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin" Compiler Details - --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details - Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa0011d1 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
hw-mig llama-cpp: llama-2-7b.Q4_0.gguf llama-cpp: llama-2-13b.Q4_0.gguf llama-cpp: llama-2-70b-chat.Q5_0.gguf llamafile: llava-v1.5-7b-q4 - CPU llamafile: mistral-7b-instruct-v0.2.Q8_0 - CPU llamafile: wizardcoder-python-34b-v1.0.Q6_K - CPU redis: GET - 50 redis: GET - 1000 redis: SET - 1000 redis: LPOP - 1000 redis: SADD - 1000 redis: LPUSH - 1000 povray: Trace Time blender: BMW27 - NVIDIA CUDA blender: Classroom - NVIDIA CUDA blender: Fishy Cat - NVIDIA CUDA blender: Barbershop - NVIDIA CUDA blender: Pabellon Barcelona - NVIDIA CUDA memcached: 1:1 memcached: 1:5 memcached: 5:1 memcached: 1:10 memcached: 1:100 john-the-ripper: bcrypt john-the-ripper: WPA PSK john-the-ripper: Blowfish john-the-ripper: HMAC-SHA512 john-the-ripper: MD5 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB AMD EPYC 7R13 48-Core 23.17 14.45 2.86 22.30 13.34 4.73 2792000.50 2505995.40 2055844.83 1679590.47 2268077.83 1744859.50 5.80 10.36 12.00 45.57 21.48 1375588.16 3850395.87 870630.72 4989344.12 5147962.66 10.616 72465 166599 72501 89836867 3044250 OpenBenchmarking.org
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 6 12 18 24 30 SE +/- 0.03, N = 3 23.17 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 4 8 12 16 20 SE +/- 0.01, N = 3 14.45 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 0.6435 1.287 1.9305 2.574 3.2175 SE +/- 0.01, N = 3 2.86 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas
Llamafile Test: llava-v1.5-7b-q4 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: llava-v1.5-7b-q4 - Acceleration: CPU AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 5 10 15 20 25 SE +/- 0.05, N = 3 22.30
Llamafile Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 3 6 9 12 15 SE +/- 0.03, N = 3 13.34
Llamafile Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 1.0643 2.1286 3.1929 4.2572 5.3215 SE +/- 0.01, N = 3 4.73
Redis Test: GET - Parallel Connections: 50 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: GET - Parallel Connections: 50 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 600K 1200K 1800K 2400K 3000K SE +/- 19715.89, N = 3 2792000.50 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: GET - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: GET - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 500K 1000K 1500K 2000K 2500K SE +/- 38739.75, N = 15 2505995.40 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: SET - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: SET - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 400K 800K 1200K 1600K 2000K SE +/- 20463.94, N = 12 2055844.83 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: LPOP - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: LPOP - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 400K 800K 1200K 1600K 2000K SE +/- 11821.21, N = 12 1679590.47 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: SADD - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: SADD - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 500K 1000K 1500K 2000K 2500K SE +/- 33076.13, N = 15 2268077.83 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
Redis Test: LPUSH - Parallel Connections: 1000 OpenBenchmarking.org Requests Per Second, More Is Better Redis 7.0.4 Test: LPUSH - Parallel Connections: 1000 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 400K 800K 1200K 1600K 2000K SE +/- 22632.71, N = 15 1744859.50 1. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3
POV-Ray Trace Time OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time AMD EPYC 7R13 48-Core 3 6 9 12 15 SE +/- 0.02, N = 3 10.62 1. (CXX) g++ options: -pipe -O3 -ffast-math -march=native -lXpm -lSM -lICE -lX11 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
Blender Blend File: BMW27 - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: BMW27 - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 1.305 2.61 3.915 5.22 6.525 SE +/- 0.02, N = 3 5.80
Blender Blend File: Classroom - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Classroom - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 3 6 9 12 15 SE +/- 0.01, N = 3 10.36
Blender Blend File: Fishy Cat - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Fishy Cat - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 3 6 9 12 15 SE +/- 0.04, N = 3 12.00
Blender Blend File: Barbershop - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Barbershop - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 10 20 30 40 50 SE +/- 0.07, N = 3 45.57
Blender Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA OpenBenchmarking.org Seconds, Fewer Is Better Blender 4.0 Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 5 10 15 20 25 SE +/- 0.00, N = 3 21.48
Memcached Set To Get Ratio: 1:1 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:1 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 300K 600K 900K 1200K 1500K SE +/- 8189.12, N = 3 1375588.16 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Memcached Set To Get Ratio: 1:5 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:5 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 800K 1600K 2400K 3200K 4000K SE +/- 34072.71, N = 3 3850395.87 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Memcached Set To Get Ratio: 5:1 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 5:1 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 200K 400K 600K 800K 1000K SE +/- 1751.22, N = 3 870630.72 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Memcached Set To Get Ratio: 1:10 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:10 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 1.1M 2.2M 3.3M 4.4M 5.5M SE +/- 8659.87, N = 3 4989344.12 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
Memcached Set To Get Ratio: 1:100 OpenBenchmarking.org Ops/sec, More Is Better Memcached 1.6.19 Set To Get Ratio: 1:100 AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB 1.1M 2.2M 3.3M 4.4M 5.5M SE +/- 11107.87, N = 3 5147962.66 1. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre
John The Ripper Test: bcrypt OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: bcrypt AMD EPYC 7R13 48-Core 16K 32K 48K 64K 80K SE +/- 507.63, N = 3 72465 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
John The Ripper Test: WPA PSK OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: WPA PSK AMD EPYC 7R13 48-Core 40K 80K 120K 160K 200K SE +/- 948.75, N = 3 166599 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
John The Ripper Test: Blowfish OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: Blowfish AMD EPYC 7R13 48-Core 16K 32K 48K 64K 80K SE +/- 340.86, N = 3 72501 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
John The Ripper Test: HMAC-SHA512 OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: HMAC-SHA512 AMD EPYC 7R13 48-Core 20M 40M 60M 80M 100M SE +/- 4103804.44, N = 15 89836867 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
John The Ripper Test: MD5 OpenBenchmarking.org Real C/S, More Is Better John The Ripper 2023.03.14 Test: MD5 AMD EPYC 7R13 48-Core 700K 1400K 2100K 2800K 3500K SE +/- 108593.08, N = 12 3044250 1. (CC) gcc options: -m64 -lssl -lcrypto -fopenmp -lgmp -lm -lrt -lz -ldl -lcrypt -lbz2
Phoronix Test Suite v10.8.5