hw-mig

AMD EPYC 7R13 48-Core testing with a Supermicro H12SSL-I v1.02 (2.7 BIOS) and NVIDIA GeForce RTX 4090 24GB on EndeavourOS rolling via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402178-NE-HWMIG611102&grw.

hw-migProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GBAMD EPYC 7R13 48-CoreAMD EPYC 7R13 48-Core @ 3.73GHz (48 Cores / 96 Threads)Supermicro H12SSL-I v1.02 (2.7 BIOS)AMD Starship/Matisse256GB15363GB Micron_7450_MTFDKCC15T3TFRNVIDIA GeForce RTX 4090 24GBNVIDIA AD102 HD Audio38GN9502 x Intel X710 for 10GbE SFP+EndeavourOS rolling6.7.4-zen1-1-zen (x86_64)Xfce 4.18X Server 1.21.1.11NVIDIA 545.29.064.6.0GCC 13.2.1 20230801 + Clang 16.0.6 + LLVM 16.0.6 + CUDA 12.3btrfs3840x1600OpenBenchmarking.orgKernel Details- Transparent Huge Pages: alwaysEnvironment Details- NVCC_PREPEND_FLAGS="-ccbin /opt/cuda/bin"Compiler Details- --disable-libssp --disable-libstdcxx-pch --disable-werror --enable-__cxa_atexit --enable-bootstrap --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++ --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-build-config=bootstrap-lto --with-linker-hash-style=gnu Processor Details- Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xa0011d1Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

hw-migllama-cpp: llama-2-7b.Q4_0.ggufllama-cpp: llama-2-13b.Q4_0.ggufllama-cpp: llama-2-70b-chat.Q5_0.ggufllamafile: llava-v1.5-7b-q4 - CPUllamafile: mistral-7b-instruct-v0.2.Q8_0 - CPUllamafile: wizardcoder-python-34b-v1.0.Q6_K - CPUblender: BMW27 - NVIDIA CUDAblender: Classroom - NVIDIA CUDAblender: Fishy Cat - NVIDIA CUDAblender: Barbershop - NVIDIA CUDAblender: Pabellon Barcelona - NVIDIA CUDApovray: Trace Timememcached: 1:1memcached: 1:5memcached: 5:1memcached: 1:10memcached: 1:100redis: GET - 50redis: GET - 1000redis: SET - 1000redis: LPOP - 1000redis: SADD - 1000redis: LPUSH - 1000AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GBAMD EPYC 7R13 48-Core23.1714.452.8622.3013.344.735.8010.3612.0045.5721.481375588.163850395.87870630.724989344.125147962.662792000.502505995.402055844.831679590.472268077.831744859.5010.616OpenBenchmarking.org

Llama.cpp

Model: llama-2-7b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-7b.Q4_0.ggufAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB612182430SE +/- 0.03, N = 323.171. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas

Llama.cpp

Model: llama-2-13b.Q4_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-13b.Q4_0.ggufAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB48121620SE +/- 0.01, N = 314.451. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas

Llama.cpp

Model: llama-2-70b-chat.Q5_0.gguf

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b1808Model: llama-2-70b-chat.Q5_0.ggufAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB0.64351.2871.93052.5743.2175SE +/- 0.01, N = 32.861. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -fopenmp -lopenblas

Llamafile

Test: llava-v1.5-7b-q4 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: llava-v1.5-7b-q4 - Acceleration: CPUAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB510152025SE +/- 0.05, N = 322.30

Llamafile

Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPUAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB3691215SE +/- 0.03, N = 313.34

Llamafile

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPUAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB1.06432.12863.19294.25725.3215SE +/- 0.01, N = 34.73

Blender

Blend File: BMW27 - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: BMW27 - Compute: NVIDIA CUDAAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB1.3052.613.9155.226.525SE +/- 0.02, N = 35.80

Blender

Blend File: Classroom - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Classroom - Compute: NVIDIA CUDAAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB3691215SE +/- 0.01, N = 310.36

Blender

Blend File: Fishy Cat - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Fishy Cat - Compute: NVIDIA CUDAAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB3691215SE +/- 0.04, N = 312.00

Blender

Blend File: Barbershop - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Barbershop - Compute: NVIDIA CUDAAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB1020304050SE +/- 0.07, N = 345.57

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA CUDA

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.0Blend File: Pabellon Barcelona - Compute: NVIDIA CUDAAMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB510152025SE +/- 0.00, N = 321.48

POV-Ray

Trace Time

OpenBenchmarking.orgSeconds, Fewer Is BetterPOV-Ray 3.7.0.7Trace TimeAMD EPYC 7R13 48-Core3691215SE +/- 0.02, N = 310.621. (CXX) g++ options: -pipe -O3 -ffast-math -march=native -lXpm -lSM -lICE -lX11 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system

Memcached

Set To Get Ratio: 1:1

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 1:1AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB300K600K900K1200K1500KSE +/- 8189.12, N = 31375588.161. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Memcached

Set To Get Ratio: 1:5

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 1:5AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB800K1600K2400K3200K4000KSE +/- 34072.71, N = 33850395.871. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Memcached

Set To Get Ratio: 5:1

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 5:1AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB200K400K600K800K1000KSE +/- 1751.22, N = 3870630.721. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Memcached

Set To Get Ratio: 1:10

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 1:10AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB1.1M2.2M3.3M4.4M5.5MSE +/- 8659.87, N = 34989344.121. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Memcached

Set To Get Ratio: 1:100

OpenBenchmarking.orgOps/sec, More Is BetterMemcached 1.6.19Set To Get Ratio: 1:100AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB1.1M2.2M3.3M4.4M5.5MSE +/- 11107.87, N = 35147962.661. (CXX) g++ options: -O2 -levent_openssl -levent -lcrypto -lssl -lpthread -lz -lpcre

Redis

Test: GET - Parallel Connections: 50

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 7.0.4Test: GET - Parallel Connections: 50AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB600K1200K1800K2400K3000KSE +/- 19715.89, N = 32792000.501. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3

Redis

Test: GET - Parallel Connections: 1000

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 7.0.4Test: GET - Parallel Connections: 1000AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB500K1000K1500K2000K2500KSE +/- 38739.75, N = 152505995.401. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3

Redis

Test: SET - Parallel Connections: 1000

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 7.0.4Test: SET - Parallel Connections: 1000AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB400K800K1200K1600K2000KSE +/- 20463.94, N = 122055844.831. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3

Redis

Test: LPOP - Parallel Connections: 1000

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 7.0.4Test: LPOP - Parallel Connections: 1000AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB400K800K1200K1600K2000KSE +/- 11821.21, N = 121679590.471. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3

Redis

Test: SADD - Parallel Connections: 1000

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 7.0.4Test: SADD - Parallel Connections: 1000AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB500K1000K1500K2000K2500KSE +/- 33076.13, N = 152268077.831. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3

Redis

Test: LPUSH - Parallel Connections: 1000

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 7.0.4Test: LPUSH - Parallel Connections: 1000AMD EPYC 7R13 48-Core - NVIDIA GeForce RTX 4090 24GB400K800K1200K1600K2000KSE +/- 22632.71, N = 151744859.501. (CXX) g++ options: -MM -MT -g3 -fvisibility=hidden -O3


Phoronix Test Suite v10.8.5