Ff Benchmarks [2401116-NE-FF240899407]

Tests for a future article. AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1802 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 22.04 via the Phoronix Test Suite.

a

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107a
Python Notes: Python 3.10.12
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

b

Processor: AMD Ryzen Threadripper 3970X 32-Core @ 3.70GHz (32 Cores / 64 Threads), Motherboard: ASUS ROG ZENITH II EXTREME (1802 BIOS), Chipset: AMD Starship/Matisse, Memory: 4 x 16 GB DRAM-3600MT/s Corsair CMT64GX4M4Z3600C16, Disk: Samsung SSD 980 PRO 500GB, Graphics: AMD Radeon RX 5700 8GB (1750/875MHz), Audio: AMD Navi 10 HDMI Audio, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200

OS: Ubuntu 22.04, Kernel: 6.2.0-39-generic (x86_64), Desktop: GNOME Shell 42.2, Display Server: X Server + Wayland, OpenGL: 4.6 Mesa 22.0.1 (LLVM 13.0.1 DRM 3.49), Vulkan: 1.2.204, Compiler: GCC 11.4.0, File-System: ext4, Screen Resolution: 3840x2160

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

CacheBench

This is a performance test of CacheBench, which is part of LLCbench. CacheBench is designed to test the memory and cache bandwidth performance Learn more via the OpenBenchmarking.org test page.

rav1e

Xiph rav1e is a Rust-written AV1 video encoder that claims to be the fastest and safest AV1 encoder. Learn more via the OpenBenchmarking.org test page.

Y-Cruncher

Y-Cruncher is a multi-threaded Pi benchmark capable of computing Pi to trillions of digits. Learn more via the OpenBenchmarking.org test page.

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

Speedb

Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.

Llama.cpp

Llama.cpp is a port of Facebook's LLaMA model in C/C++ developed by Georgi Gerganov. Llama.cpp allows the inference of LLaMA and other supported models in C/C++. For CPU inference Llama.cpp supports AVX2/AVX-512, ARM NEON, and other modern ISAs along with features like OpenBLAS usage. Learn more via the OpenBenchmarking.org test page.

36 Results Shown

Quicksilver:
CTS2
CORAL2 P1
CORAL2 P2
CacheBench:
Read
Write
Read / Modify / Write
rav1e:
1
5
6
10
Y-Cruncher:
1B
500M
PyTorch:
CPU - 1 - ResNet-50
CPU - 1 - ResNet-152
CPU - 16 - ResNet-50
CPU - 16 - ResNet-152
CPU - 1 - Efficientnet_v2_l
CPU - 16 - Efficientnet_v2_l
TensorFlow:
CPU - 1 - VGG-16
CPU - 1 - AlexNet
CPU - 16 - VGG-16
CPU - 16 - AlexNet
CPU - 1 - GoogLeNet
CPU - 1 - ResNet-50
CPU - 16 - GoogLeNet
CPU - 16 - ResNet-50
Speedb:
Rand Fill
Rand Read
Update Rand
Seq Fill
Rand Fill Sync
Read While Writing
Read Rand Write Rand
Llama.cpp:
llama-2-7b.Q4_0.gguf
llama-2-13b.Q4_0.gguf
llama-2-70b-chat.Q5_0.gguf

a

Testing initiated at 28 January 2024 20:40 by user phoronix.

b

Testing initiated at 29 January 2024 00:22 by user phoronix.

ff

View

Statistics

Graph Settings

Multi-Way Comparison

Table

Run Management

a

b