Ddf Benchmarks [2401089-NE-DDF54911740]

Tests for a future article. AMD EPYC 8534PN 64-Core testing with a AMD Cinnabar (RCB1009C BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.

QuantLib

QuantLib is an open-source library/framework around quantitative finance for modeling, trading and risk management scenarios. QuantLib is written in C++ with Boost and its built-in benchmark used reports the QuantLib Benchmark Index benchmark score. Learn more via the OpenBenchmarking.org test page.

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

CloverLeaf

CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version. Learn more via the OpenBenchmarking.org test page.

OpenRadioss

OpenRadioss is an open-source AGPL-licensed finite element solver for dynamic event analysis OpenRadioss is based on Altair Radioss and open-sourced in 2022. This open-source finite element solver is benchmarked with various example models available from https://www.openradioss.org/models/ and https://github.com/OpenRadioss/ModelExchange/tree/main/Examples. This test is currently using a reference OpenRadioss binary build offered via GitHub. Learn more via the OpenBenchmarking.org test page.

FFmpeg

This is a benchmark of the FFmpeg multimedia framework. The FFmpeg test profile is making use of a modified version of vbench from Columbia University's Architecture and Design Lab (ARCADE) [http://arcade.cs.columbia.edu/vbench/] that is a benchmark for video-as-a-service workloads. The test profile offers the options of a range of vbench scenarios based on freely distributable video content and offers the options of using the x264 or x265 video encoders for transcoding. Learn more via the OpenBenchmarking.org test page.

Xmrig

Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.

WebP2 Image Encode

This is a test of Google's libwebp2 library with the WebP2 image encode utility and using a sample 6000x4000 pixel JPEG image as the input, similar to the WebP/libwebp test profile. WebP2 is currently experimental and under heavy development as ultimately the successor to WebP. WebP2 supports 10-bit HDR, more efficienct lossy compression, improved lossless compression, animation support, and full multi-threading support compared to WebP. Learn more via the OpenBenchmarking.org test page.

easyWave

The easyWave software allows simulating tsunami generation and propagation in the context of early warning systems. EasyWave supports making use of OpenMP for CPU multi-threading and there are also GPU ports available but not currently incorporated as part of this test profile. The easyWave tsunami generation software is run with one of the example/reference input files for measuring the CPU execution time. Learn more via the OpenBenchmarking.org test page.

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

rav1e

Xiph rav1e is a Rust-written AV1 video encoder that claims to be the fastest and safest AV1 encoder. Learn more via the OpenBenchmarking.org test page.

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

Timed FFmpeg Compilation

This test times how long it takes to build the FFmpeg multimedia library. Learn more via the OpenBenchmarking.org test page.

Timed Gem5 Compilation

This test times how long it takes to compile Gem5. Gem5 is a simulator for computer system architecture research. Gem5 is widely used for computer architecture research within the industry, academia, and more. Learn more via the OpenBenchmarking.org test page.

Y-Cruncher

Y-Cruncher is a multi-threaded Pi benchmark capable of computing Pi to trillions of digits. Learn more via the OpenBenchmarking.org test page.

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

Blender

Blender is an open-source 3D creation and modeling software project. This test is of Blender's Cycles performance with various sample files. GPU computing via NVIDIA OptiX and NVIDIA CUDA is currently supported as well as HIP for AMD Radeon GPUs and Intel oneAPI for Intel Graphics. Learn more via the OpenBenchmarking.org test page.

Speedb

Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.

137 Results Shown

QuantLib:
Multi-Threaded
Single-Threaded
LeelaChessZero:
BLAS
Eigen
Quicksilver:
CTS2
CORAL2 P1
CORAL2 P2
CloverLeaf:
clover_bm
clover_bm64_short
OpenRadioss:
Bumper Beam
Chrysler Neon 1M
Cell Phone Drop Test
Bird Strike on Windshield
Rubber O-Ring Seal Installation
INIVOL and Fluid Structure Interaction Drop Container
FFmpeg:
libx265 - Live
libx265 - Upload
libx265 - Platform
libx265 - Video On Demand
Xmrig:
KawPow - 1M
Monero - 1M
Wownero - 1M
GhostRider - 1M
CryptoNight-Heavy - 1M
CryptoNight-Femto UPX2 - 1M
WebP2 Image Encode:
Default
Quality 75, Compression Effort 7
Quality 95, Compression Effort 7
Quality 100, Compression Effort 5
Quality 100, Lossless Compression
easyWave:
e2Asean Grid + BengkuluSept2007 Source - 240
e2Asean Grid + BengkuluSept2007 Source - 1200
e2Asean Grid + BengkuluSept2007 Source - 2400
Embree:
Pathtracer - Crown
Pathtracer ISPC - Crown
Pathtracer - Asian Dragon
Pathtracer - Asian Dragon Obj
Pathtracer ISPC - Asian Dragon
Pathtracer ISPC - Asian Dragon Obj
rav1e:
1
5
6
10
SVT-AV1:
Preset 4 - Bosphorus 4K
Preset 8 - Bosphorus 4K
Preset 12 - Bosphorus 4K
Preset 13 - Bosphorus 4K
Preset 4 - Bosphorus 1080p
Preset 8 - Bosphorus 1080p
Preset 12 - Bosphorus 1080p
Preset 13 - Bosphorus 1080p
Timed FFmpeg Compilation
Timed Gem5 Compilation
Y-Cruncher:
1B
5B
10B
500M
PyTorch:
CPU - 1 - ResNet-50
CPU - 1 - ResNet-152
CPU - 16 - ResNet-50
CPU - 32 - ResNet-50
CPU - 64 - ResNet-50
CPU - 16 - ResNet-152
CPU - 32 - ResNet-152
CPU - 64 - ResNet-152
CPU - 1 - Efficientnet_v2_l
CPU - 16 - Efficientnet_v2_l
CPU - 32 - Efficientnet_v2_l
CPU - 64 - Efficientnet_v2_l
TensorFlow:
CPU - 1 - VGG-16
CPU - 1 - AlexNet
CPU - 16 - VGG-16
CPU - 16 - AlexNet
CPU - 1 - GoogLeNet
CPU - 1 - ResNet-50
CPU - 16 - GoogLeNet
CPU - 16 - ResNet-50
Neural Magic DeepSparse:
NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream:
items/sec
ms/batch
NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream:
items/sec
ms/batch
ResNet-50, Baseline - Asynchronous Multi-Stream:
items/sec
ms/batch
ResNet-50, Baseline - Synchronous Single-Stream:
items/sec
ms/batch
ResNet-50, Sparse INT8 - Asynchronous Multi-Stream:
items/sec
ms/batch
ResNet-50, Sparse INT8 - Synchronous Single-Stream:
items/sec
ms/batch
CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Detection, YOLOv5s COCO - Synchronous Single-Stream:
items/sec
ms/batch
BERT-Large, NLP Question Answering - Asynchronous Multi-Stream:
items/sec
ms/batch
BERT-Large, NLP Question Answering - Synchronous Single-Stream:
items/sec
ms/batch
CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream:
items/sec
ms/batch
CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream:
items/sec
ms/batch
NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream:
items/sec
ms/batch
CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream:
items/sec
ms/batch
BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream:
items/sec
ms/batch
BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream:
items/sec
ms/batch
NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream:
items/sec
ms/batch
Blender:
BMW27 - CPU-Only
Classroom - CPU-Only
Fishy Cat - CPU-Only
Barbershop - CPU-Only
Pabellon Barcelona - CPU-Only
Speedb:
Rand Fill
Rand Read
Update Rand
Seq Fill
Rand Fill Sync
Read While Writing
Read Rand Write Rand

a

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-FTCNCZ/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-FTCNCZ/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xaa00212
Python Notes: Python 3.11.5
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 7 January 2024 21:18 by user phoronix.

b

Processor: AMD EPYC 8534PN 64-Core @ 2.00GHz (64 Cores / 128 Threads), Motherboard: AMD Cinnabar (RCB1009C BIOS), Chipset: AMD Device 14a4, Memory: 192GB, Disk: 3201GB Micron_7450_MTFDKCB3T2TFS, Graphics: ASPEED, Network: 2 x Broadcom NetXtreme BCM5720 PCIe

OS: Ubuntu 23.10, Kernel: 6.5.0-5-generic (x86_64), Desktop: GNOME Shell, Display Server: X Server 1.21.1.7, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 640x480

Testing initiated at 8 January 2024 00:13 by user phoronix.

ddf

View

Statistics

Graph Settings

Multi-Way Comparison

Table

Run Management

a

b