Gigabyte G242-P36 Ampere Altra Max Server

Benchmarks by Michael Larabel for a future article.

G242-P36

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v
Processor Notes: Scaling Governor: cppc_cpufreq performance (Boost: Disabled)
Python Notes: Python 3.11.6
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected

gig

dd

Processor: ARMv8 Neoverse-N1 @ 3.00GHz (128 Cores), Motherboard: GIGABYTE G242-P36-00 MP32-AR2-00 v01000100 (F31k SCP: 2.10.20220531 BIOS), Chipset: Ampere Computing LLC Altra PCI Root Complex A, Memory: 16 x 32 GB DDR4-3200MT/s Samsung M393A4K40DB3-CWE, Disk: 800GB Micron_7450_MTFDKBA800TFS, Graphics: ASPEED, Monitor: VGA HDMI, Network: 2 x Intel I350

OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1080

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.

Xmrig

Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.

Speedb

Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.

Xmrig

Neural Magic DeepSparse

This is a benchmark of Neural Magic's DeepSparse using its built-in deepsparse.benchmark utility and various models from their SparseZoo (https://sparsezoo.neuralmagic.com/). Learn more via the OpenBenchmarking.org test page.

Timed LLVM Compilation

This test times how long it takes to compile/build the LLVM compiler stack. Learn more via the OpenBenchmarking.org test page.

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

Neural Magic DeepSparse

Quicksilver

Quicksilver is a proxy application that represents some elements of the Mercury workload by solving a simplified dynamic Monte Carlo particle transport problem. Quicksilver is developed by Lawrence Livermore National Laboratory (LLNL) and this test profile currently makes use of the OpenMP CPU threaded code path. Learn more via the OpenBenchmarking.org test page.

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested or alternatively an allmodconfig for building all possible kernel modules for the build. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Timed LLVM Compilation

This test times how long it takes to compile/build the LLVM compiler stack. Learn more via the OpenBenchmarking.org test page.

Llama.cpp

Llama.cpp is a port of Facebook's LLaMA model in C/C++ developed by Georgi Gerganov. Llama.cpp allows the inference of LLaMA and other supported models in C/C++. For CPU inference Llama.cpp supports AVX2/AVX-512, ARM NEON, and other modern ISAs along with features like OpenBLAS usage. Learn more via the OpenBenchmarking.org test page.

Stockfish

This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.

OpenSSL

OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.

Speedb

Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.

RocksDB

This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.

Quicksilver

OpenSSL

Speedb

Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.

RocksDB

This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.

Llama.cpp

CacheBench

This is a performance test of CacheBench, which is part of LLCbench. CacheBench is designed to test the memory and cache bandwidth performance Learn more via the OpenBenchmarking.org test page.

RocksDB

This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Neural Magic DeepSparse

Algebraic Multi-Grid Benchmark

AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.

Timed Linux Kernel Compilation

Neural Magic DeepSparse

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Speedb

Speedb is a next-generation key value storage engine that is RocksDB compatible and aiming for stability, efficiency, and performance. Learn more via the OpenBenchmarking.org test page.

RocksDB

This is a benchmark of Meta/Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.

OpenSSL

Neural Magic DeepSparse

Quicksilver

Neural Magic DeepSparse

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Llama.cpp

miniFE

MiniFE Finite Element is an application for unstructured implicit finite element codes. Learn more via the OpenBenchmarking.org test page.

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

110 Results Shown

PyTorch:
CPU - 1 - Efficientnet_v2_l
CPU - 16 - ResNet-152
CPU - 16 - ResNet-50
CPU - 1 - ResNet-152
CPU - 1 - ResNet-50
Xmrig
Speedb
Xmrig
Neural Magic DeepSparse:
ResNet-50, Sparse INT8 - Asynchronous Multi-Stream:
ms/batch
items/sec
Timed LLVM Compilation
LeelaChessZero:
BLAS
Eigen
Neural Magic DeepSparse:
CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
ms/batch
items/sec
Quicksilver
Timed Linux Kernel Compilation
Stress-NG
Timed LLVM Compilation
Llama.cpp
Stockfish
OpenSSL:
ChaCha20-Poly1305
ChaCha20
AES-256-GCM
Speedb
RocksDB
Quicksilver
OpenSSL:
AES-128-GCM
SHA256
SHA512
Speedb
RocksDB
Llama.cpp
CacheBench:
Read
Read / Modify / Write
Write
RocksDB
Stress-NG:
Futex
Context Switching
Neural Magic DeepSparse:
BERT-Large, NLP Question Answering - Asynchronous Multi-Stream:
ms/batch
items/sec
Algebraic Multi-Grid Benchmark
Timed Linux Kernel Compilation
Neural Magic DeepSparse:
BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream:
ms/batch
items/sec
GROMACS
Stress-NG
Speedb:
Rand Fill
Rand Fill Sync
Update Rand
Read Rand Write Rand
RocksDB
OpenSSL:
RSA4096:
verify/s
sign/s
Neural Magic DeepSparse:
NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream:
ms/batch
items/sec
NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
ms/batch
items/sec
NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
ms/batch
items/sec
CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream:
ms/batch
items/sec
Quicksilver
Neural Magic DeepSparse:
CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
ms/batch
items/sec
NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
ms/batch
items/sec
ResNet-50, Baseline - Asynchronous Multi-Stream:
ms/batch
items/sec
CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
ms/batch
items/sec
7-Zip Compression:
Decompression Rating
Compression Rating
Stress-NG:
IO_uring
MMAP
Cloning
Malloc
CPU Cache
Pthread
Zlib
Vector Shuffle
Vector Math
Wide Vector Math
Matrix Math
Function Call
Matrix 3D Math
CPU Stress
AVL Tree
Crypto
Fused Multiply-Add
Hash
SENDFILE
AVX-512 VNNI
Glibc Qsort Data Sorting
Vector Floating Point
Floating Point
Poll
Glibc C String Functions
System V Message Passing
Forking
Memory Copying
Semaphores
Mutex
Mixed Scheduler
NUMA
Pipe
Socket Activity
Llama.cpp
miniFE
ACES DGEMM

G242-P36

Testing initiated at 16 January 2024 23:01 by user phoronix.

gig

Testing initiated at 17 January 2024 18:09 by user phoronix.

dd

OS: Ubuntu 23.10, Kernel: 6.5.0-13-generic (aarch64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1080

Testing initiated at 17 January 2024 20:45 by user phoronix.

Gigabyte G242-P36 Ampere Altra Max Server

View

Statistics

Graph Settings

Multi-Way Comparison

Table

Run Management

G242-P36

gig

dd

PyTorch

Xmrig

Speedb

Xmrig

Neural Magic DeepSparse

Timed LLVM Compilation

LeelaChessZero

Neural Magic DeepSparse

Quicksilver

Timed Linux Kernel Compilation

Stress-NG

Timed LLVM Compilation

Llama.cpp

Stockfish

OpenSSL

Speedb

RocksDB

Quicksilver

OpenSSL

Speedb

RocksDB

Llama.cpp

CacheBench

RocksDB

Stress-NG

Neural Magic DeepSparse

Algebraic Multi-Grid Benchmark

Timed Linux Kernel Compilation

Neural Magic DeepSparse

GROMACS

Stress-NG

Speedb

RocksDB

OpenSSL

Neural Magic DeepSparse

Quicksilver

Neural Magic DeepSparse

7-Zip Compression

Stress-NG

Llama.cpp

miniFE

ACES DGEMM

110 Results Shown

G242-P36

gig

dd