Amazon AWS Graviton3E vs. Graviton 2/3 benchmarks

Benchmarks by Michael Larabel for a future article on Phoronix.com.

m7g.16xlarge Graviton3

Processor: ARMv8 Neoverse-V1 (64 Cores), Motherboard: Amazon EC2 m7g.16xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 256GB, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v
Python Notes: Python 3.10.6
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected

c6g.16xlarge Graviton2

Changed Processor to ARMv8 Neoverse-N1 (64 Cores).

Changed Motherboard to Amazon EC2 c6g.16xlarge (1.0 BIOS).

Changed Memory to 128GB.

c7g.16xlarge Graviton3

Changed Processor to ARMv8 Neoverse-V1 (64 Cores).

Changed Motherboard to Amazon EC2 c7g.16xlarge (1.0 BIOS).

c7gn.16xlarge Graviton3E

Changed Motherboard to Amazon EC2 c7gn.16xlarge (1.0 BIOS).

c6a.16xlarge AMD Zen 3

Processor: AMD EPYC 7R13 (32 Cores / 64 Threads), Motherboard: Amazon EC2 c6a.16xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 128GB, Disk: 322GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (x86_64), Vulkan: 1.3.238, Compiler: GCC 11.4.0, File-System: ext4, System Layer: amazon

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: CPU Microcode: 0xa0011cf
Python Notes: Python 3.10.12
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

egeo-07

Processor: 2 x Intel Xeon Silver 4208 @ 3.20GHz (16 Cores / 32 Threads), Motherboard: Dell Precision 7920 Rack 0DY2X0 (2.21.2 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 64GB, Disk: 2000GB TOSHIBA DT01ACA2, Graphics: Matrox G200eW3 15GB, Audio: NVIDIA TU104 HD Audio, Monitor: DELL 17FP, Network: 4 x Intel I350

OS: Debian 11, Kernel: 5.10.0-28-amd64 (x86_64), Display Server: X Server, Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.2.138, Vulkan: 1.3.242, Compiler: GCC 10.2.1 20210110 + Clang 11.0.1-2 + CUDA 11.2, File-System: ext4, Screen Resolution: 1280x1024

Kernel Notes: Transparent Huge Pages: always
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-mutex --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x5003605
Python Notes: Python 2.7.18 + Python 3.9.2
Security Notes: gather_data_sampling: Mitigation of Microcode + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Mitigation of Enhanced IBRS + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Graph500

This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

Graph500

HeFFTe - Highly Efficient FFT for Exascale

Graph500

HeFFTe - Highly Efficient FFT for Exascale

nekRS

nekRS is an open-source Navier Stokes solver based on the spectral element method. NekRS supports both CPU and GPU/accelerator support though this test profile is currently configured for CPU execution. NekRS is part of Nek5000 of the Mathematics and Computer Science MCS at Argonne National Laboratory. This nekRS benchmark is primarily relevant to large core count HPC servers and otherwise may be very time consuming on smaller systems. Learn more via the OpenBenchmarking.org test page.

Input: Kershaw

egeo-07: The test quit with a non-zero exit status. E: [egeo-07.qteorica.unal.edu.co:290025] PMIX ERROR: UNREACHABLE in file ../../../src/server/pmix_server.c at line 2795

Input: TurboPipe Periodic

egeo-07: The test quit with a non-zero exit status. E: [egeo-07.qteorica.unal.edu.co:290233] PMIX ERROR: UNREACHABLE in file ../../../src/server/pmix_server.c at line 2795

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

Backend: BLAS

egeo-07: The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

Backend: Eigen

egeo-07: The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

LAMMPS Molecular Dynamics Simulator

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

LAMMPS Molecular Dynamics Simulator

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

BRL-CAD

BRL-CAD is a cross-platform, open-source solid modeling system with built-in benchmark mode. Learn more via the OpenBenchmarking.org test page.

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

Rodinia

Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

Algebraic Multi-Grid Benchmark

AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.

Kripke

Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.

LULESH

LULESH is the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. Learn more via the OpenBenchmarking.org test page.

NWChem

NWChem is an open-source high performance computational chemistry package. Per NWChem's documentation, "NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters." Learn more via the OpenBenchmarking.org test page.

Monte Carlo Simulations of Ionised Nebulae

Mocassin is the Monte Carlo Simulations of Ionised Nebulae. MOCASSIN is a fully 3D or 2D photoionisation and dust radiative transfer code which employs a Monte Carlo approach to the transfer of radiation through media of arbitrary geometry and density distribution. Learn more via the OpenBenchmarking.org test page.

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

GPAW

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.

Coremark

This is a test of EEMBC CoreMark processor benchmark. Learn more via the OpenBenchmarking.org test page.

Stockfish

This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

Timed Godot Game Engine Compilation

This test times how long it takes to compile the Godot Game Engine. Godot is a popular, open-source, cross-platform 2D/3D game engine and is built using the SCons build system and targeting the X11 platform. Learn more via the OpenBenchmarking.org test page.

Timed Gem5 Compilation

This test times how long it takes to compile Gem5. Gem5 is a simulator for computer system architecture research. Gem5 is widely used for computer architecture research within the industry, academia, and more. Learn more via the OpenBenchmarking.org test page.

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

srsRAN Project

srsRAN Project is a complete ORAN-native 5G RAN solution created by Software Radio Systems (SRS). The srsRAN Project radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.

nginx

This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenSSL

OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.

87 Results Shown

HeFFTe - Highly Efficient FFT for Exascale:
c2c - FFTW - float - 128
c2c - FFTW - float - 256
Laghos:
Triple Point Problem
Sedov Blast Wave, ube_922_hex.mesh
Stress-NG:
NUMA
CPU Cache
Matrix Math
Vector Math
Matrix 3D Math
Vector Shuffle
HeFFTe - Highly Efficient FFT for Exascale
Stress-NG:
Memory Copying
Wide Vector Math
Fused Multiply-Add
Vector Floating Point
Graph500
HeFFTe - Highly Efficient FFT for Exascale
Graph500
HeFFTe - Highly Efficient FFT for Exascale
Graph500:
26:
sssp median_TEPS
sssp max_TEPS
HeFFTe - Highly Efficient FFT for Exascale:
r2c - FFTW - double - 256
r2c - FFTW - double - 512
r2c - FFTW - double - 128
c2c - FFTW - double - 512
c2c - FFTW - double - 256
c2c - FFTW - double - 128
nekRS:
Kershaw
TurboPipe Periodic
LeelaChessZero:
BLAS
Eigen
GROMACS
LAMMPS Molecular Dynamics Simulator
HeFFTe - Highly Efficient FFT for Exascale
LAMMPS Molecular Dynamics Simulator
Remhos
BRL-CAD
NAS Parallel Benchmarks:
CG.C
EP.D
LU.C
MG.C
SP.C
Rodinia:
OpenMP LavaMD
OpenMP CFD Solver
OpenMP Streamcluster
ACES DGEMM
Pennant:
sedovbig
leblancbig
Algebraic Multi-Grid Benchmark
Kripke
LULESH
NWChem
Monte Carlo Simulations of Ionised Nebulae:
Gas HII40
Dust 2D tau100.0
QMCPACK:
Li2_STO_ae
simple-H2O
FeCO6_b3lyp_gms
FeCO6_b3lyp_gms
Xcompact3d Incompact3d:
input.i3d 129 Cells Per Direction
input.i3d 193 Cells Per Direction
GPAW
Coremark
Stockfish
7-Zip Compression:
Compression Rating
Decompression Rating
Timed Godot Game Engine Compilation
Timed Gem5 Compilation
Timed Node.js Compilation
Liquid-DSP:
32 - 256 - 32
32 - 256 - 57
64 - 256 - 32
64 - 256 - 57
32 - 256 - 512
64 - 256 - 512
srsRAN Project:
Downlink Processor Benchmark
PUSCH Processor Benchmark, Throughput Total
PUSCH Processor Benchmark, Throughput Thread
nginx:
500
1000
OpenSSL:
SHA256
SHA512
RSA4096
RSA4096
ChaCha20
AES-128-GCM
AES-256-GCM
ChaCha20-Poly1305

m7g.16xlarge Graviton3

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Testing initiated at 22 June 2023 16:24 by user ubuntu.

c6g.16xlarge Graviton2

Processor: ARMv8 Neoverse-N1 (64 Cores), Motherboard: Amazon EC2 c6g.16xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 128GB, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Testing initiated at 23 June 2023 01:32 by user ubuntu.

c7g.16xlarge Graviton3

Processor: ARMv8 Neoverse-V1 (64 Cores), Motherboard: Amazon EC2 c7g.16xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 128GB, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Testing initiated at 23 June 2023 10:31 by user ubuntu.

c7gn.16xlarge Graviton3E

Processor: ARMv8 Neoverse-V1 (64 Cores), Motherboard: Amazon EC2 c7gn.16xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 128GB, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Testing initiated at 10 July 2023 15:05 by user ubuntu.

c6a.16xlarge AMD Zen 3

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (x86_64), Vulkan: 1.3.238, Compiler: GCC 11.4.0, File-System: ext4, System Layer: amazon

Testing initiated at 11 August 2023 14:59 by user ubuntu.

egeo-07

Testing initiated at 28 May 2024 01:22 by user root.

Amazon AWS Graviton3E vs. Graviton 2/3 benchmarks

View

Statistics

Graph Settings

Additional Graphs

Multi-Way Comparison

Table

Run Management

m7g.16xlarge Graviton3

c6g.16xlarge Graviton2

c7g.16xlarge Graviton3

c7gn.16xlarge Graviton3E

c6a.16xlarge AMD Zen 3

egeo-07

HeFFTe - Highly Efficient FFT for Exascale

Laghos

Stress-NG

HeFFTe - Highly Efficient FFT for Exascale

Stress-NG

Graph500

HeFFTe - Highly Efficient FFT for Exascale

Graph500

HeFFTe - Highly Efficient FFT for Exascale

Graph500

HeFFTe - Highly Efficient FFT for Exascale

nekRS

LeelaChessZero

GROMACS

LAMMPS Molecular Dynamics Simulator

HeFFTe - Highly Efficient FFT for Exascale

LAMMPS Molecular Dynamics Simulator

Remhos

BRL-CAD

NAS Parallel Benchmarks

Rodinia

ACES DGEMM

Pennant

Algebraic Multi-Grid Benchmark

Kripke

LULESH

NWChem

Monte Carlo Simulations of Ionised Nebulae

QMCPACK

Xcompact3d Incompact3d

GPAW

Coremark

Stockfish

7-Zip Compression

Timed Godot Game Engine Compilation

Timed Gem5 Compilation

Timed Node.js Compilation

Liquid-DSP

srsRAN Project

nginx

OpenSSL

87 Results Shown

m7g.16xlarge Graviton3

c6g.16xlarge Graviton2

c7g.16xlarge Graviton3

c7gn.16xlarge Graviton3E

c6a.16xlarge AMD Zen 3

egeo-07