Amazon AWS Graviton3E vs. Graviton 2/3 benchmarks

Benchmarks by Michael Larabel for a future article on Phoronix.com.

m7g.16xlarge Graviton3

Processor: ARMv8 Neoverse-V1 (64 Cores), Motherboard: Amazon EC2 m7g.16xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 256GB, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v
Python Notes: Python 3.10.6
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected

c6g.16xlarge Graviton2

Changed Processor to ARMv8 Neoverse-N1 (64 Cores).

Changed Motherboard to Amazon EC2 c6g.16xlarge (1.0 BIOS).

Changed Memory to 128GB.

c7g.16xlarge Graviton3

Changed Processor to ARMv8 Neoverse-V1 (64 Cores).

Changed Motherboard to Amazon EC2 c7g.16xlarge (1.0 BIOS).

c7gn.16xlarge Graviton3E

Changed Motherboard to Amazon EC2 c7gn.16xlarge (1.0 BIOS).

c6a.16xlarge AMD Zen 3

Processor: AMD EPYC 7R13 (32 Cores / 64 Threads), Motherboard: Amazon EC2 c6a.16xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 128GB, Disk: 322GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (x86_64), Vulkan: 1.3.238, Compiler: GCC 11.4.0, File-System: ext4, System Layer: amazon

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: CPU Microcode: 0xa0011cf
Python Notes: Python 3.10.12
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

egeo-07

Processor: 2 x Intel Xeon Silver 4208 @ 3.20GHz (16 Cores / 32 Threads), Motherboard: Dell Precision 7920 Rack 0DY2X0 (2.21.2 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 64GB, Disk: 2000GB TOSHIBA DT01ACA2, Graphics: Matrox G200eW3 15GB, Audio: NVIDIA TU104 HD Audio, Monitor: DELL 17FP, Network: 4 x Intel I350

OS: Debian 11, Kernel: 5.10.0-28-amd64 (x86_64), Display Server: X Server, Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.2.138, Vulkan: 1.3.242, Compiler: GCC 10.2.1 20210110 + Clang 11.0.1-2 + CUDA 11.2, File-System: ext4, Screen Resolution: 1280x1024

Kernel Notes: Transparent Huge Pages: always
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-mutex --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x5003605
Python Notes: Python 2.7.18 + Python 3.9.2
Security Notes: gather_data_sampling: Mitigation of Microcode + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Mitigation of Enhanced IBRS + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

OpenSSL

OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test profile makes use of the built-in "openssl speed" benchmarking capabilities. Learn more via the OpenBenchmarking.org test page.

BRL-CAD

BRL-CAD is a cross-platform, open-source solid modeling system with built-in benchmark mode. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

OpenSSL

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Pennant

Pennant is an application focused on hydrodynamics on general unstructured meshes in 2D. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Xcompact3d Incompact3d

HeFFTe - Highly Efficient FFT for Exascale

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Graph500

This is a benchmark of the reference implementation of Graph500, an HPC benchmark focused on data intensive loads and commonly tested on supercomputers for complex data problems. Graph500 primarily stresses the communication subsystem of the hardware under test. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

NWChem

NWChem is an open-source high performance computational chemistry package. Per NWChem's documentation, "NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters." Learn more via the OpenBenchmarking.org test page.

srsRAN Project

srsRAN Project is a complete ORAN-native 5G RAN solution created by Software Radio Systems (SRS). The srsRAN Project radio suite was formerly known as srsLTE and can be used for building your own software-defined radio (SDR) 4G/5G mobile network. Learn more via the OpenBenchmarking.org test page.

Graph500

HeFFTe - Highly Efficient FFT for Exascale

LAMMPS Molecular Dynamics Simulator

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

NAS Parallel Benchmarks

NPB, NAS Parallel Benchmarks, is a benchmark developed by NASA for high-end computer systems. This test profile currently uses the MPI version of NPB. This test profile offers selecting the different NPB tests/problems and varying problem sizes. Learn more via the OpenBenchmarking.org test page.

LULESH

LULESH is the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

LAMMPS Molecular Dynamics Simulator

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Learn more via the OpenBenchmarking.org test page.

Graph500

Rodinia

Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

GPAW

GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE). Learn more via the OpenBenchmarking.org test page.

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

Coremark

This is a test of EEMBC CoreMark processor benchmark. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

Graph500

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

Rodinia

7-Zip Compression

This is a test of 7-Zip compression/decompression with its integrated benchmark feature. Learn more via the OpenBenchmarking.org test page.

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

Algebraic Multi-Grid Benchmark

AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided with AMG builds linear systems for various 3-dimensional problems. Learn more via the OpenBenchmarking.org test page.

OpenSSL

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Laghos

nginx

This is a benchmark of the lightweight Nginx HTTP(S) web-server. This Nginx web server benchmark test profile makes use of the wrk program for facilitating the HTTP requests over a fixed period time with a configurable number of concurrent clients/connections. HTTPS with a self-signed OpenSSL certificate is used by this test for local benchmarking. Learn more via the OpenBenchmarking.org test page.

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

Liquid-DSP

OpenSSL

Liquid-DSP

srsRAN Project

NAS Parallel Benchmarks

OpenSSL

nginx

Monte Carlo Simulations of Ionised Nebulae

Mocassin is the Monte Carlo Simulations of Ionised Nebulae. MOCASSIN is a fully 3D or 2D photoionisation and dust radiative transfer code which employs a Monte Carlo approach to the transfer of radiation through media of arbitrary geometry and density distribution. Learn more via the OpenBenchmarking.org test page.

srsRAN Project

Liquid-DSP

QMCPACK

Kripke

Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.

QMCPACK

NAS Parallel Benchmarks

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

Rodinia

Stress-NG

Stress-NG is a Linux stress tool developed by Colin Ian King. Learn more via the OpenBenchmarking.org test page.

Timed Godot Game Engine Compilation

This test times how long it takes to compile the Godot Game Engine. Godot is a popular, open-source, cross-platform 2D/3D game engine and is built using the SCons build system and targeting the X11 platform. Learn more via the OpenBenchmarking.org test page.

NAS Parallel Benchmarks

Timed Gem5 Compilation

This test times how long it takes to compile Gem5. Gem5 is a simulator for computer system architecture research. Gem5 is widely used for computer architecture research within the industry, academia, and more. Learn more via the OpenBenchmarking.org test page.

OpenSSL

nekRS

nekRS is an open-source Navier Stokes solver based on the spectral element method. NekRS supports both CPU and GPU/accelerator support though this test profile is currently configured for CPU execution. NekRS is part of Nek5000 of the Mathematics and Computer Science MCS at Argonne National Laboratory. This nekRS benchmark is primarily relevant to large core count HPC servers and otherwise may be very time consuming on smaller systems. Learn more via the OpenBenchmarking.org test page.

Input: Kershaw

egeo-07: The test quit with a non-zero exit status. E: [egeo-07.qteorica.unal.edu.co:290025] PMIX ERROR: UNREACHABLE in file ../../../src/server/pmix_server.c at line 2795

NAS Parallel Benchmarks

Monte Carlo Simulations of Ionised Nebulae

OpenSSL

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

nekRS

Input: TurboPipe Periodic

egeo-07: The test quit with a non-zero exit status. E: [egeo-07.qteorica.unal.edu.co:290233] PMIX ERROR: UNREACHABLE in file ../../../src/server/pmix_server.c at line 2795

Liquid-DSP

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

Backend: Eigen

egeo-07: The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

Backend: BLAS

egeo-07: The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

Stockfish

This is a test of Stockfish, an advanced open-source C++11 chess benchmark that can scale up to 512 CPU threads. Learn more via the OpenBenchmarking.org test page.

87 Results Shown

m7g.16xlarge Graviton3

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Testing initiated at 22 June 2023 16:24 by user ubuntu.

c6g.16xlarge Graviton2

Processor: ARMv8 Neoverse-N1 (64 Cores), Motherboard: Amazon EC2 c6g.16xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 128GB, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Testing initiated at 23 June 2023 01:32 by user ubuntu.

c7g.16xlarge Graviton3

Processor: ARMv8 Neoverse-V1 (64 Cores), Motherboard: Amazon EC2 c7g.16xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 128GB, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Testing initiated at 23 June 2023 10:31 by user ubuntu.

c7gn.16xlarge Graviton3E

Processor: ARMv8 Neoverse-V1 (64 Cores), Motherboard: Amazon EC2 c7gn.16xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 128GB, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (aarch64), Compiler: GCC 11.3.0, File-System: ext4, System Layer: amazon

Testing initiated at 10 July 2023 15:05 by user ubuntu.

c6a.16xlarge AMD Zen 3

OS: Ubuntu 22.04, Kernel: 5.19.0-1025-aws (x86_64), Vulkan: 1.3.238, Compiler: GCC 11.4.0, File-System: ext4, System Layer: amazon

Testing initiated at 11 August 2023 14:59 by user ubuntu.

egeo-07

Testing initiated at 28 May 2024 01:22 by user root.

Amazon AWS Graviton3E vs. Graviton 2/3 benchmarks

View

Statistics

Graph Settings

Additional Graphs

Multi-Way Comparison

Table

Run Management

m7g.16xlarge Graviton3

c6g.16xlarge Graviton2

c7g.16xlarge Graviton3

c7gn.16xlarge Graviton3E

c6a.16xlarge AMD Zen 3

egeo-07

Pennant

OpenSSL

BRL-CAD

Stress-NG

HeFFTe - Highly Efficient FFT for Exascale

Xcompact3d Incompact3d

OpenSSL

Stress-NG

Pennant

Stress-NG

Xcompact3d Incompact3d

HeFFTe - Highly Efficient FFT for Exascale

Laghos

HeFFTe - Highly Efficient FFT for Exascale

Stress-NG

Graph500

Stress-NG

NWChem

srsRAN Project

Graph500

HeFFTe - Highly Efficient FFT for Exascale

LAMMPS Molecular Dynamics Simulator

HeFFTe - Highly Efficient FFT for Exascale

NAS Parallel Benchmarks

LULESH

Stress-NG

Remhos

LAMMPS Molecular Dynamics Simulator

Graph500

Rodinia

7-Zip Compression

HeFFTe - Highly Efficient FFT for Exascale

GPAW

Stress-NG

HeFFTe - Highly Efficient FFT for Exascale

Coremark

HeFFTe - Highly Efficient FFT for Exascale

Graph500

GROMACS

Rodinia

7-Zip Compression

Liquid-DSP

Algebraic Multi-Grid Benchmark

OpenSSL

Stress-NG

Laghos

nginx

QMCPACK

Liquid-DSP

OpenSSL

Liquid-DSP

srsRAN Project

NAS Parallel Benchmarks

OpenSSL

nginx

Monte Carlo Simulations of Ionised Nebulae

srsRAN Project

Liquid-DSP

QMCPACK

Kripke

QMCPACK

NAS Parallel Benchmarks

Timed Node.js Compilation

Rodinia

Stress-NG

Timed Godot Game Engine Compilation