bandwidth ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402279-NE-BANDWIDTH66&sor&gru .
bandwidth Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution ARMv8 Neoverse-V2 b c d e f ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
bandwidth graph500: 26 graph500: 26 hpcg: 104 104 104 - 60 hpcg: 144 144 144 - 60 hpcg: 160 160 160 - 60 hpcg: 192 192 192 - 60 gromacs: MPI CPU - water_GMX50_bare graph500: 26 graph500: 26 ARMv8 Neoverse-V2 b c d e f 1573470000 1505570000 44.7059 41.9191 39.9158 38.7312 5.429 511909000 333905000 1574260000 1503510000 39.5718 38.9644 38.775 38.4843 5.52 501380000 322051000 1571860000 1498670000 38.9483 38.2333 38.5127 38.1051 5.529 492413000 327451000 1562170000 1489270000 38.6346 38.017 38.1725 38.0568 5.53 498666000 322704000 1547980000 1469720000 38.6864 37.9912 38.0004 37.9825 5.515 491501000 322883000 1559480000 1481400000 38.7058 38.1082 38.0297 37.8538 5.508 483974000 318469000 OpenBenchmarking.org
Graph500 Scale: 26 OpenBenchmarking.org bfs max_TEPS, More Is Better Graph500 3.0 Scale: 26 b ARMv8 Neoverse-V2 c d f e 300M 600M 900M 1200M 1500M 1574260000 1573470000 1571860000 1562170000 1559480000 1547980000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
Graph500 Scale: 26 OpenBenchmarking.org bfs median_TEPS, More Is Better Graph500 3.0 Scale: 26 ARMv8 Neoverse-V2 b c d f e 300M 600M 900M 1200M 1500M 1505570000 1503510000 1498670000 1489270000 1481400000 1469720000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 ARMv8 Neoverse-V2 b c f e d 10 20 30 40 50 SE +/- 0.02, N = 3 44.71 39.57 38.95 38.71 38.69 38.63 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 144 144 144 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 ARMv8 Neoverse-V2 b c f d e 10 20 30 40 50 SE +/- 0.21, N = 3 41.92 38.96 38.23 38.11 38.02 37.99 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 160 160 160 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 ARMv8 Neoverse-V2 b c d f e 9 18 27 36 45 SE +/- 0.15, N = 3 39.92 38.78 38.51 38.17 38.03 38.00 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 192 192 192 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 192 192 192 - RT: 60 ARMv8 Neoverse-V2 b c d e f 9 18 27 36 45 SE +/- 0.18, N = 3 38.73 38.48 38.11 38.06 37.98 37.85 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare d c b e f ARMv8 Neoverse-V2 1.2443 2.4886 3.7329 4.9772 6.2215 SE +/- 0.033, N = 3 5.530 5.529 5.520 5.515 5.508 5.429
Graph500 Scale: 26 OpenBenchmarking.org sssp max_TEPS, More Is Better Graph500 3.0 Scale: 26 ARMv8 Neoverse-V2 b d c e f 110M 220M 330M 440M 550M 511909000 501380000 498666000 492413000 491501000 483974000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
Graph500 Scale: 26 OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 ARMv8 Neoverse-V2 c e d b f 70M 140M 210M 280M 350M 333905000 327451000 322883000 322704000 322051000 318469000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
Phoronix Test Suite v10.8.5