bandwidth ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402279-NE-BANDWIDTH66&grw&sro .
bandwidth Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution ARMv8 Neoverse-V2 b c d e f ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
bandwidth graph500: 26 graph500: 26 graph500: 26 graph500: 26 gromacs: MPI CPU - water_GMX50_bare hpcg: 104 104 104 - 60 hpcg: 144 144 144 - 60 hpcg: 160 160 160 - 60 hpcg: 192 192 192 - 60 ARMv8 Neoverse-V2 b c d e f 1505570000 1573470000 333905000 511909000 5.429 44.7059 41.9191 39.9158 38.7312 1503510000 1574260000 322051000 501380000 5.52 39.5718 38.9644 38.775 38.4843 1498670000 1571860000 327451000 492413000 5.529 38.9483 38.2333 38.5127 38.1051 1489270000 1562170000 322704000 498666000 5.53 38.6346 38.017 38.1725 38.0568 1469720000 1547980000 322883000 491501000 5.515 38.6864 37.9912 38.0004 37.9825 1481400000 1559480000 318469000 483974000 5.508 38.7058 38.1082 38.0297 37.8538 OpenBenchmarking.org
Graph500 Scale: 26 OpenBenchmarking.org bfs median_TEPS, More Is Better Graph500 3.0 Scale: 26 ARMv8 Neoverse-V2 b c d e f 300M 600M 900M 1200M 1500M 1505570000 1503510000 1498670000 1489270000 1469720000 1481400000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
Graph500 Scale: 26 OpenBenchmarking.org bfs max_TEPS, More Is Better Graph500 3.0 Scale: 26 ARMv8 Neoverse-V2 b c d e f 300M 600M 900M 1200M 1500M 1573470000 1574260000 1571860000 1562170000 1547980000 1559480000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
Graph500 Scale: 26 OpenBenchmarking.org sssp median_TEPS, More Is Better Graph500 3.0 Scale: 26 ARMv8 Neoverse-V2 b c d e f 70M 140M 210M 280M 350M 333905000 322051000 327451000 322704000 322883000 318469000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
Graph500 Scale: 26 OpenBenchmarking.org sssp max_TEPS, More Is Better Graph500 3.0 Scale: 26 ARMv8 Neoverse-V2 b c d e f 110M 220M 330M 440M 550M 511909000 501380000 492413000 498666000 491501000 483974000 1. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare ARMv8 Neoverse-V2 b c d e f 1.2443 2.4886 3.7329 4.9772 6.2215 SE +/- 0.033, N = 3 5.429 5.520 5.529 5.530 5.515 5.508
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 ARMv8 Neoverse-V2 b c d e f 10 20 30 40 50 SE +/- 0.02, N = 3 44.71 39.57 38.95 38.63 38.69 38.71 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 144 144 144 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 144 144 144 - RT: 60 ARMv8 Neoverse-V2 b c d e f 10 20 30 40 50 SE +/- 0.21, N = 3 41.92 38.96 38.23 38.02 37.99 38.11 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 160 160 160 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 160 160 160 - RT: 60 ARMv8 Neoverse-V2 b c d e f 9 18 27 36 45 SE +/- 0.15, N = 3 39.92 38.78 38.51 38.17 38.00 38.03 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 192 192 192 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 192 192 192 - RT: 60 ARMv8 Neoverse-V2 b c d e f 9 18 27 36 45 SE +/- 0.18, N = 3 38.73 38.48 38.11 38.06 37.98 37.85 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
Phoronix Test Suite v10.8.5