NVIDIA GH200 memory bandwidth system ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402276-NE-NVIDIAGH242&grr&rdt .
NVIDIA GH200 memory bandwidth system Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Vulkan Compiler File-System Screen Resolution a b c ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 1.3.277 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
NVIDIA GH200 memory bandwidth system tinymembench: Standard Memset tinymembench: Standard Memcpy ramspeed: Copy - Integer mbw: Memory Copy, Fixed Block Size - 8192 MiB mbw: Memory Copy - 8192 MiB ramspeed: Add - Integer ramspeed: Average - Integer ramspeed: Triad - Integer ramspeed: Scale - Integer ramspeed: Scale - Floating Point ramspeed: Average - Floating Point ramspeed: Copy - Floating Point ramspeed: Triad - Floating Point ramspeed: Add - Floating Point mbw: Memory Copy - 4096 MiB mbw: Memory Copy, Fixed Block Size - 4096 MiB mbw: Memory Copy - 1024 MiB mbw: Memory Copy, Fixed Block Size - 1024 MiB stream: Copy mbw: Memory Copy, Fixed Block Size - 128 MiB mbw: Memory Copy - 512 MiB mbw: Memory Copy, Fixed Block Size - 512 MiB mbw: Memory Copy - 128 MiB stream: Add stream: Triad stream: Scale a b c 68827.5 25198.3 74997.25 13429.535 13432.382 35195.30 50216.61 34645.14 53823.14 77928.96 55752.10 77595.32 34714.34 34873.11 13139.174 13214.407 13621.048 13684.468 321183.9 16806.058 14018.313 13991.543 16741.266 313161.3 311997.8 319543.8 65367.2 24982.6 75559.86 12999.403 13487.665 35312.41 49961.21 34759.77 54778.69 78075.30 55527.02 74212.45 34806.77 34985.36 13190.154 13184.271 13721.316 13705.804 342107.4 16881.744 14099.967 14040.383 16179.790 341627.2 337166.4 338616.6 70609.9 26996.1 74789.13 13490.987 13489.447 35348.29 49510.70 34882.37 53893.69 78268.45 55658.87 74633.38 34842.29 34891.91 13258.184 13325.486 13726.316 13718.736 342567.8 17095.964 13987.139 14105.604 17104.441 340226.7 336556.6 337805.2 OpenBenchmarking.org
Tinymembench Standard Memset OpenBenchmarking.org MB/s, More Is Better Tinymembench 2018-05-28 Standard Memset a b c 15K 30K 45K 60K 75K SE +/- 52.90, N = 3 SE +/- 376.32, N = 6 SE +/- 292.74, N = 9 68827.5 65367.2 70609.9 1. (CC) gcc options: -O2 -lm
Tinymembench Standard Memcpy OpenBenchmarking.org MB/s, More Is Better Tinymembench 2018-05-28 Standard Memcpy a b c 6K 12K 18K 24K 30K SE +/- 180.61, N = 3 SE +/- 896.59, N = 6 SE +/- 720.83, N = 9 25198.3 24982.6 26996.1 1. (CC) gcc options: -O2 -lm
RAMspeed SMP Type: Copy - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Integer a b c 16K 32K 48K 64K 80K SE +/- 532.99, N = 3 SE +/- 518.30, N = 15 SE +/- 1063.03, N = 3 74997.25 75559.86 74789.13 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy, Fixed Block Size - Array Size: 8192 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 8192 MiB a b c 3K 6K 9K 12K 15K SE +/- 1.46, N = 3 SE +/- 1.46, N = 3 SE +/- 1.15, N = 3 13429.54 12999.40 13490.99 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy - Array Size: 8192 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 8192 MiB a b c 3K 6K 9K 12K 15K SE +/- 1.93, N = 3 SE +/- 0.98, N = 3 SE +/- 1.24, N = 3 13432.38 13487.67 13489.45 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Add - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Add - Benchmark: Integer a b c 8K 16K 24K 32K 40K SE +/- 12.14, N = 3 SE +/- 27.82, N = 3 SE +/- 16.40, N = 3 35195.30 35312.41 35348.29 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Average - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Integer a b c 11K 22K 33K 44K 55K SE +/- 523.54, N = 3 SE +/- 339.06, N = 3 SE +/- 120.67, N = 3 50216.61 49961.21 49510.70 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Triad - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Triad - Benchmark: Integer a b c 7K 14K 21K 28K 35K SE +/- 14.04, N = 3 SE +/- 24.34, N = 3 SE +/- 106.35, N = 3 34645.14 34759.77 34882.37 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Scale - Benchmark: Integer OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Integer a b c 12K 24K 36K 48K 60K SE +/- 135.79, N = 3 SE +/- 175.46, N = 3 SE +/- 368.97, N = 3 53823.14 54778.69 53893.69 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Scale - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Scale - Benchmark: Floating Point a b c 20K 40K 60K 80K 100K SE +/- 54.22, N = 3 SE +/- 42.08, N = 3 SE +/- 53.91, N = 3 77928.96 78075.30 78268.45 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Average - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Average - Benchmark: Floating Point a b c 12K 24K 36K 48K 60K SE +/- 274.93, N = 3 SE +/- 360.38, N = 3 SE +/- 333.14, N = 3 55752.10 55527.02 55658.87 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Copy - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Floating Point a b c 17K 34K 51K 68K 85K SE +/- 147.53, N = 3 SE +/- 483.74, N = 3 SE +/- 957.21, N = 3 77595.32 74212.45 74633.38 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Triad - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Triad - Benchmark: Floating Point a b c 7K 14K 21K 28K 35K SE +/- 11.29, N = 3 SE +/- 52.63, N = 3 SE +/- 27.38, N = 3 34714.34 34806.77 34842.29 1. (CC) gcc options: -O3 -march=native
RAMspeed SMP Type: Add - Benchmark: Floating Point OpenBenchmarking.org MB/s, More Is Better RAMspeed SMP 3.5.0 Type: Add - Benchmark: Floating Point a b c 7K 14K 21K 28K 35K SE +/- 14.35, N = 3 SE +/- 31.66, N = 3 SE +/- 9.27, N = 3 34873.11 34985.36 34891.91 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy - Array Size: 4096 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 4096 MiB a b c 3K 6K 9K 12K 15K SE +/- 20.59, N = 3 SE +/- 24.22, N = 3 SE +/- 62.34, N = 3 13139.17 13190.15 13258.18 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy, Fixed Block Size - Array Size: 4096 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 4096 MiB a b c 3K 6K 9K 12K 15K SE +/- 26.21, N = 3 SE +/- 83.60, N = 3 SE +/- 50.01, N = 3 13214.41 13184.27 13325.49 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy - Array Size: 1024 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 1024 MiB a b c 3K 6K 9K 12K 15K SE +/- 46.36, N = 3 SE +/- 8.31, N = 3 SE +/- 9.44, N = 3 13621.05 13721.32 13726.32 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy, Fixed Block Size - Array Size: 1024 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 1024 MiB a b c 3K 6K 9K 12K 15K SE +/- 4.14, N = 3 SE +/- 9.34, N = 3 SE +/- 15.68, N = 3 13684.47 13705.80 13718.74 1. (CC) gcc options: -O3 -march=native
Stream Type: Copy OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Copy a b c 70K 140K 210K 280K 350K SE +/- 2114.35, N = 5 SE +/- 1265.00, N = 5 SE +/- 981.27, N = 5 321183.9 342107.4 342567.8 1. (CC) gcc options: -O3 -march=native -fopenmp
MBW Test: Memory Copy, Fixed Block Size - Array Size: 128 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 128 MiB a b c 4K 8K 12K 16K 20K SE +/- 160.35, N = 15 SE +/- 151.66, N = 15 SE +/- 153.98, N = 15 16806.06 16881.74 17095.96 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy - Array Size: 512 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 512 MiB a b c 3K 6K 9K 12K 15K SE +/- 22.87, N = 3 SE +/- 21.00, N = 3 SE +/- 48.48, N = 3 14018.31 14099.97 13987.14 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy, Fixed Block Size - Array Size: 512 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy, Fixed Block Size - Array Size: 512 MiB a b c 3K 6K 9K 12K 15K SE +/- 46.16, N = 3 SE +/- 60.13, N = 3 SE +/- 13.55, N = 3 13991.54 14040.38 14105.60 1. (CC) gcc options: -O3 -march=native
MBW Test: Memory Copy - Array Size: 128 MiB OpenBenchmarking.org MiB/s, More Is Better MBW 2018-09-08 Test: Memory Copy - Array Size: 128 MiB a b c 4K 8K 12K 16K 20K SE +/- 168.54, N = 3 SE +/- 216.80, N = 12 SE +/- 127.70, N = 10 16741.27 16179.79 17104.44 1. (CC) gcc options: -O3 -march=native
Stream Type: Add OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Add a b c 70K 140K 210K 280K 350K SE +/- 2548.52, N = 5 SE +/- 1241.84, N = 5 SE +/- 1117.82, N = 5 313161.3 341627.2 340226.7 1. (CC) gcc options: -O3 -march=native -fopenmp
Stream Type: Triad OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Triad a b c 70K 140K 210K 280K 350K SE +/- 2534.25, N = 5 SE +/- 561.20, N = 5 SE +/- 1078.16, N = 5 311997.8 337166.4 336556.6 1. (CC) gcc options: -O3 -march=native -fopenmp
Stream Type: Scale OpenBenchmarking.org MB/s, More Is Better Stream 2013-01-17 Type: Scale a b c 70K 140K 210K 280K 350K SE +/- 1618.66, N = 5 SE +/- 819.28, N = 5 SE +/- 1113.20, N = 5 319543.8 338616.6 337805.2 1. (CC) gcc options: -O3 -march=native -fopenmp
Phoronix Test Suite v10.8.5