EPYC EO 2021 Linux Distros

2 x AMD EPYC 75F3 32-Core testing with a ASRockRack ROME2D16-2T (P3.10 BIOS) and ASPEED on Ubuntu 21.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2110202-TJ-EPYCEO20288
Jump To Table - Results

Statistics

Remove Outliers Before Calculating Averages

Graph Settings

Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
Ubuntu 21.10
October 20 2021
  8 Hours, 28 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


EPYC EO 2021 Linux DistrosOpenBenchmarking.orgPhoronix Test Suite2 x AMD EPYC 75F3 32-Core @ 2.95GHz (64 Cores / 128 Threads)ASRockRack ROME2D16-2T (P3.10 BIOS)AMD Starship/Matisse16 x 8 GB DDR4-3200MT/s HMA81GR7CJR8N-XN1000GB Western Digital WD_BLACK SN850 1TBASPEEDAMD Starship/Matisse2 x Intel 10G X550TUbuntu 21.105.13.0-20-generic (x86_64)GNOME Shell 40.5X Server1.1.182GCC 11.2.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionEPYC EO 2021 Linux Distros PerformanceSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa001114 - Python 3.9.7- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected

EPYC EO 2021 Linux Distrosbuild-nodejs: Time To Compilebuild-llvm: Unix Makefilesbuild-linux-kernel: Time To Compilebuild-godot: Time To Compileoidn: RT.ldr_alb_nrm.3840x2160mt-dgemm: Sustained Floating-Point Ratex265: Bosphorus 4Ksvt-vp9: PSNR/SSIM Optimized - Bosphorus 1080psvt-vp9: VMAF Optimized - Bosphorus 1080psvt-hevc: 10 - Bosphorus 1080psvt-hevc: 7 - Bosphorus 1080psvt-av1: Preset 8 - Bosphorus 4Kkvazaar: Bosphorus 4K - Ultra Fastkvazaar: Bosphorus 4K - Very Fastembree: Pathtracer - Crownaom-av1: Speed 10 Realtime - Bosphorus 4Kaom-av1: Speed 9 Realtime - Bosphorus 4Kaom-av1: Speed 8 Realtime - Bosphorus 4Kospray: Magnetic Reconnection - SciVisospray: NASA Streamlines - SciVisospray: XFrog Forest - SciVisospray: San Miguel - SciVisgnuradio: Hilbert Transformgnuradio: FM Deemphasis Filtergnuradio: IIR Filtergnuradio: FIR Filtergnuradio: Signal Source (Cosine)gnuradio: Five Back to Back FIR Filterscompress-zstd: 19, Long Mode - Decompression Speedcompress-zstd: 19, Long Mode - Compression Speedcompress-zstd: 8, Long Mode - Decompression Speedcompress-zstd: 8, Long Mode - Compression Speedcompress-zstd: 3, Long Mode - Decompression Speedcompress-zstd: 3, Long Mode - Compression Speedcompress-zstd: 19 - Decompression Speedcompress-zstd: 19 - Compression Speedcompress-zstd: 8 - Compression Speedcompress-zstd: 3 - Decompression Speedqe: AUSURF112nwchem: C240 Buckyballnamd: ATPase Simulation - 327,506 Atomswireguard: vpxenc: Speed 5 - Bosphorus 4Kcompress-zstd: 3 - Compression Speedhpcg: Ubuntu 21.1084.242175.35120.12144.2731.7922.95371321.80412.69410.33560.76362.8962.82049.7426.1373.162239.7135.6626.8555.56111.1117.3483.33437.4874.3708.2791.73790.9785.43623.940.04119.0183.83928.4163.13587.378.43271.13688.3243.141881.40.30493403.90713.846162.123.9034OpenBenchmarking.org

Timed Node.js Compilation

This test profile times how long it takes to build/compile Node.js itself from source. Node.js is a JavaScript run-time built from the Chrome V8 JavaScript engine while itself is written in C/C++. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 15.11Time To CompileUbuntu 21.1020406080100SE +/- 0.09, N = 384.24

Timed LLVM Compilation

This test times how long it takes to build the LLVM compiler stack. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 13.0Build System: Unix MakefilesUbuntu 21.104080120160200SE +/- 0.60, N = 3175.35

Timed Linux Kernel Compilation

This test times how long it takes to build the Linux kernel in a default configuration (defconfig) for the architecture being tested. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 5.14Time To CompileUbuntu 21.10510152025SE +/- 0.15, N = 1020.12

Timed Godot Game Engine Compilation

This test times how long it takes to compile the Godot Game Engine. Godot is a popular, open-source, cross-platform 2D/3D game engine and is built using the SCons build system and targeting the X11 platform. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Godot Game Engine Compilation 3.2.3Time To CompileUbuntu 21.101020304050SE +/- 0.10, N = 344.27

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RT.ldr_alb_nrm.3840x2160Ubuntu 21.100.40280.80561.20841.61122.014SE +/- 0.00, N = 31.79

ACES DGEMM

This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateUbuntu 21.10510152025SE +/- 0.22, N = 1522.951. (CC) gcc options: -O3 -march=native -fopenmp

x265

This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 3.4Video Input: Bosphorus 4KUbuntu 21.10510152025SE +/- 0.21, N = 321.801. (CXX) g++ options: -O3 -rdynamic -lpthread -lrt -ldl -lnuma

SVT-VP9

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample YUV input video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-VP9 0.3Tuning: PSNR/SSIM Optimized - Input: Bosphorus 1080pUbuntu 21.1090180270360450SE +/- 4.59, N = 4412.691. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-VP9 0.3Tuning: VMAF Optimized - Input: Bosphorus 1080pUbuntu 21.1090180270360450SE +/- 5.23, N = 3410.331. (CC) gcc options: -O3 -fcommon -fPIE -fPIC -fvisibility=hidden -pie -rdynamic -lpthread -lrt -lm

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 1080pUbuntu 21.10120240360480600SE +/- 1.51, N = 3560.761. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 1080pUbuntu 21.1080160240320400SE +/- 3.19, N = 3362.891. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 0.8.7Encoder Mode: Preset 8 - Input: Bosphorus 4KUbuntu 21.101428425670SE +/- 0.74, N = 362.821. (CXX) g++ options: -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie

Kvazaar

This is a test of Kvazaar as a CPU-based H.265/HEVC video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Ultra FastUbuntu 21.101122334455SE +/- 0.29, N = 349.741. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt

OpenBenchmarking.orgFrames Per Second, More Is BetterKvazaar 2.1Video Input: Bosphorus 4K - Video Preset: Very FastUbuntu 21.10612182430SE +/- 0.03, N = 326.131. (CC) gcc options: -pthread -ftree-vectorize -fvisibility=hidden -O2 -lpthread -lm -lrt

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 3.13Binary: Pathtracer - Model: CrownUbuntu 21.101632486480SE +/- 0.15, N = 373.16MIN: 71.54 / MAX: 76.03

AOM AV1

This is a test of the AOMedia AV1 encoder (libaom) developed by AOMedia and Google. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 3.2Encoder Mode: Speed 10 Realtime - Input: Bosphorus 4KUbuntu 21.10918273645SE +/- 0.21, N = 339.711. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 3.2Encoder Mode: Speed 9 Realtime - Input: Bosphorus 4KUbuntu 21.10816243240SE +/- 0.48, N = 1335.661. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 3.2Encoder Mode: Speed 8 Realtime - Input: Bosphorus 4KUbuntu 21.10612182430SE +/- 0.30, N = 326.851. (CXX) g++ options: -O3 -std=c++11 -U_FORTIFY_SOURCE -lm

OSPray

Intel OSPray is a portable ray-tracing engine for high-performance, high-fidenlity scientific visualizations. OSPray builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOSPray 1.8.5Demo: Magnetic Reconnection - Renderer: SciVisUbuntu 21.101224364860SE +/- 0.00, N = 355.56MIN: 30.3 / MAX: 58.82

OpenBenchmarking.orgFPS, More Is BetterOSPray 1.8.5Demo: NASA Streamlines - Renderer: SciVisUbuntu 21.1020406080100SE +/- 0.00, N = 3111.11MIN: 41.67 / MAX: 125

OpenBenchmarking.orgFPS, More Is BetterOSPray 1.8.5Demo: XFrog Forest - Renderer: SciVisUbuntu 21.1048121620SE +/- 0.10, N = 317.34MIN: 14.71 / MAX: 17.86

OpenBenchmarking.orgFPS, More Is BetterOSPray 1.8.5Demo: San Miguel - Renderer: SciVisUbuntu 21.1020406080100SE +/- 0.00, N = 383.33MIN: 58.82 / MAX: 90.91

GNU Radio

GNU Radio is a free software development toolkit providing signal processing blocks to implement software-defined radios (SDR) and signal processing systems. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMiB/s, More Is BetterGNU RadioTest: Hilbert TransformUbuntu 21.1090180270360450SE +/- 0.78, N = 9437.41. 3.8.2.0

OpenBenchmarking.orgMiB/s, More Is BetterGNU RadioTest: FM Deemphasis FilterUbuntu 21.102004006008001000SE +/- 9.67, N = 9874.31. 3.8.2.0

OpenBenchmarking.orgMiB/s, More Is BetterGNU RadioTest: IIR FilterUbuntu 21.10150300450600750SE +/- 1.26, N = 9708.21. 3.8.2.0

OpenBenchmarking.orgMiB/s, More Is BetterGNU RadioTest: FIR FilterUbuntu 21.102004006008001000SE +/- 1.91, N = 9791.71. 3.8.2.0

OpenBenchmarking.orgMiB/s, More Is BetterGNU RadioTest: Signal Source (Cosine)Ubuntu 21.108001600240032004000SE +/- 22.34, N = 93790.91. 3.8.2.0

OpenBenchmarking.orgMiB/s, More Is BetterGNU RadioTest: Five Back to Back FIR FiltersUbuntu 21.102004006008001000SE +/- 13.13, N = 9785.41. 3.8.2.0

Zstd Compression

This test measures the time needed to compress/decompress a sample input file using Zstd compression supplied by the system or otherwise externally of the test profile. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 19, Long Mode - Decompression SpeedUbuntu 21.108001600240032004000SE +/- 33.91, N = 33623.91. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 19, Long Mode - Compression SpeedUbuntu 21.10918273645SE +/- 0.53, N = 340.01. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 8, Long Mode - Decompression SpeedUbuntu 21.109001800270036004500SE +/- 62.42, N = 34119.01. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 8, Long Mode - Compression SpeedUbuntu 21.104080120160200SE +/- 1.67, N = 3183.81. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 3, Long Mode - Decompression SpeedUbuntu 21.108001600240032004000SE +/- 10.76, N = 33928.41. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 3, Long Mode - Compression SpeedUbuntu 21.104080120160200SE +/- 1.20, N = 3163.11. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 19 - Decompression SpeedUbuntu 21.108001600240032004000SE +/- 33.14, N = 33587.31. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 19 - Compression SpeedUbuntu 21.1020406080100SE +/- 0.33, N = 378.41. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 8 - Compression SpeedUbuntu 21.107001400210028003500SE +/- 10.24, N = 33271.11. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 3 - Decompression SpeedUbuntu 21.108001600240032004000SE +/- 56.80, N = 53688.31. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

Quantum ESPRESSO

Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterQuantum ESPRESSO 6.8Input: AUSURF112Ubuntu 21.1050100150200250SE +/- 0.28, N = 3243.141. (F9X) gfortran options: -ldevXlib -lopenblas -lFoX_dom -lFoX_sax -lFoX_wxml -lFoX_common -lFoX_utils -lFoX_fsys -lfftw3 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

NWChem

NWChem is an open-source high performance computational chemistry package. Per NWChem's documentation, "NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters." Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.0.2Input: C240 BuckyballUbuntu 21.104008001200160020001881.41. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2

NAMD

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 AtomsUbuntu 21.100.06860.13720.20580.27440.343SE +/- 0.00024, N = 30.30493

WireGuard + Linux Networking Stack Stress Test

This is a benchmark of the WireGuard secure VPN tunnel and Linux networking stack stress test. The test runs on the local host but does require root permissions to run. The way it works is it creates three namespaces. ns0 has a loopback device. ns1 and ns2 each have wireguard devices. Those two wireguard devices send traffic through the loopback device of ns0. The end result of this is that tests wind up testing encryption and decryption at the same time -- a pretty CPU and scheduler-heavy workflow. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterWireGuard + Linux Networking Stack Stress TestUbuntu 21.1090180270360450SE +/- 1.06, N = 3403.91

VP9 libvpx Encoding

This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9 video format. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.10.0Speed: Speed 5 - Input: Bosphorus 4KUbuntu 21.1048121620SE +/- 0.36, N = 1513.841. (CXX) g++ options: -m64 -lm -lpthread -O3 -fPIC -U_FORTIFY_SOURCE -std=gnu++11

Zstd Compression

This test measures the time needed to compress/decompress a sample input file using Zstd compression supplied by the system or otherwise externally of the test profile. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMB/s, More Is BetterZstd CompressionCompression Level: 3 - Compression SpeedUbuntu 21.1013002600390052006500SE +/- 147.08, N = 126162.11. *** zstd command line interface 64-bits v1.4.8, by Yann Collet ***

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1Ubuntu 21.10612182430SE +/- 1.12, N = 923.901. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi