Intel vs. Graviton2 Amazon EC2 Benchmarks

KVM testing on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2108196-TJ-2108197TJ41&sro.

Intel vs. Graviton2 Amazon EC2 BenchmarksProcessorMotherboardMemoryDiskNetworkChipsetOSKernelVulkanCompilerFile-SystemSystem Layerm6g.metalm5.24xlargem6i.24xlargeARMv8 Neoverse-N1 (64 Cores)Amazon EC2 m6g.metal v1.0252GB107GB Amazon Elastic Block StoreAmazon ElasticUbuntu 20.045.4.0-1045-aws (aarch64)1.0.2GCC 9.3.0ext42 x Intel Xeon Platinum 8259CL (48 Cores / 96 Threads)Amazon EC2 m5.24xlarge (1.0 BIOS)Intel 440FX 82441FX PMC374GB5.4.0-1045-aws (x86_64)KVM2 x Intel Xeon Platinum 8375C (48 Cores / 96 Threads)Amazon EC2 m6i.24xlarge (1.0 BIOS)372GBOpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- m6g.metal: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v - m5.24xlarge: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - m6i.24xlarge: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Java Details- m6g.metal, m5.24xlarge: OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)Security Details- m6g.metal: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected - m5.24xlarge: itlb_multihit: KVM: Vulnerable + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected - m6i.24xlarge: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected Processor Details- m5.24xlarge: CPU Microcode: 0x5003005- m6i.24xlarge: CPU Microcode: 0xd0002b1

Intel vs. Graviton2 Amazon EC2 Benchmarkshpcg: npb: BT.Cnpb: CG.Cnpb: EP.Cnpb: EP.Dnpb: FT.Cnpb: MG.Cminife: Smallpennant: sedovbigpennant: leblancbigincompact3d: input.i3d 129 Cells Per Directionincompact3d: input.i3d 193 Cells Per Directionlulesh: coremark: CoreMark Size 666 - Iterations Per Secondstockfish: Total Timeasmfish: 1024 Hash Memory, 26 Depthpovray: Trace Timem-queens: Time To Solven-queens: Elapsed Timetnn: CPU - DenseNettnn: CPU - MobileNet v2tnn: CPU - SqueezeNet v2tnn: CPU - SqueezeNet v1.1rocksdb: Rand Readm6g.metalm5.24xlargem6i.24xlarge21.457024464.8213438.712218.082233.1421850.2825872.7723848.215.4130111.297265.1838054723.248034816867.3701236555.8037529665744910486848257.43919.4303.7613288.829365.839105.072341.31527033261426.8884104533.1530206.034777.134875.4750800.7465732.2214007.125.2223710.030264.7651316321.468287816272.5871451630.51904910565856111516018542.96422.3413.8923797.589426.09493.266394.50819457607437.2245136431.1133146.766426.326752.3870031.7188248.7319946.417.245136.4139283.4929897014.905858022519.1151607068.54333413679081613665690010.63116.0683.1443522.581350.37870.932357.721231109408OpenBenchmarking.org

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1m5.24xlargem6g.metalm6i.24xlarge918273645SE +/- 0.17, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 326.8921.4637.221. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi

NAS Parallel Benchmarks

Test / Class: BT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: BT.Cm5.24xlargem6g.metalm6i.24xlarge30K60K90K120K150KSE +/- 119.93, N = 3SE +/- 12.96, N = 3SE +/- 147.64, N = 3104533.1524464.82136431.111. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi2. Open MPI 4.0.3

NAS Parallel Benchmarks

Test / Class: CG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: CG.Cm5.24xlargem6g.metalm6i.24xlarge7K14K21K28K35KSE +/- 40.34, N = 3SE +/- 27.23, N = 3SE +/- 54.87, N = 330206.0313438.7133146.761. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi2. Open MPI 4.0.3

NAS Parallel Benchmarks

Test / Class: EP.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.Cm5.24xlargem6g.metalm6i.24xlarge14002800420056007000SE +/- 188.76, N = 15SE +/- 9.83, N = 3SE +/- 82.22, N = 34777.132218.086426.321. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi2. Open MPI 4.0.3

NAS Parallel Benchmarks

Test / Class: EP.D

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: EP.Dm5.24xlargem6g.metalm6i.24xlarge14002800420056007000SE +/- 257.35, N = 12SE +/- 1.71, N = 3SE +/- 78.27, N = 154875.472233.146752.381. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi2. Open MPI 4.0.3

NAS Parallel Benchmarks

Test / Class: FT.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: FT.Cm5.24xlargem6g.metalm6i.24xlarge15K30K45K60K75KSE +/- 441.43, N = 15SE +/- 2.47, N = 3SE +/- 54.84, N = 350800.7421850.2870031.711. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi2. Open MPI 4.0.3

NAS Parallel Benchmarks

Test / Class: MG.C

OpenBenchmarking.orgTotal Mop/s, More Is BetterNAS Parallel Benchmarks 3.4Test / Class: MG.Cm5.24xlargem6g.metalm6i.24xlarge20K40K60K80K100KSE +/- 462.23, N = 3SE +/- 37.90, N = 3SE +/- 6.50, N = 365732.2225872.7788248.731. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi2. Open MPI 4.0.3

miniFE

Problem Size: Small

OpenBenchmarking.orgCG Mflops, More Is BetterminiFE 2.2Problem Size: Smallm5.24xlargem6g.metalm6i.24xlarge5K10K15K20K25KSE +/- 284.82, N = 15SE +/- 5.77, N = 3SE +/- 817.97, N = 1514007.123848.219946.41. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigm5.24xlargem6g.metalm6i.24xlarge612182430SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 325.2215.4117.251. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigm5.24xlargem6g.metalm6i.24xlarge3691215SE +/- 0.019405, N = 3SE +/- 0.003475, N = 3SE +/- 0.016794, N = 310.03026011.2972606.4139281. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

Xcompact3d Incompact3d

Input: input.i3d 129 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 129 Cells Per Directionm5.24xlargem6g.metalm6i.24xlarge1.16642.33283.49924.66565.832SE +/- 0.01271715, N = 3SE +/- 0.00359720, N = 3SE +/- 0.02606307, N = 34.765131635.183805473.492989701. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per Directionm5.24xlargem6g.metalm6i.24xlarge612182430SE +/- 0.11, N = 3SE +/- 0.02, N = 3SE +/- 0.08, N = 321.4723.2514.911. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.3m5.24xlargem6g.metalm6i.24xlarge5K10K15K20K25KSE +/- 6.88, N = 3SE +/- 6.43, N = 3SE +/- 64.35, N = 316272.5916867.3722519.121. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi

Coremark

CoreMark Size 666 - Iterations Per Second

OpenBenchmarking.orgIterations/Sec, More Is BetterCoremark 1.0CoreMark Size 666 - Iterations Per Secondm5.24xlargem6g.metalm6i.24xlarge300K600K900K1200K1500KSE +/- 5103.90, N = 3SE +/- 279.74, N = 3SE +/- 8144.03, N = 31451630.521236555.801607068.541. (CC) gcc options: -O2 -lrt" -lrt

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 13Total Timem5.24xlargem6g.metalm6i.24xlarge30M60M90M120M150MSE +/- 1176206.14, N = 15SE +/- 692846.14, N = 3SE +/- 185784.28, N = 310565856196657449136790816-m64 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -msse4.1 -mssse3 -msse2 -mbmi2-m64 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi21. (CXX) g++ options: -lgcov -lpthread -fno-exceptions -std=c++17 -fprofile-use -fno-peel-loops -fno-tracer -pedantic -O3 -flto -flto=jobserver

asmFish

1024 Hash Memory, 26 Depth

OpenBenchmarking.orgNodes/second, More Is BetterasmFish 2018-07-231024 Hash Memory, 26 Depthm5.24xlargem6g.metalm6i.24xlarge30M60M90M120M150MSE +/- 806502.14, N = 12SE +/- 1056350.97, N = 3SE +/- 1425879.59, N = 3115160185104868482136656900

POV-Ray

Trace Time

OpenBenchmarking.orgSeconds, Fewer Is BetterPOV-Ray 3.7.0.7Trace Timem5.24xlargem6g.metalm6i.24xlarge1326395265SE +/- 4.19, N = 15SE +/- 0.92, N = 15SE +/- 0.06, N = 342.9657.4410.63-march=native-march=native1. (CXX) g++ options: -pipe -O3 -ffast-math -pthread -lSM -lICE -lX11 -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system

m-queens

Time To Solve

OpenBenchmarking.orgSeconds, Fewer Is Betterm-queens 1.2Time To Solvem5.24xlargem6g.metalm6i.24xlarge510152025SE +/- 0.13, N = 3SE +/- 0.01, N = 3SE +/- 0.19, N = 322.3419.4316.071. (CXX) g++ options: -fopenmp -O2 -march=native

N-Queens

Elapsed Time

OpenBenchmarking.orgSeconds, Fewer Is BetterN-Queens 1.0Elapsed Timem5.24xlargem6g.metalm6i.24xlarge0.87571.75142.62713.50284.3785SE +/- 0.028, N = 3SE +/- 0.001, N = 3SE +/- 0.041, N = 33.8923.7613.1441. (CC) gcc options: -static -fopenmp -O3 -march=native

TNN

Target: CPU - Model: DenseNet

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: DenseNetm5.24xlargem6g.metalm6i.24xlarge8001600240032004000SE +/- 2.50, N = 3SE +/- 4.12, N = 3SE +/- 3.59, N = 33797.593288.833522.58MIN: 3754.49 / MAX: 4081.73MIN: 3237.1 / MAX: 3327.59MIN: 3492.16 / MAX: 3621.391. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: MobileNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: MobileNet v2m5.24xlargem6g.metalm6i.24xlarge90180270360450SE +/- 0.52, N = 3SE +/- 0.26, N = 3SE +/- 0.24, N = 3426.09365.84350.38MIN: 422.84 / MAX: 477.54MIN: 364.34 / MAX: 367.32MIN: 348.46 / MAX: 395.911. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v2m5.24xlargem6g.metalm6i.24xlarge20406080100SE +/- 0.01, N = 3SE +/- 0.14, N = 3SE +/- 0.53, N = 393.27105.0770.93MIN: 93.03 / MAX: 93.7MIN: 104.55 / MAX: 105.78MIN: 70.08 / MAX: 72.981. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v1.1

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v1.1m5.24xlargem6g.metalm6i.24xlarge90180270360450SE +/- 0.14, N = 3SE +/- 0.83, N = 3SE +/- 0.23, N = 3394.51341.32357.72MIN: 393.65 / MAX: 397.65MIN: 338.67 / MAX: 344.11MIN: 356.98 / MAX: 361.711. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

Facebook RocksDB

Test: Random Read

OpenBenchmarking.orgOp/s, More Is BetterFacebook RocksDB 6.22.1Test: Random Readm5.24xlargem6g.metalm6i.24xlarge60M120M180M240M300MSE +/- 128495.22, N = 3SE +/- 1242136.86, N = 3SE +/- 1027109.62, N = 31945760742703326142311094081. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread


Phoronix Test Suite v10.8.4