Amazon EC2 benchmarking for a future article.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2112046-TJ-2108199TJ92 Intel M6i Ice Lake vs. Graviton2 Amazon EC2 Benchmarks - Phoronix Test Suite Intel M6i Ice Lake vs. Graviton2 Amazon EC2 Benchmarks Amazon EC2 benchmarking for a future article.
HTML result view exported from: https://openbenchmarking.org/result/2112046-TJ-2108199TJ92&sro&grt .
Intel M6i Ice Lake vs. Graviton2 Amazon EC2 Benchmarks Processor Motherboard Memory Disk Network Chipset OS Kernel Vulkan Compiler File-System System Layer m6g.metal m5.24xlarge m6i.24xlarge m6i.32xlarge m6a.24xlarge ARMv8 Neoverse-N1 (64 Cores) Amazon EC2 m6g.metal v1.0 252GB 107GB Amazon Elastic Block Store Amazon Elastic Ubuntu 20.04 5.4.0-1045-aws (aarch64) 1.0.2 GCC 9.3.0 ext4 2 x Intel Xeon Platinum 8259CL (48 Cores / 96 Threads) Amazon EC2 m5.24xlarge (1.0 BIOS) Intel 440FX 82441FX PMC 374GB 5.4.0-1045-aws (x86_64) KVM 2 x Intel Xeon Platinum 8375C (48 Cores / 96 Threads) Amazon EC2 m6i.24xlarge (1.0 BIOS) 372GB 2 x Intel Xeon Platinum 8375C (64 Cores / 128 Threads) Amazon EC2 m6i.32xlarge (1.0 BIOS) 496GB AMD EPYC 7R13 (48 Cores / 96 Threads) Amazon EC2 m6a.24xlarge (1.0 BIOS) 370GB 5.11.0-1020-aws (x86_64) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - m6g.metal: --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v - m5.24xlarge: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - m6i.24xlarge: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - m6i.32xlarge: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - m6a.24xlarge: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Java Details - m6g.metal, m5.24xlarge: OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04) Security Details - m6g.metal: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected - m5.24xlarge: itlb_multihit: KVM: Vulnerable + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected - m6i.24xlarge: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected - m6i.32xlarge: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected - m6a.24xlarge: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional IBRS_FW STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected Processor Details - m5.24xlarge: CPU Microcode: 0x5003005 - m6i.24xlarge: CPU Microcode: 0xd0002b1 - m6i.32xlarge: CPU Microcode: 0xd0002b1 - m6a.24xlarge: CPU Microcode: 0xa001143
Intel M6i Ice Lake vs. Graviton2 Amazon EC2 Benchmarks asmfish: 1024 Hash Memory, 26 Depth coremark: CoreMark Size 666 - Iterations Per Second rocksdb: Rand Read hpcg: lulesh: m-queens: Time To Solve minife: Small n-queens: Elapsed Time npb: BT.C npb: CG.C npb: EP.C npb: EP.D npb: FT.C npb: MG.C pennant: sedovbig pennant: leblancbig povray: Trace Time stockfish: Total Time tnn: CPU - DenseNet tnn: CPU - MobileNet v2 tnn: CPU - SqueezeNet v2 tnn: CPU - SqueezeNet v1.1 incompact3d: input.i3d 129 Cells Per Direction incompact3d: input.i3d 193 Cells Per Direction m6g.metal m5.24xlarge m6i.24xlarge m6i.32xlarge m6a.24xlarge 104868482 1236555.803752 270332614 21.4570 16867.370 19.430 23848.2 3.761 24464.82 13438.71 2218.08 2233.14 21850.28 25872.77 15.41301 11.29726 57.439 96657449 3288.829 365.839 105.072 341.315 5.18380547 23.2480348 115160185 1451630.519049 194576074 26.8884 16272.587 22.341 14007.1 3.892 104533.15 30206.03 4777.13 4875.47 50800.74 65732.22 25.22237 10.03026 42.964 105658561 3797.589 426.094 93.266 394.508 4.76513163 21.4682878 136656900 1607068.543334 231109408 37.2245 22519.115 16.068 19946.4 3.144 136431.11 33146.76 6426.32 6752.38 70031.71 88248.73 17.24513 6.413928 10.631 136790816 3522.581 350.378 70.932 357.721 3.49298970 14.9058580 169329043 2128843.094210 298073130 39.1328 35739.821 12.334 18797.6 2.312 202455.31 38736.50 8107.02 8765.66 102661.18 117771.92 15.14226 5.105541 9.015 169762583 3524.690 349.167 70.518 357.689 2.95411030 12.5943028 130814939 1850380.040001 269589717 17.0281 16745.120 12.334 11197.8 2.382 92962.40 23714.82 2759.76 2780.04 54008.74 49638.45 12.27951 7.508991 10.726 127232305 2866.455 312.780 78.819 282.809 5.42248488 24.9592756 OpenBenchmarking.org
asmFish 1024 Hash Memory, 26 Depth OpenBenchmarking.org Nodes/second, More Is Better asmFish 2018-07-23 1024 Hash Memory, 26 Depth m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 40M 80M 120M 160M 200M SE +/- 806502.14, N = 12 SE +/- 280698.15, N = 3 SE +/- 1056350.97, N = 3 SE +/- 1425879.59, N = 3 SE +/- 870392.07, N = 3 115160185 130814939 104868482 136656900 169329043
Coremark CoreMark Size 666 - Iterations Per Second OpenBenchmarking.org Iterations/Sec, More Is Better Coremark 1.0 CoreMark Size 666 - Iterations Per Second m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 500K 1000K 1500K 2000K 2500K SE +/- 5103.90, N = 3 SE +/- 16150.98, N = 3 SE +/- 279.74, N = 3 SE +/- 8144.03, N = 3 SE +/- 2056.46, N = 3 1451630.52 1850380.04 1236555.80 1607068.54 2128843.09 1. (CC) gcc options: -O2 -lrt" -lrt
Facebook RocksDB Test: Random Read OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.22.1 Test: Random Read m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 60M 120M 180M 240M 300M SE +/- 128495.22, N = 3 SE +/- 749179.54, N = 3 SE +/- 1242136.86, N = 3 SE +/- 1027109.62, N = 3 SE +/- 2848830.44, N = 3 194576074 269589717 270332614 231109408 298073130 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
High Performance Conjugate Gradient OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 9 18 27 36 45 SE +/- 0.17, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.13, N = 3 26.89 17.03 21.46 37.22 39.13 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 8K 16K 24K 32K 40K SE +/- 6.88, N = 3 SE +/- 110.64, N = 3 SE +/- 6.43, N = 3 SE +/- 64.35, N = 3 SE +/- 50.85, N = 3 16272.59 16745.12 16867.37 22519.12 35739.82 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
m-queens Time To Solve OpenBenchmarking.org Seconds, Fewer Is Better m-queens 1.2 Time To Solve m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 5 10 15 20 25 SE +/- 0.13, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.19, N = 3 SE +/- 0.14, N = 4 22.34 12.33 19.43 16.07 12.33 1. (CXX) g++ options: -fopenmp -O2 -march=native
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 5K 10K 15K 20K 25K SE +/- 284.82, N = 15 SE +/- 149.74, N = 15 SE +/- 5.77, N = 3 SE +/- 817.97, N = 15 SE +/- 590.35, N = 15 14007.1 11197.8 23848.2 19946.4 18797.6 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
N-Queens Elapsed Time OpenBenchmarking.org Seconds, Fewer Is Better N-Queens 1.0 Elapsed Time m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 0.8757 1.7514 2.6271 3.5028 4.3785 SE +/- 0.028, N = 3 SE +/- 0.007, N = 3 SE +/- 0.001, N = 3 SE +/- 0.041, N = 3 SE +/- 0.052, N = 15 3.892 2.382 3.761 3.144 2.312 1. (CC) gcc options: -static -fopenmp -O3 -march=native
NAS Parallel Benchmarks Test / Class: BT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: BT.C m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 40K 80K 120K 160K 200K SE +/- 119.93, N = 3 SE +/- 140.47, N = 3 SE +/- 12.96, N = 3 SE +/- 147.64, N = 3 SE +/- 156.09, N = 3 104533.15 92962.40 24464.82 136431.11 202455.31 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
NAS Parallel Benchmarks Test / Class: CG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: CG.C m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 8K 16K 24K 32K 40K SE +/- 40.34, N = 3 SE +/- 167.20, N = 3 SE +/- 27.23, N = 3 SE +/- 54.87, N = 3 SE +/- 259.41, N = 3 30206.03 23714.82 13438.71 33146.76 38736.50 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
NAS Parallel Benchmarks Test / Class: EP.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.C m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 2K 4K 6K 8K 10K SE +/- 188.76, N = 15 SE +/- 6.49, N = 3 SE +/- 9.83, N = 3 SE +/- 82.22, N = 3 SE +/- 60.61, N = 11 4777.13 2759.76 2218.08 6426.32 8107.02 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
NAS Parallel Benchmarks Test / Class: EP.D OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: EP.D m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 2K 4K 6K 8K 10K SE +/- 257.35, N = 12 SE +/- 4.54, N = 3 SE +/- 1.71, N = 3 SE +/- 78.27, N = 15 SE +/- 77.50, N = 15 4875.47 2780.04 2233.14 6752.38 8765.66 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
NAS Parallel Benchmarks Test / Class: FT.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: FT.C m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 20K 40K 60K 80K 100K SE +/- 441.43, N = 15 SE +/- 84.37, N = 3 SE +/- 2.47, N = 3 SE +/- 54.84, N = 3 SE +/- 1220.06, N = 4 50800.74 54008.74 21850.28 70031.71 102661.18 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
NAS Parallel Benchmarks Test / Class: MG.C OpenBenchmarking.org Total Mop/s, More Is Better NAS Parallel Benchmarks 3.4 Test / Class: MG.C m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 30K 60K 90K 120K 150K SE +/- 462.23, N = 3 SE +/- 36.70, N = 3 SE +/- 37.90, N = 3 SE +/- 6.50, N = 3 SE +/- 270.09, N = 3 65732.22 49638.45 25872.77 88248.73 117771.92 1. (F9X) gfortran options: -O3 -march=native -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi 2. Open MPI 4.0.3
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 6 12 18 24 30 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 25.22 12.28 15.41 17.25 15.14 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 3 6 9 12 15 SE +/- 0.019405, N = 3 SE +/- 0.005942, N = 3 SE +/- 0.003475, N = 3 SE +/- 0.016794, N = 3 SE +/- 0.014585, N = 3 10.030260 7.508991 11.297260 6.413928 5.105541 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
POV-Ray Trace Time OpenBenchmarking.org Seconds, Fewer Is Better POV-Ray 3.7.0.7 Trace Time m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 13 26 39 52 65 SE +/- 4.195, N = 15 SE +/- 0.016, N = 3 SE +/- 0.915, N = 15 SE +/- 0.059, N = 3 SE +/- 0.105, N = 3 42.964 10.726 57.439 10.631 9.015 -march=native -lSM -lICE -lX11 -march=native -lSM -lICE -lX11 -march=native -lSM -lICE -lX11 -march=native 1. (CXX) g++ options: -pipe -O3 -ffast-math -pthread -ltiff -ljpeg -lpng -lz -lrt -lm -lboost_thread -lboost_system
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 13 Total Time m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 40M 80M 120M 160M 200M SE +/- 1176206.14, N = 15 SE +/- 1261758.38, N = 6 SE +/- 692846.14, N = 3 SE +/- 185784.28, N = 3 SE +/- 554216.71, N = 3 105658561 127232305 96657449 136790816 169762583 -m64 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -msse4.1 -mssse3 -msse2 -mbmi2 -m64 -msse -msse3 -mpopcnt -mavx2 -msse4.1 -mssse3 -msse2 -m64 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -m64 -msse -msse3 -mpopcnt -mavx2 -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 1. (CXX) g++ options: -lgcov -lpthread -fno-exceptions -std=c++17 -fprofile-use -fno-peel-loops -fno-tracer -pedantic -O3 -flto -flto=jobserver
TNN Target: CPU - Model: DenseNet OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: DenseNet m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 800 1600 2400 3200 4000 SE +/- 2.50, N = 3 SE +/- 1.59, N = 3 SE +/- 4.12, N = 3 SE +/- 3.59, N = 3 SE +/- 0.73, N = 3 3797.59 2866.46 3288.83 3522.58 3524.69 MIN: 3754.49 / MAX: 4081.73 MIN: 2825.11 / MAX: 2989.74 MIN: 3237.1 / MAX: 3327.59 MIN: 3492.16 / MAX: 3621.39 MIN: 3482.31 / MAX: 3882.56 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: MobileNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: MobileNet v2 m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 90 180 270 360 450 SE +/- 0.52, N = 3 SE +/- 1.81, N = 3 SE +/- 0.26, N = 3 SE +/- 0.24, N = 3 SE +/- 0.19, N = 3 426.09 312.78 365.84 350.38 349.17 MIN: 422.84 / MAX: 477.54 MIN: 309.33 / MAX: 382.03 MIN: 364.34 / MAX: 367.32 MIN: 348.46 / MAX: 395.91 MIN: 346.78 / MAX: 378.54 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v2 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v2 m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 SE +/- 0.53, N = 3 SE +/- 0.10, N = 3 93.27 78.82 105.07 70.93 70.52 MIN: 93.03 / MAX: 93.7 MIN: 78.48 / MAX: 84.22 MIN: 104.55 / MAX: 105.78 MIN: 70.08 / MAX: 72.98 MIN: 70.09 / MAX: 71.46 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
TNN Target: CPU - Model: SqueezeNet v1.1 OpenBenchmarking.org ms, Fewer Is Better TNN 0.3 Target: CPU - Model: SqueezeNet v1.1 m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 90 180 270 360 450 SE +/- 0.14, N = 3 SE +/- 0.32, N = 3 SE +/- 0.83, N = 3 SE +/- 0.23, N = 3 SE +/- 0.00, N = 3 394.51 282.81 341.32 357.72 357.69 MIN: 393.65 / MAX: 397.65 MIN: 281.46 / MAX: 315.41 MIN: 338.67 / MAX: 344.11 MIN: 356.98 / MAX: 361.71 MIN: 357.05 / MAX: 360.09 1. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl
Xcompact3d Incompact3d Input: input.i3d 129 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 129 Cells Per Direction m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 1.2201 2.4402 3.6603 4.8804 6.1005 SE +/- 0.01271715, N = 3 SE +/- 0.01191253, N = 3 SE +/- 0.00359720, N = 3 SE +/- 0.02606307, N = 3 SE +/- 0.01051693, N = 3 4.76513163 5.42248488 5.18380547 3.49298970 2.95411030 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Xcompact3d Incompact3d Input: input.i3d 193 Cells Per Direction OpenBenchmarking.org Seconds, Fewer Is Better Xcompact3d Incompact3d 2021-03-11 Input: input.i3d 193 Cells Per Direction m5.24xlarge m6a.24xlarge m6g.metal m6i.24xlarge m6i.32xlarge 6 12 18 24 30 SE +/- 0.11, N = 3 SE +/- 0.25, N = 3 SE +/- 0.02, N = 3 SE +/- 0.08, N = 3 SE +/- 0.02, N = 3 21.47 24.96 23.25 14.91 12.59 1. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi
Phoronix Test Suite v10.8.4