AARCH64 codegen comparison update

gcc7's performance on Cortex A53 (32kB L1)

HTML result view exported from: https://openbenchmarking.org/result/1703238-RI-GCCLATEST31&grr.

AARCH64 codegen comparison updateProcessorMotherboardMemoryDiskOSKernelCompilerFile-SystemScreen ResolutionA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3AArch64 rev 4 @ 1.50GHz (4 Cores)Amlogic2048MB32GB 00000 + 16GB NCardUbuntu 16.043.14.29 (aarch64)GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0ext41920x3240AArch64 rev 4 @ 1.55GHz (4 Cores)GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0Unknown @ 1.54GHz (4 Cores)16GB NCard + 32GB 000003.14.79-vegas95 (aarch64)GCC 7.0.1 20170214 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.01280x1440GCC 7.0.1 20170220 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.08GB NCard + 32GB 00000GCC 7.0.1 20170322 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0GCC 6.3.1 20170316 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0OpenBenchmarking.orgCompiler Details- A53 vectorize, pre-patch: --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- thunderx/vectorize, pre-patch: --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- A53 vectorize/LTO, pre patch: --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- A53, post patch: --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- A53 mtune/vectorize, post-patch: --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- A53 vectorize, updated: --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- A53 vectorize, earlier build: --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- A57 vectorize/unrolled GCC 7.0.1: --build=aarch64-linux-gnu --disable-bootstrap --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- A53 vectorize GCC 7.0.1: --build=aarch64-linux-gnu --disable-bootstrap --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new- A57 vectorize/unrolled GCC 6.3: --build=aarch64-linux-gnu --disable-bootstrap --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=newDisk Details- A53 vectorize, pre-patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- thunderx/vectorize, pre-patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A53 vectorize/LTO, pre patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A53, post patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A53 mtune/vectorize, post-patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A53 vectorize, updated: DEADLINE / commit=45,errors=remount-ro,noatime,nodiratime,rw- A53 vectorize, earlier build: CFQ / commit=45,errors=remount-ro,noatime,nodiratime,rw- A57 vectorize/unrolled GCC 7.0.1: CFQ / commit=120,errors=remount-ro,noatime,nodiratime,rw- A53 vectorize GCC 7.0.1: CFQ / commit=120,errors=remount-ro,noatime,nodiratime,rw- A57 vectorize/unrolled GCC 6.3: CFQ / commit=120,errors=remount-ro,noatime,nodiratime,rwProcessor Details- A53 vectorize, pre-patch: Scaling Governor: meson_cpufreq performance- thunderx/vectorize, pre-patch: Scaling Governor: meson_cpufreq performance- A53 vectorize/LTO, pre patch: Scaling Governor: meson_cpufreq performance- A53, post patch: Scaling Governor: meson_cpufreq performance- A53 mtune/vectorize, post-patch: Scaling Governor: meson_cpufreq interactive- A53 vectorize, updated: Scaling Governor: meson_cpufreq performance- A53 vectorize, earlier build: Scaling Governor: meson_cpufreq performance- A57 vectorize/unrolled GCC 7.0.1: Scaling Governor: meson_cpufreq performance- A53 vectorize GCC 7.0.1: Scaling Governor: meson_cpufreq performance- A57 vectorize/unrolled GCC 6.3: Scaling Governor: meson_cpufreq performance

AARCH64 codegen comparison updateredis: GETopenssl: RSA 4096-bit Performancetachyon: Total Timesudokut: Total Timesmallpt: Global Illumination Renderer; 100 Samplesprimesieve: 1e12 Prime Number Generationc-ray: Total Timettsiod-renderer: Phong Rendering With Soft-Shadow Mappingfhourstones: Complex Connect-4 Solvinggmpbench: Total Timemafft: Multiple Sequence Alignmentfftw: Stock - 2D FFT Size 2048ramspeed: Copy - Floating Pointramspeed: Copy - Integerpostmark: Disk Transaction PerformanceA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3310344.7321.5069.27101.95167543.16187.9723.163212.10552.8435.42196.904580.394581.321363318926.0221.5071.41102.75167566.21149.8223.013210.20554.8334.46190.632817.452821.431351311785.0221.5067.64101.75168540.95184.8123.773213.77554.3733.16180.534825.134829.911378309030.6421.5069.40101.88168553.13186.6923.473209.67552.5633.06186.214964.664965.061381313438.9121.5069.34102.17167573.13186.6123.493205.40555.1032.17184.814965.604955.971378277268.2321.4069.90102.72166574.65161.8023.293223.57554.1133.90185.154785.594706.401217283742.8321.4069.39102.59166523.43162.2323.493233.77554.2134.52191.544816.714816.691211276169.8621.47103.04168525.00154.8123.563398.47553.1734.22173.034193.974201.531184276298.4421.4769.64102.97166547.12151.0223.133415.07552.7535.52156.724188.184161.091194275458.7021.2072.28103.00169531.95149.6122.293325.60554.0335.47157.974388.734384.531190OpenBenchmarking.org

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 3.0.1Test: GETA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.370K140K210K280K350KSE +/- 4662.92, N = 6SE +/- 2784.59, N = 3SE +/- 2239.53, N = 3SE +/- 1052.91, N = 3SE +/- 1967.34, N = 3SE +/- 2017.17, N = 3SE +/- 419.32, N = 3SE +/- 649.43, N = 3SE +/- 2031.32, N = 3SE +/- 3267.58, N = 3310344.73318926.02311785.02309030.64313438.91277268.23283742.83276169.86276298.44275458.70-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl -O2 -fomit-frame-pointer -fipa-pta -march=armv8-a+crc

OpenSSL

RSA 4096-bit Performance

OpenBenchmarking.orgSigns Per Second, More Is BetterOpenSSL 1.0.1gRSA 4096-bit PerformanceA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 321.5021.5021.5021.5021.5021.4021.4021.4721.4721.201. (CC) gcc options: -O3 -fomit-frame-pointer -lssl -lcrypto -ldl

Tachyon

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterTachyon 0.98.9Total TimeA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.31632486480SE +/- 0.08, N = 3SE +/- 0.06, N = 3SE +/- 0.11, N = 3SE +/- 0.12, N = 3SE +/- 0.10, N = 3SE +/- 0.29, N = 3SE +/- 0.22, N = 3SE +/- 0.17, N = 3SE +/- 0.03, N = 369.2771.4167.6469.4069.3469.9069.3969.6472.28

Sudokut

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterSudokut 0.4Total TimeA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.320406080100SE +/- 0.20, N = 3SE +/- 0.76, N = 3SE +/- 0.21, N = 3SE +/- 0.10, N = 3SE +/- 0.09, N = 3SE +/- 0.20, N = 3SE +/- 0.06, N = 3SE +/- 0.04, N = 3SE +/- 0.13, N = 3SE +/- 0.04, N = 3101.95102.75101.75101.88102.17102.72102.59103.04102.97103.00

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 SamplesA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.34080120160200SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 2.33, N = 3167167168168167166166168166169-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops1. (CXX) g++ options: -fopenmp -fomit-frame-pointer -fipa-pta -march=armv8-a+crc

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 5.4.21e12 Prime Number GenerationA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3120240360480600SE +/- 3.01, N = 3SE +/- 2.99, N = 3SE +/- 8.42, N = 3SE +/- 9.14, N = 3SE +/- 6.92, N = 3SE +/- 9.13, N = 4SE +/- 4.17, N = 3SE +/- 3.16, N = 3SE +/- 9.38, N = 3SE +/- 1.80, N = 3543.16566.21540.95553.13573.13574.65523.43525.00547.12531.95-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops1. (CXX) g++ options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -fopenmp

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total TimeA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.34080120160200SE +/- 0.69, N = 3SE +/- 1.37, N = 3SE +/- 0.17, N = 3SE +/- 0.14, N = 3SE +/- 0.12, N = 3SE +/- 0.27, N = 3SE +/- 1.00, N = 3SE +/- 0.78, N = 3SE +/- 0.02, N = 3SE +/- 1.47, N = 3187.97149.82184.81186.69186.61161.80162.23154.81151.02149.61-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops1. (CC) gcc options: -lm -lpthread -O3 -fomit-frame-pointer -fipa-pta -march=armv8-a+crc

TTSIOD 3D Renderer

Phong Rendering With Soft-Shadow Mapping

OpenBenchmarking.orgFPS, More Is BetterTTSIOD 3D Renderer 2.3aPhong Rendering With Soft-Shadow MappingA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3612182430SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.09, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 323.1623.0123.7723.4723.4923.2923.4923.5623.1322.29-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops1. (CXX) g++ options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -ffast-math -mtune=native -flto -lSDL -lstdc++

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 SolvingA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.37001400210028003500SE +/- 0.35, N = 3SE +/- 0.76, N = 3SE +/- 0.22, N = 3SE +/- 1.47, N = 3SE +/- 1.81, N = 3SE +/- 3.32, N = 3SE +/- 1.49, N = 3SE +/- 1.93, N = 3SE +/- 2.42, N = 3SE +/- 1.97, N = 33212.103210.203213.773209.673205.403223.573233.773398.473415.073325.601. (CC) gcc options: -O3

GMPbench

Total Time

OpenBenchmarking.orgGMPbench Score, More Is BetterGMPbench 0.2Total TimeA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3120240360480600552.84554.83554.37552.56555.10554.11554.21553.17552.75554.03-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops1. (CC) gcc options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -lm

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 6.864Multiple Sequence AlignmentA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3816243240SE +/- 0.80, N = 6SE +/- 0.73, N = 6SE +/- 0.70, N = 6SE +/- 0.71, N = 6SE +/- 0.01, N = 3SE +/- 0.79, N = 6SE +/- 1.04, N = 6SE +/- 0.71, N = 6SE +/- 0.97, N = 6SE +/- 0.08, N = 335.4234.4633.1633.0632.1733.9034.5234.2235.5235.471. (CC) gcc options: -O3 -lm -lpthread

FFTW

Build: Stock - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.4Build: Stock - Size: 2D FFT Size 2048A53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.34080120160200SE +/- 0.99, N = 5SE +/- 1.10, N = 5SE +/- 0.49, N = 5SE +/- 0.08, N = 5SE +/- 0.21, N = 5SE +/- 0.06, N = 5SE +/- 0.16, N = 5SE +/- 0.10, N = 5SE +/- 0.16, N = 5SE +/- 0.26, N = 5196.90190.63180.53186.21184.81185.15191.54173.03156.72157.97-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=cortex-a57 -ftree-vectorize -funroll-loops1. (CC) gcc options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -lm

RAMspeed SMP

Type: Copy - Benchmark: Floating Point

OpenBenchmarking.orgMB/s, More Is BetterRAMspeed SMP 3.5.0Type: Copy - Benchmark: Floating PointA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3110022003300440055004580.392817.454825.134964.664965.604785.594816.714193.974188.184388.73

RAMspeed SMP

Type: Copy - Benchmark: Integer

OpenBenchmarking.orgMB/s, More Is BetterRAMspeed SMP 3.5.0Type: Copy - Benchmark: IntegerA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.3110022003300440055004581.322821.434829.914965.064955.974706.404816.694201.534161.094384.53

PostMark

Disk Transaction Performance

OpenBenchmarking.orgTPS, More Is BetterPostMark 1.51Disk Transaction PerformanceA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53 vectorize, updatedA53 vectorize, earlier buildA57 vectorize/unrolled GCC 7.0.1A53 vectorize GCC 7.0.1A57 vectorize/unrolled GCC 6.330060090012001500SE +/- 2.67, N = 3SE +/- 0.00, N = 3SE +/- 2.67, N = 3SE +/- 4.33, N = 3SE +/- 5.00, N = 3SE +/- 2.00, N = 3SE +/- 5.29, N = 3SE +/- 3.67, N = 313631351137813811378121712111184119411901. (CC) gcc options: -O3


Phoronix Test Suite v10.8.4