Cortex A53 GCC7 codegen comparison

Benchmarking the effect of d8c4c75 ARM patch

HTML result view exported from: https://openbenchmarking.org/result/1702153-RI-GCCCOMPAR59&grs&sor.

Cortex A53 GCC7 codegen comparisonProcessorMotherboardMemoryDiskOSKernelCompilerFile-SystemScreen ResolutionA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53/clang 3.8A72 vectorizethunderx mtuneA53 vectorize, updatedAArch64 rev 4 @ 1.50GHz (4 Cores)Amlogic2048MB32GB 00000 + 16GB NCardUbuntu 16.043.14.29 (aarch64)GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0ext41920x3240AArch64 rev 4 @ 1.55GHz (4 Cores)GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0Clang 3.8.0-2ubuntu4 + GCC 5.4.0 20160609 + LLVM 3.8.0AArch64 rev 4 @ 2.00GHz (4 Cores)16GB NCard + 32GB 00000GCC 7.0.1 20170127 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0Unknown @ 1.54GHz (4 Cores)3.14.79-vegas95 (aarch64)GCC 7.0.1 20170214 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.01280x1440OpenBenchmarking.orgCompiler Details- A53 vectorize, pre-patch, thunderx/vectorize, pre-patch, A53 vectorize/LTO, pre patch, A53, post patch, A53 mtune/vectorize, post-patch, A72 vectorize, thunderx mtune, A53 vectorize, updated: --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new Disk Details- A53 vectorize, pre-patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- thunderx/vectorize, pre-patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A53 vectorize/LTO, pre patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A53, post patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A53 mtune/vectorize, post-patch: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A53/clang 3.8: CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rw- A72 vectorize: DEADLINE / commit=30,errors=remount-ro,noatime,nodiratime,rw- thunderx mtune: DEADLINE / commit=45,errors=remount-ro,noatime,nodiratime,rw- A53 vectorize, updated: DEADLINE / commit=45,errors=remount-ro,noatime,nodiratime,rwProcessor Details- A53 vectorize, pre-patch: Scaling Governor: meson_cpufreq performance- thunderx/vectorize, pre-patch: Scaling Governor: meson_cpufreq performance- A53 vectorize/LTO, pre patch: Scaling Governor: meson_cpufreq performance- A53, post patch: Scaling Governor: meson_cpufreq performance- A53 mtune/vectorize, post-patch: Scaling Governor: meson_cpufreq interactive- A53/clang 3.8: Scaling Governor: meson_cpufreq performance- A72 vectorize: Scaling Governor: meson_cpufreq performance- thunderx mtune: Scaling Governor: meson_cpufreq performance- A53 vectorize, updated: Scaling Governor: meson_cpufreq performance

Cortex A53 GCC7 codegen comparisonprimesieve: 1e12 Prime Number Generationramspeed: Copy - Integerramspeed: Copy - Floating Pointc-ray: Total Timemafft: Multiple Sequence Alignmentredis: GETpostmark: Disk Transaction Performancetachyon: Total Timefftw: Stock - 2D FFT Size 2048fhourstones: Complex Connect-4 Solvingopenssl: RSA 4096-bit Performancettsiod-renderer: Phong Rendering With Soft-Shadow Mappingsudokut: Total Timesmallpt: Global Illumination Renderer; 100 Samplesgmpbench: Total TimeA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchA53/clang 3.8A72 vectorizethunderx mtuneA53 vectorize, updated543.164581.324580.39187.9735.42310344.73136369.27196.903212.1021.5023.16101.95167552.84566.212821.432817.45149.8234.46318926.02135171.41190.633210.2021.5023.01102.75167554.83540.954829.914825.13184.8133.16311785.02137867.64180.533213.7721.5023.77101.75168554.37553.134965.064964.66186.6933.06309030.64138169.40186.213209.6721.5023.47101.88168552.56573.134955.974965.60186.6132.17313438.91137869.34184.813205.4021.5023.49102.17167555.102256.35250.7727.87135875.96184.623358.4020.70102.33549.834769.374862.68150.9033.02314705.37139971.96175.613206.0021.4723.48102.24167554.06607.412604.102606.30155.2831.89277191.13122570.91175.463189.4021.4023.34103.09167550.57574.654706.404785.59161.8033.90277268.23121769.90185.153223.5721.4023.29102.72166554.11OpenBenchmarking.org

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 5.4.21e12 Prime Number GenerationA53 vectorize/LTO, pre patchA53 vectorize, pre-patchA72 vectorizeA53, post patchthunderx/vectorize, pre-patchA53 mtune/vectorize, post-patchA53 vectorize, updatedthunderx mtuneA53/clang 3.85001000150020002500SE +/- 8.42, N = 3SE +/- 3.01, N = 3SE +/- 7.59, N = 6SE +/- 9.14, N = 3SE +/- 2.99, N = 3SE +/- 6.92, N = 3SE +/- 9.13, N = 4SE +/- 10.24, N = 6SE +/- 12.87, N = 3540.95543.16549.83553.13566.21573.13574.65607.412256.35-O3 -mcpu=cortex-a53 -fipa-pta -ftree-vectorize -flto -ffat-lto-objects -fopenmp-Ofast -mcpu=cortex-a53 -fipa-pta -ftree-vectorize -fopenmp-Ofast -mcpu=cortex-a72 -fipa-pta -ftree-vectorize -fopenmp-Ofast -mcpu=cortex-a53 -fipa-pta -fopenmp-Ofast -mcpu=thunderx -fipa-pta -ftree-vectorize -fopenmp-Ofast -mtune=cortex-a53 -fipa-pta -ftree-vectorize -fopenmp-Ofast -mcpu=cortex-a53 -fipa-pta -ftree-vectorize -fopenmp-Ofast -mtune=thunderx -fipa-pta -fopenmp-O3 -mtune=cortex-a53 -ftree-vectorize1. (CXX) g++ options: -fomit-frame-pointer -march=armv8-a+crc

RAMspeed SMP

Type: Copy - Benchmark: Integer

OpenBenchmarking.orgMB/s, More Is BetterRAMspeed SMP 3.5.0Type: Copy - Benchmark: IntegerA53, post patchA53 mtune/vectorize, post-patchA53 vectorize/LTO, pre patchA72 vectorizeA53 vectorize, updatedA53 vectorize, pre-patchthunderx/vectorize, pre-patchthunderx mtune110022003300440055004965.064955.974829.914769.374706.404581.322821.432604.10

RAMspeed SMP

Type: Copy - Benchmark: Floating Point

OpenBenchmarking.orgMB/s, More Is BetterRAMspeed SMP 3.5.0Type: Copy - Benchmark: Floating PointA53 mtune/vectorize, post-patchA53, post patchA72 vectorizeA53 vectorize/LTO, pre patchA53 vectorize, updatedA53 vectorize, pre-patchthunderx/vectorize, pre-patchthunderx mtune110022003300440055004965.604964.664862.684825.134785.594580.392817.452606.30

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Timethunderx/vectorize, pre-patchA72 vectorizethunderx mtuneA53 vectorize, updatedA53 vectorize/LTO, pre patchA53 mtune/vectorize, post-patchA53, post patchA53 vectorize, pre-patchA53/clang 3.850100150200250SE +/- 1.37, N = 3SE +/- 0.06, N = 3SE +/- 2.98, N = 3SE +/- 0.27, N = 3SE +/- 0.17, N = 3SE +/- 0.12, N = 3SE +/- 0.14, N = 3SE +/- 0.69, N = 3SE +/- 1.03, N = 3149.82150.90155.28161.80184.81186.61186.69187.97250.77-Ofast -mcpu=thunderx -fipa-pta -ftree-vectorize-Ofast -mcpu=cortex-a72 -fipa-pta -ftree-vectorize-Ofast -mtune=thunderx -fipa-pta-Ofast -mcpu=cortex-a53 -fipa-pta -ftree-vectorize-mcpu=cortex-a53 -fipa-pta -ftree-vectorize -flto -ffat-lto-objects-Ofast -mtune=cortex-a53 -fipa-pta -ftree-vectorize-Ofast -mcpu=cortex-a53 -fipa-pta-Ofast -mcpu=cortex-a53 -fipa-pta -ftree-vectorize-mtune=cortex-a53 -ftree-vectorize1. (CC) gcc options: -lm -lpthread -O3 -fomit-frame-pointer -march=armv8-a+crc

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 6.864Multiple Sequence AlignmentA53/clang 3.8thunderx mtuneA53 mtune/vectorize, post-patchA72 vectorizeA53, post patchA53 vectorize/LTO, pre patchA53 vectorize, updatedthunderx/vectorize, pre-patchA53 vectorize, pre-patch816243240SE +/- 0.04, N = 3SE +/- 0.09, N = 3SE +/- 0.01, N = 3SE +/- 0.67, N = 6SE +/- 0.71, N = 6SE +/- 0.70, N = 6SE +/- 0.79, N = 6SE +/- 0.73, N = 6SE +/- 0.80, N = 627.8731.8932.1733.0233.0633.1633.9034.4635.42clanggccgccgccgccgccgccgccgcc

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 3.0.1Test: GETthunderx/vectorize, pre-patchA72 vectorizeA53 mtune/vectorize, post-patchA53 vectorize/LTO, pre patchA53 vectorize, pre-patchA53, post patchA53 vectorize, updatedthunderx mtune70K140K210K280K350KSE +/- 2784.59, N = 3SE +/- 3423.31, N = 3SE +/- 1967.34, N = 3SE +/- 2239.53, N = 3SE +/- 4662.92, N = 6SE +/- 1052.91, N = 3SE +/- 2017.17, N = 3SE +/- 2008.41, N = 3318926.02314705.37313438.91311785.02310344.73309030.64277268.23277191.13-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mcpu=cortex-a72 -ftree-vectorize-Ofast -mtune=cortex-a53 -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mtune=thunderx1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl -O2 -fomit-frame-pointer -fipa-pta -march=armv8-a+crc

PostMark

Disk Transaction Performance

OpenBenchmarking.orgTPS, More Is BetterPostMark 1.51Disk Transaction PerformanceA72 vectorizeA53, post patchA53 mtune/vectorize, post-patchA53 vectorize/LTO, pre patchA53 vectorize, pre-patchA53/clang 3.8thunderx/vectorize, pre-patchthunderx mtuneA53 vectorize, updated30060090012001500SE +/- 2.67, N = 3SE +/- 4.33, N = 3SE +/- 5.00, N = 3SE +/- 2.67, N = 3SE +/- 2.67, N = 3SE +/- 4.33, N = 3SE +/- 0.00, N = 3SE +/- 3.46, N = 3SE +/- 2.00, N = 3139913811378137813631358135112251217gccgccgccgccgccclanggccgccgcc

Tachyon

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterTachyon 0.98.9Total TimeA53 vectorize/LTO, pre patchA53 vectorize, pre-patchA53 mtune/vectorize, post-patchA53, post patchA53 vectorize, updatedthunderx mtunethunderx/vectorize, pre-patchA72 vectorizeA53/clang 3.820406080100SE +/- 0.11, N = 3SE +/- 0.08, N = 3SE +/- 0.10, N = 3SE +/- 0.12, N = 3SE +/- 0.29, N = 3SE +/- 0.15, N = 3SE +/- 0.06, N = 3SE +/- 0.17, N = 3SE +/- 0.22, N = 367.6469.2769.3469.4069.9070.9171.4171.9675.96

FFTW

Build: Stock - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.4Build: Stock - Size: 2D FFT Size 2048A53 vectorize, pre-patchthunderx/vectorize, pre-patchA53, post patchA53 vectorize, updatedA53 mtune/vectorize, post-patchA53/clang 3.8A53 vectorize/LTO, pre patchA72 vectorizethunderx mtune4080120160200SE +/- 0.99, N = 5SE +/- 1.10, N = 5SE +/- 0.08, N = 5SE +/- 0.06, N = 5SE +/- 0.21, N = 5SE +/- 0.11, N = 5SE +/- 0.49, N = 5SE +/- 0.14, N = 5SE +/- 0.11, N = 5196.90190.63186.21185.15184.81184.62180.53175.61175.46-Ofast -mcpu=cortex-a53 -fipa-pta -ftree-vectorize-Ofast -mcpu=thunderx -fipa-pta -ftree-vectorize-Ofast -mcpu=cortex-a53 -fipa-pta-Ofast -mcpu=cortex-a53 -fipa-pta -ftree-vectorize-Ofast -mtune=cortex-a53 -fipa-pta -ftree-vectorize-O3 -mtune=cortex-a53 -ftree-vectorize-O3 -mcpu=cortex-a53 -fipa-pta -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a72 -fipa-pta -ftree-vectorize-Ofast -mtune=thunderx -fipa-pta1. (CC) gcc options: -fomit-frame-pointer -march=armv8-a+crc -lm

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 SolvingA53/clang 3.8A53 vectorize, updatedA53 vectorize/LTO, pre patchA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53, post patchA72 vectorizeA53 mtune/vectorize, post-patchthunderx mtune7001400210028003500SE +/- 0.99, N = 3SE +/- 3.32, N = 3SE +/- 0.22, N = 3SE +/- 0.35, N = 3SE +/- 0.76, N = 3SE +/- 1.47, N = 3SE +/- 0.12, N = 3SE +/- 1.81, N = 3SE +/- 0.60, N = 33358.403223.573213.773212.103210.203209.673206.003205.403189.40clanggccgccgccgccgccgccgccgcc

OpenSSL

RSA 4096-bit Performance

OpenBenchmarking.orgSigns Per Second, More Is BetterOpenSSL 1.0.1gRSA 4096-bit PerformanceA53 mtune/vectorize, post-patchA53, post patchA53 vectorize/LTO, pre patchthunderx/vectorize, pre-patchA53 vectorize, pre-patchA72 vectorizeA53 vectorize, updatedthunderx mtuneA53/clang 3.8510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.03, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 321.5021.5021.5021.5021.5021.4721.4021.4020.70gccgccgccgccgccgccgccgccclang

TTSIOD 3D Renderer

Phong Rendering With Soft-Shadow Mapping

OpenBenchmarking.orgFPS, More Is BetterTTSIOD 3D Renderer 2.3aPhong Rendering With Soft-Shadow MappingA53 vectorize/LTO, pre patchA53 mtune/vectorize, post-patchA72 vectorizeA53, post patchthunderx mtuneA53 vectorize, updatedA53 vectorize, pre-patchthunderx/vectorize, pre-patch612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.09, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 323.7723.4923.4823.4723.3423.2923.1623.01-O3 -mcpu=cortex-a53 -ftree-vectorize -ffat-lto-objects-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a72 -ftree-vectorize-Ofast -mcpu=cortex-a53-Ofast -mtune=thunderx-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize1. (CXX) g++ options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -flto -ffast-math -mtune=native -lSDL -lstdc++

Sudokut

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterSudokut 0.4Total TimeA53 vectorize/LTO, pre patchA53, post patchA53 vectorize, pre-patchA53 mtune/vectorize, post-patchA72 vectorizeA53/clang 3.8A53 vectorize, updatedthunderx/vectorize, pre-patchthunderx mtune20406080100SE +/- 0.21, N = 3SE +/- 0.10, N = 3SE +/- 0.20, N = 3SE +/- 0.09, N = 3SE +/- 0.25, N = 3SE +/- 0.26, N = 3SE +/- 0.20, N = 3SE +/- 0.76, N = 3SE +/- 0.33, N = 3101.75101.88101.95102.17102.24102.33102.72102.75103.09

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 SamplesA53 vectorize, updatedA53 vectorize, pre-patchthunderx/vectorize, pre-patchA53 mtune/vectorize, post-patchA72 vectorizethunderx mtuneA53 vectorize/LTO, pre patchA53, post patch4080120160200SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3166167167167167167168168-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a72 -ftree-vectorize-Ofast -mtune=thunderx-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a531. (CXX) g++ options: -fopenmp -fomit-frame-pointer -fipa-pta -march=armv8-a+crc

GMPbench

Total Time

OpenBenchmarking.orgGMPbench Score, More Is BetterGMPbench 0.2Total TimeA53 mtune/vectorize, post-patchthunderx/vectorize, pre-patchA53 vectorize/LTO, pre patchA53 vectorize, updatedA72 vectorizeA53 vectorize, pre-patchA53, post patchthunderx mtune120240360480600555.10554.83554.37554.11554.06552.84552.56550.57-Ofast -mtune=cortex-a53 -ftree-vectorize-Ofast -mcpu=thunderx -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a72 -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-Ofast -mcpu=cortex-a53-Ofast -mtune=thunderx1. (CC) gcc options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -lm


Phoronix Test Suite v10.8.4