Cortex A53 GCC7 codegen comparison

Benchmarking the effect of d8c4c75 ARM patch

HTML result view exported from: https://openbenchmarking.org/result/1701143-TA-GCCCOMPAR66&grs&rdt.

Cortex A53 GCC7 codegen comparisonProcessorMotherboardMemoryDiskOSKernelCompilerFile-SystemScreen Resolutionthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patchAArch64 rev 4 @ 1.50GHz (4 Cores)Amlogic2048MB32GB 00000 + 16GB NCardUbuntu 16.043.14.29 (aarch64)GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0ext41920x3240AArch64 rev 4 @ 1.55GHz (4 Cores)GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0OpenBenchmarking.orgCompiler Details- --build=aarch64-linux-gnu --disable-browser-plugin --disable-libquadmath --disable-werror --enable-checking=release --enable-clocale=gnu --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-arch-directory=aarch64 --with-default-libstdcxx-abi=new Disk Details- CFQ / commit=30,errors=remount-ro,noatime,nodiratime,rwProcessor Details- thunderx/vectorize, pre-patch: Scaling Governor: meson_cpufreq performance- A53 vectorize, pre-patch: Scaling Governor: meson_cpufreq performance- A53 vectorize/LTO, pre patch: Scaling Governor: meson_cpufreq performance- A53, post patch: Scaling Governor: meson_cpufreq performance- A53 mtune/vectorize, post-patch: Scaling Governor: meson_cpufreq interactive

Cortex A53 GCC7 codegen comparisonramspeed: Copy - Floating Pointramspeed: Copy - Integerc-ray: Total Timemafft: Multiple Sequence Alignmentfftw: Stock - 2D FFT Size 2048primesieve: 1e12 Prime Number Generationtachyon: Total Timettsiod-renderer: Phong Rendering With Soft-Shadow Mappingredis: GETpostmark: Disk Transaction Performancesudokut: Total Timesmallpt: Global Illumination Renderer; 100 Samplesgmpbench: Total Timefhourstones: Complex Connect-4 Solvingopenssl: RSA 4096-bit Performancethunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch2817.452821.43149.8234.46190.63566.2171.4123.01318926.021351102.75167554.833210.2021.504580.394581.32187.9735.42196.90543.1669.2723.16310344.731363101.95167552.843212.1021.504825.134829.91184.8133.16180.53540.9567.6423.77311785.021378101.75168554.373213.7721.504964.664965.06186.6933.06186.21553.1369.4023.47309030.641381101.88168552.563209.6721.504965.604955.97186.6132.17184.81573.1369.3423.49313438.911378102.17167555.103205.4021.50OpenBenchmarking.org

RAMspeed SMP

Type: Copy - Benchmark: Floating Point

OpenBenchmarking.orgMB/s, More Is BetterRAMspeed SMP 3.5.0Type: Copy - Benchmark: Floating Pointthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch110022003300440055002817.454580.394825.134964.664965.60

RAMspeed SMP

Type: Copy - Benchmark: Integer

OpenBenchmarking.orgMB/s, More Is BetterRAMspeed SMP 3.5.0Type: Copy - Benchmark: Integerthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch110022003300440055002821.434581.324829.914965.064955.97

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Timethunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch4080120160200SE +/- 1.37, N = 3SE +/- 0.69, N = 3SE +/- 0.17, N = 3SE +/- 0.14, N = 3SE +/- 0.12, N = 3149.82187.97184.81186.69186.61-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize1. (CC) gcc options: -lm -lpthread -O3 -fomit-frame-pointer -fipa-pta -march=armv8-a+crc

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 6.864Multiple Sequence Alignmentthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch816243240SE +/- 0.73, N = 6SE +/- 0.80, N = 6SE +/- 0.70, N = 6SE +/- 0.71, N = 6SE +/- 0.01, N = 334.4635.4233.1633.0632.171. (CC) gcc options: -O3 -lm -lpthread

FFTW

Build: Stock - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.4Build: Stock - Size: 2D FFT Size 2048thunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch4080120160200SE +/- 1.10, N = 5SE +/- 0.99, N = 5SE +/- 0.49, N = 5SE +/- 0.08, N = 5SE +/- 0.21, N = 5190.63196.90180.53186.21184.81-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize1. (CC) gcc options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -lm

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 5.4.21e12 Prime Number Generationthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch120240360480600SE +/- 2.99, N = 3SE +/- 3.01, N = 3SE +/- 8.42, N = 3SE +/- 9.14, N = 3SE +/- 6.92, N = 3566.21543.16540.95553.13573.13-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize1. (CXX) g++ options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -fopenmp

Tachyon

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterTachyon 0.98.9Total Timethunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch1632486480SE +/- 0.06, N = 3SE +/- 0.08, N = 3SE +/- 0.11, N = 3SE +/- 0.12, N = 3SE +/- 0.10, N = 371.4169.2767.6469.4069.34

TTSIOD 3D Renderer

Phong Rendering With Soft-Shadow Mapping

OpenBenchmarking.orgFPS, More Is BetterTTSIOD 3D Renderer 2.3aPhong Rendering With Soft-Shadow Mappingthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch612182430SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 323.0123.1623.7723.4723.49-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize1. (CXX) g++ options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -ffast-math -mtune=native -flto -lSDL -lstdc++

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 3.0.1Test: GETthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch70K140K210K280K350KSE +/- 2784.59, N = 3SE +/- 4662.92, N = 6SE +/- 2239.53, N = 3SE +/- 1052.91, N = 3SE +/- 1967.34, N = 3318926.02310344.73311785.02309030.64313438.91-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl -O2 -fomit-frame-pointer -fipa-pta -march=armv8-a+crc

PostMark

Disk Transaction Performance

OpenBenchmarking.orgTPS, More Is BetterPostMark 1.51Disk Transaction Performancethunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch30060090012001500SE +/- 0.00, N = 3SE +/- 2.67, N = 3SE +/- 2.67, N = 3SE +/- 4.33, N = 3SE +/- 5.00, N = 3135113631378138113781. (CC) gcc options: -O3

Sudokut

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterSudokut 0.4Total Timethunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch20406080100SE +/- 0.76, N = 3SE +/- 0.20, N = 3SE +/- 0.21, N = 3SE +/- 0.10, N = 3SE +/- 0.09, N = 3102.75101.95101.75101.88102.17

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 Samplesthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch4080120160200SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3167167168168167-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize1. (CXX) g++ options: -fopenmp -fomit-frame-pointer -fipa-pta -march=armv8-a+crc

GMPbench

Total Time

OpenBenchmarking.orgGMPbench Score, More Is BetterGMPbench 0.2Total Timethunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch120240360480600554.83552.84554.37552.56555.10-Ofast -mcpu=thunderx -ftree-vectorize-Ofast -mcpu=cortex-a53 -ftree-vectorize-O3 -mcpu=cortex-a53 -ftree-vectorize -flto -ffat-lto-objects-Ofast -mcpu=cortex-a53-Ofast -mtune=cortex-a53 -ftree-vectorize1. (CC) gcc options: -fomit-frame-pointer -fipa-pta -march=armv8-a+crc -lm

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 Solvingthunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch7001400210028003500SE +/- 0.76, N = 3SE +/- 0.35, N = 3SE +/- 0.22, N = 3SE +/- 1.47, N = 3SE +/- 1.81, N = 33210.203212.103213.773209.673205.401. (CC) gcc options: -O3

OpenSSL

RSA 4096-bit Performance

OpenBenchmarking.orgSigns Per Second, More Is BetterOpenSSL 1.0.1gRSA 4096-bit Performancethunderx/vectorize, pre-patchA53 vectorize, pre-patchA53 vectorize/LTO, pre patchA53, post patchA53 mtune/vectorize, post-patch510152025SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 321.5021.5021.5021.5021.501. (CC) gcc options: -O3 -fomit-frame-pointer -lssl -lcrypto -ldl


Phoronix Test Suite v10.8.4