ARMv7 performance evolution

GCC 4.9 vs 6/7

HTML result view exported from: https://openbenchmarking.org/result/1609188-LO-1511262DE80&grr&sor.

ARMv7 performance evolutionProcessorMotherboardMemoryDiskOSKernelDesktopDisplay ServerOpenGLCompilerFile-SystemScreen Resolution4.9 orig.6.2.14.9.47.0.0ARMv7 rev 1 @ 1.73GHz (4 Cores)ODROIDC948MB16GB SL16GUbuntu 14.043.10.80-120 (armv7l)LXDE 0.6.1X Server 1.15.12.1 Mesa 10.1.3GCC 4.9.2ext41280x1024916MB60GB A + 64GB 000003.10.96-149 (armv7l)GCC 6.2.1 20160901 + Clang 3.6.0-2ubuntu1~trusty1 + LLVM 3.6.0GCC 4.9.4 + Clang 3.6.0-2ubuntu1~trusty1 + LLVM 3.6.0GCC 7.0.0 20160916 + Clang 3.6.0-2ubuntu1~trusty1 + LLVM 3.6.0OpenBenchmarking.orgCompiler Details- 4.9 orig.: --build=arm-linux-gnueabihf --disable-browser-plugin --disable-libitm --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-float=hard --with-fpu=vfpv3-d16 --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-armhf/jre --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-armhf --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-armhf --with-mode=thumb -v - 6.2.1: --build=arm-linux-gnueabihf --disable-browser-plugin --disable-libitm --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-multilib --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-default-libstdcxx-abi=gcc4-compatible --with-float=hard --with-fpu=vfpv3 --with-mode=arm -v - 4.9.4: --build=arm-linux-gnueabihf --disable-browser-plugin --disable-libitm --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb -v - 7.0.0: --build=arm-linux-gnueabihf --disable-bootstrap --disable-browser-plugin --disable-libitm --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-multilib --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-default-libstdcxx-abi=gcc4-compatible --with-float=hard --with-fpu=vfpv3 --with-mode=arm -v Processor Details- 4.9 orig.: Scaling Governor: meson_cpufreq interactive- 6.2.1: Scaling Governor: meson_cpufreq performance- 4.9.4: Scaling Governor: meson_cpufreq performance- 7.0.0: Scaling Governor: meson_cpufreq performance

ARMv7 performance evolutionredis: SETredis: GETpgbench: Buffer Test - Single Thread - Read Writepgbench: Buffer Test - Normal Load - Read Writen-queens: Elapsed Timeffmpeg: H.264 HD To NTSC DVencode-flac: WAV To FLACstockfish: Total Timesmallpt: Global Illumination Renderer; 100 Samplesprimesieve: 1e12 Prime Number Generationc-ray: Total Timebuild-apache: Time To Compilevpxenc: vpxencfhourstones: Complex Connect-4 Solvingfftw: Stock - 2D FFT Size 20484.9 orig.6.2.14.9.47.0.0123463.07164362.15100.78110.45226.57213.49174.1646435323994.89258.85533.044.401184.60124.23119440.02171361.76249.46561.99228.20225.11171.1554082308995.14262.32388.984.241135.10138.20126638.55178988.49244.61551.94229.64226.66170.55554823121042.52268.73366.013.881238.13133.19110578.01153872.77253.25559.72202.09223.02172.2552711307928.50261.82589.824.211243.23131.13OpenBenchmarking.org

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 3.0.1Test: SET4.9.44.9 orig.6.2.17.0.030K60K90K120K150KSE +/- 1096.12, N = 3SE +/- 1276.03, N = 3SE +/- 3005.56, N = 6SE +/- 2166.40, N = 3126638.55123463.07119440.02110578.01-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer-std=gnu99 -pipe -g3 -funroll-loops -mfloat-abi=hard -flto -fuse-linker-plugin -ffat-lto-objects -ffast-math-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer -flto -ffat-lto-objects1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl -O3 -fipa-pta -mfpu=neon -ftree-vectorize

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 3.0.1Test: GET4.9.46.2.14.9 orig.7.0.040K80K120K160K200KSE +/- 2715.81, N = 5SE +/- 385.21, N = 3SE +/- 1099.43, N = 3SE +/- 2538.14, N = 6178988.49171361.76164362.15153872.77-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer-std=gnu99 -pipe -g3 -funroll-loops -mfloat-abi=hard -flto -fuse-linker-plugin -ffat-lto-objects -ffast-math-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer -flto -ffat-lto-objects1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl -O3 -fipa-pta -mfpu=neon -ftree-vectorize

PostgreSQL pgbench

Scaling: Buffer Test - Test: Single Thread - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 9.4.3Scaling: Buffer Test - Test: Single Thread - Mode: Read Write7.0.06.2.14.9.44.9 orig.60120180240300SE +/- 0.05, N = 3SE +/- 1.35, N = 3SE +/- 1.51, N = 3SE +/- 3.92, N = 6253.25249.46244.61100.78-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mfloat-abi=hard -flto -ffat-lto-objects1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -fipa-pta -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

PostgreSQL pgbench

Scaling: Buffer Test - Test: Normal Load - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 9.4.3Scaling: Buffer Test - Test: Normal Load - Mode: Read Write6.2.17.0.04.9.44.9 orig.120240360480600SE +/- 9.60, N = 3SE +/- 8.38, N = 4SE +/- 7.92, N = 3SE +/- 31.02, N = 6561.99559.72551.94110.45-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mfloat-abi=hard -flto -ffat-lto-objects1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -fipa-pta -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

N-Queens

Elapsed Time

OpenBenchmarking.orgSeconds, Fewer Is BetterN-Queens 1.0Elapsed Time7.0.04.9 orig.6.2.14.9.450100150200250SE +/- 0.00, N = 3SE +/- 0.18, N = 3SE +/- 0.12, N = 3SE +/- 1.21, N = 3202.09226.57228.20229.64-mfloat-abi=hard -ffat-lto-objects -ffast-math-mcpu=cortex-a5 -marm-mcpu=cortex-a5 -marm1. (CC) gcc options: -static -fopenmp -O3 -fipa-pta -ftree-vectorize -mfpu=neon -fomit-frame-pointer

FFmpeg

H.264 HD To NTSC DV

OpenBenchmarking.orgSeconds, Fewer Is BetterFFmpeg 2.6.2H.264 HD To NTSC DV4.9 orig.7.0.06.2.14.9.450100150200250SE +/- 1.05, N = 3SE +/- 0.75, N = 3SE +/- 0.08, N = 3SE +/- 1.04, N = 3213.49223.02225.11226.66-ljack -mfloat-abi=hard -flto -fuse-linker-plugin -ffat-lto-objects -ffast-math -mthumb-mcpu=cortex-a5 -marm -mfpu=neon -flto -ffat-lto-objects-mcpu=cortex-a5 -marm -mfpu=neon-mcpu=cortex-a5 -marm -mfpu=neon1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -lXv -lX11 -lXext -lxcb -lxcb-shm -lxcb-xfixes -lxcb-render -lxcb-shape -lasound -lSDL -lm -llzma -lbz2 -pthread -O3 -fipa-pta -ftree-vectorize -march=armv7-a -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.1WAV To FLAC4.9.46.2.17.0.04.9 orig.4080120160200SE +/- 1.55, N = 5SE +/- 2.37, N = 5SE +/- 1.90, N = 5SE +/- 1.15, N = 5170.55171.15172.25174.16-mcpu=cortex-a5 -marm -fomit-frame-pointer-mcpu=cortex-a5 -marm -fomit-frame-pointer-mcpu=cortex-a5 -marm -fomit-frame-pointer -flto -ffat-lto-objects-mfloat-abi=hard -flto -ffat-lto-objects -ffast-math1. (CXX) g++ options: -O3 -fipa-pta -mfpu=neon -ftree-vectorize -fvisibility=hidden -logg -lm

Stockfish

Total Time

OpenBenchmarking.orgms, Fewer Is BetterStockfish 2014-11-26Total Time4.9 orig.7.0.06.2.14.9.412K24K36K48K60KSE +/- 794.66, N = 4SE +/- 204.86, N = 3SE +/- 102.22, N = 3SE +/- 1183.39, N = 646435527115408255482-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize -ffat-lto-objects-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize1. (CXX) g++ options: -lpthread -fno-exceptions -fno-rtti -ansi -pedantic -O3 -flto

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 Samples7.0.06.2.14.9.44.9 orig.70140210280350SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 1.53, N = 3SE +/- 0.00, N = 3307308312323-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mfloat-abi=hard -flto -ffat-lto-objects1. (CXX) g++ options: -fopenmp -O3 -fipa-pta

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 5.4.21e12 Prime Number Generation7.0.04.9 orig.6.2.14.9.42004006008001000SE +/- 11.02, N = 3SE +/- 19.94, N = 3SE +/- 17.44, N = 3SE +/- 23.33, N = 4928.50994.89995.141042.52-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects-mfloat-abi=hard -flto -ffat-lto-objects -mthumb-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize1. (CXX) g++ options: -O3 -fipa-pta -fopenmp

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time4.9 orig.7.0.06.2.14.9.460120180240300SE +/- 0.18, N = 3SE +/- 0.25, N = 3SE +/- 0.70, N = 3SE +/- 0.62, N = 3258.85261.82262.32268.73-mfloat-abi=hard -flto -ffat-lto-objects -ffast-math-mcpu=cortex-a5 -marm -flto -ffat-lto-objects-mcpu=cortex-a5 -marm-mcpu=cortex-a5 -marm1. (CC) gcc options: -lm -lpthread -O3 -fipa-pta -ftree-vectorize -mfpu=neon -fomit-frame-pointer

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To Compile4.9.46.2.14.9 orig.7.0.0130260390520650SE +/- 0.44, N = 3SE +/- 1.92, N = 3SE +/- 2.88, N = 3SE +/- 1.31, N = 3366.01388.98533.04589.82

VP8 libvpx Encoding

vpxenc

OpenBenchmarking.orgFrames Per Second, More Is BetterVP8 libvpx Encoding 1.3.0vpxenc4.9 orig.6.2.17.0.04.9.40.991.982.973.964.95SE +/- 0.05, N = 3SE +/- 0.07, N = 6SE +/- 0.08, N = 6SE +/- 0.07, N = 34.404.244.213.88-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize1. (CXX) g++ options: -lvpx -lgtest -lpthread -lm -O3

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 Solving7.0.04.9.44.9 orig.6.2.130060090012001500SE +/- 1.23, N = 3SE +/- 0.83, N = 3SE +/- 2.86, N = 3SE +/- 0.15, N = 31243.231238.131184.601135.101. (CC) gcc options: -O3

FFTW

Build: Stock - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.4Build: Stock - Size: 2D FFT Size 20486.2.14.9.47.0.04.9 orig.306090120150SE +/- 0.12, N = 5SE +/- 0.14, N = 5SE +/- 0.54, N = 5SE +/- 1.07, N = 5138.20133.19131.13124.23-mcpu=cortex-a5 -marm -fomit-frame-pointer-std=gnu99 -mcpu=cortex-a5 -marm -fomit-frame-pointer-mcpu=cortex-a5 -marm -fomit-frame-pointer -flto -ffat-lto-objects-std=gnu99 -mfloat-abi=hard -flto -fuse-linker-plugin -ffat-lto-objects -ffast-math1. (CC) gcc options: -O3 -fipa-pta -mfpu=neon -ftree-vectorize -lm


Phoronix Test Suite v10.8.4