ARMv7 performance evolution

GCC 4.9 vs 6/7

HTML result view exported from: https://openbenchmarking.org/result/1609188-LO-1511262DE80.

ARMv7 performance evolutionProcessorMotherboardMemoryDiskOSKernelDesktopDisplay ServerOpenGLCompilerFile-SystemScreen Resolution4.9 orig.6.2.14.9.47.0.0ARMv7 rev 1 @ 1.73GHz (4 Cores)ODROIDC948MB16GB SL16GUbuntu 14.043.10.80-120 (armv7l)LXDE 0.6.1X Server 1.15.12.1 Mesa 10.1.3GCC 4.9.2ext41280x1024916MB60GB A + 64GB 000003.10.96-149 (armv7l)GCC 6.2.1 20160901 + Clang 3.6.0-2ubuntu1~trusty1 + LLVM 3.6.0GCC 4.9.4 + Clang 3.6.0-2ubuntu1~trusty1 + LLVM 3.6.0GCC 7.0.0 20160916 + Clang 3.6.0-2ubuntu1~trusty1 + LLVM 3.6.0OpenBenchmarking.orgCompiler Details- 4.9 orig.: --build=arm-linux-gnueabihf --disable-browser-plugin --disable-libitm --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-float=hard --with-fpu=vfpv3-d16 --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.9-armhf/jre --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.9-armhf --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.9-armhf --with-mode=thumb -v - 6.2.1: --build=arm-linux-gnueabihf --disable-browser-plugin --disable-libitm --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-multilib --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-default-libstdcxx-abi=gcc4-compatible --with-float=hard --with-fpu=vfpv3 --with-mode=arm -v - 4.9.4: --build=arm-linux-gnueabihf --disable-browser-plugin --disable-libitm --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gtk-cairo --enable-java-awt=gtk --enable-java-home --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-multilib --enable-nls --enable-objc-gc --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb -v - 7.0.0: --build=arm-linux-gnueabihf --disable-bootstrap --disable-browser-plugin --disable-libitm --disable-libquadmath --disable-sjlj-exceptions --disable-werror --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-languages=c,c++,fortran --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-multilib --enable-nls --enable-plugin --enable-shared --enable-threads=posix --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --with-arch-directory=arm --with-arch=armv7-a --with-default-libstdcxx-abi=gcc4-compatible --with-float=hard --with-fpu=vfpv3 --with-mode=arm -v Processor Details- 4.9 orig.: Scaling Governor: meson_cpufreq interactive- 6.2.1: Scaling Governor: meson_cpufreq performance- 4.9.4: Scaling Governor: meson_cpufreq performance- 7.0.0: Scaling Governor: meson_cpufreq performance

ARMv7 performance evolutionfftw: Stock - 2D FFT Size 2048fhourstones: Complex Connect-4 Solvingvpxenc: vpxencbuild-apache: Time To Compilec-ray: Total Timeprimesieve: 1e12 Prime Number Generationsmallpt: Global Illumination Renderer; 100 Samplesstockfish: Total Timeencode-flac: WAV To FLACffmpeg: H.264 HD To NTSC DVn-queens: Elapsed Timepgbench: Buffer Test - Normal Load - Read Writepgbench: Buffer Test - Single Thread - Read Writeredis: GETredis: SET4.9 orig.6.2.14.9.47.0.0124.231184.604.40533.04258.85994.8932346435174.16213.49226.57110.45100.78164362.15123463.07138.201135.104.24388.98262.32995.1430854082171.15225.11228.20561.99249.46171361.76119440.02133.191238.133.88366.01268.731042.5231255482170.55226.66229.64551.94244.61178988.49126638.55131.131243.234.21589.82261.82928.5030752711172.25223.02202.09559.72253.25153872.77110578.01OpenBenchmarking.org

FFTW

Build: Stock - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.4Build: Stock - Size: 2D FFT Size 20484.9 orig.6.2.14.9.47.0.0306090120150SE +/- 1.07, N = 5SE +/- 0.12, N = 5SE +/- 0.14, N = 5SE +/- 0.54, N = 5124.23138.20133.19131.13-std=gnu99 -mfloat-abi=hard -flto -fuse-linker-plugin -ffat-lto-objects -ffast-math-mcpu=cortex-a5 -marm -fomit-frame-pointer-std=gnu99 -mcpu=cortex-a5 -marm -fomit-frame-pointer-mcpu=cortex-a5 -marm -fomit-frame-pointer -flto -ffat-lto-objects1. (CC) gcc options: -O3 -fipa-pta -ftree-vectorize -mfpu=neon -lm

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 Solving4.9 orig.6.2.14.9.47.0.030060090012001500SE +/- 2.86, N = 3SE +/- 0.15, N = 3SE +/- 0.83, N = 3SE +/- 1.23, N = 31184.601135.101238.131243.231. (CC) gcc options: -O3

VP8 libvpx Encoding

vpxenc

OpenBenchmarking.orgFrames Per Second, More Is BetterVP8 libvpx Encoding 1.3.0vpxenc4.9 orig.6.2.14.9.47.0.00.991.982.973.964.95SE +/- 0.05, N = 3SE +/- 0.07, N = 6SE +/- 0.07, N = 3SE +/- 0.08, N = 64.404.243.884.21-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects1. (CXX) g++ options: -lvpx -lgtest -lpthread -lm -O3

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To Compile4.9 orig.6.2.14.9.47.0.0130260390520650SE +/- 2.88, N = 3SE +/- 1.92, N = 3SE +/- 0.44, N = 3SE +/- 1.31, N = 3533.04388.98366.01589.82

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time4.9 orig.6.2.14.9.47.0.060120180240300SE +/- 0.18, N = 3SE +/- 0.70, N = 3SE +/- 0.62, N = 3SE +/- 0.25, N = 3258.85262.32268.73261.82-mfloat-abi=hard -flto -ffat-lto-objects -ffast-math-mcpu=cortex-a5 -marm-mcpu=cortex-a5 -marm-mcpu=cortex-a5 -marm -flto -ffat-lto-objects1. (CC) gcc options: -lm -lpthread -O3 -fipa-pta -ftree-vectorize -mfpu=neon -fomit-frame-pointer

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 5.4.21e12 Prime Number Generation4.9 orig.6.2.14.9.47.0.02004006008001000SE +/- 19.94, N = 3SE +/- 17.44, N = 3SE +/- 23.33, N = 4SE +/- 11.02, N = 3994.89995.141042.52928.50-mfloat-abi=hard -flto -ffat-lto-objects -mthumb-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects1. (CXX) g++ options: -O3 -fipa-pta -fopenmp

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 Samples4.9 orig.6.2.14.9.47.0.070140210280350SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 1.53, N = 3SE +/- 0.33, N = 3323308312307-mfloat-abi=hard -flto -ffat-lto-objects-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects1. (CXX) g++ options: -fopenmp -O3 -fipa-pta

Stockfish

Total Time

OpenBenchmarking.orgms, Fewer Is BetterStockfish 2014-11-26Total Time4.9 orig.6.2.14.9.47.0.012K24K36K48K60KSE +/- 794.66, N = 4SE +/- 102.22, N = 3SE +/- 1183.39, N = 6SE +/- 204.86, N = 346435540825548252711-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -fipa-pta -mfpu=neon -ftree-vectorize -ffat-lto-objects1. (CXX) g++ options: -lpthread -fno-exceptions -fno-rtti -ansi -pedantic -O3 -flto

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.1WAV To FLAC4.9 orig.6.2.14.9.47.0.04080120160200SE +/- 1.15, N = 5SE +/- 2.37, N = 5SE +/- 1.55, N = 5SE +/- 1.90, N = 5174.16171.15170.55172.25-mfloat-abi=hard -flto -ffat-lto-objects -ffast-math-mcpu=cortex-a5 -marm -fomit-frame-pointer-mcpu=cortex-a5 -marm -fomit-frame-pointer-mcpu=cortex-a5 -marm -fomit-frame-pointer -flto -ffat-lto-objects1. (CXX) g++ options: -O3 -fipa-pta -ftree-vectorize -mfpu=neon -fvisibility=hidden -logg -lm

FFmpeg

H.264 HD To NTSC DV

OpenBenchmarking.orgSeconds, Fewer Is BetterFFmpeg 2.6.2H.264 HD To NTSC DV4.9 orig.6.2.14.9.47.0.050100150200250SE +/- 1.05, N = 3SE +/- 0.08, N = 3SE +/- 1.04, N = 3SE +/- 0.75, N = 3213.49225.11226.66223.02-ljack -mfloat-abi=hard -flto -fuse-linker-plugin -ffat-lto-objects -ffast-math -mthumb-mcpu=cortex-a5 -marm -mfpu=neon-mcpu=cortex-a5 -marm -mfpu=neon-mcpu=cortex-a5 -marm -mfpu=neon -flto -ffat-lto-objects1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -lXv -lX11 -lXext -lxcb -lxcb-shm -lxcb-xfixes -lxcb-render -lxcb-shape -lasound -lSDL -lm -llzma -lbz2 -pthread -O3 -fipa-pta -ftree-vectorize -march=armv7-a -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT

N-Queens

Elapsed Time

OpenBenchmarking.orgSeconds, Fewer Is BetterN-Queens 1.0Elapsed Time4.9 orig.6.2.14.9.47.0.050100150200250SE +/- 0.18, N = 3SE +/- 0.12, N = 3SE +/- 1.21, N = 3SE +/- 0.00, N = 3226.57228.20229.64202.09-mfloat-abi=hard -ffat-lto-objects -ffast-math-mcpu=cortex-a5 -marm-mcpu=cortex-a5 -marm1. (CC) gcc options: -static -fopenmp -O3 -fipa-pta -ftree-vectorize -mfpu=neon -fomit-frame-pointer

PostgreSQL pgbench

Scaling: Buffer Test - Test: Normal Load - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 9.4.3Scaling: Buffer Test - Test: Normal Load - Mode: Read Write4.9 orig.6.2.14.9.47.0.0120240360480600SE +/- 31.02, N = 6SE +/- 9.60, N = 3SE +/- 7.92, N = 3SE +/- 8.38, N = 4110.45561.99551.94559.72-mfloat-abi=hard -flto -ffat-lto-objects-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -fipa-pta -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

PostgreSQL pgbench

Scaling: Buffer Test - Test: Single Thread - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 9.4.3Scaling: Buffer Test - Test: Single Thread - Mode: Read Write4.9 orig.6.2.14.9.47.0.060120180240300SE +/- 3.92, N = 6SE +/- 1.35, N = 3SE +/- 1.51, N = 3SE +/- 0.05, N = 3100.78249.46244.61253.25-mfloat-abi=hard -flto -ffat-lto-objects-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize-mcpu=cortex-a5 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize -flto -ffat-lto-objects1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -fipa-pta -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 3.0.1Test: GET4.9 orig.6.2.14.9.47.0.040K80K120K160K200KSE +/- 1099.43, N = 3SE +/- 385.21, N = 3SE +/- 2715.81, N = 5SE +/- 2538.14, N = 6164362.15171361.76178988.49153872.77-std=gnu99 -pipe -g3 -funroll-loops -mfloat-abi=hard -flto -fuse-linker-plugin -ffat-lto-objects -ffast-math-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer -flto -ffat-lto-objects1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl -O3 -fipa-pta -ftree-vectorize -mfpu=neon

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 3.0.1Test: SET4.9 orig.6.2.14.9.47.0.030K60K90K120K150KSE +/- 1276.03, N = 3SE +/- 3005.56, N = 6SE +/- 1096.12, N = 3SE +/- 2166.40, N = 3123463.07119440.02126638.55110578.01-std=gnu99 -pipe -g3 -funroll-loops -mfloat-abi=hard -flto -fuse-linker-plugin -ffat-lto-objects -ffast-math-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer-O2 -mcpu=cortex-a5 -marm -fomit-frame-pointer -flto -ffat-lto-objects1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl -O3 -fipa-pta -ftree-vectorize -mfpu=neon


Phoronix Test Suite v10.8.4