ARM64 gcc codegen comparison gcc 5.4/6.3/7.0 benchmarks running on a Cortex-A53 gcc5 A57 vectorize: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 5.4.0 20160609 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc5 thunderx vectorize: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 5.4.0 20160609 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc5 A72 LTO: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 5.4.0 20160609 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc6 A53: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 6.3.0 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc6 A53 mtune/vectorize: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 6.3.0 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc6 A57 vectorize: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 6.3.0 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc7 A53 vectorize: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc7 thunderx vectorize: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc7 A53 vectorize LTO: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 TTSIOD 3D Renderer 2.3a Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better gcc5 A57 vectorize ....... 22.48 |=========================================== gcc5 thunderx vectorize .. 21.86 |========================================= gcc5 A72 LTO ............. 22.57 |=========================================== gcc6 A53 ................. 22.01 |========================================== gcc6 A53 mtune/vectorize . 21.99 |========================================== gcc6 A57 vectorize ....... 21.71 |========================================= gcc7 A53 vectorize ....... 23.16 |============================================ gcc7 thunderx vectorize .. 23.01 |============================================ gcc7 A53 vectorize LTO ... 23.77 |============================================= GMPbench 0.2 Total Time GMPbench Score > Higher Is Better gcc5 A57 vectorize ....... 554.31 |============================================ gcc5 thunderx vectorize .. 554.44 |============================================ gcc5 A72 LTO ............. 549.96 |============================================ gcc6 A53 ................. 554.94 |============================================ gcc6 A53 mtune/vectorize . 553.02 |============================================ gcc6 A57 vectorize ....... 555.05 |============================================ gcc7 A53 vectorize ....... 552.84 |============================================ gcc7 thunderx vectorize .. 554.83 |============================================ gcc7 A53 vectorize LTO ... 554.37 |============================================ Fhourstones 3.1 Complex Connect-4 Solving Kpos / sec > Higher Is Better gcc5 A57 vectorize ....... 3052.83 |========================================= gcc5 thunderx vectorize .. 3045.83 |========================================= gcc5 A72 LTO ............. 3048.70 |========================================= gcc6 A53 ................. 3129.50 |========================================== gcc6 A53 mtune/vectorize . 3125.77 |========================================== gcc6 A57 vectorize ....... 3123.23 |========================================== gcc7 A53 vectorize ....... 3212.10 |=========================================== gcc7 thunderx vectorize .. 3210.20 |=========================================== gcc7 A53 vectorize LTO ... 3213.77 |=========================================== RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Integer MB/s > Higher Is Better gcc5 A57 vectorize ....... 4614.26 |========================================= gcc5 thunderx vectorize .. 4472.85 |======================================== gcc5 A72 LTO ............. 2916.64 |========================== gcc6 A53 ................. 4847.38 |=========================================== gcc6 A53 mtune/vectorize . 4812.66 |=========================================== gcc6 A57 vectorize ....... 4621.04 |========================================= gcc7 A53 vectorize ....... 4581.32 |========================================= gcc7 thunderx vectorize .. 2821.43 |========================= gcc7 A53 vectorize LTO ... 4829.91 |=========================================== RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Floating Point MB/s > Higher Is Better gcc5 A57 vectorize ....... 4613.36 |========================================= gcc5 thunderx vectorize .. 4497.17 |======================================== gcc5 A72 LTO ............. 2917.54 |========================== gcc6 A53 ................. 4844.14 |=========================================== gcc6 A53 mtune/vectorize . 4809.44 |=========================================== gcc6 A57 vectorize ....... 4624.88 |========================================= gcc7 A53 vectorize ....... 4580.39 |========================================= gcc7 thunderx vectorize .. 2817.45 |========================= gcc7 A53 vectorize LTO ... 4825.13 |=========================================== FFTW 3.3.4 Build: Stock - Size: 2D FFT Size 2048 Mflops > Higher Is Better gcc5 A57 vectorize ....... 189.29 |========================================== gcc5 thunderx vectorize .. 193.88 |=========================================== gcc5 A72 LTO ............. 185.55 |========================================= gcc6 A53 ................. 172.73 |======================================= gcc6 A53 mtune/vectorize . 175.09 |======================================= gcc6 A57 vectorize ....... 164.61 |===================================== gcc7 A53 vectorize ....... 196.90 |============================================ gcc7 thunderx vectorize .. 190.63 |=========================================== gcc7 A53 vectorize LTO ... 180.53 |======================================== Redis 3.0.1 Test: GET Requests Per Second > Higher Is Better gcc5 A57 vectorize ....... 311665.33 |======================================= gcc5 thunderx vectorize .. 303529.91 |====================================== gcc5 A72 LTO ............. 305506.49 |======================================= gcc6 A53 ................. 315587.06 |======================================== gcc6 A53 mtune/vectorize . 317672.24 |======================================== gcc6 A57 vectorize ....... 324752.05 |========================================= gcc7 A53 vectorize ....... 310344.73 |======================================= gcc7 thunderx vectorize .. 318926.02 |======================================== gcc7 A53 vectorize LTO ... 311785.02 |======================================= OpenSSL 1.0.1g RSA 4096-bit Performance Signs Per Second > Higher Is Better gcc5 A57 vectorize ....... 21.30 |============================================= gcc5 thunderx vectorize .. 21.20 |============================================ gcc5 A72 LTO ............. 21.20 |============================================ gcc6 A53 ................. 21.30 |============================================= gcc6 A53 mtune/vectorize . 21.23 |============================================ gcc6 A57 vectorize ....... 21.23 |============================================ gcc7 A53 vectorize ....... 21.50 |============================================= gcc7 thunderx vectorize .. 21.50 |============================================= gcc7 A53 vectorize LTO ... 21.50 |============================================= PostMark 1.51 Disk Transaction Performance TPS > Higher Is Better gcc5 A57 vectorize ....... 1361 |============================================= gcc5 thunderx vectorize .. 1351 |============================================= gcc5 A72 LTO ............. 1363 |============================================= gcc6 A53 ................. 1378 |============================================== gcc6 A53 mtune/vectorize . 1378 |============================================== gcc6 A57 vectorize ....... 1356 |============================================= gcc7 A53 vectorize ....... 1363 |============================================= gcc7 thunderx vectorize .. 1351 |============================================= gcc7 A53 vectorize LTO ... 1378 |============================================== Timed MAFFT Alignment 6.864 Multiple Sequence Alignment Seconds < Lower Is Better gcc5 A57 vectorize ....... 34.01 |========================================== gcc5 thunderx vectorize .. 34.78 |=========================================== gcc5 A72 LTO ............. 34.62 |=========================================== gcc6 A53 ................. 36.10 |============================================= gcc6 A53 mtune/vectorize . 34.23 |=========================================== gcc6 A57 vectorize ....... 34.94 |============================================ gcc7 A53 vectorize ....... 35.42 |============================================ gcc7 thunderx vectorize .. 34.46 |=========================================== gcc7 A53 vectorize LTO ... 33.16 |========================================= C-Ray 1.1 Total Time Seconds < Lower Is Better gcc5 A57 vectorize ....... 150.16 |============================== gcc5 thunderx vectorize .. 152.96 |============================== gcc5 A72 LTO ............. 223.05 |============================================ gcc6 A53 ................. 200.00 |======================================= gcc6 A53 mtune/vectorize . 199.00 |======================================= gcc6 A57 vectorize ....... 144.49 |============================= gcc7 A53 vectorize ....... 187.97 |===================================== gcc7 thunderx vectorize .. 149.82 |============================== gcc7 A53 vectorize LTO ... 184.81 |==================================== Primesieve 5.4.2 1e12 Prime Number Generation Seconds < Lower Is Better gcc5 A57 vectorize ....... 574.61 |========================================= gcc5 thunderx vectorize .. 591.80 |=========================================== gcc5 A72 LTO ............. 604.68 |============================================ gcc6 A53 ................. 610.66 |============================================ gcc6 A53 mtune/vectorize . 592.00 |=========================================== gcc6 A57 vectorize ....... 571.71 |========================================= gcc7 A53 vectorize ....... 543.16 |======================================= gcc7 thunderx vectorize .. 566.21 |========================================= gcc7 A53 vectorize LTO ... 540.95 |======================================= Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better gcc5 A57 vectorize ....... 173 |=============================================== gcc5 thunderx vectorize .. 171 |============================================== gcc5 A72 LTO ............. 172 |=============================================== gcc6 A53 ................. 169 |============================================== gcc6 A53 mtune/vectorize . 169 |============================================== gcc6 A57 vectorize ....... 168 |============================================== gcc7 A53 vectorize ....... 167 |============================================= gcc7 thunderx vectorize .. 167 |============================================= gcc7 A53 vectorize LTO ... 168 |============================================== Sudokut 0.4 Total Time Seconds < Lower Is Better gcc5 A57 vectorize ....... 102.05 |=========================================== gcc5 thunderx vectorize .. 103.63 |============================================ gcc5 A72 LTO ............. 102.65 |============================================ gcc6 A53 ................. 101.99 |=========================================== gcc6 A53 mtune/vectorize . 101.61 |=========================================== gcc6 A57 vectorize ....... 101.99 |=========================================== gcc7 A53 vectorize ....... 101.95 |=========================================== gcc7 thunderx vectorize .. 102.75 |============================================ gcc7 A53 vectorize LTO ... 101.75 |=========================================== Tachyon 0.98.9 Total Time Seconds < Lower Is Better gcc5 A57 vectorize ....... 79.47 |============================================ gcc5 thunderx vectorize .. 81.49 |============================================= gcc5 A72 LTO ............. 82.03 |============================================= gcc6 A53 ................. 71.65 |======================================= gcc6 A53 mtune/vectorize . 71.82 |======================================= gcc6 A57 vectorize ....... 76.94 |========================================== gcc7 A53 vectorize ....... 69.27 |====================================== gcc7 thunderx vectorize .. 71.41 |======================================= gcc7 A53 vectorize LTO ... 67.64 |=====================================