ARM64 gcc codegen comparison gcc 5.4/6.3/7.0 benchmarks running on a Cortex-A53 gcc5 A72 LTO: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 5.4.0 20160609 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc5 thunderx vectorize: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 5.4.0 20160609 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc7 thunderx vectorize: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc7 A53 vectorize: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc7 A53 vectorize LTO: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc6 A53: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 6.3.0 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc6 A53 mtune/vectorize: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 6.3.0 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc6 A57 vectorize: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 6.3.0 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 gcc5 A57 vectorize: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 5.4.0 20160609 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 C-Ray 1.1 Total Time Seconds < Lower Is Better gcc5 A72 LTO ............. 223.05 |============================================ gcc5 thunderx vectorize .. 152.96 |============================== gcc7 thunderx vectorize .. 149.82 |============================== gcc7 A53 vectorize ....... 187.97 |===================================== gcc7 A53 vectorize LTO ... 184.81 |==================================== gcc6 A53 ................. 200.00 |======================================= gcc6 A53 mtune/vectorize . 199.00 |======================================= gcc6 A57 vectorize ....... 144.49 |============================= gcc5 A57 vectorize ....... 150.16 |============================== FFTW 3.3.4 Build: Stock - Size: 2D FFT Size 2048 Mflops > Higher Is Better gcc5 A72 LTO ............. 185.55 |========================================= gcc5 thunderx vectorize .. 193.88 |=========================================== gcc7 thunderx vectorize .. 190.63 |=========================================== gcc7 A53 vectorize ....... 196.90 |============================================ gcc7 A53 vectorize LTO ... 180.53 |======================================== gcc6 A53 ................. 172.73 |======================================= gcc6 A53 mtune/vectorize . 175.09 |======================================= gcc6 A57 vectorize ....... 164.61 |===================================== gcc5 A57 vectorize ....... 189.29 |========================================== Fhourstones 3.1 Complex Connect-4 Solving Kpos / sec > Higher Is Better gcc5 A72 LTO ............. 3048.70 |========================================= gcc5 thunderx vectorize .. 3045.83 |========================================= gcc7 thunderx vectorize .. 3210.20 |=========================================== gcc7 A53 vectorize ....... 3212.10 |=========================================== gcc7 A53 vectorize LTO ... 3213.77 |=========================================== gcc6 A53 ................. 3129.50 |========================================== gcc6 A53 mtune/vectorize . 3125.77 |========================================== gcc6 A57 vectorize ....... 3123.23 |========================================== gcc5 A57 vectorize ....... 3052.83 |========================================= GMPbench 0.2 Total Time GMPbench Score > Higher Is Better gcc5 A72 LTO ............. 549.96 |============================================ gcc5 thunderx vectorize .. 554.44 |============================================ gcc7 thunderx vectorize .. 554.83 |============================================ gcc7 A53 vectorize ....... 552.84 |============================================ gcc7 A53 vectorize LTO ... 554.37 |============================================ gcc6 A53 ................. 554.94 |============================================ gcc6 A53 mtune/vectorize . 553.02 |============================================ gcc6 A57 vectorize ....... 555.05 |============================================ gcc5 A57 vectorize ....... 554.31 |============================================ OpenSSL 1.0.1g RSA 4096-bit Performance Signs Per Second > Higher Is Better gcc5 A72 LTO ............. 21.20 |============================================ gcc5 thunderx vectorize .. 21.20 |============================================ gcc7 thunderx vectorize .. 21.50 |============================================= gcc7 A53 vectorize ....... 21.50 |============================================= gcc7 A53 vectorize LTO ... 21.50 |============================================= gcc6 A53 ................. 21.30 |============================================= gcc6 A53 mtune/vectorize . 21.23 |============================================ gcc6 A57 vectorize ....... 21.23 |============================================ gcc5 A57 vectorize ....... 21.30 |============================================= PostMark 1.51 Disk Transaction Performance TPS > Higher Is Better gcc5 A72 LTO ............. 1363 |============================================= gcc5 thunderx vectorize .. 1351 |============================================= gcc7 thunderx vectorize .. 1351 |============================================= gcc7 A53 vectorize ....... 1363 |============================================= gcc7 A53 vectorize LTO ... 1378 |============================================== gcc6 A53 ................. 1378 |============================================== gcc6 A53 mtune/vectorize . 1378 |============================================== gcc6 A57 vectorize ....... 1356 |============================================= gcc5 A57 vectorize ....... 1361 |============================================= Primesieve 5.4.2 1e12 Prime Number Generation Seconds < Lower Is Better gcc5 A72 LTO ............. 604.68 |============================================ gcc5 thunderx vectorize .. 591.80 |=========================================== gcc7 thunderx vectorize .. 566.21 |========================================= gcc7 A53 vectorize ....... 543.16 |======================================= gcc7 A53 vectorize LTO ... 540.95 |======================================= gcc6 A53 ................. 610.66 |============================================ gcc6 A53 mtune/vectorize . 592.00 |=========================================== gcc6 A57 vectorize ....... 571.71 |========================================= gcc5 A57 vectorize ....... 574.61 |========================================= RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Integer MB/s > Higher Is Better gcc5 A72 LTO ............. 2916.64 |========================== gcc5 thunderx vectorize .. 4472.85 |======================================== gcc7 thunderx vectorize .. 2821.43 |========================= gcc7 A53 vectorize ....... 4581.32 |========================================= gcc7 A53 vectorize LTO ... 4829.91 |=========================================== gcc6 A53 ................. 4847.38 |=========================================== gcc6 A53 mtune/vectorize . 4812.66 |=========================================== gcc6 A57 vectorize ....... 4621.04 |========================================= gcc5 A57 vectorize ....... 4614.26 |========================================= RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Floating Point MB/s > Higher Is Better gcc5 A72 LTO ............. 2917.54 |========================== gcc5 thunderx vectorize .. 4497.17 |======================================== gcc7 thunderx vectorize .. 2817.45 |========================= gcc7 A53 vectorize ....... 4580.39 |========================================= gcc7 A53 vectorize LTO ... 4825.13 |=========================================== gcc6 A53 ................. 4844.14 |=========================================== gcc6 A53 mtune/vectorize . 4809.44 |=========================================== gcc6 A57 vectorize ....... 4624.88 |========================================= gcc5 A57 vectorize ....... 4613.36 |========================================= Redis 3.0.1 Test: GET Requests Per Second > Higher Is Better gcc5 A72 LTO ............. 305506.49 |======================================= gcc5 thunderx vectorize .. 303529.91 |====================================== gcc7 thunderx vectorize .. 318926.02 |======================================== gcc7 A53 vectorize ....... 310344.73 |======================================= gcc7 A53 vectorize LTO ... 311785.02 |======================================= gcc6 A53 ................. 315587.06 |======================================== gcc6 A53 mtune/vectorize . 317672.24 |======================================== gcc6 A57 vectorize ....... 324752.05 |========================================= gcc5 A57 vectorize ....... 311665.33 |======================================= Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better gcc5 A72 LTO ............. 172 |=============================================== gcc5 thunderx vectorize .. 171 |============================================== gcc7 thunderx vectorize .. 167 |============================================= gcc7 A53 vectorize ....... 167 |============================================= gcc7 A53 vectorize LTO ... 168 |============================================== gcc6 A53 ................. 169 |============================================== gcc6 A53 mtune/vectorize . 169 |============================================== gcc6 A57 vectorize ....... 168 |============================================== gcc5 A57 vectorize ....... 173 |=============================================== Sudokut 0.4 Total Time Seconds < Lower Is Better gcc5 A72 LTO ............. 102.65 |============================================ gcc5 thunderx vectorize .. 103.63 |============================================ gcc7 thunderx vectorize .. 102.75 |============================================ gcc7 A53 vectorize ....... 101.95 |=========================================== gcc7 A53 vectorize LTO ... 101.75 |=========================================== gcc6 A53 ................. 101.99 |=========================================== gcc6 A53 mtune/vectorize . 101.61 |=========================================== gcc6 A57 vectorize ....... 101.99 |=========================================== gcc5 A57 vectorize ....... 102.05 |=========================================== Tachyon 0.98.9 Total Time Seconds < Lower Is Better gcc5 A72 LTO ............. 82.03 |============================================= gcc5 thunderx vectorize .. 81.49 |============================================= gcc7 thunderx vectorize .. 71.41 |======================================= gcc7 A53 vectorize ....... 69.27 |====================================== gcc7 A53 vectorize LTO ... 67.64 |===================================== gcc6 A53 ................. 71.65 |======================================= gcc6 A53 mtune/vectorize . 71.82 |======================================= gcc6 A57 vectorize ....... 76.94 |========================================== gcc5 A57 vectorize ....... 79.47 |============================================ Timed MAFFT Alignment 6.864 Multiple Sequence Alignment Seconds < Lower Is Better gcc5 A72 LTO ............. 34.62 |=========================================== gcc5 thunderx vectorize .. 34.78 |=========================================== gcc7 thunderx vectorize .. 34.46 |=========================================== gcc7 A53 vectorize ....... 35.42 |============================================ gcc7 A53 vectorize LTO ... 33.16 |========================================= gcc6 A53 ................. 36.10 |============================================= gcc6 A53 mtune/vectorize . 34.23 |=========================================== gcc6 A57 vectorize ....... 34.94 |============================================ gcc5 A57 vectorize ....... 34.01 |========================================== TTSIOD 3D Renderer 2.3a Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better gcc5 A72 LTO ............. 22.57 |=========================================== gcc5 thunderx vectorize .. 21.86 |========================================= gcc7 thunderx vectorize .. 23.01 |============================================ gcc7 A53 vectorize ....... 23.16 |============================================ gcc7 A53 vectorize LTO ... 23.77 |============================================= gcc6 A53 ................. 22.01 |========================================== gcc6 A53 mtune/vectorize . 21.99 |========================================== gcc6 A57 vectorize ....... 21.71 |========================================= gcc5 A57 vectorize ....... 22.48 |===========================================