AARCH64 codegen comparison update gcc7's performance on Cortex A53 (32kB L1) A53 vectorize, pre-patch: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 thunderx/vectorize, pre-patch: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53 vectorize/LTO, pre patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53, post patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53 mtune/vectorize, post-patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53 vectorize, updated: Processor: Unknown @ 1.54GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 16GB NCard + 32GB 00000 OS: Ubuntu 16.04, Kernel: 3.14.79-vegas95 (aarch64), Compiler: GCC 7.0.1 20170214 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1280x1440 A53 vectorize, earlier build: Processor: Unknown @ 1.54GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 16GB NCard + 32GB 00000 OS: Ubuntu 16.04, Kernel: 3.14.79-vegas95 (aarch64), Compiler: GCC 7.0.1 20170220 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1280x1440 A57 vectorize/unrolled GCC 7.0.1: Processor: Unknown @ 1.54GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 8GB NCard + 32GB 00000 OS: Ubuntu 16.04, Kernel: 3.14.79-vegas95 (aarch64), Compiler: GCC 7.0.1 20170322 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1280x1440 A53 vectorize GCC 7.0.1: Processor: Unknown @ 1.54GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 8GB NCard + 32GB 00000 OS: Ubuntu 16.04, Kernel: 3.14.79-vegas95 (aarch64), Compiler: GCC 7.0.1 20170322 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1280x1440 A57 vectorize/unrolled GCC 6.3: Processor: Unknown @ 1.54GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 8GB NCard + 32GB 00000 OS: Ubuntu 16.04, Kernel: 3.14.79-vegas95 (aarch64), Compiler: GCC 6.3.1 20170316 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1280x1440 PostMark 1.51 Disk Transaction Performance TPS > Higher Is Better A53 vectorize, pre-patch ......... 1363 |====================================== thunderx/vectorize, pre-patch .... 1351 |===================================== A53 vectorize/LTO, pre patch ..... 1378 |====================================== A53, post patch .................. 1381 |====================================== A53 mtune/vectorize, post-patch .. 1378 |====================================== A53 vectorize, updated ........... 1217 |================================= A53 vectorize, earlier build ..... 1211 |================================= A57 vectorize/unrolled GCC 7.0.1 . 1184 |================================= A53 vectorize GCC 7.0.1 .......... 1194 |================================= A57 vectorize/unrolled GCC 6.3 ... 1190 |================================= RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Integer MB/s > Higher Is Better A53 vectorize, pre-patch ......... 4581.32 |================================ thunderx/vectorize, pre-patch .... 2821.43 |==================== A53 vectorize/LTO, pre patch ..... 4829.91 |================================== A53, post patch .................. 4965.06 |=================================== A53 mtune/vectorize, post-patch .. 4955.97 |=================================== A53 vectorize, updated ........... 4706.40 |================================= A53 vectorize, earlier build ..... 4816.69 |================================== A57 vectorize/unrolled GCC 7.0.1 . 4201.53 |============================== A53 vectorize GCC 7.0.1 .......... 4161.09 |============================= A57 vectorize/unrolled GCC 6.3 ... 4384.53 |=============================== RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Floating Point MB/s > Higher Is Better A53 vectorize, pre-patch ......... 4580.39 |================================ thunderx/vectorize, pre-patch .... 2817.45 |==================== A53 vectorize/LTO, pre patch ..... 4825.13 |================================== A53, post patch .................. 4964.66 |=================================== A53 mtune/vectorize, post-patch .. 4965.60 |=================================== A53 vectorize, updated ........... 4785.59 |================================== A53 vectorize, earlier build ..... 4816.71 |================================== A57 vectorize/unrolled GCC 7.0.1 . 4193.97 |============================== A53 vectorize GCC 7.0.1 .......... 4188.18 |============================== A57 vectorize/unrolled GCC 6.3 ... 4388.73 |=============================== FFTW 3.3.4 Build: Stock - Size: 2D FFT Size 2048 Mflops > Higher Is Better A53 vectorize, pre-patch ......... 196.90 |==================================== thunderx/vectorize, pre-patch .... 190.63 |=================================== A53 vectorize/LTO, pre patch ..... 180.53 |================================= A53, post patch .................. 186.21 |================================== A53 mtune/vectorize, post-patch .. 184.81 |================================== A53 vectorize, updated ........... 185.15 |================================== A53 vectorize, earlier build ..... 191.54 |=================================== A57 vectorize/unrolled GCC 7.0.1 . 173.03 |================================ A53 vectorize GCC 7.0.1 .......... 156.72 |============================= A57 vectorize/unrolled GCC 6.3 ... 157.97 |============================= Timed MAFFT Alignment 6.864 Multiple Sequence Alignment Seconds < Lower Is Better A53 vectorize, pre-patch ......... 35.42 |===================================== thunderx/vectorize, pre-patch .... 34.46 |==================================== A53 vectorize/LTO, pre patch ..... 33.16 |=================================== A53, post patch .................. 33.06 |================================== A53 mtune/vectorize, post-patch .. 32.17 |================================== A53 vectorize, updated ........... 33.90 |=================================== A53 vectorize, earlier build ..... 34.52 |==================================== A57 vectorize/unrolled GCC 7.0.1 . 34.22 |==================================== A53 vectorize GCC 7.0.1 .......... 35.52 |===================================== A57 vectorize/unrolled GCC 6.3 ... 35.47 |===================================== GMPbench 0.2 Total Time GMPbench Score > Higher Is Better A53 vectorize, pre-patch ......... 552.84 |==================================== thunderx/vectorize, pre-patch .... 554.83 |==================================== A53 vectorize/LTO, pre patch ..... 554.37 |==================================== A53, post patch .................. 552.56 |==================================== A53 mtune/vectorize, post-patch .. 555.10 |==================================== A53 vectorize, updated ........... 554.11 |==================================== A53 vectorize, earlier build ..... 554.21 |==================================== A57 vectorize/unrolled GCC 7.0.1 . 553.17 |==================================== A53 vectorize GCC 7.0.1 .......... 552.75 |==================================== A57 vectorize/unrolled GCC 6.3 ... 554.03 |==================================== Fhourstones 3.1 Complex Connect-4 Solving Kpos / sec > Higher Is Better A53 vectorize, pre-patch ......... 3212.10 |================================= thunderx/vectorize, pre-patch .... 3210.20 |================================= A53 vectorize/LTO, pre patch ..... 3213.77 |================================= A53, post patch .................. 3209.67 |================================= A53 mtune/vectorize, post-patch .. 3205.40 |================================= A53 vectorize, updated ........... 3223.57 |================================= A53 vectorize, earlier build ..... 3233.77 |================================= A57 vectorize/unrolled GCC 7.0.1 . 3398.47 |=================================== A53 vectorize GCC 7.0.1 .......... 3415.07 |=================================== A57 vectorize/unrolled GCC 6.3 ... 3325.60 |================================== TTSIOD 3D Renderer 2.3a Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better A53 vectorize, pre-patch ......... 23.16 |==================================== thunderx/vectorize, pre-patch .... 23.01 |==================================== A53 vectorize/LTO, pre patch ..... 23.77 |===================================== A53, post patch .................. 23.47 |===================================== A53 mtune/vectorize, post-patch .. 23.49 |===================================== A53 vectorize, updated ........... 23.29 |==================================== A53 vectorize, earlier build ..... 23.49 |===================================== A57 vectorize/unrolled GCC 7.0.1 . 23.56 |===================================== A53 vectorize GCC 7.0.1 .......... 23.13 |==================================== A57 vectorize/unrolled GCC 6.3 ... 22.29 |=================================== C-Ray 1.1 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ......... 187.97 |==================================== thunderx/vectorize, pre-patch .... 149.82 |============================= A53 vectorize/LTO, pre patch ..... 184.81 |=================================== A53, post patch .................. 186.69 |==================================== A53 mtune/vectorize, post-patch .. 186.61 |==================================== A53 vectorize, updated ........... 161.80 |=============================== A53 vectorize, earlier build ..... 162.23 |=============================== A57 vectorize/unrolled GCC 7.0.1 . 154.81 |============================== A53 vectorize GCC 7.0.1 .......... 151.02 |============================= A57 vectorize/unrolled GCC 6.3 ... 149.61 |============================= Primesieve 5.4.2 1e12 Prime Number Generation Seconds < Lower Is Better A53 vectorize, pre-patch ......... 543.16 |================================== thunderx/vectorize, pre-patch .... 566.21 |=================================== A53 vectorize/LTO, pre patch ..... 540.95 |================================== A53, post patch .................. 553.13 |=================================== A53 mtune/vectorize, post-patch .. 573.13 |==================================== A53 vectorize, updated ........... 574.65 |==================================== A53 vectorize, earlier build ..... 523.43 |================================= A57 vectorize/unrolled GCC 7.0.1 . 525.00 |================================= A53 vectorize GCC 7.0.1 .......... 547.12 |================================== A57 vectorize/unrolled GCC 6.3 ... 531.95 |================================= Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better A53 vectorize, pre-patch ......... 167 |======================================= thunderx/vectorize, pre-patch .... 167 |======================================= A53 vectorize/LTO, pre patch ..... 168 |======================================= A53, post patch .................. 168 |======================================= A53 mtune/vectorize, post-patch .. 167 |======================================= A53 vectorize, updated ........... 166 |====================================== A53 vectorize, earlier build ..... 166 |====================================== A57 vectorize/unrolled GCC 7.0.1 . 168 |======================================= A53 vectorize GCC 7.0.1 .......... 166 |====================================== A57 vectorize/unrolled GCC 6.3 ... 169 |======================================= Sudokut 0.4 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ......... 101.95 |==================================== thunderx/vectorize, pre-patch .... 102.75 |==================================== A53 vectorize/LTO, pre patch ..... 101.75 |==================================== A53, post patch .................. 101.88 |==================================== A53 mtune/vectorize, post-patch .. 102.17 |==================================== A53 vectorize, updated ........... 102.72 |==================================== A53 vectorize, earlier build ..... 102.59 |==================================== A57 vectorize/unrolled GCC 7.0.1 . 103.04 |==================================== A53 vectorize GCC 7.0.1 .......... 102.97 |==================================== A57 vectorize/unrolled GCC 6.3 ... 103.00 |==================================== Tachyon 0.98.9 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ........ 69.27 |==================================== thunderx/vectorize, pre-patch ... 71.41 |====================================== A53 vectorize/LTO, pre patch .... 67.64 |==================================== A53, post patch ................. 69.40 |==================================== A53 mtune/vectorize, post-patch . 69.34 |==================================== A53 vectorize, updated .......... 69.90 |===================================== A53 vectorize, earlier build .... 69.39 |==================================== A53 vectorize GCC 7.0.1 ......... 69.64 |===================================== A57 vectorize/unrolled GCC 6.3 .. 72.28 |====================================== OpenSSL 1.0.1g RSA 4096-bit Performance Signs Per Second > Higher Is Better A53 vectorize, pre-patch ......... 21.50 |===================================== thunderx/vectorize, pre-patch .... 21.50 |===================================== A53 vectorize/LTO, pre patch ..... 21.50 |===================================== A53, post patch .................. 21.50 |===================================== A53 mtune/vectorize, post-patch .. 21.50 |===================================== A53 vectorize, updated ........... 21.40 |===================================== A53 vectorize, earlier build ..... 21.40 |===================================== A57 vectorize/unrolled GCC 7.0.1 . 21.47 |===================================== A53 vectorize GCC 7.0.1 .......... 21.47 |===================================== A57 vectorize/unrolled GCC 6.3 ... 21.20 |==================================== Redis 3.0.1 Test: GET Requests Per Second > Higher Is Better A53 vectorize, pre-patch ......... 310344.73 |================================ thunderx/vectorize, pre-patch .... 318926.02 |================================= A53 vectorize/LTO, pre patch ..... 311785.02 |================================ A53, post patch .................. 309030.64 |================================ A53 mtune/vectorize, post-patch .. 313438.91 |================================ A53 vectorize, updated ........... 277268.23 |============================= A53 vectorize, earlier build ..... 283742.83 |============================= A57 vectorize/unrolled GCC 7.0.1 . 276169.86 |============================= A53 vectorize GCC 7.0.1 .......... 276298.44 |============================= A57 vectorize/unrolled GCC 6.3 ... 275458.70 |=============================