Cortex A53 GCC7 codegen comparison Benchmarking the effect of d8c4c75 ARM patch A53 vectorize, pre-patch: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 thunderx/vectorize, pre-patch: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53 vectorize/LTO, pre patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53, post patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53 mtune/vectorize, post-patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53/clang 3.8: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: Clang 3.8.0-2ubuntu4 + GCC 5.4.0 20160609 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A72 vectorize: Processor: AArch64 rev 4 @ 2.00GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 16GB NCard + 32GB 00000 OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.1 20170127 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 thunderx mtune: Processor: Unknown @ 1.54GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 16GB NCard + 32GB 00000 OS: Ubuntu 16.04, Kernel: 3.14.79-vegas95 (aarch64), Compiler: GCC 7.0.1 20170214 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1280x1440 A53 vectorize, updated: Processor: Unknown @ 1.54GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 16GB NCard + 32GB 00000 OS: Ubuntu 16.04, Kernel: 3.14.79-vegas95 (aarch64), Compiler: GCC 7.0.1 20170214 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1280x1440 PostMark 1.51 Disk Transaction Performance TPS > Higher Is Better A53 vectorize, pre-patch ........ 1363 |====================================== thunderx/vectorize, pre-patch ... 1351 |====================================== A53 vectorize/LTO, pre patch .... 1378 |====================================== A53, post patch ................. 1381 |====================================== A53 mtune/vectorize, post-patch . 1378 |====================================== A53/clang 3.8 ................... 1358 |====================================== A72 vectorize ................... 1399 |======================================= thunderx mtune .................. 1225 |================================== A53 vectorize, updated .......... 1217 |================================== RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Integer MB/s > Higher Is Better A53 vectorize, pre-patch ........ 4581.32 |================================= thunderx/vectorize, pre-patch ... 2821.43 |==================== A53 vectorize/LTO, pre patch .... 4829.91 |=================================== A53, post patch ................. 4965.06 |==================================== A53 mtune/vectorize, post-patch . 4955.97 |==================================== A72 vectorize ................... 4769.37 |=================================== thunderx mtune .................. 2604.10 |=================== A53 vectorize, updated .......... 4706.40 |================================== RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Floating Point MB/s > Higher Is Better A53 vectorize, pre-patch ........ 4580.39 |================================= thunderx/vectorize, pre-patch ... 2817.45 |==================== A53 vectorize/LTO, pre patch .... 4825.13 |=================================== A53, post patch ................. 4964.66 |==================================== A53 mtune/vectorize, post-patch . 4965.60 |==================================== A72 vectorize ................... 4862.68 |=================================== thunderx mtune .................. 2606.30 |=================== A53 vectorize, updated .......... 4785.59 |=================================== FFTW 3.3.4 Build: Stock - Size: 2D FFT Size 2048 Mflops > Higher Is Better A53 vectorize, pre-patch ........ 196.90 |===================================== thunderx/vectorize, pre-patch ... 190.63 |==================================== A53 vectorize/LTO, pre patch .... 180.53 |================================== A53, post patch ................. 186.21 |=================================== A53 mtune/vectorize, post-patch . 184.81 |=================================== A53/clang 3.8 ................... 184.62 |=================================== A72 vectorize ................... 175.61 |================================= thunderx mtune .................. 175.46 |================================= A53 vectorize, updated .......... 185.15 |=================================== Timed MAFFT Alignment 6.864 Multiple Sequence Alignment Seconds < Lower Is Better A53 vectorize, pre-patch ........ 35.42 |====================================== thunderx/vectorize, pre-patch ... 34.46 |===================================== A53 vectorize/LTO, pre patch .... 33.16 |==================================== A53, post patch ................. 33.06 |=================================== A53 mtune/vectorize, post-patch . 32.17 |=================================== A53/clang 3.8 ................... 27.87 |============================== A72 vectorize ................... 33.02 |=================================== thunderx mtune .................. 31.89 |================================== A53 vectorize, updated .......... 33.90 |==================================== GMPbench 0.2 Total Time GMPbench Score > Higher Is Better A53 vectorize, pre-patch ........ 552.84 |===================================== thunderx/vectorize, pre-patch ... 554.83 |===================================== A53 vectorize/LTO, pre patch .... 554.37 |===================================== A53, post patch ................. 552.56 |===================================== A53 mtune/vectorize, post-patch . 555.10 |===================================== A72 vectorize ................... 554.06 |===================================== thunderx mtune .................. 550.57 |===================================== A53 vectorize, updated .......... 554.11 |===================================== Fhourstones 3.1 Complex Connect-4 Solving Kpos / sec > Higher Is Better A53 vectorize, pre-patch ........ 3212.10 |================================== thunderx/vectorize, pre-patch ... 3210.20 |================================== A53 vectorize/LTO, pre patch .... 3213.77 |================================== A53, post patch ................. 3209.67 |================================== A53 mtune/vectorize, post-patch . 3205.40 |================================== A53/clang 3.8 ................... 3358.40 |==================================== A72 vectorize ................... 3206.00 |================================== thunderx mtune .................. 3189.40 |================================== A53 vectorize, updated .......... 3223.57 |=================================== TTSIOD 3D Renderer 2.3a Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better A53 vectorize, pre-patch ........ 23.16 |===================================== thunderx/vectorize, pre-patch ... 23.01 |===================================== A53 vectorize/LTO, pre patch .... 23.77 |====================================== A53, post patch ................. 23.47 |====================================== A53 mtune/vectorize, post-patch . 23.49 |====================================== A72 vectorize ................... 23.48 |====================================== thunderx mtune .................. 23.34 |===================================== A53 vectorize, updated .......... 23.29 |===================================== C-Ray 1.1 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ........ 187.97 |============================ thunderx/vectorize, pre-patch ... 149.82 |====================== A53 vectorize/LTO, pre patch .... 184.81 |=========================== A53, post patch ................. 186.69 |============================ A53 mtune/vectorize, post-patch . 186.61 |============================ A53/clang 3.8 ................... 250.77 |===================================== A72 vectorize ................... 150.90 |====================== thunderx mtune .................. 155.28 |======================= A53 vectorize, updated .......... 161.80 |======================== Primesieve 5.4.2 1e12 Prime Number Generation Seconds < Lower Is Better A53 vectorize, pre-patch ........ 543.16 |========= thunderx/vectorize, pre-patch ... 566.21 |========= A53 vectorize/LTO, pre patch .... 540.95 |========= A53, post patch ................. 553.13 |========= A53 mtune/vectorize, post-patch . 573.13 |========= A53/clang 3.8 ................... 2256.35 |==================================== A72 vectorize ................... 549.83 |========= thunderx mtune .................. 607.41 |========== A53 vectorize, updated .......... 574.65 |========= Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better A53 vectorize, pre-patch ........ 167 |======================================== thunderx/vectorize, pre-patch ... 167 |======================================== A53 vectorize/LTO, pre patch .... 168 |======================================== A53, post patch ................. 168 |======================================== A53 mtune/vectorize, post-patch . 167 |======================================== A72 vectorize ................... 167 |======================================== thunderx mtune .................. 167 |======================================== A53 vectorize, updated .......... 166 |======================================== Sudokut 0.4 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ........ 101.95 |===================================== thunderx/vectorize, pre-patch ... 102.75 |===================================== A53 vectorize/LTO, pre patch .... 101.75 |===================================== A53, post patch ................. 101.88 |===================================== A53 mtune/vectorize, post-patch . 102.17 |===================================== A53/clang 3.8 ................... 102.33 |===================================== A72 vectorize ................... 102.24 |===================================== thunderx mtune .................. 103.09 |===================================== A53 vectorize, updated .......... 102.72 |===================================== Tachyon 0.98.9 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ........ 69.27 |=================================== thunderx/vectorize, pre-patch ... 71.41 |==================================== A53 vectorize/LTO, pre patch .... 67.64 |================================== A53, post patch ................. 69.40 |=================================== A53 mtune/vectorize, post-patch . 69.34 |=================================== A53/clang 3.8 ................... 75.96 |====================================== A72 vectorize ................... 71.96 |==================================== thunderx mtune .................. 70.91 |=================================== A53 vectorize, updated .......... 69.90 |=================================== OpenSSL 1.0.1g RSA 4096-bit Performance Signs Per Second > Higher Is Better A53 vectorize, pre-patch ........ 21.50 |====================================== thunderx/vectorize, pre-patch ... 21.50 |====================================== A53 vectorize/LTO, pre patch .... 21.50 |====================================== A53, post patch ................. 21.50 |====================================== A53 mtune/vectorize, post-patch . 21.50 |====================================== A53/clang 3.8 ................... 20.70 |===================================== A72 vectorize ................... 21.47 |====================================== thunderx mtune .................. 21.40 |====================================== A53 vectorize, updated .......... 21.40 |====================================== Redis 3.0.1 Test: GET Requests Per Second > Higher Is Better A53 vectorize, pre-patch ........ 310344.73 |================================= thunderx/vectorize, pre-patch ... 318926.02 |================================== A53 vectorize/LTO, pre patch .... 311785.02 |================================= A53, post patch ................. 309030.64 |================================= A53 mtune/vectorize, post-patch . 313438.91 |================================= A72 vectorize ................... 314705.37 |================================== thunderx mtune .................. 277191.13 |============================== A53 vectorize, updated .......... 277268.23 |==============================