Cortex A53 GCC7 codegen comparison Benchmarking the effect of d8c4c75 ARM patch A53 vectorize, pre-patch: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 thunderx/vectorize, pre-patch: Processor: AArch64 rev 4 @ 1.50GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53 vectorize/LTO, pre patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170110 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53, post patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 A53 mtune/vectorize, post-patch: Processor: AArch64 rev 4 @ 1.55GHz (4 Cores), Motherboard: Amlogic, Memory: 2048MB, Disk: 32GB 00000 + 16GB NCard OS: Ubuntu 16.04, Kernel: 3.14.29 (aarch64), Compiler: GCC 7.0.0 20170113 + Clang 3.8.0-2ubuntu4 + LLVM 3.8.0, File-System: ext4, Screen Resolution: 1920x3240 PostMark 1.51 Disk Transaction Performance TPS > Higher Is Better A53 vectorize, pre-patch ........ 1363 |====================================== thunderx/vectorize, pre-patch ... 1351 |====================================== A53 vectorize/LTO, pre patch .... 1378 |======================================= A53, post patch ................. 1381 |======================================= A53 mtune/vectorize, post-patch . 1378 |======================================= RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Integer MB/s > Higher Is Better A53 vectorize, pre-patch ........ 4581.32 |================================= thunderx/vectorize, pre-patch ... 2821.43 |==================== A53 vectorize/LTO, pre patch .... 4829.91 |=================================== A53, post patch ................. 4965.06 |==================================== A53 mtune/vectorize, post-patch . 4955.97 |==================================== RAMspeed SMP 3.5.0 Type: Copy - Benchmark: Floating Point MB/s > Higher Is Better A53 vectorize, pre-patch ........ 4580.39 |================================= thunderx/vectorize, pre-patch ... 2817.45 |==================== A53 vectorize/LTO, pre patch .... 4825.13 |=================================== A53, post patch ................. 4964.66 |==================================== A53 mtune/vectorize, post-patch . 4965.60 |==================================== FFTW 3.3.4 Build: Stock - Size: 2D FFT Size 2048 Mflops > Higher Is Better A53 vectorize, pre-patch ........ 196.90 |===================================== thunderx/vectorize, pre-patch ... 190.63 |==================================== A53 vectorize/LTO, pre patch .... 180.53 |================================== A53, post patch ................. 186.21 |=================================== A53 mtune/vectorize, post-patch . 184.81 |=================================== Timed MAFFT Alignment 6.864 Multiple Sequence Alignment Seconds < Lower Is Better A53 vectorize, pre-patch ........ 35.42 |====================================== thunderx/vectorize, pre-patch ... 34.46 |===================================== A53 vectorize/LTO, pre patch .... 33.16 |==================================== A53, post patch ................. 33.06 |=================================== A53 mtune/vectorize, post-patch . 32.17 |=================================== GMPbench 0.2 Total Time GMPbench Score > Higher Is Better A53 vectorize, pre-patch ........ 552.84 |===================================== thunderx/vectorize, pre-patch ... 554.83 |===================================== A53 vectorize/LTO, pre patch .... 554.37 |===================================== A53, post patch ................. 552.56 |===================================== A53 mtune/vectorize, post-patch . 555.10 |===================================== Fhourstones 3.1 Complex Connect-4 Solving Kpos / sec > Higher Is Better A53 vectorize, pre-patch ........ 3212.10 |==================================== thunderx/vectorize, pre-patch ... 3210.20 |==================================== A53 vectorize/LTO, pre patch .... 3213.77 |==================================== A53, post patch ................. 3209.67 |==================================== A53 mtune/vectorize, post-patch . 3205.40 |==================================== TTSIOD 3D Renderer 2.3a Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better A53 vectorize, pre-patch ........ 23.16 |===================================== thunderx/vectorize, pre-patch ... 23.01 |===================================== A53 vectorize/LTO, pre patch .... 23.77 |====================================== A53, post patch ................. 23.47 |====================================== A53 mtune/vectorize, post-patch . 23.49 |====================================== C-Ray 1.1 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ........ 187.97 |===================================== thunderx/vectorize, pre-patch ... 149.82 |============================= A53 vectorize/LTO, pre patch .... 184.81 |==================================== A53, post patch ................. 186.69 |===================================== A53 mtune/vectorize, post-patch . 186.61 |===================================== Primesieve 5.4.2 1e12 Prime Number Generation Seconds < Lower Is Better A53 vectorize, pre-patch ........ 543.16 |=================================== thunderx/vectorize, pre-patch ... 566.21 |===================================== A53 vectorize/LTO, pre patch .... 540.95 |=================================== A53, post patch ................. 553.13 |==================================== A53 mtune/vectorize, post-patch . 573.13 |===================================== Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better A53 vectorize, pre-patch ........ 167 |======================================== thunderx/vectorize, pre-patch ... 167 |======================================== A53 vectorize/LTO, pre patch .... 168 |======================================== A53, post patch ................. 168 |======================================== A53 mtune/vectorize, post-patch . 167 |======================================== Sudokut 0.4 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ........ 101.95 |===================================== thunderx/vectorize, pre-patch ... 102.75 |===================================== A53 vectorize/LTO, pre patch .... 101.75 |===================================== A53, post patch ................. 101.88 |===================================== A53 mtune/vectorize, post-patch . 102.17 |===================================== Tachyon 0.98.9 Total Time Seconds < Lower Is Better A53 vectorize, pre-patch ........ 69.27 |===================================== thunderx/vectorize, pre-patch ... 71.41 |====================================== A53 vectorize/LTO, pre patch .... 67.64 |==================================== A53, post patch ................. 69.40 |===================================== A53 mtune/vectorize, post-patch . 69.34 |===================================== OpenSSL 1.0.1g RSA 4096-bit Performance Signs Per Second > Higher Is Better A53 vectorize, pre-patch ........ 21.50 |====================================== thunderx/vectorize, pre-patch ... 21.50 |====================================== A53 vectorize/LTO, pre patch .... 21.50 |====================================== A53, post patch ................. 21.50 |====================================== A53 mtune/vectorize, post-patch . 21.50 |====================================== Redis 3.0.1 Test: GET Requests Per Second > Higher Is Better A53 vectorize, pre-patch ........ 310344.73 |================================= thunderx/vectorize, pre-patch ... 318926.02 |================================== A53 vectorize/LTO, pre patch .... 311785.02 |================================= A53, post patch ................. 309030.64 |================================= A53 mtune/vectorize, post-patch . 313438.91 |=================================