AArch64 Compiler Benchmarks Feb 2018 Ampere eMAG ARMv8 compiler benchmarking with GCC and Clang for a future article on Phoronix. GCC 8.2.0: Processor: Ampere eMAG ARMv8 @ 3.00GHz (32 Cores), Motherboard: AmpereComputing OSPREY (4.8.19 BIOS), Chipset: Applied Micro Circuits X-Gene, Memory: 129024MB, Disk: 256GB Samsung SSD 860, Graphics: ASPEED Family, Network: Intel I210 OS: Fedora 29, Kernel: 4.20.6-200.fc29.aarch64 (aarch64) 20190131, Compiler: GCC 8.2.0, File-System: xfs, Screen Resolution: 1024x768 GCC 9.0.1: Processor: Ampere eMAG ARMv8 @ 3.00GHz (32 Cores), Motherboard: AmpereComputing OSPREY (4.8.19 BIOS), Chipset: Applied Micro Circuits X-Gene, Memory: 129024MB, Disk: 256GB Samsung SSD 860, Graphics: ASPEED Family, Network: Intel I210 OS: Fedora 29, Kernel: 4.20.6-200.fc29.aarch64 (aarch64) 20190131, Compiler: GCC 9.0.1 20190203, File-System: xfs, Screen Resolution: 1024x768 Clang 7.0.1: Processor: Ampere eMAG ARMv8 @ 3.00GHz (32 Cores), Motherboard: AmpereComputing OSPREY (4.8.19 BIOS), Chipset: Applied Micro Circuits X-Gene, Memory: 129024MB, Disk: 256GB Samsung SSD 860, Graphics: ASPEED Family, Network: Intel I210 OS: Fedora 29, Kernel: 4.20.6-200.fc29.aarch64 (aarch64) 20190131, Compiler: Clang 7.0.1 + LLVM 7.0.1, File-System: xfs, Screen Resolution: 1024x768 Clang 8.0.0-rc2: Processor: Ampere eMAG ARMv8 @ 3.00GHz (32 Cores), Motherboard: AmpereComputing OSPREY (4.8.19 BIOS), Chipset: Applied Micro Circuits X-Gene, Memory: 129024MB, Disk: 256GB Samsung SSD 860, Graphics: ASPEED Family, Network: Intel I210 OS: Fedora 29, Kernel: 4.20.6-200.fc29.aarch64 (aarch64) 20190131, Compiler: Clang 8.0.0 + LLVM 8.0.0, File-System: xfs, Screen Resolution: 1024x768 t-test1 2017-01-13 Threads: 1 Seconds < Lower Is Better GCC 8.2.0 ....... 85.77 |====================================================== GCC 9.0.1 ....... 85.50 |====================================================== Clang 7.0.1 ..... 85.48 |====================================================== Clang 8.0.0-rc2 . 85.49 |====================================================== t-test1 2017-01-13 Threads: 2 Seconds < Lower Is Better GCC 8.2.0 ....... 32.46 |====================================================== GCC 9.0.1 ....... 32.37 |====================================================== Clang 7.0.1 ..... 32.24 |====================================================== Clang 8.0.0-rc2 . 32.28 |====================================================== lzbench 2017-08-08 Test: XZ 0 - Process: Compression MB/s > Higher Is Better GCC 8.2.0 . 15 |=============================================================== GCC 9.0.1 . 15 |=============================================================== lzbench 2017-08-08 Test: XZ 0 - Process: Decompression MB/s > Higher Is Better GCC 8.2.0 . 65 |=============================================================== GCC 9.0.1 . 64 |============================================================== lzbench 2017-08-08 Test: Zstd 1 - Process: Compression MB/s > Higher Is Better GCC 8.2.0 . 155 |============================================================== GCC 9.0.1 . 154 |============================================================== lzbench 2017-08-08 Test: Zstd 1 - Process: Decompression MB/s > Higher Is Better GCC 8.2.0 . 483 |============================================================= GCC 9.0.1 . 492 |============================================================== lzbench 2017-08-08 Test: Brotli 0 - Process: Compression MB/s > Higher Is Better GCC 8.2.0 . 152 |============================================================== GCC 9.0.1 . 152 |============================================================== lzbench 2017-08-08 Test: Brotli 0 - Process: Decompression MB/s > Higher Is Better GCC 8.2.0 . 215 |============================================================== GCC 9.0.1 . 215 |============================================================== lzbench 2017-08-08 Test: Libdeflate 1 - Process: Compression MB/s > Higher Is Better GCC 8.2.0 . 64 |=============================================================== GCC 9.0.1 . 64 |=============================================================== lzbench 2017-08-08 Test: Libdeflate 1 - Process: Decompression MB/s > Higher Is Better GCC 8.2.0 . 484 |============================================================== GCC 9.0.1 . 471 |============================================================ Timed MAFFT Alignment 7.392 Multiple Sequence Alignment Seconds < Lower Is Better GCC 8.2.0 ....... 7.40 |======================================================= GCC 9.0.1 ....... 7.08 |===================================================== Clang 7.0.1 ..... 7.01 |==================================================== Clang 8.0.0-rc2 . 6.97 |==================================================== CacheBench Test: Read MB/s > Higher Is Better GCC 8.2.0 ....... 4571 |======================================================= GCC 9.0.1 ....... 4571 |======================================================= Clang 7.0.1 ..... 4571 |======================================================= Clang 8.0.0-rc2 . 4571 |======================================================= CacheBench Test: Write MB/s > Higher Is Better GCC 8.2.0 ....... 10391 |=============================== GCC 9.0.1 ....... 10376 |=============================== Clang 7.0.1 ..... 17475 |===================================================== Clang 8.0.0-rc2 . 17824 |====================================================== CacheBench Test: Read / Modify / Write MB/s > Higher Is Better GCC 8.2.0 ....... 13558 |=========================== GCC 9.0.1 ....... 13548 |=========================== Clang 7.0.1 ..... 27461 |====================================================== Clang 8.0.0-rc2 . 27462 |====================================================== SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better GCC 8.2.0 ....... 706 |======================================================== GCC 9.0.1 ....... 711 |======================================================== Clang 7.0.1 ..... 600 |=============================================== Clang 8.0.0-rc2 . 606 |================================================ SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better GCC 8.2.0 ....... 279 |======================================================== GCC 9.0.1 ....... 279 |======================================================== Clang 7.0.1 ..... 240 |================================================ Clang 8.0.0-rc2 . 240 |================================================ SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better GCC 8.2.0 ....... 176 |======================================================== GCC 9.0.1 ....... 177 |======================================================== Clang 7.0.1 ..... 163 |==================================================== Clang 8.0.0-rc2 . 168 |===================================================== SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better GCC 8.2.0 ....... 748 |======================================================= GCC 9.0.1 ....... 755 |======================================================== Clang 7.0.1 ..... 561 |========================================== Clang 8.0.0-rc2 . 568 |========================================== SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better GCC 8.2.0 ....... 1138 |====================================================== GCC 9.0.1 ....... 1156 |======================================================= Clang 7.0.1 ..... 1140 |====================================================== Clang 8.0.0-rc2 . 1158 |======================================================= SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better GCC 8.2.0 ....... 1188 |======================================================= GCC 9.0.1 ....... 1188 |======================================================= Clang 7.0.1 ..... 895 |========================================= Clang 8.0.0-rc2 . 896 |========================================= GraphicsMagick 1.3.30 Operation: Swirl Iterations Per Minute > Higher Is Better GCC 8.2.0 ....... 139 |======================================================== GCC 9.0.1 ....... 139 |======================================================== Clang 7.0.1 ..... 23 |========= Clang 8.0.0-rc2 . 23 |========= GraphicsMagick 1.3.30 Operation: Rotate Iterations Per Minute > Higher Is Better GCC 8.2.0 ....... 141 |======================================================= GCC 9.0.1 ....... 140 |====================================================== Clang 7.0.1 ..... 144 |======================================================== Clang 8.0.0-rc2 . 143 |======================================================== GraphicsMagick 1.3.30 Operation: Sharpen Iterations Per Minute > Higher Is Better GCC 8.2.0 ....... 128 |======================================================== GCC 9.0.1 ....... 128 |======================================================== Clang 7.0.1 ..... 10 |==== Clang 8.0.0-rc2 . 10 |==== GraphicsMagick 1.3.30 Operation: Enhanced Iterations Per Minute > Higher Is Better GCC 8.2.0 ....... 105 |======================================================== GCC 9.0.1 ....... 105 |======================================================== Clang 7.0.1 ..... 7 |==== Clang 8.0.0-rc2 . 7 |==== GraphicsMagick 1.3.30 Operation: Resizing Iterations Per Minute > Higher Is Better GCC 8.2.0 ....... 151 |======================================================== GCC 9.0.1 ....... 148 |======================================================= Clang 7.0.1 ..... 45 |================= Clang 8.0.0-rc2 . 46 |================= GraphicsMagick 1.3.30 Operation: Noise-Gaussian Iterations Per Minute > Higher Is Better GCC 8.2.0 ....... 101 |======================================================= GCC 9.0.1 ....... 103 |======================================================== Clang 7.0.1 ..... 7 |==== Clang 8.0.0-rc2 . 7 |==== GraphicsMagick 1.3.30 Operation: HWB Color Space Iterations Per Minute > Higher Is Better GCC 8.2.0 ....... 160 |======================================================== GCC 9.0.1 ....... 159 |======================================================== Clang 7.0.1 ..... 52 |================== Clang 8.0.0-rc2 . 52 |================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better GCC 8.2.0 ....... 806 |======================================================== GCC 9.0.1 ....... 812 |======================================================== Clang 7.0.1 ..... 582 |======================================== Clang 8.0.0-rc2 . 608 |========================================== 7-Zip Compression 16.02 Compress Speed Test MIPS > Higher Is Better GCC 8.2.0 . 43730 |========================================================== GCC 9.0.1 . 45171 |============================================================ Timed GCC Compilation 8.2 Time To Compile Seconds < Lower Is Better GCC 8.2.0 ....... 2729 |======================================== GCC 9.0.1 ....... 2719 |======================================== Clang 7.0.1 ..... 3761 |======================================================= Clang 8.0.0-rc2 . 3667 |====================================================== Timed ImageMagick Compilation 6.9.0 Time To Compile Seconds < Lower Is Better GCC 8.2.0 ....... 86.01 |================================================= GCC 9.0.1 ....... 94.42 |====================================================== Clang 7.0.1 ..... 64.65 |===================================== Clang 8.0.0-rc2 . 63.50 |==================================== Timed LLVM Compilation 6.0.1 Time To Compile Seconds < Lower Is Better GCC 8.2.0 ....... 621 |======================================================= GCC 9.0.1 ....... 638 |======================================================== Clang 7.0.1 ..... 496 |============================================ Clang 8.0.0-rc2 . 517 |============================================= C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Seconds < Lower Is Better GCC 8.2.0 ....... 78.42 |=========================== GCC 9.0.1 ....... 78.58 |=========================== Clang 7.0.1 ..... 156.13 |===================================================== Clang 8.0.0-rc2 . 156.18 |===================================================== Parallel BZIP2 Compression 1.1.12 256MB File Compression Seconds < Lower Is Better GCC 8.2.0 . 4.58 |============================================================ GCC 9.0.1 . 4.66 |============================================================= Smallpt 1.0 Global Illumination Renderer; 128 Samples Seconds < Lower Is Better GCC 8.2.0 . 16.79 |=========================================================== GCC 9.0.1 . 17.06 |============================================================ AOBench Size: 2048 x 2048 - Total Time Seconds < Lower Is Better GCC 8.2.0 ....... 109 |=========================================== GCC 9.0.1 ....... 110 |============================================ Clang 7.0.1 ..... 141 |======================================================== Clang 8.0.0-rc2 . 141 |======================================================== Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better GCC 8.2.0 ....... 19.40 |=================================================== GCC 9.0.1 ....... 19.54 |==================================================== Clang 7.0.1 ..... 20.37 |====================================================== Clang 8.0.0-rc2 . 20.15 |===================================================== Bullet Physics Engine 2.81 Test: 1000 Stack Seconds < Lower Is Better GCC 8.2.0 ....... 23.81 |=================================================== GCC 9.0.1 ....... 23.81 |=================================================== Clang 7.0.1 ..... 25.03 |====================================================== Clang 8.0.0-rc2 . 25.04 |====================================================== Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better GCC 8.2.0 ....... 28.15 |==================================================== GCC 9.0.1 ....... 28.39 |===================================================== Clang 7.0.1 ..... 29.10 |====================================================== Clang 8.0.0-rc2 . 29.05 |====================================================== Bullet Physics Engine 2.81 Test: 136 Ragdolls Seconds < Lower Is Better GCC 8.2.0 ....... 11.43 |=================================================== GCC 9.0.1 ....... 11.36 |================================================== Clang 7.0.1 ..... 12.16 |====================================================== Clang 8.0.0-rc2 . 12.12 |====================================================== Bullet Physics Engine 2.81 Test: Prim Trimesh Seconds < Lower Is Better GCC 8.2.0 ....... 3.81 |===================================================== GCC 9.0.1 ....... 3.81 |===================================================== Clang 7.0.1 ..... 3.96 |======================================================= Clang 8.0.0-rc2 . 3.97 |======================================================= Bullet Physics Engine 2.81 Test: Convex Trimesh Seconds < Lower Is Better GCC 8.2.0 ....... 5.64 |===================================================== GCC 9.0.1 ....... 5.64 |===================================================== Clang 7.0.1 ..... 5.84 |======================================================= Clang 8.0.0-rc2 . 5.83 |======================================================= XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 Seconds < Lower Is Better GCC 8.2.0 ....... 173 |==================================================== GCC 9.0.1 ....... 179 |====================================================== Clang 7.0.1 ..... 184 |======================================================== Clang 8.0.0-rc2 . 185 |======================================================== Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 Seconds < Lower Is Better GCC 8.2.0 ....... 22.41 |=================================================== GCC 9.0.1 ....... 22.63 |=================================================== Clang 7.0.1 ..... 23.81 |====================================================== Clang 8.0.0-rc2 . 23.96 |====================================================== dav1d 0.1 Video Input: Summer Nature 4K Seconds < Lower Is Better GCC 8.2.0 ....... 184 |===================================================== GCC 9.0.1 ....... 181 |===================================================== Clang 7.0.1 ..... 193 |======================================================== Clang 8.0.0-rc2 . 192 |======================================================== dav1d 0.1 Video Input: Summer Nature 1080p Seconds < Lower Is Better GCC 8.2.0 ....... 68.73 |===================================================== GCC 9.0.1 ....... 68.23 |===================================================== Clang 7.0.1 ..... 69.81 |====================================================== Clang 8.0.0-rc2 . 69.89 |====================================================== FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better GCC 8.2.0 ....... 51.04 |============================================= GCC 9.0.1 ....... 50.37 |============================================= Clang 7.0.1 ..... 60.27 |===================================================== Clang 8.0.0-rc2 . 60.89 |====================================================== LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better GCC 8.2.0 ....... 29.38 |=============================================== GCC 9.0.1 ....... 30.46 |================================================= Clang 7.0.1 ..... 30.57 |================================================= Clang 8.0.0-rc2 . 33.67 |====================================================== FFmpeg 4.0.2 H.264 HD To NTSC DV Seconds < Lower Is Better GCC 8.2.0 ....... 36.63 |====================================================== GCC 9.0.1 ....... 36.38 |====================================================== Clang 7.0.1 ..... 36.19 |===================================================== Clang 8.0.0-rc2 . 35.91 |===================================================== OpenSSL 1.1.1 RSA 4096-bit Performance Signs Per Second > Higher Is Better GCC 8.2.0 ....... 2364 |======================================================= GCC 9.0.1 ....... 2368 |======================================================= Clang 7.0.1 ..... 2333 |====================================================== Clang 8.0.0-rc2 . 2336 |====================================================== libjpeg-turbo tjbench 1.5.3 Test: Decompression Throughput Megapixels/sec > Higher Is Better GCC 8.2.0 ....... 62.27 |================================================== GCC 9.0.1 ....... 67.70 |====================================================== Clang 7.0.1 ..... 67.19 |====================================================== Clang 8.0.0-rc2 . 67.80 |====================================================== CppPerformanceBenchmarks 9 Test: Atol Seconds < Lower Is Better GCC 8.2.0 ....... 143 |======================================================== GCC 9.0.1 ....... 142 |======================================================== Clang 7.0.1 ..... 143 |======================================================== Clang 8.0.0-rc2 . 142 |======================================================== CppPerformanceBenchmarks 9 Test: Ctype Seconds < Lower Is Better GCC 8.2.0 ....... 85.70 |====================================================== GCC 9.0.1 ....... 84.93 |====================================================== Clang 7.0.1 ..... 83.06 |==================================================== Clang 8.0.0-rc2 . 82.15 |==================================================== CppPerformanceBenchmarks 9 Test: Stepanov Vector Seconds < Lower Is Better GCC 8.2.0 ....... 170 |======================================================== GCC 9.0.1 ....... 170 |======================================================== Clang 7.0.1 ..... 166 |======================================================= Clang 8.0.0-rc2 . 164 |====================================================== CppPerformanceBenchmarks 9 Test: Function Objects Seconds < Lower Is Better GCC 8.2.0 ....... 28.97 |====================================================== GCC 9.0.1 ....... 29.02 |====================================================== Clang 7.0.1 ..... 28.89 |====================================================== Clang 8.0.0-rc2 . 29.10 |====================================================== CppPerformanceBenchmarks 9 Test: Stepanov Abstraction Seconds < Lower Is Better GCC 8.2.0 ....... 65.48 |====================================================== GCC 9.0.1 ....... 65.62 |====================================================== Clang 7.0.1 ..... 64.43 |===================================================== Clang 8.0.0-rc2 . 64.52 |===================================================== Redis 4.0.8 Test: LPOP Requests Per Second > Higher Is Better GCC 8.2.0 ....... 303369 |============================================== GCC 9.0.1 ....... 335430 |=================================================== Clang 7.0.1 ..... 326052 |================================================== Clang 8.0.0-rc2 . 347380 |===================================================== Redis 4.0.8 Test: SADD Requests Per Second > Higher Is Better GCC 8.2.0 ....... 312119 |==================================================== GCC 9.0.1 ....... 316431 |===================================================== Clang 7.0.1 ..... 297902 |================================================== Clang 8.0.0-rc2 . 296772 |================================================== Redis 4.0.8 Test: LPUSH Requests Per Second > Higher Is Better GCC 8.2.0 ....... 234390 |===================================================== GCC 9.0.1 ....... 222818 |================================================== Clang 7.0.1 ..... 229757 |==================================================== Clang 8.0.0-rc2 . 227467 |=================================================== Redis 4.0.8 Test: GET Requests Per Second > Higher Is Better GCC 8.2.0 ....... 335269 |==================================================== GCC 9.0.1 ....... 327054 |================================================== Clang 7.0.1 ..... 343826 |===================================================== Clang 8.0.0-rc2 . 343477 |===================================================== Redis 4.0.8 Test: SET Requests Per Second > Higher Is Better GCC 8.2.0 ....... 272504 |=================================================== GCC 9.0.1 ....... 282048 |===================================================== Clang 7.0.1 ..... 277030 |==================================================== Clang 8.0.0-rc2 . 272753 |=================================================== Sysbench 2018-07-28 Test: Memory Events Per Second > Higher Is Better GCC 8.2.0 ....... 56788453 |=================================================== GCC 9.0.1 ....... 56299123 |=================================================== Clang 7.0.1 ..... 55343486 |================================================== Clang 8.0.0-rc2 . 53144563 |================================================ Sysbench 2018-07-28 Test: CPU Events Per Second > Higher Is Better GCC 8.2.0 ....... 40086 |===================================================== GCC 9.0.1 ....... 40294 |===================================================== Clang 7.0.1 ..... 40976 |====================================================== Clang 8.0.0-rc2 . 40963 |====================================================== Xsbench 2017-07-06 Lookups/s > Higher Is Better GCC 8.2.0 . 2748697 |========================================================== GCC 9.0.1 . 2442397 |==================================================== Memcached mcperf 1.5.10 Method: Add Operations Per Second > Higher Is Better GCC 8.2.0 ....... 6809 |====================================================== GCC 9.0.1 ....... 6688 |====================================================== Clang 7.0.1 ..... 6855 |======================================================= Clang 8.0.0-rc2 . 6872 |======================================================= Memcached mcperf 1.5.10 Method: Get Operations Per Second > Higher Is Better GCC 8.2.0 ....... 17329 |====================================================== GCC 9.0.1 ....... 17128 |===================================================== Clang 7.0.1 ..... 17213 |====================================================== Clang 8.0.0-rc2 . 17233 |====================================================== Memcached mcperf 1.5.10 Method: Set Operations Per Second > Higher Is Better GCC 8.2.0 ....... 6781 |======================================================= GCC 9.0.1 ....... 6613 |===================================================== Clang 7.0.1 ..... 6843 |======================================================= Clang 8.0.0-rc2 . 6756 |====================================================== Memcached mcperf 1.5.10 Method: Append Operations Per Second > Higher Is Better GCC 8.2.0 ....... 6835 |======================================================= GCC 9.0.1 ....... 6810 |======================================================= Clang 7.0.1 ..... 6836 |======================================================= Clang 8.0.0-rc2 . 6861 |======================================================= Memcached mcperf 1.5.10 Method: Delete Operations Per Second > Higher Is Better GCC 8.2.0 ....... 16987 |===================================================== GCC 9.0.1 ....... 17311 |====================================================== Clang 7.0.1 ..... 17375 |====================================================== Clang 8.0.0-rc2 . 17263 |====================================================== Memcached mcperf 1.5.10 Method: Prepend Operations Per Second > Higher Is Better GCC 8.2.0 ....... 6957 |======================================================= GCC 9.0.1 ....... 6901 |======================================================= Clang 7.0.1 ..... 6874 |====================================================== Clang 8.0.0-rc2 . 6754 |===================================================== Memcached mcperf 1.5.10 Method: Replace Operations Per Second > Higher Is Better GCC 8.2.0 ....... 6776 |====================================================== GCC 9.0.1 ....... 6933 |======================================================= Clang 7.0.1 ..... 6919 |======================================================= Clang 8.0.0-rc2 . 6794 |====================================================== Apache Benchmark 2.4.29 Static Web Page Serving Requests Per Second > Higher Is Better GCC 8.2.0 ....... 2967 |======================================================= GCC 9.0.1 ....... 2964 |======================================================= Clang 7.0.1 ..... 2945 |======================================================= Clang 8.0.0-rc2 . 2960 |======================================================= Apache Siege 2.4.29 Concurrent Users: 200 Transactions Per Second > Higher Is Better GCC 8.2.0 ....... 23431 |====================================================== GCC 9.0.1 ....... 23156 |===================================================== Clang 7.0.1 ..... 23215 |===================================================== Clang 8.0.0-rc2 . 23508 |====================================================== Apache Siege 2.4.29 Concurrent Users: 250 Transactions Per Second > Higher Is Better GCC 8.2.0 ....... 23373 |====================================================== GCC 9.0.1 ....... 23455 |====================================================== Clang 7.0.1 ..... 23473 |====================================================== Clang 8.0.0-rc2 . 23382 |======================================================