POWER9 Talos II Compiler Benchmarks POWER9 compiler benchmarking for a future article on Phoronix. GCC 8.2.0: Processor: POWER9 @ 3.80GHz (44 Cores / 176 Threads), Motherboard: PowerNV T2P9D01 REV 1.01, Memory: 65536MB, Disk: Samsung SSD 960 EVO 500GB + 2000GB Seagate ST2000DM006-2DM1, Graphics: ASPEED Family, Network: 2 x Broadcom NetXtreme BCM5719 PCIe OS: Ubuntu 19.04, Kernel: 4.18.0-11-generic (ppc64le), Compiler: GCC 8.2.0 + clang (GCC) 8.2.0, File-System: ext4, Screen Resolution: 1024x768 GCC 9.0.1: Processor: POWER9 @ 3.80GHz (44 Cores / 176 Threads), Motherboard: PowerNV T2P9D01 REV 1.01, Memory: 65536MB, Disk: Samsung SSD 960 EVO 500GB + 2000GB Seagate ST2000DM006-2DM1, Graphics: ASPEED Family, Network: 2 x Broadcom NetXtreme BCM5719 PCIe OS: Ubuntu 19.04, Kernel: 4.18.0-11-generic (ppc64le), Compiler: GCC 9.0.1 20190203 + clang (GCC) 9.0.1 20190203 (experimental), File-System: ext4, Screen Resolution: 1024x768 Clang 7.0.1: Processor: POWER9 @ 3.80GHz (44 Cores / 176 Threads), Motherboard: PowerNV T2P9D01 REV 1.01, Memory: 65536MB, Disk: Samsung SSD 960 EVO 500GB + 2000GB Seagate ST2000DM006-2DM1, Graphics: ASPEED Family, Network: 2 x Broadcom NetXtreme BCM5719 PCIe OS: Ubuntu 19.04, Kernel: 4.18.0-11-generic (ppc64le), Compiler: Clang 7.0.1 + LLVM 7.0.1, File-System: ext4, Screen Resolution: 1024x768 Clang 8.0.0-rc: Processor: POWER9 @ 3.80GHz (44 Cores / 176 Threads), Motherboard: PowerNV T2P9D01 REV 1.01, Memory: 65536MB, Disk: Samsung SSD 960 EVO 500GB + 2000GB Seagate ST2000DM006-2DM1, Graphics: ASPEED Family, Network: 2 x Broadcom NetXtreme BCM5719 PCIe OS: Ubuntu 19.04, Kernel: 4.18.0-11-generic (ppc64le), Compiler: Clang 8.0.0 + LLVM 8.0.0, File-System: ext4, Screen Resolution: 1024x768 t-test1 2017-01-13 Threads: 1 Seconds < Lower Is Better GCC 8.2.0 ...... 17.71 |====================================================== GCC 9.0.1 ...... 17.80 |======================================================= Clang 7.0.1 .... 17.92 |======================================================= Clang 8.0.0-rc . 17.83 |======================================================= t-test1 2017-01-13 Threads: 2 Seconds < Lower Is Better GCC 8.2.0 ...... 6.65 |==================================================== GCC 9.0.1 ...... 6.99 |======================================================= Clang 7.0.1 .... 6.88 |====================================================== Clang 8.0.0-rc . 7.18 |======================================================== lzbench 2017-08-08 Test: XZ 0 - Process: Compression MB/s > Higher Is Better GCC 8.2.0 . 21 |============================================================ GCC 9.0.1 . 22 |=============================================================== lzbench 2017-08-08 Test: XZ 0 - Process: Decompression MB/s > Higher Is Better GCC 8.2.0 . 70 |============================================================= GCC 9.0.1 . 72 |=============================================================== lzbench 2017-08-08 Test: Zstd 1 - Process: Compression MB/s > Higher Is Better GCC 8.2.0 . 286 |============================================================== GCC 8.2.0 . 285 |============================================================== GCC 9.0.1 . 280 |============================================================= lzbench 2017-08-08 Test: Zstd 1 - Process: Decompression MB/s > Higher Is Better GCC 8.2.0 . 800 |============================================================= GCC 8.2.0 . 790 |============================================================ GCC 9.0.1 . 813 |============================================================== lzbench 2017-08-08 Test: Brotli 0 - Process: Compression MB/s > Higher Is Better GCC 8.2.0 . 322 |============================================================== GCC 9.0.1 . 317 |============================================================= lzbench 2017-08-08 Test: Brotli 0 - Process: Decompression MB/s > Higher Is Better GCC 8.2.0 . 292 |============================================================= GCC 9.0.1 . 299 |============================================================== lzbench 2017-08-08 Test: Libdeflate 1 - Process: Compression MB/s > Higher Is Better GCC 8.2.0 . 115 |============================================================== GCC 9.0.1 . 115 |============================================================== lzbench 2017-08-08 Test: Libdeflate 1 - Process: Decompression MB/s > Higher Is Better GCC 8.2.0 . 353 |============================================================= GCC 9.0.1 . 356 |============================================================== Timed MAFFT Alignment 7.392 Multiple Sequence Alignment Seconds < Lower Is Better GCC 8.2.0 ...... 4.51 |======================================================== GCC 9.0.1 ...... 3.79 |=============================================== Clang 7.0.1 .... 4.39 |======================================================= Clang 8.0.0-rc . 4.24 |===================================================== CacheBench Test: Read MB/s > Higher Is Better GCC 8.2.0 ...... 4782 |======================================================= GCC 9.0.1 ...... 4897 |======================================================== Clang 7.0.1 .... 4750 |====================================================== Clang 8.0.0-rc . 4750 |====================================================== CacheBench Test: Write MB/s > Higher Is Better GCC 8.2.0 ...... 11925 |=============== GCC 9.0.1 ...... 11147 |============== Clang 7.0.1 .... 44459 |======================================================= Clang 8.0.0-rc . 44754 |======================================================= CacheBench Test: Read / Modify / Write MB/s > Higher Is Better GCC 8.2.0 ...... 21154 |==================== GCC 9.0.1 ...... 19793 |================== Clang 7.0.1 .... 58691 |======================================================= Clang 8.0.0-rc . 58871 |======================================================= SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better GCC 8.2.0 ...... 1255 |============================================ GCC 9.0.1 ...... 1173 |========================================= Clang 7.0.1 .... 1585 |======================================================== Clang 8.0.0-rc . 1593 |======================================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better GCC 8.2.0 ...... 161 |======================== GCC 9.0.1 ...... 153 |======================= Clang 7.0.1 .... 348 |==================================================== Clang 8.0.0-rc . 378 |========================================================= SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better GCC 8.2.0 ...... 307 |========================================================= GCC 9.0.1 ...... 308 |========================================================= Clang 7.0.1 .... 305 |======================================================== Clang 8.0.0-rc . 305 |======================================================== SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better GCC 8.2.0 ...... 1119 |================================== GCC 9.0.1 ...... 1116 |================================== Clang 7.0.1 .... 1724 |===================================================== Clang 8.0.0-rc . 1829 |======================================================== SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better GCC 8.2.0 ...... 3441 |========================================== GCC 9.0.1 ...... 3042 |===================================== Clang 7.0.1 .... 4605 |======================================================== Clang 8.0.0-rc . 4509 |======================================================= SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better GCC 8.2.0 ...... 1248 |======================================================== GCC 9.0.1 ...... 1248 |======================================================== Clang 7.0.1 .... 940 |========================================== Clang 8.0.0-rc . 944 |========================================== TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better GCC 8.2.0 ...... 714368 |============================================= GCC 9.0.1 ...... 740293 |=============================================== Clang 7.0.1 .... 835778 |===================================================== Clang 8.0.0-rc . 849966 |====================================================== x264 2018-09-25 H.264 Video Encoding Frames Per Second > Higher Is Better GCC 8.2.0 ...... 53.75 |======================================================= GCC 9.0.1 ...... 52.15 |===================================================== Clang 7.0.1 .... 52.34 |====================================================== Clang 8.0.0-rc . 51.98 |===================================================== x265 3.0 H.265 1080p Video Encoding Frames Per Second > Higher Is Better GCC 8.2.0 . 11.30 |============================================================ GCC 9.0.1 . 11.25 |============================================================ GraphicsMagick 1.3.30 Operation: Swirl Iterations Per Minute > Higher Is Better GCC 8.2.0 ...... 164 |========================================================= GCC 9.0.1 ...... 165 |========================================================= Clang 7.0.1 .... 43 |=============== Clang 8.0.0-rc . 43 |=============== GraphicsMagick 1.3.30 Operation: Rotate Iterations Per Minute > Higher Is Better GCC 8.2.0 ...... 197 |========================================================= GCC 9.0.1 ...... 197 |========================================================= Clang 7.0.1 .... 159 |============================================== Clang 8.0.0-rc . 159 |============================================== GraphicsMagick 1.3.30 Operation: Sharpen Iterations Per Minute > Higher Is Better GCC 8.2.0 ...... 147 |========================================================= GCC 9.0.1 ...... 148 |========================================================= Clang 7.0.1 .... 20 |======== Clang 8.0.0-rc . 19 |======= GraphicsMagick 1.3.30 Operation: Enhanced Iterations Per Minute > Higher Is Better GCC 8.2.0 ...... 154 |========================================================= GCC 9.0.1 ...... 154 |========================================================= Clang 7.0.1 .... 23 |========= Clang 8.0.0-rc . 22 |======== GraphicsMagick 1.3.30 Operation: Resizing Iterations Per Minute > Higher Is Better GCC 8.2.0 ...... 178 |========================================================= GCC 9.0.1 ...... 177 |========================================================= Clang 7.0.1 .... 89 |============================= Clang 8.0.0-rc . 88 |============================ GraphicsMagick 1.3.30 Operation: Noise-Gaussian Iterations Per Minute > Higher Is Better GCC 8.2.0 ...... 159 |========================================================= GCC 9.0.1 ...... 157 |======================================================== Clang 7.0.1 .... 15 |===== Clang 8.0.0-rc . 15 |===== GraphicsMagick 1.3.30 Operation: HWB Color Space Iterations Per Minute > Higher Is Better GCC 8.2.0 ...... 183 |======================================================== GCC 9.0.1 ...... 185 |========================================================= Clang 7.0.1 .... 118 |==================================== Clang 8.0.0-rc . 118 |==================================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better GCC 8.2.0 ...... 618 |====================================================== GCC 9.0.1 ...... 656 |========================================================= Clang 7.0.1 .... 594 |==================================================== Clang 8.0.0-rc . 623 |====================================================== 7-Zip Compression 16.02 Compress Speed Test MIPS > Higher Is Better GCC 8.2.0 . 157870 |=========================================================== GCC 9.0.1 . 153692 |========================================================= ebizzy 0.3 Records/s > Higher Is Better GCC 8.2.0 ...... 1117793 |=================================================== GCC 9.0.1 ...... 1130592 |==================================================== Clang 7.0.1 .... 1152232 |===================================================== Clang 8.0.0-rc . 1071048 |================================================= C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Seconds < Lower Is Better GCC 8.2.0 ...... 17.84 |================= GCC 9.0.1 ...... 17.87 |================= Clang 7.0.1 .... 56.23 |======================================================= Clang 8.0.0-rc . 56.53 |======================================================= Parallel BZIP2 Compression 1.1.12 256MB File Compression Seconds < Lower Is Better GCC 8.2.0 . 2.34 |============================================================= GCC 9.0.1 . 2.26 |=========================================================== AOBench Size: 2048 x 2048 - Total Time Seconds < Lower Is Better GCC 8.2.0 ...... 58.83 |=========================================== GCC 9.0.1 ...... 59.03 |=========================================== Clang 7.0.1 .... 74.95 |======================================================= Clang 8.0.0-rc . 75.60 |======================================================= Bullet Physics Engine 2.81 Test: Raytests Seconds < Lower Is Better GCC 8.2.0 ...... 4.91 |=================================================== GCC 9.0.1 ...... 4.91 |=================================================== Clang 7.0.1 .... 5.38 |======================================================== Clang 8.0.0-rc . 5.41 |======================================================== Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better GCC 8.2.0 ...... 6.28 |=================================================== GCC 9.0.1 ...... 6.26 |=================================================== Clang 7.0.1 .... 6.84 |======================================================== Clang 8.0.0-rc . 6.85 |======================================================== Bullet Physics Engine 2.81 Test: 1000 Stack Seconds < Lower Is Better GCC 8.2.0 ...... 7.69 |==================================================== GCC 9.0.1 ...... 7.55 |=================================================== Clang 7.0.1 .... 8.32 |======================================================== Clang 8.0.0-rc . 8.35 |======================================================== Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better GCC 8.2.0 ...... 8.22 |===================================================== GCC 9.0.1 ...... 8.45 |====================================================== Clang 7.0.1 .... 8.73 |======================================================== Clang 8.0.0-rc . 8.66 |======================================================== Bullet Physics Engine 2.81 Test: 136 Ragdolls Seconds < Lower Is Better GCC 8.2.0 ...... 4.15 |================================================ GCC 9.0.1 ...... 4.13 |================================================ Clang 7.0.1 .... 4.84 |======================================================== Clang 8.0.0-rc . 4.82 |======================================================== Bullet Physics Engine 2.81 Test: Prim Trimesh Seconds < Lower Is Better GCC 8.2.0 ...... 1.54 |=================================================== GCC 9.0.1 ...... 1.54 |=================================================== Clang 7.0.1 .... 1.70 |======================================================== Clang 8.0.0-rc . 1.69 |======================================================== Bullet Physics Engine 2.81 Test: Convex Trimesh Seconds < Lower Is Better GCC 8.2.0 ...... 1.95 |==================================================== GCC 9.0.1 ...... 1.98 |==================================================== Clang 7.0.1 .... 2.12 |======================================================== Clang 8.0.0-rc . 2.11 |======================================================== Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 Seconds < Lower Is Better GCC 8.2.0 ...... 11.37 |====================================================== GCC 9.0.1 ...... 11.59 |======================================================= Clang 7.0.1 .... 11.04 |==================================================== Clang 8.0.0-rc . 11.33 |====================================================== dav1d 0.1 Video Input: Summer Nature 4K Seconds < Lower Is Better GCC 8.2.0 ...... 93.23 |================================================= GCC 9.0.1 ...... 84.58 |============================================ Clang 7.0.1 .... 101.06 |===================================================== Clang 8.0.0-rc . 103.17 |====================================================== dav1d 0.1 Video Input: Summer Nature 1080p Seconds < Lower Is Better GCC 8.2.0 ...... 28.93 |================================================= GCC 9.0.1 ...... 27.69 |=============================================== Clang 7.0.1 .... 31.38 |====================================================== Clang 8.0.0-rc . 32.19 |======================================================= FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better GCC 8.2.0 ...... 44.39 |=================================================== GCC 9.0.1 ...... 39.96 |============================================== Clang 7.0.1 .... 47.71 |======================================================= Clang 8.0.0-rc . 47.84 |======================================================= LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better GCC 8.2.0 ...... 15.73 |============================================== GCC 9.0.1 ...... 15.73 |============================================== Clang 7.0.1 .... 18.61 |====================================================== Clang 8.0.0-rc . 18.86 |======================================================= m-queens 1.2 Time To Solve Seconds < Lower Is Better GCC 8.2.0 . 20.36 |============================================================ GCC 9.0.1 . 20.35 |============================================================ OpenSSL 1.1.1 RSA 4096-bit Performance Signs Per Second > Higher Is Better GCC 8.2.0 ...... 7514 |======================================================== GCC 9.0.1 ...... 7407 |======================================================= Clang 7.0.1 .... 6958 |==================================================== Clang 8.0.0-rc . 7063 |===================================================== libjpeg-turbo tjbench 1.5.3 Test: Decompression Throughput Megapixels/sec > Higher Is Better GCC 8.2.0 ...... 106 |======================================================= GCC 9.0.1 ...... 106 |======================================================= Clang 7.0.1 .... 109 |========================================================= Clang 8.0.0-rc . 107 |======================================================== CppPerformanceBenchmarks 9 Test: Atol Seconds < Lower Is Better GCC 8.2.0 ...... 89.17 |======================================================= GCC 9.0.1 ...... 89.24 |======================================================= Clang 7.0.1 .... 89.75 |======================================================= Clang 8.0.0-rc . 89.49 |======================================================= CppPerformanceBenchmarks 9 Test: Ctype Seconds < Lower Is Better GCC 8.2.0 ...... 75.02 |======================================================= GCC 9.0.1 ...... 74.86 |======================================================= Clang 7.0.1 .... 66.25 |================================================= Clang 8.0.0-rc . 66.34 |================================================= CppPerformanceBenchmarks 9 Test: Stepanov Vector Seconds < Lower Is Better GCC 8.2.0 ...... 142 |========================================================= GCC 9.0.1 ...... 140 |======================================================== Clang 7.0.1 .... 131 |===================================================== Clang 8.0.0-rc . 132 |===================================================== CppPerformanceBenchmarks 9 Test: Function Objects Seconds < Lower Is Better GCC 8.2.0 ...... 20.00 |======================================================= GCC 9.0.1 ...... 19.70 |====================================================== Clang 7.0.1 .... 19.53 |====================================================== Clang 8.0.0-rc . 19.50 |====================================================== CppPerformanceBenchmarks 9 Test: Stepanov Abstraction Seconds < Lower Is Better GCC 8.2.0 ...... 59.98 |======================================================= GCC 9.0.1 ...... 56.89 |==================================================== Clang 7.0.1 .... 53.04 |================================================= Clang 8.0.0-rc . 53.24 |================================================= Redis 4.0.8 Test: LPOP Requests Per Second > Higher Is Better GCC 8.2.0 ...... 1685935 |================================================== GCC 9.0.1 ...... 1714327 |================================================== Clang 7.0.1 .... 1804132 |===================================================== Clang 8.0.0-rc . 1729936 |=================================================== Redis 4.0.8 Test: SADD Requests Per Second > Higher Is Better GCC 8.2.0 ...... 1217931 |=============================================== GCC 9.0.1 ...... 1298341 |================================================== Clang 7.0.1 .... 1376795 |===================================================== Clang 8.0.0-rc . 1211944 |=============================================== Redis 4.0.8 Test: LPUSH Requests Per Second > Higher Is Better GCC 8.2.0 ...... 791788 |================================================= GCC 9.0.1 ...... 820878 |=================================================== Clang 7.0.1 .... 867234 |====================================================== Clang 8.0.0-rc . 836662 |==================================================== Redis 4.0.8 Test: GET Requests Per Second > Higher Is Better GCC 8.2.0 ...... 1604180 |================================================= GCC 9.0.1 ...... 1618223 |================================================== Clang 7.0.1 .... 1721346 |===================================================== Clang 8.0.0-rc . 1666854 |=================================================== Redis 4.0.8 Test: SET Requests Per Second > Higher Is Better GCC 8.2.0 ...... 1017131 |================================================ GCC 9.0.1 ...... 1096671 |==================================================== Clang 7.0.1 .... 1113384 |===================================================== Clang 8.0.0-rc . 1033115 |================================================= Xsbench 2017-07-06 Lookups/s > Higher Is Better GCC 8.2.0 . 5560007 |========================================================== GCC 9.0.1 . 5486220 |========================================================= Memcached mcperf 1.5.10 Method: Add Operations Per Second > Higher Is Better GCC 8.2.0 ...... 49781 |======================================================= GCC 9.0.1 ...... 37853 |========================================== Clang 7.0.1 .... 49578 |====================================================== Clang 8.0.0-rc . 50110 |======================================================= Memcached mcperf 1.5.10 Method: Get Operations Per Second > Higher Is Better GCC 8.2.0 ...... 74015 |======================================================= GCC 9.0.1 ...... 56285 |========================================== Clang 7.0.1 .... 74061 |======================================================= Clang 8.0.0-rc . 74547 |======================================================= Memcached mcperf 1.5.10 Method: Set Operations Per Second > Higher Is Better GCC 8.2.0 ...... 49744 |======================================================= GCC 9.0.1 ...... 38135 |========================================== Clang 7.0.1 .... 49453 |======================================================= Clang 8.0.0-rc . 49682 |======================================================= Memcached mcperf 1.5.10 Method: Append Operations Per Second > Higher Is Better GCC 8.2.0 ...... 51756 |====================================================== GCC 9.0.1 ...... 39666 |========================================== Clang 7.0.1 .... 51981 |======================================================= Clang 8.0.0-rc . 52234 |======================================================= Memcached mcperf 1.5.10 Method: Delete Operations Per Second > Higher Is Better GCC 8.2.0 ...... 74515 |======================================================= GCC 9.0.1 ...... 56599 |========================================== Clang 7.0.1 .... 74409 |======================================================= Clang 8.0.0-rc . 73918 |======================================================= Memcached mcperf 1.5.10 Method: Prepend Operations Per Second > Higher Is Better GCC 8.2.0 ...... 51972 |====================================================== GCC 9.0.1 ...... 39442 |========================================= Clang 7.0.1 .... 52563 |====================================================== Clang 8.0.0-rc . 53049 |======================================================= Memcached mcperf 1.5.10 Method: Replace Operations Per Second > Higher Is Better GCC 8.2.0 ...... 52559 |======================================================= GCC 9.0.1 ...... 39540 |========================================= Clang 7.0.1 .... 52200 |======================================================= Clang 8.0.0-rc . 52343 |======================================================= Hierarchical INTegration 1.0 Test: FLOAT QUIPs > Higher Is Better GCC 8.2.0 ...... 220611266 |================================================= GCC 9.0.1 ...... 227407921 |=================================================== Clang 7.0.1 .... 174464419 |======================================= Clang 8.0.0-rc . 176740207 |======================================== Hierarchical INTegration 1.0 Test: DOUBLE QUIPs > Higher Is Better GCC 8.2.0 ...... 535814531 |================================================= GCC 9.0.1 ...... 552839732 |=================================================== Clang 7.0.1 .... 377718193 |=================================== Clang 8.0.0-rc . 379561299 |=================================== Apache Benchmark 2.4.29 Static Web Page Serving Requests Per Second > Higher Is Better GCC 8.2.0 ...... 21110 |====================================================== GCC 9.0.1 ...... 21008 |====================================================== Clang 7.0.1 .... 21209 |====================================================== Clang 8.0.0-rc . 21444 |=======================================================