GCC 4.9 Linux LTO Optimizations Some link-time optimization compiler benchmarks by Michael Larabel for Phoronix.com of GCC 4.8.2 and 4.9.0 RC1. The Link-time optimization results with -flto didn't turn out to be as exciting as anticipated, so here they are for this short future article on phoronix. Benchmarks from an Intel Core i7 Haswell running Ubuntu Linux. GCC 4.8.2 - Stock: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: ECS Z87H3-A2X EXTREME v1.0, Chipset: Intel 4th Gen Core DRAM, Memory: 16384MB, Disk: 120GB Samsung SSD 840, Graphics: ECS NVIDIA GeForce GTX 460 768MB (675/1804MHz), Audio: Realtek ALC1150, Monitor: Samsung SyncMaster, Network: Realtek RTL8111/8168/8411 OS: Ubuntu 14.04, Kernel: 3.13.0-22-generic (x86_64), Desktop: Unity 7.2.0, Display Server: X Server 1.15.0, Display Driver: NVIDIA 337.12, OpenGL: 4.3.0, Compiler: GCC 4.8.2, File-System: ext4, Screen Resolution: 2560x1600 GCC 4.8.2 - LTO: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: ECS Z87H3-A2X EXTREME v1.0, Chipset: Intel 4th Gen Core DRAM, Memory: 16384MB, Disk: 120GB Samsung SSD 840, Graphics: ECS NVIDIA GeForce GTX 460 768MB (675/1804MHz), Audio: Realtek ALC1150, Monitor: Samsung SyncMaster, Network: Realtek RTL8111/8168/8411 OS: Ubuntu 14.04, Kernel: 3.13.0-22-generic (x86_64), Desktop: Unity 7.2.0, Display Server: X Server 1.15.0, Display Driver: NVIDIA 337.12, OpenGL: 4.3.0, Compiler: GCC 4.8.2, File-System: ext4, Screen Resolution: 2560x1600 GCC 4.9.0 RC1 - Stock: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: ECS Z87H3-A2X EXTREME v1.0, Chipset: Intel 4th Gen Core DRAM, Memory: 16384MB, Disk: 120GB Samsung SSD 840, Graphics: ECS NVIDIA GeForce GTX 460 768MB (675/1804MHz), Audio: Realtek ALC1150, Monitor: Samsung SyncMaster, Network: Realtek RTL8111/8168/8411 OS: Ubuntu 14.04, Kernel: 3.13.0-22-generic (x86_64), Desktop: Unity 7.2.0, Display Server: X Server 1.15.0, Display Driver: NVIDIA 337.12, OpenGL: 4.3.0, Compiler: GCC 4.9.0 20140411, File-System: ext4, Screen Resolution: 2560x1600 GCC 4.9.0 RC1 - LTO: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: ECS Z87H3-A2X EXTREME v1.0, Chipset: Intel 4th Gen Core DRAM, Memory: 16384MB, Disk: 120GB Samsung SSD 840, Graphics: ECS NVIDIA GeForce GTX 460 768MB (675/1804MHz), Audio: Realtek ALC1150, Monitor: Samsung SyncMaster, Network: Realtek RTL8111/8168/8411 OS: Ubuntu 14.04, Kernel: 3.13.0-22-generic (x86_64), Desktop: Unity 7.2.0, Display Server: X Server 1.15.0, Display Driver: NVIDIA 337.12, OpenGL: 4.3.0, Compiler: GCC 4.9.0 20140411, File-System: ext4, Screen Resolution: 2560x1600 HPC Challenge 1.4.3 Test / Class: G-Ptrans GB/s > Higher Is Better GCC 4.8.2 - Stock ..... 1.26591 |============================================== GCC 4.8.2 - LTO ....... 1.26956 |============================================== GCC 4.9.0 RC1 - Stock . 1.26904 |============================================== GCC 4.9.0 RC1 - LTO ... 1.26850 |============================================== HPC Challenge 1.4.3 Test / Class: EP-STREAM Triad GB/s > Higher Is Better GCC 4.8.2 - Stock ..... 1.08895 |============================================== GCC 4.8.2 - LTO ....... 1.08612 |============================================== GCC 4.9.0 RC1 - Stock . 1.08175 |============================================== GCC 4.9.0 RC1 - LTO ... 1.08126 |============================================== HPC Challenge 1.4.3 Test / Class: G-HPL GFLOPS > Higher Is Better GCC 4.8.2 - Stock ..... 50.05 |================================================ GCC 4.8.2 - LTO ....... 49.79 |================================================ GCC 4.9.0 RC1 - Stock . 50.08 |================================================ GCC 4.9.0 RC1 - LTO ... 49.96 |================================================ HPC Challenge 1.4.3 Test / Class: G-Ffte GFLOPS > Higher Is Better GCC 4.8.2 - Stock ..... 2.11922 |============================================== GCC 4.8.2 - LTO ....... 2.12967 |============================================== GCC 4.9.0 RC1 - Stock . 2.11338 |============================================== GCC 4.9.0 RC1 - LTO ... 2.13555 |============================================== HPC Challenge 1.4.3 Test / Class: EP-DGEMM GFLOPS > Higher Is Better GCC 4.8.2 - Stock ..... 6.57694 |============================================= GCC 4.8.2 - LTO ....... 6.63749 |============================================= GCC 4.9.0 RC1 - Stock . 6.72611 |============================================== GCC 4.9.0 RC1 - LTO ... 6.63834 |============================================= HPC Challenge 1.4.3 Test / Class: G-Random Access GUP/s > Higher Is Better GCC 4.8.2 - Stock ..... 0.00979 |============================================== GCC 4.8.2 - LTO ....... 0.00980 |============================================== GCC 4.9.0 RC1 - Stock . 0.00980 |============================================== GCC 4.9.0 RC1 - LTO ... 0.00989 |============================================== GraphicsMagick 1.3.19 Operation: Blur Iterations Per Minute > Higher Is Better GCC 4.8.2 - Stock ..... 170 |================================================== GCC 4.8.2 - LTO ....... 171 |================================================== GCC 4.9.0 RC1 - Stock . 166 |================================================= GCC 4.9.0 RC1 - LTO ... 165 |================================================ GraphicsMagick 1.3.19 Operation: Sharpen Iterations Per Minute > Higher Is Better GCC 4.8.2 - Stock ..... 140 |================================================== GCC 4.8.2 - LTO ....... 141 |================================================== GCC 4.9.0 RC1 - Stock . 140 |================================================== GCC 4.9.0 RC1 - LTO ... 140 |================================================== GraphicsMagick 1.3.19 Operation: Resizing Iterations Per Minute > Higher Is Better GCC 4.8.2 - Stock ..... 198 |================================================== GCC 4.8.2 - LTO ....... 199 |================================================== GCC 4.9.0 RC1 - Stock . 199 |================================================== GCC 4.9.0 RC1 - LTO ... 198 |================================================== GraphicsMagick 1.3.19 Operation: HWB Color Space Iterations Per Minute > Higher Is Better GCC 4.8.2 - Stock ..... 216 |================================================== GCC 4.8.2 - LTO ....... 218 |================================================== GCC 4.9.0 RC1 - Stock . 214 |================================================= GCC 4.9.0 RC1 - LTO ... 213 |================================================= GraphicsMagick 1.3.19 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better GCC 4.8.2 - Stock ..... 97 |================================================ GCC 4.8.2 - LTO ....... 97 |================================================ GCC 4.9.0 RC1 - Stock . 102 |================================================== GCC 4.9.0 RC1 - LTO ... 102 |================================================== BYTE Unix Benchmark 3.6 Computational Test: Dhrystone 2 LPS > Higher Is Better GCC 4.8.2 - Stock ..... 35908267.37 |========================================= GCC 4.8.2 - LTO ....... 34717223.53 |======================================== GCC 4.9.0 RC1 - Stock . 36694838.47 |========================================== GCC 4.9.0 RC1 - LTO ... 33544122.87 |====================================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better GCC 4.8.2 - Stock ..... 1810.10 |============================================== GCC 4.8.2 - LTO ....... 1813.55 |============================================== GCC 4.9.0 RC1 - Stock . 1828.15 |============================================== GCC 4.9.0 RC1 - LTO ... 1825.41 |============================================== Hierarchical INTegration 1.0 Test: FLOAT QUIPs > Higher Is Better GCC 4.8.2 - Stock ..... 367476347.00 |======================================== GCC 4.8.2 - LTO ....... 367633749.83 |======================================== GCC 4.9.0 RC1 - Stock . 373674384.83 |========================================= GCC 4.9.0 RC1 - LTO ... 371516290.66 |========================================= Apache Benchmark 2.4.7 Static Web Page Serving Requests Per Second > Higher Is Better GCC 4.8.2 - Stock ..... 35904.53 |============================================= GCC 4.8.2 - LTO ....... 35521.51 |============================================ GCC 4.9.0 RC1 - Stock . 35953.73 |============================================= GCC 4.9.0 RC1 - LTO ... 35817.51 |============================================= ebizzy 0.3 Records/s Seconds > Higher Is Better GCC 4.8.2 - Stock ..... 42849 |================================================ GCC 4.8.2 - LTO ....... 42698 |================================================ GCC 4.9.0 RC1 - Stock . 42950 |================================================ GCC 4.9.0 RC1 - LTO ... 42565 |================================================ Timed Apache Compilation 2.4.7 Time To Compile Seconds < Lower Is Better GCC 4.8.2 - Stock ..... 27.14 |============================ GCC 4.8.2 - LTO ....... 44.37 |=============================================== GCC 4.9.0 RC1 - Stock . 27.88 |============================= GCC 4.9.0 RC1 - LTO ... 45.74 |================================================ Timed ImageMagick Compilation 6.8.1-10 Time To Compile Seconds < Lower Is Better GCC 4.8.2 - Stock ..... 60.35 |====================== GCC 4.8.2 - LTO ....... 124.51 |============================================= GCC 4.9.0 RC1 - Stock . 60.59 |====================== GCC 4.9.0 RC1 - LTO ... 129.91 |=============================================== Timed PHP Compilation 5.2.9 Time To Compile Seconds < Lower Is Better GCC 4.8.2 - Stock ..... 26.27 |============= GCC 4.8.2 - LTO ....... 93.29 |=============================================== GCC 4.9.0 RC1 - Stock . 27.05 |============== GCC 4.9.0 RC1 - LTO ... 95.42 |================================================ C-Ray 1.1 Total Time Seconds < Lower Is Better GCC 4.8.2 - Stock ..... 16.98 |================================================ GCC 4.8.2 - LTO ....... 16.98 |================================================ GCC 4.9.0 RC1 - Stock . 17.09 |================================================ GCC 4.9.0 RC1 - LTO ... 17.08 |================================================ Open Porous Media 2013-11-26 OPM Benchmark: Upscale-Relperm Seconds < Lower Is Better GCC 4.8.2 - Stock . 100.58 |=================================================== GCC 4.8.2 - LTO ... 100.42 |=================================================== Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better GCC 4.8.2 - Stock . 24 |======================================================= GCC 4.8.2 - LTO ... 24 |======================================================= FLAC Audio Encoding 1.3.0 WAV To FLAC Seconds < Lower Is Better GCC 4.8.2 - Stock ..... 4.82 |================================================= GCC 4.9.0 RC1 - Stock . 3.70 |====================================== GCC 4.9.0 RC1 - LTO ... 3.72 |====================================== LAME MP3 Encoding 3.99.3 WAV To MP3 Seconds < Lower Is Better GCC 4.8.2 - Stock ..... 12.42 |============================================= GCC 4.8.2 - LTO ....... 13.24 |================================================ GCC 4.9.0 RC1 - Stock . 10.87 |======================================= GCC 4.9.0 RC1 - LTO ... 10.88 |======================================= FFmpeg 2.1.1 H.264 HD To NTSC DV Seconds < Lower Is Better GCC 4.8.2 - Stock ..... 13.99 |================================================ GCC 4.8.2 - LTO ....... 13.80 |=============================================== GCC 4.9.0 RC1 - Stock . 13.80 |=============================================== GCC 4.9.0 RC1 - LTO ... 13.75 |=============================================== Open FMM Nero2D 2.0.2 Total Time Seconds < Lower Is Better GCC 4.8.2 - Stock . 453.45 |================================================== GCC 4.8.2 - LTO ... 458.86 |===================================================