GCC 4.9 Linux LTO Optimizations Some link-time optimization compiler benchmarks by Michael Larabel for Phoronix.com of GCC 4.8.2 and 4.9.0 RC1. The Link-time optimization results with -flto didn't turn out to be as exciting as anticipated, so here they are for this short future article on phoronix. Benchmarks from an Intel Core i7 Haswell running Ubuntu Linux.
HTML result view exported from: https://openbenchmarking.org/result/1404126-PTS-GCC4849L62&sro&grw .
GCC 4.9 Linux LTO Optimizations Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GCC 4.8.2 - Stock GCC 4.8.2 - LTO GCC 4.9.0 RC1 - Stock GCC 4.9.0 RC1 - LTO Intel Core i7-4770K @ 3.50GHz (8 Cores) ECS Z87H3-A2X EXTREME v1.0 Intel 4th Gen Core DRAM 16384MB 120GB Samsung SSD 840 ECS NVIDIA GeForce GTX 460 768MB (675/1804MHz) Realtek ALC1150 Samsung SyncMaster Realtek RTL8111/8168/8411 Ubuntu 14.04 3.13.0-22-generic (x86_64) Unity 7.2.0 X Server 1.15.0 NVIDIA 337.12 4.3.0 GCC 4.8.2 ext4 2560x1600 GCC 4.9.0 20140411 OpenBenchmarking.org Compiler Details - --enable-checking=release Processor Details - Scaling Governor: acpi-cpufreq ondemand
GCC 4.9 Linux LTO Optimizations hint: FLOAT encode-flac: WAV To FLAC encode-mp3: WAV To MP3 himeno: Poisson Pressure Solver hpcc: G-HPL hpcc: G-Ffte hpcc: EP-DGEMM hpcc: G-Ptrans hpcc: EP-STREAM Triad hpcc: G-Rand Access build-imagemagick: Time To Compile build-apache: Time To Compile build-php: Time To Compile nero2d: Total Time graphics-magick: Blur graphics-magick: Sharpen open-porous-media: Upscale-Relperm graphics-magick: Resizing graphics-magick: HWB Color Space graphics-magick: Local Adaptive Thresholding c-ray: Total Time ffmpeg: H.264 HD To NTSC DV smallpt: Global Illumination Renderer; 100 Samples apache: Static Web Page Serving ebizzy: Records/s byte: Dhrystone 2 GCC 4.8.2 - Stock GCC 4.8.2 - LTO GCC 4.9.0 RC1 - Stock GCC 4.9.0 RC1 - LTO 367476347.00 4.82 12.42 1810.10 50.05477 2.11922 6.57694 1.26591 1.08895 0.00979 60.35 27.14 26.27 453.45 170 140 100.58 198 216 97 16.98 13.99 24 35904.53 42849 35908267.37 367633749.83 13.24 1813.55 49.78783 2.12967 6.63749 1.26956 1.08612 0.00980 124.51 44.37 93.29 458.86 171 141 100.42 199 218 97 16.98 13.80 24 35521.51 42698 34717223.53 373674384.83 3.70 10.87 1828.15 50.08117 2.11338 6.72611 1.26904 1.08175 0.00980 60.59 27.88 27.05 166 140 199 214 102 17.09 13.80 35953.73 42950 36694838.47 371516290.66 3.72 10.88 1825.41 49.96230 2.13555 6.63834 1.26850 1.08126 0.00989 129.91 45.74 95.42 165 140 198 213 102 17.08 13.75 35817.51 42565 33544122.87 OpenBenchmarking.org
Hierarchical INTegration Test: FLOAT OpenBenchmarking.org QUIPs, More Is Better Hierarchical INTegration 1.0 Test: FLOAT GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 80M 160M 240M 320M 400M SE +/- 129752.70, N = 3 SE +/- 103032.27, N = 3 SE +/- 994638.92, N = 3 SE +/- 362392.17, N = 3 367633749.83 367476347.00 371516290.66 373674384.83 -flto -flto 1. (CC) gcc options: -O3 -march=native -lm
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.0 WAV To FLAC GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 1.0845 2.169 3.2535 4.338 5.4225 SE +/- 0.06, N = 7 SE +/- 0.09, N = 10 SE +/- 0.09, N = 10 4.82 3.72 3.70 -flto 1. (CXX) g++ options: -O3 -march=native -fvisibility=hidden -logg -lm
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.99.3 WAV To MP3 GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 3 6 9 12 15 SE +/- 0.05, N = 5 SE +/- 0.04, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 13.24 12.42 10.88 10.87 -flto -flto 1. (CC) gcc options: -pipe -O3 -march=native -lm
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 400 800 1200 1600 2000 SE +/- 0.48, N = 3 SE +/- 3.93, N = 3 SE +/- 0.63, N = 3 SE +/- 1.31, N = 3 1813.55 1810.10 1825.41 1828.15 -flto -flto 1. (CC) gcc options: -O3 -march=native
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.4.3 Test / Class: G-HPL GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 11 22 33 44 55 SE +/- 0.24, N = 3 SE +/- 0.08, N = 3 SE +/- 0.11, N = 3 SE +/- 0.05, N = 3 49.79 50.05 49.96 50.08 -flto -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 1.6.5
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.4.3 Test / Class: G-Ffte GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 0.4805 0.961 1.4415 1.922 2.4025 SE +/- 0.00327, N = 3 SE +/- 0.00495, N = 3 SE +/- 0.01396, N = 3 SE +/- 0.00571, N = 3 2.12967 2.11922 2.13555 2.11338 -flto -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 1.6.5
HPC Challenge Test / Class: EP-DGEMM OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.4.3 Test / Class: EP-DGEMM GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 2 4 6 8 10 SE +/- 0.05029, N = 3 SE +/- 0.04345, N = 3 SE +/- 0.06945, N = 3 SE +/- 0.05569, N = 3 6.63749 6.57694 6.63834 6.72611 -flto -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 1.6.5
HPC Challenge Test / Class: G-Ptrans OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.4.3 Test / Class: G-Ptrans GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 0.2857 0.5714 0.8571 1.1428 1.4285 SE +/- 0.00112, N = 3 SE +/- 0.00051, N = 3 SE +/- 0.00220, N = 3 SE +/- 0.00329, N = 3 1.26956 1.26591 1.26850 1.26904 -flto -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 1.6.5
HPC Challenge Test / Class: EP-STREAM Triad OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.4.3 Test / Class: EP-STREAM Triad GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 0.245 0.49 0.735 0.98 1.225 SE +/- 0.00405, N = 3 SE +/- 0.00488, N = 3 SE +/- 0.00342, N = 3 SE +/- 0.00379, N = 3 1.08612 1.08895 1.08126 1.08175 -flto -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 1.6.5
HPC Challenge Test / Class: G-Random Access OpenBenchmarking.org GUP/s, More Is Better HPC Challenge 1.4.3 Test / Class: G-Random Access GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 0.0022 0.0044 0.0066 0.0088 0.011 SE +/- 0.00001, N = 3 SE +/- 0.00008, N = 3 SE +/- 0.00003, N = 3 SE +/- 0.00013, N = 3 0.00980 0.00979 0.00989 0.00980 -flto -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops 2. BLAS + Open MPI 1.6.5
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.8.1-10 Time To Compile GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 30 60 90 120 150 SE +/- 0.30, N = 3 SE +/- 0.36, N = 3 SE +/- 0.32, N = 3 SE +/- 0.04, N = 3 124.51 60.35 129.91 60.59
Timed Apache Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Apache Compilation 2.4.7 Time To Compile GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 10 20 30 40 50 SE +/- 0.07, N = 3 SE +/- 0.19, N = 3 SE +/- 0.05, N = 3 SE +/- 0.23, N = 3 44.37 27.14 45.74 27.88
Timed PHP Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 5.2.9 Time To Compile GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 20 40 60 80 100 SE +/- 1.16, N = 3 SE +/- 0.08, N = 3 SE +/- 0.14, N = 3 SE +/- 0.02, N = 3 93.29 26.27 95.42 27.05 -flto -flto 1. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm
Open FMM Nero2D Total Time OpenBenchmarking.org Seconds, Fewer Is Better Open FMM Nero2D 2.0.2 Total Time GCC 4.8.2 - LTO GCC 4.8.2 - Stock 100 200 300 400 500 458.86 453.45 -flto 1. (CXX) g++ options: -O3 -march=native -lfftw3 -llapack -lf77blas -latlas -lm
GraphicsMagick Operation: Blur OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: Blur GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 40 80 120 160 200 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 171 170 165 166 -flto -flto 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: Sharpen GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 141 140 140 140 -flto -flto 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread
Open Porous Media OPM Benchmark: Upscale-Relperm OpenBenchmarking.org Seconds, Fewer Is Better Open Porous Media 2013-11-26 OPM Benchmark: Upscale-Relperm GCC 4.8.2 - LTO GCC 4.8.2 - Stock 20 40 60 80 100 SE +/- 0.38, N = 3 SE +/- 0.17, N = 3 100.42 100.58 1. (F9X) gfortran options: -rdynamic
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: Resizing GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 40 80 120 160 200 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 199 198 198 199 -flto -flto 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: HWB Color Space OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: HWB Color Space GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 50 100 150 200 250 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 218 216 213 214 -flto -flto 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread
GraphicsMagick Operation: Local Adaptive Thresholding OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.19 Operation: Local Adaptive Thresholding GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 97 97 102 102 -flto -flto 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 4 8 12 16 20 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 16.98 16.98 17.08 17.09 -flto -flto 1. (CC) gcc options: -lm -lpthread -O3 -march=native
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 2.1.1 H.264 HD To NTSC DV GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 13.80 13.99 13.75 13.80 -flto -flto 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -O3 -march=native -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples GCC 4.8.2 - LTO GCC 4.8.2 - Stock 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 24 24 -flto 1. (CXX) g++ options: -fopenmp -O3 -march=native
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.7 Static Web Page Serving GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 8K 16K 24K 32K 40K SE +/- 75.16, N = 3 SE +/- 136.25, N = 3 SE +/- 55.60, N = 3 SE +/- 76.61, N = 3 35521.51 35904.53 35817.51 35953.73 -flto -flto 1. (CC) gcc options: -shared -fPIC -pthread -O3 -march=native
ebizzy Records/s OpenBenchmarking.org Seconds, More Is Better ebizzy 0.3 Records/s GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 9K 18K 27K 36K 45K SE +/- 104.13, N = 3 SE +/- 109.85, N = 3 SE +/- 33.07, N = 3 SE +/- 44.08, N = 3 42698 42849 42565 42950 -flto -flto 1. (CC) gcc options: -pthread -lpthread -O3 -march=native
BYTE Unix Benchmark Computational Test: Dhrystone 2 OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 3.6 Computational Test: Dhrystone 2 GCC 4.8.2 - LTO GCC 4.8.2 - Stock GCC 4.9.0 RC1 - LTO GCC 4.9.0 RC1 - Stock 8M 16M 24M 32M 40M SE +/- 9630.07, N = 3 SE +/- 13215.25, N = 3 SE +/- 32494.09, N = 3 SE +/- 165665.98, N = 3 34717223.53 35908267.37 33544122.87 36694838.47 -flto -flto 1. (CC) gcc options: -O3 -march=native
Phoronix Test Suite v10.8.5