GCC 4.9 Linux LTO Optimizations

Some link-time optimization compiler benchmarks by Michael Larabel for Phoronix.com of GCC 4.8.2 and 4.9.0 RC1. The Link-time optimization results with -flto didn't turn out to be as exciting as anticipated, so here they are for this short future article on phoronix. Benchmarks from an Intel Core i7 Haswell running Ubuntu Linux.

HTML result view exported from: https://openbenchmarking.org/result/1404126-PTS-GCC4849L62&sro&grr.

GCC 4.9 Linux LTO OptimizationsProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionGCC 4.8.2 - StockGCC 4.8.2 - LTOGCC 4.9.0 RC1 - StockGCC 4.9.0 RC1 - LTOIntel Core i7-4770K @ 3.50GHz (8 Cores)ECS Z87H3-A2X EXTREME v1.0Intel 4th Gen Core DRAM16384MB120GB Samsung SSD 840ECS NVIDIA GeForce GTX 460 768MB (675/1804MHz)Realtek ALC1150Samsung SyncMasterRealtek RTL8111/8168/8411Ubuntu 14.043.13.0-22-generic (x86_64)Unity 7.2.0X Server 1.15.0NVIDIA 337.124.3.0GCC 4.8.2ext42560x1600GCC 4.9.0 20140411OpenBenchmarking.orgCompiler Details- --enable-checking=releaseProcessor Details- Scaling Governor: acpi-cpufreq ondemand

GCC 4.9 Linux LTO Optimizationsapache: Static Web Page Servinghint: FLOATnero2d: Total Timeffmpeg: H.264 HD To NTSC DVencode-mp3: WAV To MP3encode-flac: WAV To FLACsmallpt: Global Illumination Renderer; 100 Samplesopen-porous-media: Upscale-Relpermebizzy: Records/sc-ray: Total Timebuild-php: Time To Compilebuild-imagemagick: Time To Compilebuild-apache: Time To Compilehimeno: Poisson Pressure Solvergraphics-magick: Local Adaptive Thresholdinggraphics-magick: HWB Color Spacegraphics-magick: Resizinggraphics-magick: Sharpengraphics-magick: Blurbyte: Dhrystone 2hpcc: G-Rand Accesshpcc: EP-STREAM Triadhpcc: G-Ptranshpcc: EP-DGEMMhpcc: G-Fftehpcc: G-HPLGCC 4.8.2 - StockGCC 4.8.2 - LTOGCC 4.9.0 RC1 - StockGCC 4.9.0 RC1 - LTO35904.53367476347.00453.4513.9912.424.8224100.584284916.9826.2760.3527.141810.109721619814017035908267.370.009791.088951.265916.576942.1192250.0547735521.51367633749.83458.8613.8013.2424100.424269816.9893.29124.5144.371813.559721819914117134717223.530.009801.086121.269566.637492.1296749.7878335953.73373674384.8313.8010.873.704295017.0927.0560.5927.881828.1510221419914016636694838.470.009801.081751.269046.726112.1133850.0811735817.51371516290.6613.7510.883.724256517.0895.42129.9145.741825.4110221319814016533544122.870.009891.081261.268506.638342.1355549.96230OpenBenchmarking.org

Apache Benchmark

Static Web Page Serving

OpenBenchmarking.orgRequests Per Second, More Is BetterApache Benchmark 2.4.7Static Web Page ServingGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock8K16K24K32K40KSE +/- 75.16, N = 3SE +/- 136.25, N = 3SE +/- 55.60, N = 3SE +/- 76.61, N = 335521.5135904.5335817.5135953.73-flto-flto1. (CC) gcc options: -shared -fPIC -pthread -O3 -march=native

Hierarchical INTegration

Test: FLOAT

OpenBenchmarking.orgQUIPs, More Is BetterHierarchical INTegration 1.0Test: FLOATGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock80M160M240M320M400MSE +/- 129752.70, N = 3SE +/- 103032.27, N = 3SE +/- 994638.92, N = 3SE +/- 362392.17, N = 3367633749.83367476347.00371516290.66373674384.83-flto-flto1. (CC) gcc options: -O3 -march=native -lm

Open FMM Nero2D

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterOpen FMM Nero2D 2.0.2Total TimeGCC 4.8.2 - LTOGCC 4.8.2 - Stock100200300400500458.86453.45-flto1. (CXX) g++ options: -O3 -march=native -lfftw3 -llapack -lf77blas -latlas -lm

FFmpeg

H.264 HD To NTSC DV

OpenBenchmarking.orgSeconds, Fewer Is BetterFFmpeg 2.1.1H.264 HD To NTSC DVGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock48121620SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.01, N = 313.8013.9913.7513.80-flto-flto1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -O3 -march=native -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT

LAME MP3 Encoding

WAV To MP3

OpenBenchmarking.orgSeconds, Fewer Is BetterLAME MP3 Encoding 3.99.3WAV To MP3GCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock3691215SE +/- 0.05, N = 5SE +/- 0.04, N = 5SE +/- 0.01, N = 5SE +/- 0.01, N = 513.2412.4210.8810.87-flto-flto1. (CC) gcc options: -pipe -O3 -march=native -lm

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.0WAV To FLACGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock1.08452.1693.25354.3385.4225SE +/- 0.06, N = 7SE +/- 0.09, N = 10SE +/- 0.09, N = 104.823.723.70-flto1. (CXX) g++ options: -O3 -march=native -fvisibility=hidden -logg -lm

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 SamplesGCC 4.8.2 - LTOGCC 4.8.2 - Stock612182430SE +/- 0.00, N = 3SE +/- 0.00, N = 32424-flto1. (CXX) g++ options: -fopenmp -O3 -march=native

Open Porous Media

OPM Benchmark: Upscale-Relperm

OpenBenchmarking.orgSeconds, Fewer Is BetterOpen Porous Media 2013-11-26OPM Benchmark: Upscale-RelpermGCC 4.8.2 - LTOGCC 4.8.2 - Stock20406080100SE +/- 0.38, N = 3SE +/- 0.17, N = 3100.42100.581. (F9X) gfortran options: -rdynamic

ebizzy

Records/s

OpenBenchmarking.orgSeconds, More Is Betterebizzy 0.3Records/sGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock9K18K27K36K45KSE +/- 104.13, N = 3SE +/- 109.85, N = 3SE +/- 33.07, N = 3SE +/- 44.08, N = 342698428494256542950-flto-flto1. (CC) gcc options: -pthread -lpthread -O3 -march=native

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total TimeGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock48121620SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 316.9816.9817.0817.09-flto-flto1. (CC) gcc options: -lm -lpthread -O3 -march=native

Timed PHP Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 5.2.9Time To CompileGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock20406080100SE +/- 1.16, N = 3SE +/- 0.08, N = 3SE +/- 0.14, N = 3SE +/- 0.02, N = 393.2926.2795.4227.05-flto-flto1. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm

Timed ImageMagick Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed ImageMagick Compilation 6.8.1-10Time To CompileGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock306090120150SE +/- 0.30, N = 3SE +/- 0.36, N = 3SE +/- 0.32, N = 3SE +/- 0.04, N = 3124.5160.35129.9160.59

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To CompileGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock1020304050SE +/- 0.07, N = 3SE +/- 0.19, N = 3SE +/- 0.05, N = 3SE +/- 0.23, N = 344.3727.1445.7427.88

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure SolverGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock400800120016002000SE +/- 0.48, N = 3SE +/- 3.93, N = 3SE +/- 0.63, N = 3SE +/- 1.31, N = 31813.551810.101825.411828.15-flto-flto1. (CC) gcc options: -O3 -march=native

GraphicsMagick

Operation: Local Adaptive Thresholding

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.19Operation: Local Adaptive ThresholdingGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock20406080100SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 39797102102-flto-flto1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread

GraphicsMagick

Operation: HWB Color Space

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.19Operation: HWB Color SpaceGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock50100150200250SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3218216213214-flto-flto1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread

GraphicsMagick

Operation: Resizing

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.19Operation: ResizingGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock4080120160200SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3199198198199-flto-flto1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread

GraphicsMagick

Operation: Sharpen

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.19Operation: SharpenGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock306090120150SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3141140140140-flto-flto1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread

GraphicsMagick

Operation: Blur

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.19Operation: BlurGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock4080120160200SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3171170165166-flto-flto1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -march=native -pthread -ljbig -lwebp -ljpeg -lXext -lX11 -llzma -lxml2 -lz -lm -lpthread

BYTE Unix Benchmark

Computational Test: Dhrystone 2

OpenBenchmarking.orgLPS, More Is BetterBYTE Unix Benchmark 3.6Computational Test: Dhrystone 2GCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock8M16M24M32M40MSE +/- 9630.07, N = 3SE +/- 13215.25, N = 3SE +/- 32494.09, N = 3SE +/- 165665.98, N = 334717223.5335908267.3733544122.8736694838.47-flto-flto1. (CC) gcc options: -O3 -march=native

HPC Challenge

Test / Class: G-Random Access

OpenBenchmarking.orgGUP/s, More Is BetterHPC Challenge 1.4.3Test / Class: G-Random AccessGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock0.00220.00440.00660.00880.011SE +/- 0.00001, N = 3SE +/- 0.00008, N = 3SE +/- 0.00003, N = 3SE +/- 0.00013, N = 30.009800.009790.009890.00980-flto-flto1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops2. BLAS + Open MPI 1.6.5

HPC Challenge

Test / Class: EP-STREAM Triad

OpenBenchmarking.orgGB/s, More Is BetterHPC Challenge 1.4.3Test / Class: EP-STREAM TriadGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock0.2450.490.7350.981.225SE +/- 0.00405, N = 3SE +/- 0.00488, N = 3SE +/- 0.00342, N = 3SE +/- 0.00379, N = 31.086121.088951.081261.08175-flto-flto1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops2. BLAS + Open MPI 1.6.5

HPC Challenge

Test / Class: G-Ptrans

OpenBenchmarking.orgGB/s, More Is BetterHPC Challenge 1.4.3Test / Class: G-PtransGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock0.28570.57140.85711.14281.4285SE +/- 0.00112, N = 3SE +/- 0.00051, N = 3SE +/- 0.00220, N = 3SE +/- 0.00329, N = 31.269561.265911.268501.26904-flto-flto1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops2. BLAS + Open MPI 1.6.5

HPC Challenge

Test / Class: EP-DGEMM

OpenBenchmarking.orgGFLOPS, More Is BetterHPC Challenge 1.4.3Test / Class: EP-DGEMMGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock246810SE +/- 0.05029, N = 3SE +/- 0.04345, N = 3SE +/- 0.06945, N = 3SE +/- 0.05569, N = 36.637496.576946.638346.72611-flto-flto1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops2. BLAS + Open MPI 1.6.5

HPC Challenge

Test / Class: G-Ffte

OpenBenchmarking.orgGFLOPS, More Is BetterHPC Challenge 1.4.3Test / Class: G-FfteGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock0.48050.9611.44151.9222.4025SE +/- 0.00327, N = 3SE +/- 0.00495, N = 3SE +/- 0.01396, N = 3SE +/- 0.00571, N = 32.129672.119222.135552.11338-flto-flto1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops2. BLAS + Open MPI 1.6.5

HPC Challenge

Test / Class: G-HPL

OpenBenchmarking.orgGFLOPS, More Is BetterHPC Challenge 1.4.3Test / Class: G-HPLGCC 4.8.2 - LTOGCC 4.8.2 - StockGCC 4.9.0 RC1 - LTOGCC 4.9.0 RC1 - Stock1122334455SE +/- 0.24, N = 3SE +/- 0.08, N = 3SE +/- 0.11, N = 3SE +/- 0.05, N = 349.7950.0549.9650.08-flto-flto1. (CC) gcc options: -lblas -lm -pthread -lmpi -ldl -lhwloc -fomit-frame-pointer -O3 -march=native -funroll-loops2. BLAS + Open MPI 1.6.5


Phoronix Test Suite v10.8.5