Intel Haswell GCC 4.8 core-avx2 Tuning

Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article.

HTML result view exported from: https://openbenchmarking.org/result/1309165-SO-1309136DA64&grs&rdt.

Intel Haswell GCC 4.8 core-avx2 TuningProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen Resolutioncore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.5Intel Core i7-4770K @ 3.50GHz (8 Cores)Intel DH87RLIntel Haswell DRAM15360MB240GB OCZ VERTEX3Intel Haswell IGPIntel Haswell HDMIVA2431Intel Connection I217-VUbuntu 13.043.10.0-999-generic (x86_64)Unity 7.0.0X Server 1.13.3intel 2.21.93.0 Mesa 9.2.0-devel (git-a2e3b1c)GCC 4.8.1 + LLVM 3.2ext41920x1080Intel Core i7-3770K @ 3.90GHz (8 Cores)ASRock Z77 Pro4-M16384MB256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz)LCD3090WQXiGentoo Base 2.23.11.0-drmfixes20130912-core-avx-i (x86_64)KDEX Server 1.14.2.902 (1.14.3 RC 2)radeon 7.2.993.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn2560x1600Intel Core i7-3820 @ 4.20GHz (8 Cores)Gigabyte X79-UD3Intel Xeon E5/Core250GB Samsung SSD 840 + 80GB TOSHIBA MK8052GS + 640GB Western Digital WD6401AALS-0eVGA NVIDIA GeForce GTX 650 Ti 2048MB (928/2700MHz)Realtek ALC898Intel 82579V Gigabit ConnectionLinux3.10.10-1-ARCH (x86_64)Cinnamon 1.8.8X Server 1.14.2NVIDIA 325.154.3.0 NVIDIA 325.15GCC 4.8.1 20130725btrfsOpenBenchmarking.orgCompiler Details- core-avx2: --enable-checking=release --enable-languages=c,c++,fortran- core2: --enable-checking=release --enable-languages=c,c++,fortran- corei7: --enable-checking=release --enable-languages=c,c++,fortran- corei7-avx: --enable-checking=release --enable-languages=c,c++,fortran- core-avx-i: --enable-checking=release --enable-languages=c,c++,fortran- nocona: --enable-checking=release --enable-languages=c,c++,fortran- test: --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.1 --build=x86_64-pc-linux-gnu --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.1 --disable-altivec --disable-fixed-point --disable-isl-version-check --disable-libgcj --disable-libssp --disable-lto --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-languages=c,c++,fortran --enable-libgomp --enable-libmudflap --enable-libstdcxx-time --enable-multilib --enable-nls --enable-obsolete --enable-secureplt --enable-shared --enable-targets=all --enable-threads=posix --host=x86_64-pc-linux-gnu --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.1/include --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.1/man --with-cloog --with-multilib-list=m32,m64 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.8.1/python - 3820 @ 4.5: --disable-cloog-version-check --disable-install-libiberty --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-cloog-backend=isl --enable-gnu-unique-object --enable-gold --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-ld=default --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-linker-hash-style=gnu --with-plugin-ld=ld.goldProcessor Details- core-avx2: Scaling Governor: acpi- freq ondemand- core2: Scaling Governor: acpi- freq ondemand- corei7: Scaling Governor: acpi- freq ondemand- corei7-avx: Scaling Governor: acpi- freq ondemand- core-avx-i: Scaling Governor: acpi- freq ondemand- nocona: Scaling Governor: acpi- freq ondemand- test: Scaling Governor: intel_pstate powersave- 3820 @ 4.5: Scaling Governor: intel_pstate powersave

Intel Haswell GCC 4.8 core-avx2 Tuningsmallpt: Global Illumination Renderer; 100 Samplesgraphics-magick: Sharpenc-ray: Total Timescimark2: Fast Fourier Transformbuild-imagemagick: Time To Compilescimark2: Dense LU Matrix Factorizationhimeno: Poisson Pressure Solverttsiod-renderer: Phong Rendering With Soft-Shadow Mappinggraphics-magick: Blurapache: Static Web Page Servingbuild-linux-kernel: Time To Compilegraphics-magick: Local Adaptive Thresholdinggraphics-magick: Resizingscimark2: Monte Carloffmpeg: H.264 HD To NTSC DVhmmer: Pfam Database Searchbotan: Tigerx264: H.264 Video Encodingbotan: CAST-256botan: AES-256core-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.52413617.02226.5780.661817.031282.30119.7813825644.1097.25121182596.1613.0110.55424.56155.1895.76158.43268422.95250.9379.031859.971564.22121.5811725606.1797.63120160616.2113.1610.14438.87156.7495.80158.35268422.95249.1179.641863.191560.18123.1411625490.1497.77120160616.6512.9310.22427.31156.0695.54157.96269622.84251.8680.911851.101404.92117.7112225580.4498.10119166616.6512.8610.62442.47155.6395.77158.19269622.83247.3581.061824.281630.12116.5412225549.8497.85120167615.7613.0010.45440.37156.0895.79158.31268323.07245.0776.981825.731517.03122.0211524888.1197.89118157615.3312.9410.16438.78156.8095.48157.97878327.78339.8859.512386.291686.65148.7513223897.3289.94123161553.4811.8610.13158.19909226.59223.4555.022542.251735.83130.6114429308.2480.48138177549.209.95161.39OpenBenchmarking.org

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 Samplescore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.520406080100SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 32426262626268790-O3 -march=core-avx2-O3 -march=core2-O3 -march=corei7-O3 -march=corei7-avx-O3 -march=core-avx-i-O3 -march=nocona1. (CXX) g++ options: -fopenmp

GraphicsMagick

Operation: Sharpen

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Sharpencore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.5306090120150SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 313684849696838392-O3 -march=core-avx2 -ljbig-O3 -march=core2 -ljbig-O3 -march=corei7 -ljbig-O3 -march=corei7-avx -ljbig-O3 -march=core-avx-i -ljbig-O3 -march=nocona -ljbig-O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O2 -llcms2 -ltiff -lfreetype -ljasper -lxml2 -lgomp1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Timecore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.5714212835SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 317.0222.9522.9522.8422.8323.0727.7826.59-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=core-avx-i-march=nocona1. (CC) gcc options: -lm -lpthread -O3

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier Transformcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.570140210280350SE +/- 2.02, N = 4SE +/- 0.67, N = 4SE +/- 0.86, N = 4SE +/- 1.22, N = 4SE +/- 2.13, N = 4SE +/- 2.50, N = 4SE +/- 0.34, N = 4SE +/- 2.69, N = 4226.57250.93249.11251.86247.35245.07339.88223.45

Timed ImageMagick Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed ImageMagick Compilation 6.8.1-10Time To Compilecore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.520406080100SE +/- 0.32, N = 3SE +/- 0.06, N = 3SE +/- 0.18, N = 3SE +/- 0.10, N = 3SE +/- 0.31, N = 3SE +/- 0.04, N = 3SE +/- 0.08, N = 3SE +/- 0.08, N = 380.6679.0379.6480.9181.0676.9859.5155.02

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix Factorizationcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.55001000150020002500SE +/- 28.95, N = 4SE +/- 5.53, N = 4SE +/- 3.12, N = 4SE +/- 22.67, N = 4SE +/- 23.35, N = 4SE +/- 21.90, N = 4SE +/- 3.08, N = 4SE +/- 38.45, N = 51817.031859.971863.191851.101824.281825.732386.292542.25

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solvercore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.5400800120016002000SE +/- 19.87, N = 6SE +/- 3.07, N = 3SE +/- 0.75, N = 3SE +/- 105.46, N = 6SE +/- 1.05, N = 3SE +/- 1.20, N = 3SE +/- 0.98, N = 3SE +/- 18.21, N = 31282.301564.221560.181404.921630.121517.031686.651735.83-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=core-avx-i-march=nocona1. (CC) gcc options: -O3

TTSIOD 3D Renderer

Phong Rendering With Soft-Shadow Mapping

OpenBenchmarking.orgFPS, More Is BetterTTSIOD 3D Renderer 2.2zPhong Rendering With Soft-Shadow Mappingcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.5306090120150SE +/- 0.09, N = 3SE +/- 0.76, N = 3SE +/- 0.45, N = 3SE +/- 0.66, N = 3SE +/- 0.36, N = 3SE +/- 0.39, N = 3SE +/- 0.27, N = 3SE +/- 0.03, N = 3119.78121.58123.14117.71116.54122.02148.75130.61-march=core-avx2 -flto-march=core2 -flto-march=corei7 -flto-march=corei7-avx -flto-march=core-avx-i -flto-march=nocona -flto-lpthread-flto -lpthread1. (CXX) g++ options: -O3 -fomit-frame-pointer -ffast-math -mtune=native -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -lstdc++

GraphicsMagick

Operation: Blur

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Blurcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.5306090120150SE +/- 0.88, N = 3SE +/- 0.00, N = 3SE +/- 1.00, N = 3SE +/- 1.00, N = 3SE +/- 0.67, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3138117116122122115132144-O3 -march=core-avx2 -ljbig-O3 -march=core2 -ljbig-O3 -march=corei7 -ljbig-O3 -march=corei7-avx -ljbig-O3 -march=core-avx-i -ljbig-O3 -march=nocona -ljbig-O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O2 -llcms2 -ltiff -lfreetype -ljasper -lxml2 -lgomp1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

Apache Benchmark

Static Web Page Serving

OpenBenchmarking.orgRequests Per Second, More Is BetterApache Benchmark 2.4.3Static Web Page Servingcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.56K12K18K24K30KSE +/- 170.85, N = 3SE +/- 229.80, N = 3SE +/- 193.34, N = 3SE +/- 178.37, N = 3SE +/- 126.25, N = 3SE +/- 107.43, N = 3SE +/- 94.17, N = 3SE +/- 162.25, N = 325644.1025606.1725490.1425580.4425549.8424888.1123897.3229308.24-O3 -march=core-avx2-O3 -march=core2-O3 -march=corei7-O3 -march=corei7-avx-O3 -march=core-avx-i-O3 -march=nocona-O2-O21. (CC) gcc options: -shared -fPIC -pthread

Timed Linux Kernel Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 3.1Time To Compilecore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.520406080100SE +/- 0.60, N = 3SE +/- 0.54, N = 3SE +/- 0.69, N = 3SE +/- 0.54, N = 3SE +/- 0.76, N = 3SE +/- 0.59, N = 3SE +/- 0.79, N = 3SE +/- 0.50, N = 397.2597.6397.7798.1097.8597.8989.9480.48

GraphicsMagick

Operation: Local Adaptive Thresholding

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Local Adaptive Thresholdingcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.5306090120150SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3121120120119120118123138-O3 -march=core-avx2 -ljbig-O3 -march=core2 -ljbig-O3 -march=corei7 -ljbig-O3 -march=corei7-avx -ljbig-O3 -march=core-avx-i -ljbig-O3 -march=nocona -ljbig-O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O2 -llcms2 -ltiff -lfreetype -ljasper -lxml2 -lgomp1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

GraphicsMagick

Operation: Resizing

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Resizingcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.54080120160200SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3182160160166167157161177-O3 -march=core-avx2 -ljbig-O3 -march=core2 -ljbig-O3 -march=corei7 -ljbig-O3 -march=corei7-avx -ljbig-O3 -march=core-avx-i -ljbig-O3 -march=nocona -ljbig-O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O2 -llcms2 -ltiff -lfreetype -ljasper -lxml2 -lgomp1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte Carlocore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.5130260390520650SE +/- 20.17, N = 8SE +/- 0.51, N = 4SE +/- 0.44, N = 4SE +/- 0.44, N = 4SE +/- 0.44, N = 4SE +/- 0.72, N = 4SE +/- 0.00, N = 4SE +/- 9.67, N = 8596.16616.21616.65616.65615.76615.33553.48549.20

FFmpeg

H.264 HD To NTSC DV

OpenBenchmarking.orgSeconds, Fewer Is BetterFFmpeg 1.1H.264 HD To NTSC DVcore-avx2core2corei7corei7-avxcore-avx-inoconatest3691215SE +/- 0.12, N = 3SE +/- 0.09, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.05, N = 3SE +/- 0.07, N = 3SE +/- 0.06, N = 313.0113.1612.9312.8613.0012.9411.86-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=core-avx-i-march=nocona-lva -lpthread -lrt1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -O3 -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database Searchcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.53691215SE +/- 0.17, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 0.17, N = 410.5510.1410.2210.6210.4510.1610.139.95-O3 -march=core-avx2-O3 -march=core2-O3 -march=corei7-O3 -march=corei7-avx-O3 -march=core-avx-i-O3 -march=nocona-O2-O21. (CC) gcc options: -pthread -lhmmer -lsquid -lm

Botan

Test: Tiger

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: Tigercore-avx2core2corei7corei7-avxcore-avx-inocona100200300400500424.56438.87427.31442.47440.37438.781. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2013-06-08H.264 Video Encodingcore-avx2core2corei7corei7-avxcore-avx-inoconatest3820 @ 4.54080120160200SE +/- 0.90, N = 5SE +/- 0.55, N = 5SE +/- 0.51, N = 5SE +/- 0.20, N = 5SE +/- 0.50, N = 5SE +/- 0.30, N = 5SE +/- 0.52, N = 5SE +/- 1.34, N = 5155.18156.74156.06155.63156.08156.80158.19161.39-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=core-avx-i-march=nocona-lavformat -lavcodec -lavutil -lswscale-lavformat -lavcodec -lavutil -lswscale1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize

Botan

Test: CAST-256

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: CAST-256core-avx2core2corei7corei7-avxcore-avx-inocona2040608010095.7695.8095.5495.7795.7995.481. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2

Botan

Test: AES-256

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: AES-256core-avx2core2corei7corei7-avxcore-avx-inocona4080120160200158.43158.35157.96158.19158.31157.971. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2


Phoronix Test Suite v10.8.4