Intel Haswell GCC 4.8 core-avx2 Tuning

Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article.

HTML result view exported from: https://openbenchmarking.org/result/1309136-DARK-130615064&gru&sor.

Intel Haswell GCC 4.8 core-avx2 TuningProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen Resolutionnoconacore2corei7corei7-avxcore-avx-icore-avx2testi7-3770K core-avx-iIntel Core i7-4770K @ 3.50GHz (8 Cores)Intel DH87RLIntel Haswell DRAM15360MB240GB OCZ VERTEX3Intel Haswell IGPIntel Haswell HDMIVA2431Intel Connection I217-VUbuntu 13.043.10.0-999-generic (x86_64)Unity 7.0.0X Server 1.13.3intel 2.21.93.0 Mesa 9.2.0-devel (git-a2e3b1c)GCC 4.8.1 + LLVM 3.2ext41920x1080Intel Core i7-3770K @ 3.90GHz (8 Cores)ASRock Z77 Pro4-M16384MB256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz)LCD3090WQXiGentoo Base 2.23.11.0-drmfixes20130912-core-avx-i (x86_64)KDEX Server 1.14.2.902 (1.14.3 RC 2)radeon 7.2.993.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn2560x1600OpenBenchmarking.orgCompiler Details- nocona: --enable-checking=release --enable-languages=c,c++,fortran- core2: --enable-checking=release --enable-languages=c,c++,fortran- corei7: --enable-checking=release --enable-languages=c,c++,fortran- corei7-avx: --enable-checking=release --enable-languages=c,c++,fortran- core-avx-i: --enable-checking=release --enable-languages=c,c++,fortran- core-avx2: --enable-checking=release --enable-languages=c,c++,fortran- test: --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.1 --build=x86_64-pc-linux-gnu --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.1 --disable-altivec --disable-fixed-point --disable-isl-version-check --disable-libgcj --disable-libssp --disable-lto --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-languages=c,c++,fortran --enable-libgomp --enable-libmudflap --enable-libstdcxx-time --enable-multilib --enable-nls --enable-obsolete --enable-secureplt --enable-shared --enable-targets=all --enable-threads=posix --host=x86_64-pc-linux-gnu --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.1/include --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.1/man --with-cloog --with-multilib-list=m32,m64 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.8.1/python - i7-3770K core-avx-i: --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.1 --build=x86_64-pc-linux-gnu --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.1 --disable-altivec --disable-fixed-point --disable-isl-version-check --disable-libgcj --disable-libssp --disable-lto --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-languages=c,c++,fortran --enable-libgomp --enable-libmudflap --enable-libstdcxx-time --enable-multilib --enable-nls --enable-obsolete --enable-secureplt --enable-shared --enable-targets=all --enable-threads=posix --host=x86_64-pc-linux-gnu --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.1/include --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.1/man --with-cloog --with-multilib-list=m32,m64 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.8.1/python Processor Details- nocona: Scaling Governor: acpi- freq ondemand- core2: Scaling Governor: acpi- freq ondemand- corei7: Scaling Governor: acpi- freq ondemand- corei7-avx: Scaling Governor: acpi- freq ondemand- core-avx-i: Scaling Governor: acpi- freq ondemand- core-avx2: Scaling Governor: acpi- freq ondemand- test: Scaling Governor: intel_pstate powersave- i7-3770K core-avx-i: Scaling Governor: intel_pstate powersave

Intel Haswell GCC 4.8 core-avx2 Tuningttsiod-renderer: Phong Rendering With Soft-Shadow Mappingx264: H.264 Video Encodinggraphics-magick: Blurgraphics-magick: Sharpengraphics-magick: Resizinggraphics-magick: Local Adaptive Thresholdingbotan: Tigerbotan: AES-256botan: CAST-256scimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Dense LU Matrix Factorizationhimeno: Poisson Pressure Solverapache: Static Web Page Servinghmmer: Pfam Database Searchbuild-imagemagick: Time To Compilebuild-linux-kernel: Time To Compilec-ray: Total Timesmallpt: Global Illumination Renderer; 100 Samplesffmpeg: H.264 HD To NTSC DVnoconacore2corei7corei7-avxcore-avx-icore-avx2testi7-3770K core-avx-i122.02156.8011583157118438.78157.9795.48615.33245.071825.731517.0324888.1110.1676.9897.8923.072612.94121.58156.7411784160120438.87158.3595.80616.21250.931859.971564.2225606.1710.1479.0397.6322.952613.16123.14156.0611684160120427.31157.9695.54616.65249.111863.191560.1825490.1410.2279.6497.7722.952612.93117.71155.6312296166119442.47158.1995.77616.65251.861851.101404.9225580.4410.6280.9198.1022.842612.86116.54156.0812296167120440.37158.3195.79615.76247.351824.281630.1225549.8410.4581.0697.8522.832613.00119.78155.18138136182121424.56158.4395.76596.16226.571817.031282.3025644.1010.5580.6697.2517.022413.01148.75158.1913283161123553.48339.882386.291686.6523897.3210.1359.5189.9427.788711.86148.59157.8513895167116553.48346.412378.311677.6723771.729.8764.0889.9028.182511.89OpenBenchmarking.org

TTSIOD 3D Renderer

Phong Rendering With Soft-Shadow Mapping

OpenBenchmarking.orgFPS, More Is BetterTTSIOD 3D Renderer 2.2zPhong Rendering With Soft-Shadow Mappingtesti7-3770K core-avx-icorei7noconacore2core-avx2corei7-avxcore-avx-i306090120150SE +/- 0.27, N = 3SE +/- 0.81, N = 3SE +/- 0.45, N = 3SE +/- 0.39, N = 3SE +/- 0.76, N = 3SE +/- 0.09, N = 3SE +/- 0.66, N = 3SE +/- 0.36, N = 3148.75148.59123.14122.02121.58119.78117.71116.54-lpthread-march=core-avx-i -lpthread-march=corei7 -flto-march=nocona -flto-march=core2 -flto-march=core-avx2 -flto-march=corei7-avx -flto-march=core-avx-i -flto1. (CXX) g++ options: -O3 -fomit-frame-pointer -ffast-math -mtune=native -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -lstdc++

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2013-06-08H.264 Video Encodingtesti7-3770K core-avx-inoconacore2core-avx-icorei7corei7-avxcore-avx2306090120150SE +/- 0.52, N = 5SE +/- 0.20, N = 5SE +/- 0.30, N = 5SE +/- 0.55, N = 5SE +/- 0.50, N = 5SE +/- 0.51, N = 5SE +/- 0.20, N = 5SE +/- 0.90, N = 5158.19157.85156.80156.74156.08156.06155.63155.18-lavformat -lavcodec -lavutil -lswscale-lavformat -lavcodec -lavutil -lswscale -march=core-avx-i-march=nocona-march=core2-march=core-avx-i-march=corei7-march=corei7-avx-march=core-avx21. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize

GraphicsMagick

Operation: Blur

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Bluri7-3770K core-avx-icore-avx2testcore-avx-icorei7-avxcore2corei7nocona306090120150SE +/- 0.00, N = 3SE +/- 0.88, N = 3SE +/- 0.00, N = 3SE +/- 0.67, N = 3SE +/- 1.00, N = 3SE +/- 0.00, N = 3SE +/- 1.00, N = 3SE +/- 0.00, N = 3138138132122122117116115-march=core-avx-i -O3 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O3 -march=core-avx2 -ljbig-O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O3 -march=core-avx-i -ljbig-O3 -march=corei7-avx -ljbig-O3 -march=core2 -ljbig-O3 -march=corei7 -ljbig-O3 -march=nocona -ljbig1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

GraphicsMagick

Operation: Sharpen

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Sharpencore-avx2core-avx-icorei7-avxi7-3770K core-avx-icorei7core2testnocona306090120150SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 313696969584848383-O3 -march=core-avx2 -ljbig-O3 -march=core-avx-i -ljbig-O3 -march=corei7-avx -ljbig-march=core-avx-i -O3 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O3 -march=corei7 -ljbig-O3 -march=core2 -ljbig-O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O3 -march=nocona -ljbig1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

GraphicsMagick

Operation: Resizing

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Resizingcore-avx2i7-3770K core-avx-icore-avx-icorei7-avxtestcorei7core2nocona4080120160200SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3182167167166161160160157-O3 -march=core-avx2 -ljbig-march=core-avx-i -O3 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O3 -march=core-avx-i -ljbig-O3 -march=corei7-avx -ljbig-O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O3 -march=corei7 -ljbig-O3 -march=core2 -ljbig-O3 -march=nocona -ljbig1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

GraphicsMagick

Operation: Local Adaptive Thresholding

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Local Adaptive Thresholdingtestcore-avx2core-avx-icorei7core2corei7-avxnoconai7-3770K core-avx-i306090120150SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3123121120120120119118116-O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt-O3 -march=core-avx2 -ljbig-O3 -march=core-avx-i -ljbig-O3 -march=corei7 -ljbig-O3 -march=core2 -ljbig-O3 -march=corei7-avx -ljbig-O3 -march=nocona -ljbig-march=core-avx-i -O3 -llcms2 -ltiff -lfreetype -lxml2 -lrt1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

Botan

Test: Tiger

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: Tigercorei7-avxcore-avx-icore2noconacorei7core-avx2100200300400500442.47440.37438.87438.78427.31424.561. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2

Botan

Test: AES-256

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: AES-256core-avx2core2core-avx-icorei7-avxnoconacorei74080120160200158.43158.35158.31158.19157.97157.961. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2

Botan

Test: CAST-256

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: CAST-256core2core-avx-icorei7-avxcore-avx2corei7nocona2040608010095.8095.7995.7795.7695.5495.481. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte Carlocorei7-avxcorei7core2core-avx-inoconacore-avx2i7-3770K core-avx-itest130260390520650SE +/- 0.44, N = 4SE +/- 0.44, N = 4SE +/- 0.51, N = 4SE +/- 0.44, N = 4SE +/- 0.72, N = 4SE +/- 20.17, N = 8SE +/- 0.00, N = 4SE +/- 0.00, N = 4616.65616.65616.21615.76615.33596.16553.48553.48

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier Transformi7-3770K core-avx-itestcorei7-avxcore2corei7core-avx-inoconacore-avx280160240320400SE +/- 0.00, N = 4SE +/- 0.34, N = 4SE +/- 1.22, N = 4SE +/- 0.67, N = 4SE +/- 0.86, N = 4SE +/- 2.13, N = 4SE +/- 2.50, N = 4SE +/- 2.02, N = 4346.41339.88251.86250.93249.11247.35245.07226.57

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix Factorizationtesti7-3770K core-avx-icorei7core2corei7-avxnoconacore-avx-icore-avx25001000150020002500SE +/- 3.08, N = 4SE +/- 2.64, N = 4SE +/- 3.12, N = 4SE +/- 5.53, N = 4SE +/- 22.67, N = 4SE +/- 21.90, N = 4SE +/- 23.35, N = 4SE +/- 28.95, N = 42386.292378.311863.191859.971851.101825.731824.281817.03

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solvertesti7-3770K core-avx-icore-avx-icore2corei7noconacorei7-avxcore-avx2400800120016002000SE +/- 0.98, N = 3SE +/- 1.04, N = 3SE +/- 1.05, N = 3SE +/- 3.07, N = 3SE +/- 0.75, N = 3SE +/- 1.20, N = 3SE +/- 105.46, N = 6SE +/- 19.87, N = 61686.651677.671630.121564.221560.181517.031404.921282.30-march=core-avx-i-march=core-avx-i-march=core2-march=corei7-march=nocona-march=corei7-avx-march=core-avx21. (CC) gcc options: -O3

Apache Benchmark

Static Web Page Serving

OpenBenchmarking.orgRequests Per Second, More Is BetterApache Benchmark 2.4.3Static Web Page Servingcore-avx2core2corei7-avxcore-avx-icorei7noconatesti7-3770K core-avx-i5K10K15K20K25KSE +/- 170.85, N = 3SE +/- 229.80, N = 3SE +/- 178.37, N = 3SE +/- 126.25, N = 3SE +/- 193.34, N = 3SE +/- 107.43, N = 3SE +/- 94.17, N = 3SE +/- 100.68, N = 325644.1025606.1725580.4425549.8425490.1424888.1123897.3223771.72-O3 -march=core-avx2-O3 -march=core2-O3 -march=corei7-avx-O3 -march=core-avx-i-O3 -march=corei7-O3 -march=nocona-O2-march=core-avx-i -O31. (CC) gcc options: -shared -fPIC -pthread

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database Searchi7-3770K core-avx-itestcore2noconacorei7core-avx-icore-avx2corei7-avx3691215SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.06, N = 3SE +/- 0.04, N = 3SE +/- 0.17, N = 3SE +/- 0.02, N = 39.8710.1310.1410.1610.2210.4510.5510.62-march=core-avx-i -O3-O2-O3 -march=core2-O3 -march=nocona-O3 -march=corei7-O3 -march=core-avx-i-O3 -march=core-avx2-O3 -march=corei7-avx1. (CC) gcc options: -pthread -lhmmer -lsquid -lm

Timed ImageMagick Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed ImageMagick Compilation 6.8.1-10Time To Compiletesti7-3770K core-avx-inoconacore2corei7core-avx2corei7-avxcore-avx-i20406080100SE +/- 0.08, N = 3SE +/- 0.23, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.18, N = 3SE +/- 0.32, N = 3SE +/- 0.10, N = 3SE +/- 0.31, N = 359.5164.0876.9879.0379.6480.6680.9181.06

Timed Linux Kernel Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 3.1Time To Compilei7-3770K core-avx-itestcore-avx2core2corei7core-avx-inoconacorei7-avx20406080100SE +/- 0.59, N = 3SE +/- 0.79, N = 3SE +/- 0.60, N = 3SE +/- 0.54, N = 3SE +/- 0.69, N = 3SE +/- 0.76, N = 3SE +/- 0.59, N = 3SE +/- 0.54, N = 389.9089.9497.2597.6397.7797.8597.8998.10

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Timecore-avx2core-avx-icorei7-avxcore2corei7noconatesti7-3770K core-avx-i714212835SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 317.0222.8322.8422.9522.9523.0727.7828.18-march=core-avx2-march=core-avx-i-march=corei7-avx-march=core2-march=corei7-march=nocona-march=core-avx-i1. (CC) gcc options: -lm -lpthread -O3

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 Samplescore-avx2i7-3770K core-avx-inoconacore2corei7corei7-avxcore-avx-itest20406080100SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 32425262626262687-O3 -march=core-avx2-march=core-avx-i -O3-O3 -march=nocona-O3 -march=core2-O3 -march=corei7-O3 -march=corei7-avx-O3 -march=core-avx-i1. (CXX) g++ options: -fopenmp

FFmpeg

H.264 HD To NTSC DV

OpenBenchmarking.orgSeconds, Fewer Is BetterFFmpeg 1.1H.264 HD To NTSC DVtesti7-3770K core-avx-icorei7-avxcorei7noconacore-avx-icore-avx2core23691215SE +/- 0.06, N = 3SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 3SE +/- 0.05, N = 3SE +/- 0.12, N = 3SE +/- 0.09, N = 311.8611.8912.8612.9312.9413.0013.0113.16-lva -lpthread -lrt-lva -lpthread -lrt -march=core-avx-i-march=corei7-avx-march=corei7-march=nocona-march=core-avx-i-march=core-avx2-march=core21. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT


Phoronix Test Suite v10.8.4