Intel Haswell GCC 4.8 core-avx2 Tuning

Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article.

HTML result view exported from: https://openbenchmarking.org/result/1306150-PTS-INTELHAS05&grs&sro.

Intel Haswell GCC 4.8 core-avx2 TuningProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen Resolutionnoconacore2corei7corei7-avxcore-avx-icore-avx2Intel Core i7-4770K @ 3.50GHz (8 Cores)Intel DH87RLIntel Haswell DRAM15360MB240GB OCZ VERTEX3Intel Haswell IGPIntel Haswell HDMIVA2431Intel Connection I217-VUbuntu 13.043.10.0-999-generic (x86_64)Unity 7.0.0X Server 1.13.3intel 2.21.93.0 Mesa 9.2.0-devel (git-a2e3b1c)GCC 4.8.1 + LLVM 3.2ext41920x1080OpenBenchmarking.orgCompiler Details- --enable-checking=release --enable-languages=c,c++,fortranProcessor Details- Scaling Governor: acpi- freq ondemand

Intel Haswell GCC 4.8 core-avx2 Tuninggraphics-magick: Sharpenc-ray: Total Timehimeno: Poisson Pressure Solvergraphics-magick: Blurgraphics-magick: Resizingscimark2: Fast Fourier Transformsmallpt: Global Illumination Renderer; 100 Samplesttsiod-renderer: Phong Rendering With Soft-Shadow Mappingbuild-imagemagick: Time To Compilehmmer: Pfam Database Searchbotan: Tigerscimark2: Monte Carloapache: Static Web Page Servinggraphics-magick: Local Adaptive Thresholdingscimark2: Dense LU Matrix Factorizationffmpeg: H.264 HD To NTSC DVx264: H.264 Video Encodingbuild-linux-kernel: Time To Compilebotan: CAST-256botan: AES-256noconacore2corei7corei7-avxcore-avx-icore-avx28323.071517.03115157245.0726122.0276.9810.16438.78615.3324888.111181825.7312.94156.8097.8995.48157.978422.951564.22117160250.9326121.5879.0310.14438.87616.2125606.171201859.9713.16156.7497.6395.80158.358422.951560.18116160249.1126123.1479.6410.22427.31616.6525490.141201863.1912.93156.0697.7795.54157.969622.841404.92122166251.8626117.7180.9110.62442.47616.6525580.441191851.1012.86155.6398.1095.77158.199622.831630.12122167247.3526116.5481.0610.45440.37615.7625549.841201824.2813.00156.0897.8595.79158.3113617.021282.30138182226.5724119.7880.6610.55424.56596.1625644.101211817.0313.01155.1897.2595.76158.43OpenBenchmarking.org

GraphicsMagick

Operation: Sharpen

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Sharpencore-avx-icore-avx2core2corei7corei7-avxnocona306090120150SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 39613684849683-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Timecore-avx-icore-avx2core2corei7corei7-avxnocona612182430SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 322.8317.0222.9522.9522.8423.07-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -lm -lpthread -O3

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solvercore-avx-icore-avx2core2corei7corei7-avxnocona400800120016002000SE +/- 1.05, N = 3SE +/- 19.87, N = 6SE +/- 3.07, N = 3SE +/- 0.75, N = 3SE +/- 105.46, N = 6SE +/- 1.20, N = 31630.121282.301564.221560.181404.921517.03-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -O3

GraphicsMagick

Operation: Blur

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Blurcore-avx-icore-avx2core2corei7corei7-avxnocona306090120150SE +/- 0.67, N = 3SE +/- 0.88, N = 3SE +/- 0.00, N = 3SE +/- 1.00, N = 3SE +/- 1.00, N = 3SE +/- 0.00, N = 3122138117116122115-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

GraphicsMagick

Operation: Resizing

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Resizingcore-avx-icore-avx2core2corei7corei7-avxnocona4080120160200SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3167182160160166157-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier Transformcore-avx-icore-avx2core2corei7corei7-avxnocona60120180240300SE +/- 2.13, N = 4SE +/- 2.02, N = 4SE +/- 0.67, N = 4SE +/- 0.86, N = 4SE +/- 1.22, N = 4SE +/- 2.50, N = 4247.35226.57250.93249.11251.86245.07

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 Samplescore-avx-icore-avx2core2corei7corei7-avxnocona612182430SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3262426262626-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CXX) g++ options: -fopenmp -O3

TTSIOD 3D Renderer

Phong Rendering With Soft-Shadow Mapping

OpenBenchmarking.orgFPS, More Is BetterTTSIOD 3D Renderer 2.2zPhong Rendering With Soft-Shadow Mappingcore-avx-icore-avx2core2corei7corei7-avxnocona306090120150SE +/- 0.36, N = 3SE +/- 0.09, N = 3SE +/- 0.76, N = 3SE +/- 0.45, N = 3SE +/- 0.66, N = 3SE +/- 0.39, N = 3116.54119.78121.58123.14117.71122.02-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CXX) g++ options: -O3 -fomit-frame-pointer -ffast-math -mtune=native -flto -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -lstdc++

Timed ImageMagick Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed ImageMagick Compilation 6.8.1-10Time To Compilecore-avx-icore-avx2core2corei7corei7-avxnocona20406080100SE +/- 0.31, N = 3SE +/- 0.32, N = 3SE +/- 0.06, N = 3SE +/- 0.18, N = 3SE +/- 0.10, N = 3SE +/- 0.04, N = 381.0680.6679.0379.6480.9176.98

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database Searchcore-avx-icore-avx2core2corei7corei7-avxnocona3691215SE +/- 0.04, N = 3SE +/- 0.17, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 310.4510.5510.1410.2210.6210.16-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -O3 -pthread -lhmmer -lsquid -lm

Botan

Test: Tiger

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: Tigercore-avx-icore-avx2core2corei7corei7-avxnocona100200300400500440.37424.56438.87427.31442.47438.781. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte Carlocore-avx-icore-avx2core2corei7corei7-avxnocona130260390520650SE +/- 0.44, N = 4SE +/- 20.17, N = 8SE +/- 0.51, N = 4SE +/- 0.44, N = 4SE +/- 0.44, N = 4SE +/- 0.72, N = 4615.76596.16616.21616.65616.65615.33

Apache Benchmark

Static Web Page Serving

OpenBenchmarking.orgRequests Per Second, More Is BetterApache Benchmark 2.4.3Static Web Page Servingcore-avx-icore-avx2core2corei7corei7-avxnocona5K10K15K20K25KSE +/- 126.25, N = 3SE +/- 170.85, N = 3SE +/- 229.80, N = 3SE +/- 193.34, N = 3SE +/- 178.37, N = 3SE +/- 107.43, N = 325549.8425644.1025606.1725490.1425580.4424888.11-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -shared -fPIC -pthread -O3

GraphicsMagick

Operation: Local Adaptive Thresholding

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.16Operation: Local Adaptive Thresholdingcore-avx-icore-avx2core2corei7corei7-avxnocona306090120150SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3120121120120119118-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix Factorizationcore-avx-icore-avx2core2corei7corei7-avxnocona400800120016002000SE +/- 23.35, N = 4SE +/- 28.95, N = 4SE +/- 5.53, N = 4SE +/- 3.12, N = 4SE +/- 22.67, N = 4SE +/- 21.90, N = 41824.281817.031859.971863.191851.101825.73

FFmpeg

H.264 HD To NTSC DV

OpenBenchmarking.orgSeconds, Fewer Is BetterFFmpeg 1.1H.264 HD To NTSC DVcore-avx-icore-avx2core2corei7corei7-avxnocona3691215SE +/- 0.05, N = 3SE +/- 0.12, N = 3SE +/- 0.09, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.07, N = 313.0013.0113.1612.9312.8612.94-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -O3 -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2013-06-08H.264 Video Encodingcore-avx-icore-avx2core2corei7corei7-avxnocona306090120150SE +/- 0.50, N = 5SE +/- 0.90, N = 5SE +/- 0.55, N = 5SE +/- 0.51, N = 5SE +/- 0.20, N = 5SE +/- 0.30, N = 5156.08155.18156.74156.06155.63156.80-march=core-avx-i-march=core-avx2-march=core2-march=corei7-march=corei7-avx-march=nocona1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize

Timed Linux Kernel Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 3.1Time To Compilecore-avx-icore-avx2core2corei7corei7-avxnocona20406080100SE +/- 0.76, N = 3SE +/- 0.60, N = 3SE +/- 0.54, N = 3SE +/- 0.69, N = 3SE +/- 0.54, N = 3SE +/- 0.59, N = 397.8597.2597.6397.7798.1097.89

Botan

Test: CAST-256

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: CAST-256core-avx-icore-avx2core2corei7corei7-avxnocona2040608010095.7995.7695.8095.5495.7795.481. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2

Botan

Test: AES-256

OpenBenchmarking.orgMbytes/s, More Is BetterBotan 1.10.3Test: AES-256core-avx-icore-avx2core2corei7corei7-avxnocona4080120160200158.31158.43158.35157.96158.19157.971. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2


Phoronix Test Suite v10.8.5