Intel Haswell GCC 4.8 core-avx2 Tuning Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article. nocona: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7-avx: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx-i: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better nocona ..... 10.16 |======================================================== core2 ...... 10.14 |======================================================== corei7 ..... 10.22 |========================================================= corei7-avx . 10.62 |=========================================================== core-avx-i . 10.45 |========================================================== core-avx2 .. 10.55 |=========================================================== Botan 1.10.3 Test: Tiger Mbytes/s > Higher Is Better nocona ..... 438.78 |========================================================== core2 ...... 438.87 |========================================================== corei7 ..... 427.31 |======================================================== corei7-avx . 442.47 |========================================================== core-avx-i . 440.37 |========================================================== core-avx2 .. 424.56 |======================================================== Botan 1.10.3 Test: AES-256 Mbytes/s > Higher Is Better nocona ..... 157.97 |========================================================== core2 ...... 158.35 |========================================================== corei7 ..... 157.96 |========================================================== corei7-avx . 158.19 |========================================================== core-avx-i . 158.31 |========================================================== core-avx2 .. 158.43 |========================================================== Botan 1.10.3 Test: CAST-256 Mbytes/s > Higher Is Better nocona ..... 95.48 |=========================================================== core2 ...... 95.80 |=========================================================== corei7 ..... 95.54 |=========================================================== corei7-avx . 95.77 |=========================================================== core-avx-i . 95.79 |=========================================================== core-avx2 .. 95.76 |=========================================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better nocona ..... 615.33 |========================================================== core2 ...... 616.21 |========================================================== corei7 ..... 616.65 |========================================================== corei7-avx . 616.65 |========================================================== core-avx-i . 615.76 |========================================================== core-avx2 .. 596.16 |======================================================== SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better nocona ..... 245.07 |======================================================== core2 ...... 250.93 |========================================================== corei7 ..... 249.11 |========================================================= corei7-avx . 251.86 |========================================================== core-avx-i . 247.35 |========================================================= core-avx2 .. 226.57 |==================================================== SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better nocona ..... 1825.73 |======================================================== core2 ...... 1859.97 |========================================================= corei7 ..... 1863.19 |========================================================= corei7-avx . 1851.10 |========================================================= core-avx-i . 1824.28 |======================================================== core-avx2 .. 1817.03 |======================================================== TTSIOD 3D Renderer 2.2z Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better nocona ..... 122.02 |========================================================= core2 ...... 121.58 |========================================================= corei7 ..... 123.14 |========================================================== corei7-avx . 117.71 |======================================================= core-avx-i . 116.54 |======================================================= core-avx2 .. 119.78 |======================================================== x264 2013-06-08 H.264 Video Encoding Frames Per Second > Higher Is Better nocona ..... 156.80 |========================================================== core2 ...... 156.74 |========================================================== corei7 ..... 156.06 |========================================================== corei7-avx . 155.63 |========================================================== core-avx-i . 156.08 |========================================================== core-avx2 .. 155.18 |========================================================= GraphicsMagick 1.3.16 Operation: Blur Iterations Per Minute > Higher Is Better nocona ..... 115 |=================================================== core2 ...... 117 |==================================================== corei7 ..... 116 |=================================================== corei7-avx . 122 |====================================================== core-avx-i . 122 |====================================================== core-avx2 .. 138 |============================================================= GraphicsMagick 1.3.16 Operation: Sharpen Iterations Per Minute > Higher Is Better nocona ..... 83 |===================================== core2 ...... 84 |====================================== corei7 ..... 84 |====================================== corei7-avx . 96 |=========================================== core-avx-i . 96 |=========================================== core-avx2 .. 136 |============================================================= GraphicsMagick 1.3.16 Operation: Resizing Iterations Per Minute > Higher Is Better nocona ..... 157 |===================================================== core2 ...... 160 |====================================================== corei7 ..... 160 |====================================================== corei7-avx . 166 |======================================================== core-avx-i . 167 |======================================================== core-avx2 .. 182 |============================================================= GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better nocona ..... 118 |=========================================================== core2 ...... 120 |============================================================ corei7 ..... 120 |============================================================ corei7-avx . 119 |============================================================ core-avx-i . 120 |============================================================ core-avx2 .. 121 |============================================================= Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better nocona ..... 1517.03 |===================================================== core2 ...... 1564.22 |======================================================= corei7 ..... 1560.18 |======================================================= corei7-avx . 1404.92 |================================================= core-avx-i . 1630.12 |========================================================= core-avx2 .. 1282.30 |============================================= Timed ImageMagick Compilation 6.8.1-10 Time To Compile Seconds < Lower Is Better nocona ..... 76.98 |======================================================== core2 ...... 79.03 |========================================================== corei7 ..... 79.64 |========================================================== corei7-avx . 80.91 |=========================================================== core-avx-i . 81.06 |=========================================================== core-avx2 .. 80.66 |=========================================================== Timed Linux Kernel Compilation 3.1 Time To Compile Seconds < Lower Is Better nocona ..... 97.89 |=========================================================== core2 ...... 97.63 |=========================================================== corei7 ..... 97.77 |=========================================================== corei7-avx . 98.10 |=========================================================== core-avx-i . 97.85 |=========================================================== core-avx2 .. 97.25 |========================================================== C-Ray 1.1 Total Time Seconds < Lower Is Better nocona ..... 23.07 |=========================================================== core2 ...... 22.95 |=========================================================== corei7 ..... 22.95 |=========================================================== corei7-avx . 22.84 |========================================================== core-avx-i . 22.83 |========================================================== core-avx2 .. 17.02 |============================================ Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better nocona ..... 26 |============================================================== core2 ...... 26 |============================================================== corei7 ..... 26 |============================================================== corei7-avx . 26 |============================================================== core-avx-i . 26 |============================================================== core-avx2 .. 24 |========================================================= FFmpeg 1.1 H.264 HD To NTSC DV Seconds < Lower Is Better nocona ..... 12.94 |========================================================== core2 ...... 13.16 |=========================================================== corei7 ..... 12.93 |========================================================== corei7-avx . 12.86 |========================================================== core-avx-i . 13.00 |========================================================== core-avx2 .. 13.01 |========================================================== Apache Benchmark 2.4.3 Static Web Page Serving Requests Per Second > Higher Is Better nocona ..... 24888.11 |====================================================== core2 ...... 25606.17 |======================================================== corei7 ..... 25490.14 |======================================================== corei7-avx . 25580.44 |======================================================== core-avx-i . 25549.84 |======================================================== core-avx2 .. 25644.10 |========================================================