Intel Haswell GCC 4.8 core-avx2 Tuning Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article. nocona: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7-avx: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx-i: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 test: Processor: Intel Core i7-3770K @ 3.90GHz (8 Cores), Motherboard: ASRock Z77 Pro4-M, Memory: 16384MB, Disk: 256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080, Graphics: Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz), Monitor: LCD3090WQXi OS: Gentoo Base 2.2, Kernel: 3.11.0-drmfixes20130912-core-avx-i (x86_64), Desktop: KDE, Display Server: X Server 1.14.2.902 (1.14.3 RC 2), Display Driver: radeon 7.2.99, OpenGL: 3.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4, Compiler: GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn, File-System: ext4, Screen Resolution: 2560x1600 i7-3770K core-avx-i: Processor: Intel Core i7-3770K @ 3.90GHz (8 Cores), Motherboard: ASRock Z77 Pro4-M, Memory: 16384MB, Disk: 256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080, Graphics: Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz), Monitor: LCD3090WQXi OS: Gentoo Base 2.2, Kernel: 3.11.0-drmfixes20130912-core-avx-i (x86_64), Desktop: KDE, Display Server: X Server 1.14.2.902 (1.14.3 RC 2), Display Driver: radeon 7.2.99, OpenGL: 3.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4, Compiler: GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn, File-System: ext4, Screen Resolution: 2560x1600 Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better nocona .............. 10.16 |================================================ core2 ............... 10.14 |================================================ corei7 .............. 10.22 |================================================ corei7-avx .......... 10.62 |================================================== core-avx-i .......... 10.45 |================================================= core-avx2 ........... 10.55 |================================================== test ................ 10.13 |================================================ i7-3770K core-avx-i . 9.87 |============================================== Botan 1.10.3 Test: Tiger Mbytes/s > Higher Is Better nocona ..... 438.78 |========================================================== core2 ...... 438.87 |========================================================== corei7 ..... 427.31 |======================================================== corei7-avx . 442.47 |========================================================== core-avx-i . 440.37 |========================================================== core-avx2 .. 424.56 |======================================================== Botan 1.10.3 Test: AES-256 Mbytes/s > Higher Is Better nocona ..... 157.97 |========================================================== core2 ...... 158.35 |========================================================== corei7 ..... 157.96 |========================================================== corei7-avx . 158.19 |========================================================== core-avx-i . 158.31 |========================================================== core-avx2 .. 158.43 |========================================================== Botan 1.10.3 Test: CAST-256 Mbytes/s > Higher Is Better nocona ..... 95.48 |=========================================================== core2 ...... 95.80 |=========================================================== corei7 ..... 95.54 |=========================================================== corei7-avx . 95.77 |=========================================================== core-avx-i . 95.79 |=========================================================== core-avx2 .. 95.76 |=========================================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better nocona .............. 615.33 |================================================= core2 ............... 616.21 |================================================= corei7 .............. 616.65 |================================================= corei7-avx .......... 616.65 |================================================= core-avx-i .......... 615.76 |================================================= core-avx2 ........... 596.16 |=============================================== test ................ 553.48 |============================================ i7-3770K core-avx-i . 553.48 |============================================ SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better nocona .............. 245.07 |=================================== core2 ............... 250.93 |=================================== corei7 .............. 249.11 |=================================== corei7-avx .......... 251.86 |==================================== core-avx-i .......... 247.35 |=================================== core-avx2 ........... 226.57 |================================ test ................ 339.88 |================================================ i7-3770K core-avx-i . 346.41 |================================================= SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better nocona .............. 1825.73 |===================================== core2 ............... 1859.97 |===================================== corei7 .............. 1863.19 |===================================== corei7-avx .......... 1851.10 |===================================== core-avx-i .......... 1824.28 |===================================== core-avx2 ........... 1817.03 |===================================== test ................ 2386.29 |================================================ i7-3770K core-avx-i . 2378.31 |================================================ TTSIOD 3D Renderer 2.2z Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better nocona .............. 122.02 |======================================== core2 ............... 121.58 |======================================== corei7 .............. 123.14 |========================================= corei7-avx .......... 117.71 |======================================= core-avx-i .......... 116.54 |====================================== core-avx2 ........... 119.78 |======================================= test ................ 148.75 |================================================= i7-3770K core-avx-i . 148.59 |================================================= x264 2013-06-08 H.264 Video Encoding Frames Per Second > Higher Is Better nocona .............. 156.80 |================================================= core2 ............... 156.74 |================================================= corei7 .............. 156.06 |================================================ corei7-avx .......... 155.63 |================================================ core-avx-i .......... 156.08 |================================================ core-avx2 ........... 155.18 |================================================ test ................ 158.19 |================================================= i7-3770K core-avx-i . 157.85 |================================================= GraphicsMagick 1.3.16 Operation: Blur Iterations Per Minute > Higher Is Better nocona .............. 115 |=========================================== core2 ............... 117 |============================================ corei7 .............. 116 |============================================ corei7-avx .......... 122 |============================================== core-avx-i .......... 122 |============================================== core-avx2 ........... 138 |==================================================== test ................ 132 |================================================== i7-3770K core-avx-i . 138 |==================================================== GraphicsMagick 1.3.16 Operation: Sharpen Iterations Per Minute > Higher Is Better nocona .............. 83 |================================ core2 ............... 84 |================================ corei7 .............. 84 |================================ corei7-avx .......... 96 |===================================== core-avx-i .......... 96 |===================================== core-avx2 ........... 136 |==================================================== test ................ 83 |================================ i7-3770K core-avx-i . 95 |==================================== GraphicsMagick 1.3.16 Operation: Resizing Iterations Per Minute > Higher Is Better nocona .............. 157 |============================================= core2 ............... 160 |============================================== corei7 .............. 160 |============================================== corei7-avx .......... 166 |=============================================== core-avx-i .......... 167 |================================================ core-avx2 ........... 182 |==================================================== test ................ 161 |============================================== i7-3770K core-avx-i . 167 |================================================ GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better nocona .............. 118 |================================================== core2 ............... 120 |=================================================== corei7 .............. 120 |=================================================== corei7-avx .......... 119 |================================================== core-avx-i .......... 120 |=================================================== core-avx2 ........... 121 |=================================================== test ................ 123 |==================================================== i7-3770K core-avx-i . 116 |================================================= Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better nocona .............. 1517.03 |=========================================== core2 ............... 1564.22 |============================================= corei7 .............. 1560.18 |============================================ corei7-avx .......... 1404.92 |======================================== core-avx-i .......... 1630.12 |============================================== core-avx2 ........... 1282.30 |==================================== test ................ 1686.65 |================================================ i7-3770K core-avx-i . 1677.67 |================================================ Timed ImageMagick Compilation 6.8.1-10 Time To Compile Seconds < Lower Is Better nocona .............. 76.98 |=============================================== core2 ............... 79.03 |================================================= corei7 .............. 79.64 |================================================= corei7-avx .......... 80.91 |================================================== core-avx-i .......... 81.06 |================================================== core-avx2 ........... 80.66 |================================================== test ................ 59.51 |===================================== i7-3770K core-avx-i . 64.08 |======================================== Timed Linux Kernel Compilation 3.1 Time To Compile Seconds < Lower Is Better nocona .............. 97.89 |================================================== core2 ............... 97.63 |================================================== corei7 .............. 97.77 |================================================== corei7-avx .......... 98.10 |================================================== core-avx-i .......... 97.85 |================================================== core-avx2 ........... 97.25 |================================================== test ................ 89.94 |============================================== i7-3770K core-avx-i . 89.90 |============================================== C-Ray 1.1 Total Time Seconds < Lower Is Better nocona .............. 23.07 |========================================= core2 ............... 22.95 |========================================= corei7 .............. 22.95 |========================================= corei7-avx .......... 22.84 |========================================= core-avx-i .......... 22.83 |========================================= core-avx2 ........... 17.02 |============================== test ................ 27.78 |================================================= i7-3770K core-avx-i . 28.18 |================================================== Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better nocona .............. 26 |================ core2 ............... 26 |================ corei7 .............. 26 |================ corei7-avx .......... 26 |================ core-avx-i .......... 26 |================ core-avx2 ........... 24 |=============== test ................ 87 |===================================================== i7-3770K core-avx-i . 25 |=============== FFmpeg 1.1 H.264 HD To NTSC DV Seconds < Lower Is Better nocona .............. 12.94 |================================================= core2 ............... 13.16 |================================================== corei7 .............. 12.93 |================================================= corei7-avx .......... 12.86 |================================================= core-avx-i .......... 13.00 |================================================= core-avx2 ........... 13.01 |================================================= test ................ 11.86 |============================================= i7-3770K core-avx-i . 11.89 |============================================= Apache Benchmark 2.4.3 Static Web Page Serving Requests Per Second > Higher Is Better nocona .............. 24888.11 |============================================== core2 ............... 25606.17 |=============================================== corei7 .............. 25490.14 |=============================================== corei7-avx .......... 25580.44 |=============================================== core-avx-i .......... 25549.84 |=============================================== core-avx2 ........... 25644.10 |=============================================== test ................ 23897.32 |============================================ i7-3770K core-avx-i . 23771.72 |============================================