Intel Haswell GCC 4.8 core-avx2 Tuning Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article. nocona: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7-avx: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx-i: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 test: Processor: Intel Core i7-3770K @ 3.90GHz (8 Cores), Motherboard: ASRock Z77 Pro4-M, Memory: 16384MB, Disk: 256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080, Graphics: Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz), Monitor: LCD3090WQXi OS: Gentoo Base 2.2, Kernel: 3.11.0-drmfixes20130912-core-avx-i (x86_64), Desktop: KDE, Display Server: X Server 1.14.2.902 (1.14.3 RC 2), Display Driver: radeon 7.2.99, OpenGL: 3.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4, Compiler: GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn, File-System: ext4, Screen Resolution: 2560x1600 i7-3770K core-avx-i: Processor: Intel Core i7-3770K @ 3.90GHz (8 Cores), Motherboard: ASRock Z77 Pro4-M, Memory: 16384MB, Disk: 256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080, Graphics: Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz), Monitor: LCD3090WQXi OS: Gentoo Base 2.2, Kernel: 3.11.0-drmfixes20130912-core-avx-i (x86_64), Desktop: KDE, Display Server: X Server 1.14.2.902 (1.14.3 RC 2), Display Driver: radeon 7.2.99, OpenGL: 3.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4, Compiler: GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn, File-System: ext4, Screen Resolution: 2560x1600 Q9300@3.33GHz: Processor: Intel Core 2 Quad Q9300 @ 3.33GHz (4 Cores), Motherboard: ASUS P5K3 Deluxe, Chipset: Intel 82G33/G31/P35/P31 + ICH9R, Memory: 8192MB, Disk: 1000GB Seagate ST31000340AS, Graphics: LLVMpipe, Audio: Analog Devices AD1988B, Monitor: SyncMaster, Network: Marvell 88E8056 PCI-E Gigabit OS: Slackware 14.0, Kernel: 3.2.45 (x86_64), Display Server: X Server 1.12.4, Display Driver: nouveau 0.0.16, OpenGL: 2.1 Mesa 8.0.4 Gallium 0.4, Compiler: GCC 4.7.1 + Clang 3.0 + LLVM 3.0, File-System: ext4, Screen Resolution: 1680x1050 Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better nocona .............. 10.16 |=========================== core2 ............... 10.14 |=========================== corei7 .............. 10.22 |=========================== corei7-avx .......... 10.62 |============================ core-avx-i .......... 10.45 |============================ core-avx2 ........... 10.55 |============================ test ................ 10.13 |=========================== i7-3770K core-avx-i . 9.87 |========================== Q9300@3.33GHz ....... 18.74 |================================================== Botan 1.10.3 Test: Tiger Mbytes/s > Higher Is Better nocona ........ 438.78 |======================================================= core2 ......... 438.87 |======================================================= corei7 ........ 427.31 |===================================================== corei7-avx .... 442.47 |======================================================= core-avx-i .... 440.37 |======================================================= core-avx2 ..... 424.56 |===================================================== Q9300@3.33GHz . 331.70 |========================================= Botan 1.10.3 Test: AES-256 Mbytes/s > Higher Is Better nocona ........ 157.97 |====================================== core2 ......... 158.35 |====================================== corei7 ........ 157.96 |====================================== corei7-avx .... 158.19 |====================================== core-avx-i .... 158.31 |====================================== core-avx2 ..... 158.43 |====================================== Q9300@3.33GHz . 227.29 |======================================================= Botan 1.10.3 Test: CAST-256 Mbytes/s > Higher Is Better nocona ........ 95.48 |======================================================== core2 ......... 95.80 |======================================================== corei7 ........ 95.54 |======================================================== corei7-avx .... 95.77 |======================================================== core-avx-i .... 95.79 |======================================================== core-avx2 ..... 95.76 |======================================================== Q9300@3.33GHz . 75.73 |============================================ SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better nocona .............. 615.33 |================================================= core2 ............... 616.21 |================================================= corei7 .............. 616.65 |================================================= corei7-avx .......... 616.65 |================================================= core-avx-i .......... 615.76 |================================================= core-avx2 ........... 596.16 |=============================================== test ................ 553.48 |============================================ i7-3770K core-avx-i . 553.48 |============================================ Q9300@3.33GHz ....... 325.95 |========================== SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better nocona .............. 245.07 |=================================== core2 ............... 250.93 |=================================== corei7 .............. 249.11 |=================================== corei7-avx .......... 251.86 |==================================== core-avx-i .......... 247.35 |=================================== core-avx2 ........... 226.57 |================================ test ................ 339.88 |================================================ i7-3770K core-avx-i . 346.41 |================================================= Q9300@3.33GHz ....... 93.72 |============= SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better nocona .............. 1825.73 |===================================== core2 ............... 1859.97 |===================================== corei7 .............. 1863.19 |===================================== corei7-avx .......... 1851.10 |===================================== core-avx-i .......... 1824.28 |===================================== core-avx2 ........... 1817.03 |===================================== test ................ 2386.29 |================================================ i7-3770K core-avx-i . 2378.31 |================================================ Q9300@3.33GHz ....... 865.11 |================= TTSIOD 3D Renderer 2.2z Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better nocona .............. 122.02 |======================================== core2 ............... 121.58 |======================================== corei7 .............. 123.14 |========================================= corei7-avx .......... 117.71 |======================================= core-avx-i .......... 116.54 |====================================== core-avx2 ........... 119.78 |======================================= test ................ 148.75 |================================================= i7-3770K core-avx-i . 148.59 |================================================= Q9300@3.33GHz ....... 0.93 | x264 2013-06-08 H.264 Video Encoding Frames Per Second > Higher Is Better nocona .............. 156.80 |================================================= core2 ............... 156.74 |================================================= corei7 .............. 156.06 |================================================ corei7-avx .......... 155.63 |================================================ core-avx-i .......... 156.08 |================================================ core-avx2 ........... 155.18 |================================================ test ................ 158.19 |================================================= i7-3770K core-avx-i . 157.85 |================================================= Q9300@3.33GHz ....... 84.66 |========================== GraphicsMagick 1.3.16 Operation: Blur Iterations Per Minute > Higher Is Better nocona .............. 115 |=========================================== core2 ............... 117 |============================================ corei7 .............. 116 |============================================ corei7-avx .......... 122 |============================================== core-avx-i .......... 122 |============================================== core-avx2 ........... 138 |==================================================== test ................ 132 |================================================== i7-3770K core-avx-i . 138 |==================================================== GraphicsMagick 1.3.16 Operation: Sharpen Iterations Per Minute > Higher Is Better nocona .............. 83 |================================ core2 ............... 84 |================================ corei7 .............. 84 |================================ corei7-avx .......... 96 |===================================== core-avx-i .......... 96 |===================================== core-avx2 ........... 136 |==================================================== test ................ 83 |================================ i7-3770K core-avx-i . 95 |==================================== GraphicsMagick 1.3.16 Operation: Resizing Iterations Per Minute > Higher Is Better nocona .............. 157 |============================================= core2 ............... 160 |============================================== corei7 .............. 160 |============================================== corei7-avx .......... 166 |=============================================== core-avx-i .......... 167 |================================================ core-avx2 ........... 182 |==================================================== test ................ 161 |============================================== i7-3770K core-avx-i . 167 |================================================ GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better nocona .............. 118 |================================================== core2 ............... 120 |=================================================== corei7 .............. 120 |=================================================== corei7-avx .......... 119 |================================================== core-avx-i .......... 120 |=================================================== core-avx2 ........... 121 |=================================================== test ................ 123 |==================================================== i7-3770K core-avx-i . 116 |================================================= Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better nocona .............. 1517.03 |=========================================== core2 ............... 1564.22 |============================================= corei7 .............. 1560.18 |============================================ corei7-avx .......... 1404.92 |======================================== core-avx-i .......... 1630.12 |============================================== core-avx2 ........... 1282.30 |==================================== test ................ 1686.65 |================================================ i7-3770K core-avx-i . 1677.67 |================================================ Q9300@3.33GHz ....... 1190.98 |================================== Timed ImageMagick Compilation 6.8.1-10 Time To Compile Seconds < Lower Is Better nocona .............. 76.98 |=================================== core2 ............... 79.03 |==================================== corei7 .............. 79.64 |===================================== corei7-avx .......... 80.91 |===================================== core-avx-i .......... 81.06 |===================================== core-avx2 ........... 80.66 |===================================== test ................ 59.51 |=========================== i7-3770K core-avx-i . 64.08 |============================== Q9300@3.33GHz ....... 106.30 |================================================= Timed Linux Kernel Compilation 3.1 Time To Compile Seconds < Lower Is Better nocona .............. 97.89 |================================= core2 ............... 97.63 |================================= corei7 .............. 97.77 |================================= corei7-avx .......... 98.10 |================================= core-avx-i .......... 97.85 |================================= core-avx2 ........... 97.25 |================================= test ................ 89.94 |=============================== i7-3770K core-avx-i . 89.90 |=============================== Q9300@3.33GHz ....... 144.27 |================================================= C-Ray 1.1 Total Time Seconds < Lower Is Better nocona .............. 23.07 |========================== core2 ............... 22.95 |========================== corei7 .............. 22.95 |========================== corei7-avx .......... 22.84 |========================== core-avx-i .......... 22.83 |========================== core-avx2 ........... 17.02 |=================== test ................ 27.78 |=============================== i7-3770K core-avx-i . 28.18 |================================ Q9300@3.33GHz ....... 44.15 |================================================== Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better nocona .............. 26 |======== core2 ............... 26 |======== corei7 .............. 26 |======== corei7-avx .......... 26 |======== core-avx-i .......... 26 |======== core-avx2 ........... 24 |======= test ................ 87 |========================== i7-3770K core-avx-i . 25 |======== Q9300@3.33GHz ....... 172 |==================================================== FFmpeg 1.1 H.264 HD To NTSC DV Seconds < Lower Is Better nocona .............. 12.94 |==================================== core2 ............... 13.16 |==================================== corei7 .............. 12.93 |==================================== corei7-avx .......... 12.86 |==================================== core-avx-i .......... 13.00 |==================================== core-avx2 ........... 13.01 |==================================== test ................ 11.86 |================================= i7-3770K core-avx-i . 11.89 |================================= Q9300@3.33GHz ....... 18.06 |================================================== Apache Benchmark 2.4.3 Static Web Page Serving Requests Per Second > Higher Is Better nocona .............. 24888.11 |============================================== core2 ............... 25606.17 |=============================================== corei7 .............. 25490.14 |=============================================== corei7-avx .......... 25580.44 |=============================================== core-avx-i .......... 25549.84 |=============================================== core-avx2 ........... 25644.10 |=============================================== test ................ 23897.32 |============================================ i7-3770K core-avx-i . 23771.72 |============================================ Q9300@3.33GHz ....... 12787.34 |=======================