Intel Haswell GCC 4.8 core-avx2 Tuning Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article. nocona: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7-avx: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx-i: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 test: Processor: Intel Core i7-3770K @ 3.90GHz (8 Cores), Motherboard: ASRock Z77 Pro4-M, Memory: 16384MB, Disk: 256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080, Graphics: Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz), Monitor: LCD3090WQXi OS: Gentoo Base 2.2, Kernel: 3.11.0-drmfixes20130912-core-avx-i (x86_64), Desktop: KDE, Display Server: X Server 1.14.2.902 (1.14.3 RC 2), Display Driver: radeon 7.2.99, OpenGL: 3.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4, Compiler: GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn, File-System: ext4, Screen Resolution: 2560x1600 SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better nocona ..... 615.33 |========================================================== core2 ...... 616.21 |========================================================== corei7 ..... 616.65 |========================================================== corei7-avx . 616.65 |========================================================== core-avx-i . 615.76 |========================================================== core-avx2 .. 596.16 |======================================================== test ....... 553.48 |==================================================== SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better nocona ..... 245.07 |========================================== core2 ...... 250.93 |=========================================== corei7 ..... 249.11 |=========================================== corei7-avx . 251.86 |=========================================== core-avx-i . 247.35 |========================================== core-avx2 .. 226.57 |======================================= test ....... 339.88 |========================================================== SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better nocona ..... 1825.73 |============================================ core2 ...... 1859.97 |============================================ corei7 ..... 1863.19 |============================================= corei7-avx . 1851.10 |============================================ core-avx-i . 1824.28 |============================================ core-avx2 .. 1817.03 |=========================================== test ....... 2386.29 |========================================================= Botan 1.10.3 Test: Tiger Mbytes/s > Higher Is Better nocona ..... 438.78 |========================================================== core2 ...... 438.87 |========================================================== corei7 ..... 427.31 |======================================================== corei7-avx . 442.47 |========================================================== core-avx-i . 440.37 |========================================================== core-avx2 .. 424.56 |======================================================== Botan 1.10.3 Test: AES-256 Mbytes/s > Higher Is Better nocona ..... 157.97 |========================================================== core2 ...... 158.35 |========================================================== corei7 ..... 157.96 |========================================================== corei7-avx . 158.19 |========================================================== core-avx-i . 158.31 |========================================================== core-avx2 .. 158.43 |========================================================== Botan 1.10.3 Test: CAST-256 Mbytes/s > Higher Is Better nocona ..... 95.48 |=========================================================== core2 ...... 95.80 |=========================================================== corei7 ..... 95.54 |=========================================================== corei7-avx . 95.77 |=========================================================== core-avx-i . 95.79 |=========================================================== core-avx2 .. 95.76 |=========================================================== Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better nocona ..... 10.16 |======================================================== core2 ...... 10.14 |======================================================== corei7 ..... 10.22 |========================================================= corei7-avx . 10.62 |=========================================================== core-avx-i . 10.45 |========================================================== core-avx2 .. 10.55 |=========================================================== test ....... 10.13 |======================================================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better nocona ..... 1517.03 |=================================================== core2 ...... 1564.22 |===================================================== corei7 ..... 1560.18 |===================================================== corei7-avx . 1404.92 |=============================================== core-avx-i . 1630.12 |======================================================= core-avx2 .. 1282.30 |=========================================== test ....... 1686.65 |========================================================= Timed ImageMagick Compilation 6.8.1-10 Time To Compile Seconds < Lower Is Better nocona ..... 76.98 |======================================================== core2 ...... 79.03 |========================================================== corei7 ..... 79.64 |========================================================== corei7-avx . 80.91 |=========================================================== core-avx-i . 81.06 |=========================================================== core-avx2 .. 80.66 |=========================================================== test ....... 59.51 |=========================================== Timed Linux Kernel Compilation 3.1 Time To Compile Seconds < Lower Is Better nocona ..... 97.89 |=========================================================== core2 ...... 97.63 |=========================================================== corei7 ..... 97.77 |=========================================================== corei7-avx . 98.10 |=========================================================== core-avx-i . 97.85 |=========================================================== core-avx2 .. 97.25 |========================================================== test ....... 89.94 |====================================================== GraphicsMagick 1.3.16 Operation: Blur Iterations Per Minute > Higher Is Better nocona ..... 115 |=================================================== core2 ...... 117 |==================================================== corei7 ..... 116 |=================================================== corei7-avx . 122 |====================================================== core-avx-i . 122 |====================================================== core-avx2 .. 138 |============================================================= test ....... 132 |========================================================== GraphicsMagick 1.3.16 Operation: Sharpen Iterations Per Minute > Higher Is Better nocona ..... 83 |===================================== core2 ...... 84 |====================================== corei7 ..... 84 |====================================== corei7-avx . 96 |=========================================== core-avx-i . 96 |=========================================== core-avx2 .. 136 |============================================================= test ....... 83 |===================================== GraphicsMagick 1.3.16 Operation: Resizing Iterations Per Minute > Higher Is Better nocona ..... 157 |===================================================== core2 ...... 160 |====================================================== corei7 ..... 160 |====================================================== corei7-avx . 166 |======================================================== core-avx-i . 167 |======================================================== core-avx2 .. 182 |============================================================= test ....... 161 |====================================================== GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better nocona ..... 118 |=========================================================== core2 ...... 120 |============================================================ corei7 ..... 120 |============================================================ corei7-avx . 119 |=========================================================== core-avx-i . 120 |============================================================ core-avx2 .. 121 |============================================================ test ....... 123 |============================================================= x264 2013-06-08 H.264 Video Encoding Frames Per Second > Higher Is Better nocona ..... 156.80 |========================================================= core2 ...... 156.74 |========================================================= corei7 ..... 156.06 |========================================================= corei7-avx . 155.63 |========================================================= core-avx-i . 156.08 |========================================================= core-avx2 .. 155.18 |========================================================= test ....... 158.19 |========================================================== C-Ray 1.1 Total Time Seconds < Lower Is Better nocona ..... 23.07 |================================================= core2 ...... 22.95 |================================================= corei7 ..... 22.95 |================================================= corei7-avx . 22.84 |================================================= core-avx-i . 22.83 |================================================ core-avx2 .. 17.02 |==================================== test ....... 27.78 |=========================================================== TTSIOD 3D Renderer 2.2z Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better nocona ..... 122.02 |================================================ core2 ...... 121.58 |=============================================== corei7 ..... 123.14 |================================================ corei7-avx . 117.71 |============================================== core-avx-i . 116.54 |============================================= core-avx2 .. 119.78 |=============================================== test ....... 148.75 |========================================================== FFmpeg 1.1 H.264 HD To NTSC DV Seconds < Lower Is Better nocona ..... 12.94 |========================================================== core2 ...... 13.16 |=========================================================== corei7 ..... 12.93 |========================================================== corei7-avx . 12.86 |========================================================== core-avx-i . 13.00 |========================================================== core-avx2 .. 13.01 |========================================================== test ....... 11.86 |===================================================== Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better nocona ..... 26 |=================== core2 ...... 26 |=================== corei7 ..... 26 |=================== corei7-avx . 26 |=================== core-avx-i . 26 |=================== core-avx2 .. 24 |================= test ....... 87 |============================================================== Apache Benchmark 2.4.3 Static Web Page Serving Requests Per Second > Higher Is Better nocona ..... 24888.11 |====================================================== core2 ...... 25606.17 |======================================================== corei7 ..... 25490.14 |======================================================== corei7-avx . 25580.44 |======================================================== core-avx-i . 25549.84 |======================================================== core-avx2 .. 25644.10 |======================================================== test ....... 23897.32 |====================================================