Intel Haswell GCC 4.8 core-avx2 Tuning Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article. nocona: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 corei7-avx: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx-i: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 core-avx2: Processor: Intel Core i7-4770K @ 3.50GHz (8 Cores), Motherboard: Intel DH87RL, Chipset: Intel Haswell DRAM, Memory: 15360MB, Disk: 240GB OCZ VERTEX3, Graphics: Intel Haswell IGP, Audio: Intel Haswell HDMI, Monitor: VA2431, Network: Intel Connection I217-V OS: Ubuntu 13.04, Kernel: 3.10.0-999-generic (x86_64), Desktop: Unity 7.0.0, Display Server: X Server 1.13.3, Display Driver: intel 2.21.9, OpenGL: 3.0 Mesa 9.2.0-devel (git-a2e3b1c), Compiler: GCC 4.8.1 + LLVM 3.2, File-System: ext4, Screen Resolution: 1920x1080 test: Processor: Intel Core i7-3770K @ 3.90GHz (8 Cores), Motherboard: ASRock Z77 Pro4-M, Memory: 16384MB, Disk: 256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080, Graphics: Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz), Monitor: LCD3090WQXi OS: Gentoo Base 2.2, Kernel: 3.11.0-drmfixes20130912-core-avx-i (x86_64), Desktop: KDE, Display Server: X Server 1.14.2.902 (1.14.3 RC 2), Display Driver: radeon 7.2.99, OpenGL: 3.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4, Compiler: GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn, File-System: ext4, Screen Resolution: 2560x1600 3820 @ 4.5: Processor: Intel Core i7-3820 @ 4.20GHz (8 Cores), Motherboard: Gigabyte X79-UD3, Chipset: Intel Xeon E5/Core, Memory: 16384MB, Disk: 250GB Samsung SSD 840 + 80GB TOSHIBA MK8052GS + 640GB Western Digital WD6401AALS-0, Graphics: eVGA NVIDIA GeForce GTX 650 Ti 2048MB (928/2700MHz), Audio: Realtek ALC898, Network: Intel 82579V Gigabit Connection OS: Linux, Kernel: 3.10.10-1-ARCH (x86_64), Desktop: Cinnamon 1.8.8, Display Server: X Server 1.14.2, Display Driver: NVIDIA 325.15, OpenGL: 4.3.0 NVIDIA 325.15, Compiler: GCC 4.8.1 20130725, File-System: btrfs Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better nocona ..... 10.16 |======================================================== core2 ...... 10.14 |======================================================== corei7 ..... 10.22 |========================================================= corei7-avx . 10.62 |=========================================================== core-avx-i . 10.45 |========================================================== core-avx2 .. 10.55 |=========================================================== test ....... 10.13 |======================================================== 3820 @ 4.5 . 9.95 |======================================================= Botan 1.10.3 Test: Tiger Mbytes/s > Higher Is Better nocona ..... 438.78 |========================================================== core2 ...... 438.87 |========================================================== corei7 ..... 427.31 |======================================================== corei7-avx . 442.47 |========================================================== core-avx-i . 440.37 |========================================================== core-avx2 .. 424.56 |======================================================== Botan 1.10.3 Test: AES-256 Mbytes/s > Higher Is Better nocona ..... 157.97 |========================================================== core2 ...... 158.35 |========================================================== corei7 ..... 157.96 |========================================================== corei7-avx . 158.19 |========================================================== core-avx-i . 158.31 |========================================================== core-avx2 .. 158.43 |========================================================== Botan 1.10.3 Test: CAST-256 Mbytes/s > Higher Is Better nocona ..... 95.48 |=========================================================== core2 ...... 95.80 |=========================================================== corei7 ..... 95.54 |=========================================================== corei7-avx . 95.77 |=========================================================== core-avx-i . 95.79 |=========================================================== core-avx2 .. 95.76 |=========================================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better nocona ..... 615.33 |========================================================== core2 ...... 616.21 |========================================================== corei7 ..... 616.65 |========================================================== corei7-avx . 616.65 |========================================================== core-avx-i . 615.76 |========================================================== core-avx2 .. 596.16 |======================================================== test ....... 553.48 |==================================================== 3820 @ 4.5 . 549.20 |==================================================== SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better nocona ..... 245.07 |========================================== core2 ...... 250.93 |=========================================== corei7 ..... 249.11 |=========================================== corei7-avx . 251.86 |=========================================== core-avx-i . 247.35 |========================================== core-avx2 .. 226.57 |======================================= test ....... 339.88 |========================================================== 3820 @ 4.5 . 223.45 |====================================== SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better nocona ..... 1825.73 |========================================= core2 ...... 1859.97 |========================================== corei7 ..... 1863.19 |========================================== corei7-avx . 1851.10 |========================================== core-avx-i . 1824.28 |========================================= core-avx2 .. 1817.03 |========================================= test ....... 2386.29 |====================================================== 3820 @ 4.5 . 2542.25 |========================================================= TTSIOD 3D Renderer 2.2z Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better nocona ..... 122.02 |================================================ core2 ...... 121.58 |=============================================== corei7 ..... 123.14 |================================================ corei7-avx . 117.71 |============================================== core-avx-i . 116.54 |============================================= core-avx2 .. 119.78 |=============================================== test ....... 148.75 |========================================================== 3820 @ 4.5 . 130.61 |=================================================== x264 2013-06-08 H.264 Video Encoding Frames Per Second > Higher Is Better nocona ..... 156.80 |======================================================== core2 ...... 156.74 |======================================================== corei7 ..... 156.06 |======================================================== corei7-avx . 155.63 |======================================================== core-avx-i . 156.08 |======================================================== core-avx2 .. 155.18 |======================================================== test ....... 158.19 |========================================================= 3820 @ 4.5 . 161.39 |========================================================== GraphicsMagick 1.3.16 Operation: Blur Iterations Per Minute > Higher Is Better nocona ..... 115 |================================================= core2 ...... 117 |================================================== corei7 ..... 116 |================================================= corei7-avx . 122 |==================================================== core-avx-i . 122 |==================================================== core-avx2 .. 138 |========================================================== test ....... 132 |======================================================== 3820 @ 4.5 . 144 |============================================================= GraphicsMagick 1.3.16 Operation: Sharpen Iterations Per Minute > Higher Is Better nocona ..... 83 |===================================== core2 ...... 84 |====================================== corei7 ..... 84 |====================================== corei7-avx . 96 |=========================================== core-avx-i . 96 |=========================================== core-avx2 .. 136 |============================================================= test ....... 83 |===================================== 3820 @ 4.5 . 92 |========================================= GraphicsMagick 1.3.16 Operation: Resizing Iterations Per Minute > Higher Is Better nocona ..... 157 |===================================================== core2 ...... 160 |====================================================== corei7 ..... 160 |====================================================== corei7-avx . 166 |======================================================== core-avx-i . 167 |======================================================== core-avx2 .. 182 |============================================================= test ....... 161 |====================================================== 3820 @ 4.5 . 177 |=========================================================== GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better nocona ..... 118 |==================================================== core2 ...... 120 |===================================================== corei7 ..... 120 |===================================================== corei7-avx . 119 |===================================================== core-avx-i . 120 |===================================================== core-avx2 .. 121 |===================================================== test ....... 123 |====================================================== 3820 @ 4.5 . 138 |============================================================= Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better nocona ..... 1517.03 |================================================== core2 ...... 1564.22 |=================================================== corei7 ..... 1560.18 |=================================================== corei7-avx . 1404.92 |============================================== core-avx-i . 1630.12 |====================================================== core-avx2 .. 1282.30 |========================================== test ....... 1686.65 |======================================================= 3820 @ 4.5 . 1735.83 |========================================================= Timed ImageMagick Compilation 6.8.1-10 Time To Compile Seconds < Lower Is Better nocona ..... 76.98 |======================================================== core2 ...... 79.03 |========================================================== corei7 ..... 79.64 |========================================================== corei7-avx . 80.91 |=========================================================== core-avx-i . 81.06 |=========================================================== core-avx2 .. 80.66 |=========================================================== test ....... 59.51 |=========================================== 3820 @ 4.5 . 55.02 |======================================== Timed Linux Kernel Compilation 3.1 Time To Compile Seconds < Lower Is Better nocona ..... 97.89 |=========================================================== core2 ...... 97.63 |=========================================================== corei7 ..... 97.77 |=========================================================== corei7-avx . 98.10 |=========================================================== core-avx-i . 97.85 |=========================================================== core-avx2 .. 97.25 |========================================================== test ....... 89.94 |====================================================== 3820 @ 4.5 . 80.48 |================================================ C-Ray 1.1 Total Time Seconds < Lower Is Better nocona ..... 23.07 |================================================= core2 ...... 22.95 |================================================= corei7 ..... 22.95 |================================================= corei7-avx . 22.84 |================================================= core-avx-i . 22.83 |================================================ core-avx2 .. 17.02 |==================================== test ....... 27.78 |=========================================================== 3820 @ 4.5 . 26.59 |======================================================== Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better nocona ..... 26 |================== core2 ...... 26 |================== corei7 ..... 26 |================== corei7-avx . 26 |================== core-avx-i . 26 |================== core-avx2 .. 24 |================= test ....... 87 |============================================================ 3820 @ 4.5 . 90 |============================================================== FFmpeg 1.1 H.264 HD To NTSC DV Seconds < Lower Is Better nocona ..... 12.94 |========================================================== core2 ...... 13.16 |=========================================================== corei7 ..... 12.93 |========================================================== corei7-avx . 12.86 |========================================================== core-avx-i . 13.00 |========================================================== core-avx2 .. 13.01 |========================================================== test ....... 11.86 |===================================================== Apache Benchmark 2.4.3 Static Web Page Serving Requests Per Second > Higher Is Better nocona ..... 24888.11 |================================================ core2 ...... 25606.17 |================================================= corei7 ..... 25490.14 |================================================= corei7-avx . 25580.44 |================================================= core-avx-i . 25549.84 |================================================= core-avx2 .. 25644.10 |================================================= test ....... 23897.32 |============================================== 3820 @ 4.5 . 29308.24 |========================================================