Intel Haswell GCC 4.8 core-avx2 Tuning Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article.
HTML result view exported from: https://openbenchmarking.org/result/1309165-SO-1309136DA64&sro&grs .
Intel Haswell GCC 4.8 core-avx2 Tuning Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution nocona core2 corei7 corei7-avx core-avx-i core-avx2 test 3820 @ 4.5 Intel Core i7-4770K @ 3.50GHz (8 Cores) Intel DH87RL Intel Haswell DRAM 15360MB 240GB OCZ VERTEX3 Intel Haswell IGP Intel Haswell HDMI VA2431 Intel Connection I217-V Ubuntu 13.04 3.10.0-999-generic (x86_64) Unity 7.0.0 X Server 1.13.3 intel 2.21.9 3.0 Mesa 9.2.0-devel (git-a2e3b1c) GCC 4.8.1 + LLVM 3.2 ext4 1920x1080 Intel Core i7-3770K @ 3.90GHz (8 Cores) ASRock Z77 Pro4-M 16384MB 256GB OCZ VECTOR + 2 x 1000GB SAMSUNG HD103UJ + 80GB INTEL SSDSA2M080 Gallium 0.4 on AMD TAHITI 3072MB (810/1250MHz) LCD3090WQXi Gentoo Base 2.2 3.11.0-drmfixes20130912-core-avx-i (x86_64) KDE X Server 1.14.2.902 (1.14.3 RC 2) radeon 7.2.99 3.0 Mesa 9.3.0-devel (git-f4e35f8) Gallium 0.4 GCC 4.8.1 + Clang 3.4 + LLVM 3.4svn 2560x1600 Intel Core i7-3820 @ 4.20GHz (8 Cores) Gigabyte X79-UD3 Intel Xeon E5/Core 250GB Samsung SSD 840 + 80GB TOSHIBA MK8052GS + 640GB Western Digital WD6401AALS-0 eVGA NVIDIA GeForce GTX 650 Ti 2048MB (928/2700MHz) Realtek ALC898 Intel 82579V Gigabit Connection Linux 3.10.10-1-ARCH (x86_64) Cinnamon 1.8.8 X Server 1.14.2 NVIDIA 325.15 4.3.0 NVIDIA 325.15 GCC 4.8.1 20130725 btrfs OpenBenchmarking.org Compiler Details - nocona: --enable-checking=release --enable-languages=c,c++,fortran - core2: --enable-checking=release --enable-languages=c,c++,fortran - corei7: --enable-checking=release --enable-languages=c,c++,fortran - corei7-avx: --enable-checking=release --enable-languages=c,c++,fortran - core-avx-i: --enable-checking=release --enable-languages=c,c++,fortran - core-avx2: --enable-checking=release --enable-languages=c,c++,fortran - test: --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.1 --build=x86_64-pc-linux-gnu --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.1 --disable-altivec --disable-fixed-point --disable-isl-version-check --disable-libgcj --disable-libssp --disable-lto --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-languages=c,c++,fortran --enable-libgomp --enable-libmudflap --enable-libstdcxx-time --enable-multilib --enable-nls --enable-obsolete --enable-secureplt --enable-shared --enable-targets=all --enable-threads=posix --host=x86_64-pc-linux-gnu --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.1/include --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.1/man --with-cloog --with-multilib-list=m32,m64 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.8.1/python - 3820 @ 4.5: --disable-cloog-version-check --disable-install-libiberty --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror --enable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-cloog-backend=isl --enable-gnu-unique-object --enable-gold --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-ld=default --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-linker-hash-style=gnu --with-plugin-ld=ld.gold Processor Details - nocona: Scaling Governor: acpi- freq ondemand - core2: Scaling Governor: acpi- freq ondemand - corei7: Scaling Governor: acpi- freq ondemand - corei7-avx: Scaling Governor: acpi- freq ondemand - core-avx-i: Scaling Governor: acpi- freq ondemand - core-avx2: Scaling Governor: acpi- freq ondemand - test: Scaling Governor: intel_pstate powersave - 3820 @ 4.5: Scaling Governor: intel_pstate powersave
Intel Haswell GCC 4.8 core-avx2 Tuning smallpt: Global Illumination Renderer; 100 Samples graphics-magick: Sharpen c-ray: Total Time scimark2: Fast Fourier Transform build-imagemagick: Time To Compile scimark2: Dense LU Matrix Factorization himeno: Poisson Pressure Solver ttsiod-renderer: Phong Rendering With Soft-Shadow Mapping graphics-magick: Blur apache: Static Web Page Serving build-linux-kernel: Time To Compile graphics-magick: Local Adaptive Thresholding graphics-magick: Resizing scimark2: Monte Carlo ffmpeg: H.264 HD To NTSC DV hmmer: Pfam Database Search botan: Tiger x264: H.264 Video Encoding botan: CAST-256 botan: AES-256 nocona core2 corei7 corei7-avx core-avx-i core-avx2 test 3820 @ 4.5 26 83 23.07 245.07 76.98 1825.73 1517.03 122.02 115 24888.11 97.89 118 157 615.33 12.94 10.16 438.78 156.80 95.48 157.97 26 84 22.95 250.93 79.03 1859.97 1564.22 121.58 117 25606.17 97.63 120 160 616.21 13.16 10.14 438.87 156.74 95.80 158.35 26 84 22.95 249.11 79.64 1863.19 1560.18 123.14 116 25490.14 97.77 120 160 616.65 12.93 10.22 427.31 156.06 95.54 157.96 26 96 22.84 251.86 80.91 1851.10 1404.92 117.71 122 25580.44 98.10 119 166 616.65 12.86 10.62 442.47 155.63 95.77 158.19 26 96 22.83 247.35 81.06 1824.28 1630.12 116.54 122 25549.84 97.85 120 167 615.76 13.00 10.45 440.37 156.08 95.79 158.31 24 136 17.02 226.57 80.66 1817.03 1282.30 119.78 138 25644.10 97.25 121 182 596.16 13.01 10.55 424.56 155.18 95.76 158.43 87 83 27.78 339.88 59.51 2386.29 1686.65 148.75 132 23897.32 89.94 123 161 553.48 11.86 10.13 158.19 90 92 26.59 223.45 55.02 2542.25 1735.83 130.61 144 29308.24 80.48 138 177 549.20 9.95 161.39 OpenBenchmarking.org
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 20 40 60 80 100 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 90 26 24 26 26 26 26 87 -O3 -march=core-avx-i -O3 -march=core-avx2 -O3 -march=core2 -O3 -march=corei7 -O3 -march=corei7-avx -O3 -march=nocona 1. (CXX) g++ options: -fopenmp
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Sharpen 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 92 96 136 84 84 96 83 83 -O2 -llcms2 -ltiff -lfreetype -ljasper -lxml2 -lgomp -O3 -march=core-avx-i -ljbig -O3 -march=core-avx2 -ljbig -O3 -march=core2 -ljbig -O3 -march=corei7 -ljbig -O3 -march=corei7-avx -ljbig -O3 -march=nocona -ljbig -O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt 1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 7 14 21 28 35 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 26.59 22.83 17.02 22.95 22.95 22.84 23.07 27.78 -march=core-avx-i -march=core-avx2 -march=core2 -march=corei7 -march=corei7-avx -march=nocona 1. (CC) gcc options: -lm -lpthread -O3
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 70 140 210 280 350 SE +/- 2.69, N = 4 SE +/- 2.13, N = 4 SE +/- 2.02, N = 4 SE +/- 0.67, N = 4 SE +/- 0.86, N = 4 SE +/- 1.22, N = 4 SE +/- 2.50, N = 4 SE +/- 0.34, N = 4 223.45 247.35 226.57 250.93 249.11 251.86 245.07 339.88
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.8.1-10 Time To Compile 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 20 40 60 80 100 SE +/- 0.08, N = 3 SE +/- 0.31, N = 3 SE +/- 0.32, N = 3 SE +/- 0.06, N = 3 SE +/- 0.18, N = 3 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.08, N = 3 55.02 81.06 80.66 79.03 79.64 80.91 76.98 59.51
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 500 1000 1500 2000 2500 SE +/- 38.45, N = 5 SE +/- 23.35, N = 4 SE +/- 28.95, N = 4 SE +/- 5.53, N = 4 SE +/- 3.12, N = 4 SE +/- 22.67, N = 4 SE +/- 21.90, N = 4 SE +/- 3.08, N = 4 2542.25 1824.28 1817.03 1859.97 1863.19 1851.10 1825.73 2386.29
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 400 800 1200 1600 2000 SE +/- 18.21, N = 3 SE +/- 1.05, N = 3 SE +/- 19.87, N = 6 SE +/- 3.07, N = 3 SE +/- 0.75, N = 3 SE +/- 105.46, N = 6 SE +/- 1.20, N = 3 SE +/- 0.98, N = 3 1735.83 1630.12 1282.30 1564.22 1560.18 1404.92 1517.03 1686.65 -march=core-avx-i -march=core-avx2 -march=core2 -march=corei7 -march=corei7-avx -march=nocona 1. (CC) gcc options: -O3
TTSIOD 3D Renderer Phong Rendering With Soft-Shadow Mapping OpenBenchmarking.org FPS, More Is Better TTSIOD 3D Renderer 2.2z Phong Rendering With Soft-Shadow Mapping 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.36, N = 3 SE +/- 0.09, N = 3 SE +/- 0.76, N = 3 SE +/- 0.45, N = 3 SE +/- 0.66, N = 3 SE +/- 0.39, N = 3 SE +/- 0.27, N = 3 130.61 116.54 119.78 121.58 123.14 117.71 122.02 148.75 -flto -lpthread -march=core-avx-i -flto -march=core-avx2 -flto -march=core2 -flto -march=corei7 -flto -march=corei7-avx -flto -march=nocona -flto -lpthread 1. (CXX) g++ options: -O3 -fomit-frame-pointer -ffast-math -mtune=native -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -lstdc++
GraphicsMagick Operation: Blur OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Blur 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 30 60 90 120 150 SE +/- 0.33, N = 3 SE +/- 0.67, N = 3 SE +/- 0.88, N = 3 SE +/- 0.00, N = 3 SE +/- 1.00, N = 3 SE +/- 1.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 144 122 138 117 116 122 115 132 -O2 -llcms2 -ltiff -lfreetype -ljasper -lxml2 -lgomp -O3 -march=core-avx-i -ljbig -O3 -march=core-avx2 -ljbig -O3 -march=core2 -ljbig -O3 -march=corei7 -ljbig -O3 -march=corei7-avx -ljbig -O3 -march=nocona -ljbig -O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt 1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.3 Static Web Page Serving 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 6K 12K 18K 24K 30K SE +/- 162.25, N = 3 SE +/- 126.25, N = 3 SE +/- 170.85, N = 3 SE +/- 229.80, N = 3 SE +/- 193.34, N = 3 SE +/- 178.37, N = 3 SE +/- 107.43, N = 3 SE +/- 94.17, N = 3 29308.24 25549.84 25644.10 25606.17 25490.14 25580.44 24888.11 23897.32 -O2 -O3 -march=core-avx-i -O3 -march=core-avx2 -O3 -march=core2 -O3 -march=corei7 -O3 -march=corei7-avx -O3 -march=nocona -O2 1. (CC) gcc options: -shared -fPIC -pthread
Timed Linux Kernel Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 3.1 Time To Compile 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 20 40 60 80 100 SE +/- 0.50, N = 3 SE +/- 0.76, N = 3 SE +/- 0.60, N = 3 SE +/- 0.54, N = 3 SE +/- 0.69, N = 3 SE +/- 0.54, N = 3 SE +/- 0.59, N = 3 SE +/- 0.79, N = 3 80.48 97.85 97.25 97.63 97.77 98.10 97.89 89.94
GraphicsMagick Operation: Local Adaptive Thresholding OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 138 120 121 120 120 119 118 123 -O2 -llcms2 -ltiff -lfreetype -ljasper -lxml2 -lgomp -O3 -march=core-avx-i -ljbig -O3 -march=core-avx2 -ljbig -O3 -march=core2 -ljbig -O3 -march=corei7 -ljbig -O3 -march=corei7-avx -ljbig -O3 -march=nocona -ljbig -O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt 1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Resizing 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 40 80 120 160 200 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 177 167 182 160 160 166 157 161 -O2 -llcms2 -ltiff -lfreetype -ljasper -lxml2 -lgomp -O3 -march=core-avx-i -ljbig -O3 -march=core-avx2 -ljbig -O3 -march=core2 -ljbig -O3 -march=corei7 -ljbig -O3 -march=corei7-avx -ljbig -O3 -march=nocona -ljbig -O2 -llcms2 -ltiff -lfreetype -lxml2 -lrt 1. (CC) gcc options: -std=gnu99 -fopenmp -pthread -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 130 260 390 520 650 SE +/- 9.67, N = 8 SE +/- 0.44, N = 4 SE +/- 20.17, N = 8 SE +/- 0.51, N = 4 SE +/- 0.44, N = 4 SE +/- 0.44, N = 4 SE +/- 0.72, N = 4 SE +/- 0.00, N = 4 549.20 615.76 596.16 616.21 616.65 616.65 615.33 553.48
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 1.1 H.264 HD To NTSC DV core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.12, N = 3 SE +/- 0.09, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 13.00 13.01 13.16 12.93 12.86 12.94 11.86 -march=core-avx-i -march=core-avx2 -march=core2 -march=corei7 -march=corei7-avx -march=nocona -lva -lpthread -lrt 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -O3 -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 3 6 9 12 15 SE +/- 0.17, N = 4 SE +/- 0.04, N = 3 SE +/- 0.17, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 9.95 10.45 10.55 10.14 10.22 10.62 10.16 10.13 -O2 -O3 -march=core-avx-i -O3 -march=core-avx2 -O3 -march=core2 -O3 -march=corei7 -O3 -march=corei7-avx -O3 -march=nocona -O2 1. (CC) gcc options: -pthread -lhmmer -lsquid -lm
Botan Test: Tiger OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: Tiger core-avx-i core-avx2 core2 corei7 corei7-avx nocona 100 200 300 400 500 440.37 424.56 438.87 427.31 442.47 438.78 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
x264 H.264 Video Encoding OpenBenchmarking.org Frames Per Second, More Is Better x264 2013-06-08 H.264 Video Encoding 3820 @ 4.5 core-avx-i core-avx2 core2 corei7 corei7-avx nocona test 40 80 120 160 200 SE +/- 1.34, N = 5 SE +/- 0.50, N = 5 SE +/- 0.90, N = 5 SE +/- 0.55, N = 5 SE +/- 0.51, N = 5 SE +/- 0.20, N = 5 SE +/- 0.30, N = 5 SE +/- 0.52, N = 5 161.39 156.08 155.18 156.74 156.06 155.63 156.80 158.19 -lavformat -lavcodec -lavutil -lswscale -march=core-avx-i -march=core-avx2 -march=core2 -march=corei7 -march=corei7-avx -march=nocona -lavformat -lavcodec -lavutil -lswscale 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize
Botan Test: CAST-256 OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: CAST-256 core-avx-i core-avx2 core2 corei7 corei7-avx nocona 20 40 60 80 100 95.79 95.76 95.80 95.54 95.77 95.48 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
Botan Test: AES-256 OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: AES-256 core-avx-i core-avx2 core2 corei7 corei7-avx nocona 40 80 120 160 200 158.31 158.43 158.35 157.96 158.19 157.97 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
Phoronix Test Suite v10.8.4