Intel Haswell GCC 4.8 core-avx2 Tuning Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article.
HTML result view exported from: https://openbenchmarking.org/result/1306150-PTS-INTELHAS05&grs&sor .
Intel Haswell GCC 4.8 core-avx2 Tuning Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution nocona core2 corei7 corei7-avx core-avx-i core-avx2 Intel Core i7-4770K @ 3.50GHz (8 Cores) Intel DH87RL Intel Haswell DRAM 15360MB 240GB OCZ VERTEX3 Intel Haswell IGP Intel Haswell HDMI VA2431 Intel Connection I217-V Ubuntu 13.04 3.10.0-999-generic (x86_64) Unity 7.0.0 X Server 1.13.3 intel 2.21.9 3.0 Mesa 9.2.0-devel (git-a2e3b1c) GCC 4.8.1 + LLVM 3.2 ext4 1920x1080 OpenBenchmarking.org Compiler Details - --enable-checking=release --enable-languages=c,c++,fortran Processor Details - Scaling Governor: acpi- freq ondemand
Intel Haswell GCC 4.8 core-avx2 Tuning graphics-magick: Sharpen c-ray: Total Time himeno: Poisson Pressure Solver graphics-magick: Blur graphics-magick: Resizing scimark2: Fast Fourier Transform smallpt: Global Illumination Renderer; 100 Samples ttsiod-renderer: Phong Rendering With Soft-Shadow Mapping build-imagemagick: Time To Compile hmmer: Pfam Database Search botan: Tiger scimark2: Monte Carlo apache: Static Web Page Serving graphics-magick: Local Adaptive Thresholding scimark2: Dense LU Matrix Factorization ffmpeg: H.264 HD To NTSC DV x264: H.264 Video Encoding build-linux-kernel: Time To Compile botan: CAST-256 botan: AES-256 nocona core2 corei7 corei7-avx core-avx-i core-avx2 83 23.07 1517.03 115 157 245.07 26 122.02 76.98 10.16 438.78 615.33 24888.11 118 1825.73 12.94 156.80 97.89 95.48 157.97 84 22.95 1564.22 117 160 250.93 26 121.58 79.03 10.14 438.87 616.21 25606.17 120 1859.97 13.16 156.74 97.63 95.80 158.35 84 22.95 1560.18 116 160 249.11 26 123.14 79.64 10.22 427.31 616.65 25490.14 120 1863.19 12.93 156.06 97.77 95.54 157.96 96 22.84 1404.92 122 166 251.86 26 117.71 80.91 10.62 442.47 616.65 25580.44 119 1851.10 12.86 155.63 98.10 95.77 158.19 96 22.83 1630.12 122 167 247.35 26 116.54 81.06 10.45 440.37 615.76 25549.84 120 1824.28 13.00 156.08 97.85 95.79 158.31 136 17.02 1282.30 138 182 226.57 24 119.78 80.66 10.55 424.56 596.16 25644.10 121 1817.03 13.01 155.18 97.25 95.76 158.43 OpenBenchmarking.org
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Sharpen core-avx2 core-avx-i corei7-avx corei7 core2 nocona 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 136 96 96 84 84 83 -march=core-avx2 -march=core-avx-i -march=corei7-avx -march=corei7 -march=core2 -march=nocona 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time core-avx2 core-avx-i corei7-avx core2 corei7 nocona 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 17.02 22.83 22.84 22.95 22.95 23.07 -march=core-avx2 -march=core-avx-i -march=corei7-avx -march=core2 -march=corei7 -march=nocona 1. (CC) gcc options: -lm -lpthread -O3
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver core-avx-i core2 corei7 nocona corei7-avx core-avx2 400 800 1200 1600 2000 SE +/- 1.05, N = 3 SE +/- 3.07, N = 3 SE +/- 0.75, N = 3 SE +/- 1.20, N = 3 SE +/- 105.46, N = 6 SE +/- 19.87, N = 6 1630.12 1564.22 1560.18 1517.03 1404.92 1282.30 -march=core-avx-i -march=core2 -march=corei7 -march=nocona -march=corei7-avx -march=core-avx2 1. (CC) gcc options: -O3
GraphicsMagick Operation: Blur OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Blur core-avx2 core-avx-i corei7-avx core2 corei7 nocona 30 60 90 120 150 SE +/- 0.88, N = 3 SE +/- 0.67, N = 3 SE +/- 1.00, N = 3 SE +/- 0.00, N = 3 SE +/- 1.00, N = 3 SE +/- 0.00, N = 3 138 122 122 117 116 115 -march=core-avx2 -march=core-avx-i -march=corei7-avx -march=core2 -march=corei7 -march=nocona 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Resizing core-avx2 core-avx-i corei7-avx corei7 core2 nocona 40 80 120 160 200 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 182 167 166 160 160 157 -march=core-avx2 -march=core-avx-i -march=corei7-avx -march=corei7 -march=core2 -march=nocona 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform corei7-avx core2 corei7 core-avx-i nocona core-avx2 60 120 180 240 300 SE +/- 1.22, N = 4 SE +/- 0.67, N = 4 SE +/- 0.86, N = 4 SE +/- 2.13, N = 4 SE +/- 2.50, N = 4 SE +/- 2.02, N = 4 251.86 250.93 249.11 247.35 245.07 226.57
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples core-avx2 nocona core2 corei7 corei7-avx core-avx-i 6 12 18 24 30 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 24 26 26 26 26 26 -march=core-avx2 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i 1. (CXX) g++ options: -fopenmp -O3
TTSIOD 3D Renderer Phong Rendering With Soft-Shadow Mapping OpenBenchmarking.org FPS, More Is Better TTSIOD 3D Renderer 2.2z Phong Rendering With Soft-Shadow Mapping corei7 nocona core2 core-avx2 corei7-avx core-avx-i 30 60 90 120 150 SE +/- 0.45, N = 3 SE +/- 0.39, N = 3 SE +/- 0.76, N = 3 SE +/- 0.09, N = 3 SE +/- 0.66, N = 3 SE +/- 0.36, N = 3 123.14 122.02 121.58 119.78 117.71 116.54 -march=corei7 -march=nocona -march=core2 -march=core-avx2 -march=corei7-avx -march=core-avx-i 1. (CXX) g++ options: -O3 -fomit-frame-pointer -ffast-math -mtune=native -flto -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -lstdc++
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.8.1-10 Time To Compile nocona core2 corei7 core-avx2 corei7-avx core-avx-i 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.18, N = 3 SE +/- 0.32, N = 3 SE +/- 0.10, N = 3 SE +/- 0.31, N = 3 76.98 79.03 79.64 80.66 80.91 81.06
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search core2 nocona corei7 core-avx-i core-avx2 corei7-avx 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.17, N = 3 SE +/- 0.02, N = 3 10.14 10.16 10.22 10.45 10.55 10.62 -march=core2 -march=nocona -march=corei7 -march=core-avx-i -march=core-avx2 -march=corei7-avx 1. (CC) gcc options: -O3 -pthread -lhmmer -lsquid -lm
Botan Test: Tiger OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: Tiger corei7-avx core-avx-i core2 nocona corei7 core-avx2 100 200 300 400 500 442.47 440.37 438.87 438.78 427.31 424.56 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo corei7-avx corei7 core2 core-avx-i nocona core-avx2 130 260 390 520 650 SE +/- 0.44, N = 4 SE +/- 0.44, N = 4 SE +/- 0.51, N = 4 SE +/- 0.44, N = 4 SE +/- 0.72, N = 4 SE +/- 20.17, N = 8 616.65 616.65 616.21 615.76 615.33 596.16
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.3 Static Web Page Serving core-avx2 core2 corei7-avx core-avx-i corei7 nocona 5K 10K 15K 20K 25K SE +/- 170.85, N = 3 SE +/- 229.80, N = 3 SE +/- 178.37, N = 3 SE +/- 126.25, N = 3 SE +/- 193.34, N = 3 SE +/- 107.43, N = 3 25644.10 25606.17 25580.44 25549.84 25490.14 24888.11 -march=core-avx2 -march=core2 -march=corei7-avx -march=core-avx-i -march=corei7 -march=nocona 1. (CC) gcc options: -shared -fPIC -pthread -O3
GraphicsMagick Operation: Local Adaptive Thresholding OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding core-avx2 core-avx-i corei7 core2 corei7-avx nocona 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 121 120 120 120 119 118 -march=core-avx2 -march=core-avx-i -march=corei7 -march=core2 -march=corei7-avx -march=nocona 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization corei7 core2 corei7-avx nocona core-avx-i core-avx2 400 800 1200 1600 2000 SE +/- 3.12, N = 4 SE +/- 5.53, N = 4 SE +/- 22.67, N = 4 SE +/- 21.90, N = 4 SE +/- 23.35, N = 4 SE +/- 28.95, N = 4 1863.19 1859.97 1851.10 1825.73 1824.28 1817.03
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 1.1 H.264 HD To NTSC DV corei7-avx corei7 nocona core-avx-i core-avx2 core2 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 SE +/- 0.05, N = 3 SE +/- 0.12, N = 3 SE +/- 0.09, N = 3 12.86 12.93 12.94 13.00 13.01 13.16 -march=corei7-avx -march=corei7 -march=nocona -march=core-avx-i -march=core-avx2 -march=core2 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -O3 -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
x264 H.264 Video Encoding OpenBenchmarking.org Frames Per Second, More Is Better x264 2013-06-08 H.264 Video Encoding nocona core2 core-avx-i corei7 corei7-avx core-avx2 30 60 90 120 150 SE +/- 0.30, N = 5 SE +/- 0.55, N = 5 SE +/- 0.50, N = 5 SE +/- 0.51, N = 5 SE +/- 0.20, N = 5 SE +/- 0.90, N = 5 156.80 156.74 156.08 156.06 155.63 155.18 -march=nocona -march=core2 -march=core-avx-i -march=corei7 -march=corei7-avx -march=core-avx2 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize
Timed Linux Kernel Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 3.1 Time To Compile core-avx2 core2 corei7 core-avx-i nocona corei7-avx 20 40 60 80 100 SE +/- 0.60, N = 3 SE +/- 0.54, N = 3 SE +/- 0.69, N = 3 SE +/- 0.76, N = 3 SE +/- 0.59, N = 3 SE +/- 0.54, N = 3 97.25 97.63 97.77 97.85 97.89 98.10
Botan Test: CAST-256 OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: CAST-256 core2 core-avx-i corei7-avx core-avx2 corei7 nocona 20 40 60 80 100 95.80 95.79 95.77 95.76 95.54 95.48 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
Botan Test: AES-256 OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: AES-256 core-avx2 core2 core-avx-i corei7-avx nocona corei7 40 80 120 160 200 158.43 158.35 158.31 158.19 157.97 157.96 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
Phoronix Test Suite v10.8.5