Intel Haswell GCC 4.8 core-avx2 Tuning Testing Intel Core i7 4770K with different CFLAGS/CXXFLAGS to look at the core-avx2 Haswell GCC 4.8.1 compiler optimizations. Benchmarks by Michael Larabel of Phoronix for a future article.
HTML result view exported from: https://openbenchmarking.org/result/1306150-PTS-INTELHAS05&grt .
Intel Haswell GCC 4.8 core-avx2 Tuning Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution nocona core2 corei7 corei7-avx core-avx-i core-avx2 Intel Core i7-4770K @ 3.50GHz (8 Cores) Intel DH87RL Intel Haswell DRAM 15360MB 240GB OCZ VERTEX3 Intel Haswell IGP Intel Haswell HDMI VA2431 Intel Connection I217-V Ubuntu 13.04 3.10.0-999-generic (x86_64) Unity 7.0.0 X Server 1.13.3 intel 2.21.9 3.0 Mesa 9.2.0-devel (git-a2e3b1c) GCC 4.8.1 + LLVM 3.2 ext4 1920x1080 OpenBenchmarking.org Compiler Details - --enable-checking=release --enable-languages=c,c++,fortran Processor Details - Scaling Governor: acpi- freq ondemand
Intel Haswell GCC 4.8 core-avx2 Tuning apache: Static Web Page Serving botan: Tiger botan: AES-256 botan: CAST-256 c-ray: Total Time ffmpeg: H.264 HD To NTSC DV graphics-magick: Blur graphics-magick: Sharpen graphics-magick: Resizing graphics-magick: Local Adaptive Thresholding himeno: Poisson Pressure Solver scimark2: Monte Carlo scimark2: Fast Fourier Transform scimark2: Dense LU Matrix Factorization smallpt: Global Illumination Renderer; 100 Samples hmmer: Pfam Database Search build-imagemagick: Time To Compile build-linux-kernel: Time To Compile ttsiod-renderer: Phong Rendering With Soft-Shadow Mapping x264: H.264 Video Encoding nocona core2 corei7 corei7-avx core-avx-i core-avx2 24888.11 438.78 157.97 95.48 23.07 12.94 115 83 157 118 1517.03 615.33 245.07 1825.73 26 10.16 76.98 97.89 122.02 156.80 25606.17 438.87 158.35 95.80 22.95 13.16 117 84 160 120 1564.22 616.21 250.93 1859.97 26 10.14 79.03 97.63 121.58 156.74 25490.14 427.31 157.96 95.54 22.95 12.93 116 84 160 120 1560.18 616.65 249.11 1863.19 26 10.22 79.64 97.77 123.14 156.06 25580.44 442.47 158.19 95.77 22.84 12.86 122 96 166 119 1404.92 616.65 251.86 1851.10 26 10.62 80.91 98.10 117.71 155.63 25549.84 440.37 158.31 95.79 22.83 13.00 122 96 167 120 1630.12 615.76 247.35 1824.28 26 10.45 81.06 97.85 116.54 156.08 25644.10 424.56 158.43 95.76 17.02 13.01 138 136 182 121 1282.30 596.16 226.57 1817.03 24 10.55 80.66 97.25 119.78 155.18 OpenBenchmarking.org
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.3 Static Web Page Serving nocona core2 corei7 corei7-avx core-avx-i core-avx2 5K 10K 15K 20K 25K SE +/- 107.43, N = 3 SE +/- 229.80, N = 3 SE +/- 193.34, N = 3 SE +/- 178.37, N = 3 SE +/- 126.25, N = 3 SE +/- 170.85, N = 3 24888.11 25606.17 25490.14 25580.44 25549.84 25644.10 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -shared -fPIC -pthread -O3
Botan Test: Tiger OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: Tiger nocona core2 corei7 corei7-avx core-avx-i core-avx2 100 200 300 400 500 438.78 438.87 427.31 442.47 440.37 424.56 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
Botan Test: AES-256 OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: AES-256 nocona core2 corei7 corei7-avx core-avx-i core-avx2 40 80 120 160 200 157.97 158.35 157.96 158.19 158.31 158.43 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
Botan Test: CAST-256 OpenBenchmarking.org Mbytes/s, More Is Better Botan 1.10.3 Test: CAST-256 nocona core2 corei7 corei7-avx core-avx-i core-avx2 20 40 60 80 100 95.48 95.80 95.54 95.77 95.79 95.76 1. (CXX) g++ options: -m64 -ldl -lpthread -lrt -O2
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time nocona core2 corei7 corei7-avx core-avx-i core-avx2 6 12 18 24 30 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 23.07 22.95 22.95 22.84 22.83 17.02 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -lm -lpthread -O3
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 1.1 H.264 HD To NTSC DV nocona core2 corei7 corei7-avx core-avx-i core-avx2 3 6 9 12 15 SE +/- 0.07, N = 3 SE +/- 0.09, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.05, N = 3 SE +/- 0.12, N = 3 12.94 13.16 12.93 12.86 13.00 13.01 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -O3 -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -MMD -MF -MT
GraphicsMagick Operation: Blur OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Blur nocona core2 corei7 corei7-avx core-avx-i core-avx2 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 1.00, N = 3 SE +/- 1.00, N = 3 SE +/- 0.67, N = 3 SE +/- 0.88, N = 3 115 117 116 122 122 138 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
GraphicsMagick Operation: Sharpen OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Sharpen nocona core2 corei7 corei7-avx core-avx-i core-avx2 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 83 84 84 96 96 136 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
GraphicsMagick Operation: Resizing OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Resizing nocona core2 corei7 corei7-avx core-avx-i core-avx2 40 80 120 160 200 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 157 160 160 166 167 182 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
GraphicsMagick Operation: Local Adaptive Thresholding OpenBenchmarking.org Iterations Per Minute, More Is Better GraphicsMagick 1.3.16 Operation: Local Adaptive Thresholding nocona core2 corei7 corei7-avx core-avx-i core-avx2 30 60 90 120 150 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 118 120 120 119 120 121 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -std=gnu99 -fopenmp -O3 -pthread -ljbig -ljpeg -lXext -lSM -lICE -lX11 -llzma -lbz2 -lz -lm -lpthread
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver nocona core2 corei7 corei7-avx core-avx-i core-avx2 400 800 1200 1600 2000 SE +/- 1.20, N = 3 SE +/- 3.07, N = 3 SE +/- 0.75, N = 3 SE +/- 105.46, N = 6 SE +/- 1.05, N = 3 SE +/- 19.87, N = 6 1517.03 1564.22 1560.18 1404.92 1630.12 1282.30 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -O3
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo nocona core2 corei7 corei7-avx core-avx-i core-avx2 130 260 390 520 650 SE +/- 0.72, N = 4 SE +/- 0.51, N = 4 SE +/- 0.44, N = 4 SE +/- 0.44, N = 4 SE +/- 0.44, N = 4 SE +/- 20.17, N = 8 615.33 616.21 616.65 616.65 615.76 596.16
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform nocona core2 corei7 corei7-avx core-avx-i core-avx2 60 120 180 240 300 SE +/- 2.50, N = 4 SE +/- 0.67, N = 4 SE +/- 0.86, N = 4 SE +/- 1.22, N = 4 SE +/- 2.13, N = 4 SE +/- 2.02, N = 4 245.07 250.93 249.11 251.86 247.35 226.57
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization nocona core2 corei7 corei7-avx core-avx-i core-avx2 400 800 1200 1600 2000 SE +/- 21.90, N = 4 SE +/- 5.53, N = 4 SE +/- 3.12, N = 4 SE +/- 22.67, N = 4 SE +/- 23.35, N = 4 SE +/- 28.95, N = 4 1825.73 1859.97 1863.19 1851.10 1824.28 1817.03
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples nocona core2 corei7 corei7-avx core-avx-i core-avx2 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 26 26 26 26 26 24 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CXX) g++ options: -fopenmp -O3
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search nocona core2 corei7 corei7-avx core-avx-i core-avx2 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.17, N = 3 10.16 10.14 10.22 10.62 10.45 10.55 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -O3 -pthread -lhmmer -lsquid -lm
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.8.1-10 Time To Compile nocona core2 corei7 corei7-avx core-avx-i core-avx2 20 40 60 80 100 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.18, N = 3 SE +/- 0.10, N = 3 SE +/- 0.31, N = 3 SE +/- 0.32, N = 3 76.98 79.03 79.64 80.91 81.06 80.66
Timed Linux Kernel Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed Linux Kernel Compilation 3.1 Time To Compile nocona core2 corei7 corei7-avx core-avx-i core-avx2 20 40 60 80 100 SE +/- 0.59, N = 3 SE +/- 0.54, N = 3 SE +/- 0.69, N = 3 SE +/- 0.54, N = 3 SE +/- 0.76, N = 3 SE +/- 0.60, N = 3 97.89 97.63 97.77 98.10 97.85 97.25
TTSIOD 3D Renderer Phong Rendering With Soft-Shadow Mapping OpenBenchmarking.org FPS, More Is Better TTSIOD 3D Renderer 2.2z Phong Rendering With Soft-Shadow Mapping nocona core2 corei7 corei7-avx core-avx-i core-avx2 30 60 90 120 150 SE +/- 0.39, N = 3 SE +/- 0.76, N = 3 SE +/- 0.45, N = 3 SE +/- 0.66, N = 3 SE +/- 0.36, N = 3 SE +/- 0.09, N = 3 122.02 121.58 123.14 117.71 116.54 119.78 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CXX) g++ options: -O3 -fomit-frame-pointer -ffast-math -mtune=native -flto -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -lstdc++
x264 H.264 Video Encoding OpenBenchmarking.org Frames Per Second, More Is Better x264 2013-06-08 H.264 Video Encoding nocona core2 corei7 corei7-avx core-avx-i core-avx2 30 60 90 120 150 SE +/- 0.30, N = 5 SE +/- 0.55, N = 5 SE +/- 0.51, N = 5 SE +/- 0.20, N = 5 SE +/- 0.50, N = 5 SE +/- 0.90, N = 5 156.80 156.74 156.06 155.63 156.08 155.18 -march=nocona -march=core2 -march=corei7 -march=corei7-avx -march=core-avx-i -march=core-avx2 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize
Phoronix Test Suite v10.8.5