LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 Haswell

Intel Core i7-4770K testing with a Intel DH87RL motherboard looking at the GCC 4.7, GCC 4.8, LLVM Clang 3.2, and LLVM Clang 3.3 compiler performance with core-avx2 Haswell optimizations. Intel Core i7 Haswell core-avx2 compiler benchmarks for a future article on Phoronix by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1307173-UT-1306206SO69.

LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 HaswellProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0Intel Core i7-4770K @ 3.50GHz (8 Cores)Intel DH87RLIntel 4th Gen Core DRAM15360MB240GB OCZ VERTEX3Intel Haswell DesktopIntel Haswell HDMIVA2431Intel Connection I217-VUbuntu 13.103.10.0-999-generic (x86_64)Unity 7.0.0X Server 1.13.3intel 2.21.93.0 Mesa 9.1.3GCC 4.7.3ext41920x1080GCC 4.8.1Clang 3.2 + LLVM 3.2svnClang 3.3 + LLVM 3.3AMD FX-8120 Eight-Core @ 4.00GHz (8 Cores)ASUS Crosshair V FormulaAMD ATI RD890 bridge8192MB1000GB Western Digital WD10EALX-009AMD Radeon HD 7800 2048MB (1050/1450MHz)Realtek ALC889B2430LIntel 82583V Gigabit ConnectionSlackware 14.03.9.9 (x86_64)X Server 1.13.4fglrx 13.10.104.2.12337GCC 4.8.1 + Clang 3.3 + LLVM 3.3OpenBenchmarking.orgCompiler Details- GCC 4.7.3: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran- GCC 4.8.1: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran- LLVM Clang 3.2: Optimized build; Built Jun 20 2013 (14:54:23); Default target: x86_64-unknown-linux-gnu; Host CPU: x86-64- LLVM Clang 3.3: Optimized build; Built Jun 20 2013 (12:21:18); Default target: x86_64-unknown-linux-gnu; Host CPU: x86-64- AMD FX-8120 @ 4.0: --build=x86_64-slackware-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-checking=release --enable-java-home --enable-languages=ada,c,c++,fortran,go,java,lto,objc --enable-libssp --enable-lto --enable-multilib --enable-objc-gc --enable-shared --enable-threads=posix --host=x86_64-slackware-linux --mandir=/usr/man --target=x86_64-slackware-linux --verbose --with-antlr-jar=/home/slackware/slackbuilds/gcc/antlr-runtime-3.4.jar --with-arch-directory=amd64 --with-gnu-ld --with-java-home=/usr/lib64/jvm/jre --with-jvm-jar-dir=/usr/lib64/jvm/jvm-exports --with-jvm-root-dir=/usr/lib64/jvm --with-python-dir=/lib64/python2.7/site-packages Processor Details- GCC 4.7.3: Scaling Governor: acpi- freq ondemand- GCC 4.8.1: Scaling Governor: acpi- freq ondemand- LLVM Clang 3.2: Scaling Governor: acpi- freq ondemand- LLVM Clang 3.3: Scaling Governor: acpi- freq ondemand- AMD FX-8120 @ 4.0: Scaling Governor: acpi- freq conservative

LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 Haswellhmmer: Pfam Database Searchmafft: Multiple Sequence Alignmentblake2: Phoronix Test Suite v4.8.0m0scimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationtscp: AI Chess Performancex264: H.264 Video Encodinghimeno: Poisson Pressure Solverbuild-imagemagick: Time To Compilebuild-php: Time To Compilec-ray: Total Timeprimesieve: 1e12 Prime Number Generationsmallpt: Global Illumination Renderer; 100 Samplesencode-mp3: WAV To MP3ffmpeg: H.264 HD To NTSC DVtachyon: Total Timeapache: Static Web Page ServingGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.010.415.275.32450.21251.671177.861861.551163.52631626155.331663.9793.6732.9221.4579.362512.8625743.9910.305.175.30615.32245.001123.801773.351164.63599455156.341593.0978.6133.3017.0279.242512.8125786.1511.925.967.54615.04246.791234.191774.031670.77624323155.351532.5131.9419.5927.4614012.7412.7710.4425888.9510.946.197.45619.77237.861263.291827.341666.24624749153.151419.9034.3521.0327.03326.8514214.4513.1810.9825295.8213.076.469.14359.4270.04999.201750.14724.80303125148.51635.5535.7227.19172.4313220.9419.5818928.33OpenBenchmarking.org

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.03691215SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.18, N = 4SE +/- 0.07, N = 310.4110.3011.9210.9413.07-march=core-avx2 -O3-march=core-avx2 -O3-march=core-avx2 -O3-march=core-avx2 -O3-O21. (CC) gcc options: -pthread -lhmmer -lsquid -lm

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 6.864Multiple Sequence AlignmentGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0246810SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.10, N = 3SE +/- 0.12, N = 3SE +/- 0.16, N = 65.275.175.966.196.461. (CC) gcc options: -O3 -lm -lpthread

BLAKE2

Phoronix Test Suite v4.8.0m0

OpenBenchmarking.orgCycles Per Byte, Fewer Is BetterBLAKE2 20121223Phoronix Test Suite v4.8.0m0GCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.03691215SE +/- 0.00, N = 3SE +/- 0.01, N = 3SE +/- 0.11, N = 3SE +/- 0.13, N = 4SE +/- 0.00, N = 35.325.307.547.459.141. (CC) gcc options: -std=gnu99 -O3 -march=native -lcrypto -lz

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0130260390520650SE +/- 0.55, N = 4SE +/- 0.00, N = 4SE +/- 5.67, N = 4SE +/- 0.89, N = 4SE +/- 0.58, N = 4450.21615.32615.04619.77359.42

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.050100150200250SE +/- 0.68, N = 4SE +/- 0.34, N = 4SE +/- 1.60, N = 4SE +/- 0.98, N = 4SE +/- 0.63, N = 4251.67245.00246.79237.8670.04

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.030060090012001500SE +/- 0.85, N = 4SE +/- 5.13, N = 4SE +/- 30.06, N = 4SE +/- 5.09, N = 4SE +/- 7.67, N = 41177.861123.801234.191263.29999.20

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0400800120016002000SE +/- 1.88, N = 4SE +/- 1.48, N = 4SE +/- 36.01, N = 4SE +/- 22.59, N = 4SE +/- 6.33, N = 41861.551773.351774.031827.341750.14

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0400800120016002000SE +/- 1.28, N = 4SE +/- 1.11, N = 4SE +/- 0.00, N = 4SE +/- 1.85, N = 4SE +/- 0.99, N = 41163.521164.631670.771666.24724.80

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0140K280K420K560K700KSE +/- 0.00, N = 5SE +/- 520.60, N = 5SE +/- 141.40, N = 5SE +/- 480.04, N = 5SE +/- 190.28, N = 5631626599455624323624749303125

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2013-06-08H.264 Video EncodingGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0306090120150SE +/- 0.33, N = 5SE +/- 0.40, N = 5SE +/- 0.60, N = 5SE +/- 0.58, N = 5SE +/- 0.66, N = 5155.33156.34155.35153.15148.51-march=core-avx2-march=core-avx2-march=core-avx2-march=core-avx21. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure SolverGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0400800120016002000SE +/- 0.19, N = 3SE +/- 20.45, N = 3SE +/- 2.02, N = 3SE +/- 13.80, N = 3SE +/- 1.68, N = 31663.971593.091532.511419.90635.55-march=core-avx2-march=core-avx2-march=core-avx2-march=core-avx21. (CC) gcc options: -O3

Timed ImageMagick Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed ImageMagick Compilation 6.8.1-10Time To CompileGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.320406080100SE +/- 0.60, N = 3SE +/- 0.07, N = 3SE +/- 0.12, N = 3SE +/- 0.37, N = 393.6778.6131.9434.35

Timed PHP Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 5.2.9Time To CompileGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0816243240SE +/- 0.14, N = 3SE +/- 0.46, N = 3SE +/- 0.03, N = 3SE +/- 0.20, N = 3SE +/- 0.09, N = 332.9233.3019.5921.0335.72-march=core-avx2 -O3-march=core-avx2 -O3-march=core-avx2 -O3 -lpthread-march=core-avx2 -O3-O21. (CC) gcc options: -pedantic -ldl -lz -lm

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total TimeGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0612182430SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 321.4517.0227.4627.0327.19-march=core-avx2-march=core-avx2-march=core-avx2-march=core-avx21. (CC) gcc options: -lm -lpthread -O3

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 4.21e12 Prime Number GenerationGCC 4.7.3GCC 4.8.1LLVM Clang 3.3AMD FX-8120 @ 4.070140210280350SE +/- 0.06, N = 3SE +/- 0.08, N = 3SE +/- 2.08, N = 3SE +/- 0.05, N = 379.3679.24326.85172.43-fopenmp-fopenmp-fopenmp1. (CXX) g++ options: -O2

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 SamplesGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0306090120150SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 1.33, N = 3SE +/- 0.33, N = 32525140142132-march=core-avx2 -O3-march=core-avx2 -O3-march=core-avx2 -O3-march=core-avx2 -O31. (CXX) g++ options: -fopenmp

LAME MP3 Encoding

WAV To MP3

OpenBenchmarking.orgSeconds, Fewer Is BetterLAME MP3 Encoding 3.99.3WAV To MP3LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0510152025SE +/- 0.01, N = 5SE +/- 0.01, N = 5SE +/- 0.02, N = 512.7414.4520.94-march=core-avx2-march=core-avx2-fomit-frame-pointer -ffast-math -lncurses1. (CC) gcc options: -pipe -O3 -lm

FFmpeg

H.264 HD To NTSC DV

OpenBenchmarking.orgSeconds, Fewer Is BetterFFmpeg 1.1H.264 HD To NTSC DVGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.33691215SE +/- 0.08, N = 3SE +/- 0.06, N = 3SE +/- 0.04, N = 3SE +/- 0.12, N = 312.8612.8112.7713.18-fno-tree-vectorize -MF -MT-fno-tree-vectorize -MF -MT-Qunused-arguments-Qunused-arguments1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -march=core-avx2 -O3 -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -MMD

Tachyon

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterTachyon 0.98.9Total TimeLLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.0510152025SE +/- 0.06, N = 3SE +/- 0.24, N = 6SE +/- 0.02, N = 310.4410.9819.581. (CC) gcc options: -m32 -O3 -fomit-frame-pointer -ffast-math -ltachyon -lm -lpthread

Apache Benchmark

Static Web Page Serving

OpenBenchmarking.orgRequests Per Second, More Is BetterApache Benchmark 2.4.3Static Web Page ServingGCC 4.7.3GCC 4.8.1LLVM Clang 3.2LLVM Clang 3.3AMD FX-8120 @ 4.06K12K18K24K30KSE +/- 51.73, N = 3SE +/- 36.32, N = 3SE +/- 32.41, N = 3SE +/- 280.78, N = 3SE +/- 334.57, N = 325743.9925786.1525888.9525295.8218928.33-march=core-avx2 -O3-march=core-avx2 -O3-march=core-avx2 -O3-march=core-avx2 -O3-O21. (CC) gcc options: -shared -fPIC -pthread


Phoronix Test Suite v10.8.4