LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 Haswell Intel Core i7-4770K testing with a Intel DH87RL motherboard looking at the GCC 4.7, GCC 4.8, LLVM Clang 3.2, and LLVM Clang 3.3 compiler performance with core-avx2 Haswell optimizations. Intel Core i7 Haswell core-avx2 compiler benchmarks for a future article on Phoronix by Michael Larabel.
HTML result view exported from: https://openbenchmarking.org/result/1307173-UT-1306206SO69 .
LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 Haswell Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 Intel Core i7-4770K @ 3.50GHz (8 Cores) Intel DH87RL Intel 4th Gen Core DRAM 15360MB 240GB OCZ VERTEX3 Intel Haswell Desktop Intel Haswell HDMI VA2431 Intel Connection I217-V Ubuntu 13.10 3.10.0-999-generic (x86_64) Unity 7.0.0 X Server 1.13.3 intel 2.21.9 3.0 Mesa 9.1.3 GCC 4.7.3 ext4 1920x1080 GCC 4.8.1 Clang 3.2 + LLVM 3.2svn Clang 3.3 + LLVM 3.3 AMD FX-8120 Eight-Core @ 4.00GHz (8 Cores) ASUS Crosshair V Formula AMD ATI RD890 bridge 8192MB 1000GB Western Digital WD10EALX-009 AMD Radeon HD 7800 2048MB (1050/1450MHz) Realtek ALC889 B2430L Intel 82583V Gigabit Connection Slackware 14.0 3.9.9 (x86_64) X Server 1.13.4 fglrx 13.10.10 4.2.12337 GCC 4.8.1 + Clang 3.3 + LLVM 3.3 OpenBenchmarking.org Compiler Details - GCC 4.7.3: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran - GCC 4.8.1: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran - LLVM Clang 3.2: Optimized build; Built Jun 20 2013 (14:54:23); Default target: x86_64-unknown-linux-gnu; Host CPU: x86-64 - LLVM Clang 3.3: Optimized build; Built Jun 20 2013 (12:21:18); Default target: x86_64-unknown-linux-gnu; Host CPU: x86-64 - AMD FX-8120 @ 4.0: --build=x86_64-slackware-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-checking=release --enable-java-home --enable-languages=ada,c,c++,fortran,go,java,lto,objc --enable-libssp --enable-lto --enable-multilib --enable-objc-gc --enable-shared --enable-threads=posix --host=x86_64-slackware-linux --mandir=/usr/man --target=x86_64-slackware-linux --verbose --with-antlr-jar=/home/slackware/slackbuilds/gcc/antlr-runtime-3.4.jar --with-arch-directory=amd64 --with-gnu-ld --with-java-home=/usr/lib64/jvm/jre --with-jvm-jar-dir=/usr/lib64/jvm/jvm-exports --with-jvm-root-dir=/usr/lib64/jvm --with-python-dir=/lib64/python2.7/site-packages Processor Details - GCC 4.7.3: Scaling Governor: acpi- freq ondemand - GCC 4.8.1: Scaling Governor: acpi- freq ondemand - LLVM Clang 3.2: Scaling Governor: acpi- freq ondemand - LLVM Clang 3.3: Scaling Governor: acpi- freq ondemand - AMD FX-8120 @ 4.0: Scaling Governor: acpi- freq conservative
LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 Haswell hmmer: Pfam Database Search mafft: Multiple Sequence Alignment blake2: Phoronix Test Suite v4.8.0m0 scimark2: Monte Carlo scimark2: Fast Fourier Transform scimark2: Sparse Matrix Multiply scimark2: Dense LU Matrix Factorization scimark2: Jacobi Successive Over-Relaxation tscp: AI Chess Performance x264: H.264 Video Encoding himeno: Poisson Pressure Solver build-imagemagick: Time To Compile build-php: Time To Compile c-ray: Total Time primesieve: 1e12 Prime Number Generation smallpt: Global Illumination Renderer; 100 Samples encode-mp3: WAV To MP3 ffmpeg: H.264 HD To NTSC DV tachyon: Total Time apache: Static Web Page Serving GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 10.41 5.27 5.32 450.21 251.67 1177.86 1861.55 1163.52 631626 155.33 1663.97 93.67 32.92 21.45 79.36 25 12.86 25743.99 10.30 5.17 5.30 615.32 245.00 1123.80 1773.35 1164.63 599455 156.34 1593.09 78.61 33.30 17.02 79.24 25 12.81 25786.15 11.92 5.96 7.54 615.04 246.79 1234.19 1774.03 1670.77 624323 155.35 1532.51 31.94 19.59 27.46 140 12.74 12.77 10.44 25888.95 10.94 6.19 7.45 619.77 237.86 1263.29 1827.34 1666.24 624749 153.15 1419.90 34.35 21.03 27.03 326.85 142 14.45 13.18 10.98 25295.82 13.07 6.46 9.14 359.42 70.04 999.20 1750.14 724.80 303125 148.51 635.55 35.72 27.19 172.43 132 20.94 19.58 18928.33 OpenBenchmarking.org
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.18, N = 4 SE +/- 0.07, N = 3 10.41 10.30 11.92 10.94 13.07 -march=core-avx2 -O3 -march=core-avx2 -O3 -march=core-avx2 -O3 -march=core-avx2 -O3 -O2 1. (CC) gcc options: -pthread -lhmmer -lsquid -lm
Timed MAFFT Alignment Multiple Sequence Alignment OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 6.864 Multiple Sequence Alignment GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 2 4 6 8 10 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.10, N = 3 SE +/- 0.12, N = 3 SE +/- 0.16, N = 6 5.27 5.17 5.96 6.19 6.46 1. (CC) gcc options: -O3 -lm -lpthread
BLAKE2 Phoronix Test Suite v4.8.0m0 OpenBenchmarking.org Cycles Per Byte, Fewer Is Better BLAKE2 20121223 Phoronix Test Suite v4.8.0m0 GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.11, N = 3 SE +/- 0.13, N = 4 SE +/- 0.00, N = 3 5.32 5.30 7.54 7.45 9.14 1. (CC) gcc options: -std=gnu99 -O3 -march=native -lcrypto -lz
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 130 260 390 520 650 SE +/- 0.55, N = 4 SE +/- 0.00, N = 4 SE +/- 5.67, N = 4 SE +/- 0.89, N = 4 SE +/- 0.58, N = 4 450.21 615.32 615.04 619.77 359.42
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 50 100 150 200 250 SE +/- 0.68, N = 4 SE +/- 0.34, N = 4 SE +/- 1.60, N = 4 SE +/- 0.98, N = 4 SE +/- 0.63, N = 4 251.67 245.00 246.79 237.86 70.04
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 300 600 900 1200 1500 SE +/- 0.85, N = 4 SE +/- 5.13, N = 4 SE +/- 30.06, N = 4 SE +/- 5.09, N = 4 SE +/- 7.67, N = 4 1177.86 1123.80 1234.19 1263.29 999.20
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 400 800 1200 1600 2000 SE +/- 1.88, N = 4 SE +/- 1.48, N = 4 SE +/- 36.01, N = 4 SE +/- 22.59, N = 4 SE +/- 6.33, N = 4 1861.55 1773.35 1774.03 1827.34 1750.14
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 400 800 1200 1600 2000 SE +/- 1.28, N = 4 SE +/- 1.11, N = 4 SE +/- 0.00, N = 4 SE +/- 1.85, N = 4 SE +/- 0.99, N = 4 1163.52 1164.63 1670.77 1666.24 724.80
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 140K 280K 420K 560K 700K SE +/- 0.00, N = 5 SE +/- 520.60, N = 5 SE +/- 141.40, N = 5 SE +/- 480.04, N = 5 SE +/- 190.28, N = 5 631626 599455 624323 624749 303125
x264 H.264 Video Encoding OpenBenchmarking.org Frames Per Second, More Is Better x264 2013-06-08 H.264 Video Encoding GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 30 60 90 120 150 SE +/- 0.33, N = 5 SE +/- 0.40, N = 5 SE +/- 0.60, N = 5 SE +/- 0.58, N = 5 SE +/- 0.66, N = 5 155.33 156.34 155.35 153.15 148.51 -march=core-avx2 -march=core-avx2 -march=core-avx2 -march=core-avx2 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 400 800 1200 1600 2000 SE +/- 0.19, N = 3 SE +/- 20.45, N = 3 SE +/- 2.02, N = 3 SE +/- 13.80, N = 3 SE +/- 1.68, N = 3 1663.97 1593.09 1532.51 1419.90 635.55 -march=core-avx2 -march=core-avx2 -march=core-avx2 -march=core-avx2 1. (CC) gcc options: -O3
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.8.1-10 Time To Compile GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 20 40 60 80 100 SE +/- 0.60, N = 3 SE +/- 0.07, N = 3 SE +/- 0.12, N = 3 SE +/- 0.37, N = 3 93.67 78.61 31.94 34.35
Timed PHP Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 5.2.9 Time To Compile GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 8 16 24 32 40 SE +/- 0.14, N = 3 SE +/- 0.46, N = 3 SE +/- 0.03, N = 3 SE +/- 0.20, N = 3 SE +/- 0.09, N = 3 32.92 33.30 19.59 21.03 35.72 -march=core-avx2 -O3 -march=core-avx2 -O3 -march=core-avx2 -O3 -lpthread -march=core-avx2 -O3 -O2 1. (CC) gcc options: -pedantic -ldl -lz -lm
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 21.45 17.02 27.46 27.03 27.19 -march=core-avx2 -march=core-avx2 -march=core-avx2 -march=core-avx2 1. (CC) gcc options: -lm -lpthread -O3
Primesieve 1e12 Prime Number Generation OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 4.2 1e12 Prime Number Generation GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.3 AMD FX-8120 @ 4.0 70 140 210 280 350 SE +/- 0.06, N = 3 SE +/- 0.08, N = 3 SE +/- 2.08, N = 3 SE +/- 0.05, N = 3 79.36 79.24 326.85 172.43 -fopenmp -fopenmp -fopenmp 1. (CXX) g++ options: -O2
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 1.33, N = 3 SE +/- 0.33, N = 3 25 25 140 142 132 -march=core-avx2 -O3 -march=core-avx2 -O3 -march=core-avx2 -O3 -march=core-avx2 -O3 1. (CXX) g++ options: -fopenmp
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.99.3 WAV To MP3 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 5 10 15 20 25 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.02, N = 5 12.74 14.45 20.94 -march=core-avx2 -march=core-avx2 -fomit-frame-pointer -ffast-math -lncurses 1. (CC) gcc options: -pipe -O3 -lm
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 1.1 H.264 HD To NTSC DV GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 3 6 9 12 15 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 SE +/- 0.04, N = 3 SE +/- 0.12, N = 3 12.86 12.81 12.77 13.18 -fno-tree-vectorize -MF -MT -fno-tree-vectorize -MF -MT -Qunused-arguments -Qunused-arguments 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -march=core-avx2 -O3 -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -MMD
Tachyon Total Time OpenBenchmarking.org Seconds, Fewer Is Better Tachyon 0.98.9 Total Time LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 5 10 15 20 25 SE +/- 0.06, N = 3 SE +/- 0.24, N = 6 SE +/- 0.02, N = 3 10.44 10.98 19.58 1. (CC) gcc options: -m32 -O3 -fomit-frame-pointer -ffast-math -ltachyon -lm -lpthread
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.3 Static Web Page Serving GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 AMD FX-8120 @ 4.0 6K 12K 18K 24K 30K SE +/- 51.73, N = 3 SE +/- 36.32, N = 3 SE +/- 32.41, N = 3 SE +/- 280.78, N = 3 SE +/- 334.57, N = 3 25743.99 25786.15 25888.95 25295.82 18928.33 -march=core-avx2 -O3 -march=core-avx2 -O3 -march=core-avx2 -O3 -march=core-avx2 -O3 -O2 1. (CC) gcc options: -shared -fPIC -pthread
Phoronix Test Suite v10.8.4