LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 Haswell Intel Core i7-4770K testing with a Intel DH87RL motherboard looking at the GCC 4.7, GCC 4.8, LLVM Clang 3.2, and LLVM Clang 3.3 compiler performance with core-avx2 Haswell optimizations. Intel Core i7 Haswell core-avx2 compiler benchmarks for a future article on Phoronix by Michael Larabel.
HTML result view exported from: https://openbenchmarking.org/result/1306206-SO-LLVMCLANG08&sor .
LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 Haswell Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 Intel Core i7-4770K @ 3.50GHz (8 Cores) Intel DH87RL Intel 4th Gen Core DRAM 15360MB 240GB OCZ VERTEX3 Intel Haswell Desktop Intel Haswell HDMI VA2431 Intel Connection I217-V Ubuntu 13.10 3.10.0-999-generic (x86_64) Unity 7.0.0 X Server 1.13.3 intel 2.21.9 3.0 Mesa 9.1.3 GCC 4.7.3 ext4 1920x1080 GCC 4.8.1 Clang 3.2 + LLVM 3.2svn Clang 3.3 + LLVM 3.3 OpenBenchmarking.org Compiler Details - GCC 4.7.3: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran - GCC 4.8.1: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran - LLVM Clang 3.2: Optimized build; Built Jun 20 2013 (14:54:23); Default target: x86_64-unknown-linux-gnu; Host CPU: x86-64 - LLVM Clang 3.3: Optimized build; Built Jun 20 2013 (12:21:18); Default target: x86_64-unknown-linux-gnu; Host CPU: x86-64 Processor Details - Scaling Governor: acpi- freq ondemand
LLVM Clang 3.3 vs. GCC 4.8 - Intel Core-AVX2 Haswell hmmer: Pfam Database Search mafft: Multiple Sequence Alignment blake2: Phoronix Test Suite v4.8.0m0 scimark2: Monte Carlo scimark2: Fast Fourier Transform scimark2: Sparse Matrix Multiply scimark2: Dense LU Matrix Factorization scimark2: Jacobi Successive Over-Relaxation tscp: AI Chess Performance x264: H.264 Video Encoding himeno: Poisson Pressure Solver build-imagemagick: Time To Compile build-php: Time To Compile c-ray: Total Time primesieve: 1e12 Prime Number Generation smallpt: Global Illumination Renderer; 100 Samples encode-mp3: WAV To MP3 ffmpeg: H.264 HD To NTSC DV tachyon: Total Time apache: Static Web Page Serving GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 10.41 5.27 5.32 450.21 251.67 1177.86 1861.55 1163.52 631626 155.33 1663.97 93.67 32.92 21.45 79.36 25 12.86 25743.99 10.30 5.17 5.30 615.32 245.00 1123.80 1773.35 1164.63 599455 156.34 1593.09 78.61 33.30 17.02 79.24 25 12.81 25786.15 11.92 5.96 7.54 615.04 246.79 1234.19 1774.03 1670.77 624323 155.35 1532.51 31.94 19.59 27.46 140 12.74 12.77 10.44 25888.95 10.94 6.19 7.45 619.77 237.86 1263.29 1827.34 1666.24 624749 153.15 1419.90 34.35 21.03 27.03 326.85 142 14.45 13.18 10.98 25295.82 OpenBenchmarking.org
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search GCC 4.8.1 GCC 4.7.3 LLVM Clang 3.3 LLVM Clang 3.2 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.18, N = 4 SE +/- 0.00, N = 3 10.30 10.41 10.94 11.92 1. (CC) gcc options: -march=core-avx2 -O3 -pthread -lhmmer -lsquid -lm
Timed MAFFT Alignment Multiple Sequence Alignment OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 6.864 Multiple Sequence Alignment GCC 4.8.1 GCC 4.7.3 LLVM Clang 3.2 LLVM Clang 3.3 2 4 6 8 10 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.10, N = 3 SE +/- 0.12, N = 3 5.17 5.27 5.96 6.19 1. (CC) gcc options: -O3 -lm -lpthread
BLAKE2 Phoronix Test Suite v4.8.0m0 OpenBenchmarking.org Cycles Per Byte, Fewer Is Better BLAKE2 20121223 Phoronix Test Suite v4.8.0m0 GCC 4.8.1 GCC 4.7.3 LLVM Clang 3.3 LLVM Clang 3.2 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.13, N = 4 SE +/- 0.11, N = 3 5.30 5.32 7.45 7.54 1. (CC) gcc options: -std=gnu99 -O3 -march=native -lcrypto -lz
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo LLVM Clang 3.3 GCC 4.8.1 LLVM Clang 3.2 GCC 4.7.3 130 260 390 520 650 SE +/- 0.89, N = 4 SE +/- 0.00, N = 4 SE +/- 5.67, N = 4 SE +/- 0.55, N = 4 619.77 615.32 615.04 450.21
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform GCC 4.7.3 LLVM Clang 3.2 GCC 4.8.1 LLVM Clang 3.3 50 100 150 200 250 SE +/- 0.68, N = 4 SE +/- 1.60, N = 4 SE +/- 0.34, N = 4 SE +/- 0.98, N = 4 251.67 246.79 245.00 237.86
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply LLVM Clang 3.3 LLVM Clang 3.2 GCC 4.7.3 GCC 4.8.1 300 600 900 1200 1500 SE +/- 5.09, N = 4 SE +/- 30.06, N = 4 SE +/- 0.85, N = 4 SE +/- 5.13, N = 4 1263.29 1234.19 1177.86 1123.80
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization GCC 4.7.3 LLVM Clang 3.3 LLVM Clang 3.2 GCC 4.8.1 400 800 1200 1600 2000 SE +/- 1.88, N = 4 SE +/- 22.59, N = 4 SE +/- 36.01, N = 4 SE +/- 1.48, N = 4 1861.55 1827.34 1774.03 1773.35
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation LLVM Clang 3.2 LLVM Clang 3.3 GCC 4.8.1 GCC 4.7.3 400 800 1200 1600 2000 SE +/- 0.00, N = 4 SE +/- 1.85, N = 4 SE +/- 1.11, N = 4 SE +/- 1.28, N = 4 1670.77 1666.24 1164.63 1163.52
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance GCC 4.7.3 LLVM Clang 3.3 LLVM Clang 3.2 GCC 4.8.1 140K 280K 420K 560K 700K SE +/- 0.00, N = 5 SE +/- 480.04, N = 5 SE +/- 141.40, N = 5 SE +/- 520.60, N = 5 631626 624749 624323 599455
x264 H.264 Video Encoding OpenBenchmarking.org Frames Per Second, More Is Better x264 2013-06-08 H.264 Video Encoding GCC 4.8.1 LLVM Clang 3.2 GCC 4.7.3 LLVM Clang 3.3 30 60 90 120 150 SE +/- 0.40, N = 5 SE +/- 0.60, N = 5 SE +/- 0.33, N = 5 SE +/- 0.58, N = 5 156.34 155.35 155.33 153.15 1. (CC) gcc options: -ldl -m64 -lm -lpthread -O3 -ffast-math -march=core-avx2 -std=gnu99 -fomit-frame-pointer -fno-tree-vectorize
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 400 800 1200 1600 2000 SE +/- 0.19, N = 3 SE +/- 20.45, N = 3 SE +/- 2.02, N = 3 SE +/- 13.80, N = 3 1663.97 1593.09 1532.51 1419.90 1. (CC) gcc options: -O3 -march=core-avx2
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.8.1-10 Time To Compile LLVM Clang 3.2 LLVM Clang 3.3 GCC 4.8.1 GCC 4.7.3 20 40 60 80 100 SE +/- 0.12, N = 3 SE +/- 0.37, N = 3 SE +/- 0.07, N = 3 SE +/- 0.60, N = 3 31.94 34.35 78.61 93.67
Timed PHP Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 5.2.9 Time To Compile LLVM Clang 3.2 LLVM Clang 3.3 GCC 4.7.3 GCC 4.8.1 8 16 24 32 40 SE +/- 0.03, N = 3 SE +/- 0.20, N = 3 SE +/- 0.14, N = 3 SE +/- 0.46, N = 3 19.59 21.03 32.92 33.30 -lpthread 1. (CC) gcc options: -march=core-avx2 -O3 -pedantic -ldl -lz -lm
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time GCC 4.8.1 GCC 4.7.3 LLVM Clang 3.3 LLVM Clang 3.2 6 12 18 24 30 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 17.02 21.45 27.03 27.46 1. (CC) gcc options: -lm -lpthread -O3 -march=core-avx2
Primesieve 1e12 Prime Number Generation OpenBenchmarking.org Seconds, Fewer Is Better Primesieve 4.2 1e12 Prime Number Generation GCC 4.8.1 GCC 4.7.3 LLVM Clang 3.3 70 140 210 280 350 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 SE +/- 2.08, N = 3 79.24 79.36 326.85 -fopenmp -fopenmp 1. (CXX) g++ options: -O2
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples GCC 4.7.3 GCC 4.8.1 LLVM Clang 3.2 LLVM Clang 3.3 30 60 90 120 150 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 1.33, N = 3 25 25 140 142 1. (CXX) g++ options: -fopenmp -march=core-avx2 -O3
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.99.3 WAV To MP3 LLVM Clang 3.2 LLVM Clang 3.3 4 8 12 16 20 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 12.74 14.45 1. (CC) gcc options: -pipe -march=core-avx2 -O3 -lm
FFmpeg H.264 HD To NTSC DV OpenBenchmarking.org Seconds, Fewer Is Better FFmpeg 1.1 H.264 HD To NTSC DV LLVM Clang 3.2 GCC 4.8.1 GCC 4.7.3 LLVM Clang 3.3 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.08, N = 3 SE +/- 0.12, N = 3 12.77 12.81 12.86 13.18 -Qunused-arguments -fno-tree-vectorize -MF -MT -fno-tree-vectorize -MF -MT -Qunused-arguments 1. (CC) gcc options: -lavdevice -lavfilter -lavformat -lavcodec -lswresample -lswscale -lavutil -ldl -lasound -lSDL -lm -pthread -lbz2 -march=core-avx2 -O3 -std=c99 -fomit-frame-pointer -fno-math-errno -fno-signed-zeros -MMD
Tachyon Total Time OpenBenchmarking.org Seconds, Fewer Is Better Tachyon 0.98.9 Total Time LLVM Clang 3.2 LLVM Clang 3.3 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.24, N = 6 10.44 10.98 1. (CC) gcc options: -m32 -O3 -fomit-frame-pointer -ffast-math -ltachyon -lm -lpthread
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.3 Static Web Page Serving LLVM Clang 3.2 GCC 4.8.1 GCC 4.7.3 LLVM Clang 3.3 6K 12K 18K 24K 30K SE +/- 32.41, N = 3 SE +/- 36.32, N = 3 SE +/- 51.73, N = 3 SE +/- 280.78, N = 3 25888.95 25786.15 25743.99 25295.82 1. (CC) gcc options: -shared -fPIC -pthread -march=core-avx2 -O3
Phoronix Test Suite v10.8.5