GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking Benchmarks by Michael Larabel for a future article on phoronix looking at early GCC 7 compiler performance compared to GCC 6 and GCC 5 and then LLVM Clang 3.8 and Clang 3.9.
HTML result view exported from: https://openbenchmarking.org/result/1609139-LO-GCCCLANG151 .
GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 Intel Xeon E5-2609 v4 @ 1.70GHz (8 Cores) MSI X99A WORKSTATION (MS-7A54) v1.0 Intel Xeon E7 v4/Xeon 16384MB 3 x 120GB TOSHIBA-TR150 LLVMpipe Realtek ALC1150 Intel Connection Ubuntu 16.04 4.8.0-999-generic (x86_64) 20160908 Unity 7.4.0 X Server 1.18.3 modesetting 1.18.3 3.3 Mesa 11.2.0 Gallium 0.4 GCC 5.4.0 ext4 1024x768 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0-2ubuntu4 Clang 3.9.0-svn279689-1~exp1 OpenBenchmarking.org Environment Details - LIBGL_ALWAYS_SOFTWARE=1 Compiler Details - GCC 5.4.0, GCC 6.2.0, GCC 7.0.0 20160904: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran Disk Details - CFQ / data=ordered,errors=remount-ro,relatime,rw Processor Details - Scaling Governor: intel_pstate powersave
GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking lammps: Rhodopsin Protein fftw: Float + SSE - 1D FFT Size 4096 fftw: Float + SSE - 2D FFT Size 4096 hmmer: Pfam Database Search mafft: Multiple Sequence Alignment fhourstones: Complex Connect-4 Solving scimark2: Composite scimark2: Monte Carlo scimark2: Fast Fourier Transform scimark2: Sparse Matrix Multiply scimark2: Dense LU Matrix Factorization scimark2: Jacobi Successive Over-Relaxation himeno: Poisson Pressure Solver ebizzy: Phoronix Test Suite v6.6.0 build-imagemagick: Time To Compile build-php: Time To Compile c-ray: Total Time smallpt: Global Illumination Renderer; 100 Samples bullet: Raytests bullet: 3000 Fall bullet: 136 Ragdolls encode-flac: WAV To FLAC encode-mp3: WAV To MP3 n-queens: Elapsed Time openssl: RSA 4096-bit Performance pgbench: Buffer Test - Normal Load - Read Write pgbench: Buffer Test - Single Thread - Read Write redis: GET redis: SET hint: FLOAT apache: Static Web Page Serving GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 72.00 10966 7410.50 12.18 5.99 6954.67 695.70 304.10 228.18 1047.40 1327.15 571.65 893.82 155360 67.21 35.31 21.30 47 5.94 9.13 6.45 13.45 22.84 57.53 569.37 4468.82 552.55 1434042.92 969029.10 177738053.32 27564.72 10936 7515.98 12.18 6.41 6884.93 738.21 304.07 232.19 1280.02 1302.74 572.05 1091.32 155273 94.51 36.25 21.25 47 5.98 9.11 6.46 13.44 22.42 56.21 569.27 4411.75 546.21 1408567.41 972159.31 166219905.04 28383.22 10951 7517.94 12.22 6.45 6905.30 800.64 304.03 239.27 1290.94 1596.90 572.06 1095.90 155692 81.27 36.52 23.36 30 5.96 9.01 6.40 13.45 22.03 52.54 570.20 4437.25 549.49 1393442.96 965269.60 165733740.33 29512.13 63.22 10357 6841.34 12.37 7.74 7164.37 1111.78 126.46 223.71 1414.48 2936.85 857.42 748.09 152747 74.34 28.45 39.34 6.20 9.57 6.89 13.26 26.94 566.80 4411.72 538.44 1342892.96 925939.11 141990405.69 29812.18 62.30 10331 6923.64 13.76 7.27 6563.47 1115.00 126.43 226.01 1448.22 2917.96 856.35 841.55 138671 78.45 35.25 44.42 6.21 9.83 6.92 13.15 27.46 511.07 4356.38 531.79 1348061.04 933433.87 143252876.20 27344.42 OpenBenchmarking.org
LAMMPS Molecular Dynamics Simulator Test: Rhodopsin Protein OpenBenchmarking.org Loop Time, Fewer Is Better LAMMPS Molecular Dynamics Simulator 1.0 Test: Rhodopsin Protein GCC 5.4.0 Clang 3.8.0 Clang 3.9.0 16 32 48 64 80 SE +/- 0.12, N = 3 SE +/- 0.14, N = 3 SE +/- 0.09, N = 3 72.00 63.22 62.30 1. (CXX) g++ options: -lfftw -lmpich
FFTW Build: Float + SSE - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.4 Build: Float + SSE - Size: 1D FFT Size 4096 GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 2K 4K 6K 8K 10K SE +/- 68.86, N = 5 SE +/- 44.65, N = 5 SE +/- 50.75, N = 5 SE +/- 40.65, N = 5 SE +/- 39.29, N = 5 10966 10936 10951 10357 10331 1. (CC) gcc options: -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.4 Build: Float + SSE - Size: 2D FFT Size 4096 GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 1600 3200 4800 6400 8000 SE +/- 43.96, N = 5 SE +/- 22.70, N = 5 SE +/- 15.43, N = 5 SE +/- 36.82, N = 5 SE +/- 16.42, N = 5 7410.50 7515.98 7517.94 6841.34 6923.64 1. (CC) gcc options: -O3 -march=native -lm
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 12.18 12.18 12.22 12.37 13.76 1. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm
Timed MAFFT Alignment Multiple Sequence Alignment OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 6.864 Multiple Sequence Alignment GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 2 4 6 8 10 SE +/- 0.03, N = 3 SE +/- 0.19, N = 6 SE +/- 0.10, N = 6 SE +/- 0.14, N = 6 SE +/- 0.17, N = 6 5.99 6.41 6.45 7.74 7.27 1. (CC) gcc options: -O3 -lm -lpthread
Fhourstones Complex Connect-4 Solving OpenBenchmarking.org Kpos / sec, More Is Better Fhourstones 3.1 Complex Connect-4 Solving GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 1500 3000 4500 6000 7500 SE +/- 5.88, N = 3 SE +/- 11.54, N = 3 SE +/- 10.00, N = 3 SE +/- 0.87, N = 3 SE +/- 2.59, N = 3 6954.67 6884.93 6905.30 7164.37 6563.47 1. (CC) gcc options: -O3
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 200 400 600 800 1000 SE +/- 2.13, N = 4 SE +/- 0.14, N = 4 SE +/- 0.09, N = 4 SE +/- 0.09, N = 4 SE +/- 0.69, N = 4 695.70 738.21 800.64 1111.78 1115.00 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 70 140 210 280 350 SE +/- 0.00, N = 4 SE +/- 0.00, N = 4 SE +/- 0.00, N = 4 SE +/- 0.02, N = 4 SE +/- 0.15, N = 4 304.10 304.07 304.03 126.46 126.43 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 50 100 150 200 250 SE +/- 0.13, N = 4 SE +/- 0.38, N = 4 SE +/- 0.16, N = 4 SE +/- 0.36, N = 4 SE +/- 1.03, N = 4 228.18 232.19 239.27 223.71 226.01 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 300 600 900 1200 1500 SE +/- 2.22, N = 4 SE +/- 0.17, N = 4 SE +/- 0.35, N = 4 SE +/- 0.32, N = 4 SE +/- 2.31, N = 4 1047.40 1280.02 1290.94 1414.48 1448.22 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 600 1200 1800 2400 3000 SE +/- 9.92, N = 4 SE +/- 0.23, N = 4 SE +/- 0.14, N = 4 SE +/- 0.55, N = 4 SE +/- 1.74, N = 4 1327.15 1302.74 1596.90 2936.85 2917.96 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 200 400 600 800 1000 SE +/- 0.41, N = 4 SE +/- 0.00, N = 4 SE +/- 0.01, N = 4 SE +/- 0.01, N = 4 SE +/- 0.04, N = 4 571.65 572.05 572.06 857.42 856.35 1. (CXX) g++ options: -O3 -march=native
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 200 400 600 800 1000 SE +/- 0.49, N = 3 SE +/- 1.34, N = 3 SE +/- 0.65, N = 3 SE +/- 0.07, N = 3 SE +/- 1.07, N = 3 893.82 1091.32 1095.90 748.09 841.55 1. (CC) gcc options: -O3 -march=native -mavx2
ebizzy Phoronix Test Suite v6.6.0 OpenBenchmarking.org Records/s, More Is Better ebizzy 0.3 Phoronix Test Suite v6.6.0 GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 30K 60K 90K 120K 150K SE +/- 87.83, N = 3 SE +/- 68.34, N = 3 SE +/- 360.05, N = 3 SE +/- 2308.58, N = 4 SE +/- 392.88, N = 3 155360 155273 155692 152747 138671 1. (CC) gcc options: -pthread -lpthread -O3 -march=native
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.9.0 Time To Compile GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.17, N = 3 SE +/- 0.14, N = 3 SE +/- 0.05, N = 3 SE +/- 0.22, N = 3 67.21 94.51 81.27 74.34 78.45
Timed PHP Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 5.2.9 Time To Compile GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 8 16 24 32 40 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 SE +/- 0.05, N = 3 SE +/- 0.17, N = 3 SE +/- 0.40, N = 3 35.31 36.25 36.52 28.45 35.25 1. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 SE +/- 0.01, N = 3 21.30 21.25 23.36 39.34 44.42 1. (CC) gcc options: -lm -lpthread -O3 -march=native
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 11 22 33 44 55 SE +/- 0.33, N = 3 SE +/- 0.88, N = 3 SE +/- 0.00, N = 3 47 47 30 1. (CXX) g++ options: -fopenmp -O3 -march=native
Bullet Physics Engine Test: Raytests OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Raytests GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 5.94 5.98 5.96 6.20 6.21 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 3000 Fall OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 3000 Fall GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 9.13 9.11 9.01 9.57 9.83 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 136 Ragdolls OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 136 Ragdolls GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 6.45 6.46 6.40 6.89 6.92 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.1 WAV To FLAC GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 3 6 9 12 15 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 13.45 13.44 13.45 13.26 13.15 -fvisibility=hidden -fvisibility=hidden -fvisibility=hidden 1. (CXX) g++ options: -O3 -march=native -lm
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.99.3 WAV To MP3 GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 6 12 18 24 30 SE +/- 0.01, N = 5 SE +/- 0.03, N = 5 SE +/- 0.01, N = 5 SE +/- 0.04, N = 5 SE +/- 0.04, N = 5 22.84 22.42 22.03 26.94 27.46 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -march=native -lm
N-Queens Elapsed Time OpenBenchmarking.org Seconds, Fewer Is Better N-Queens 1.0 Elapsed Time GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 13 26 39 52 65 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 57.53 56.21 52.54 1. (CC) gcc options: -static -fopenmp -O3 -march=native
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.0.1g RSA 4096-bit Performance GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 120 240 360 480 600 SE +/- 0.58, N = 3 SE +/- 1.31, N = 3 SE +/- 0.25, N = 3 SE +/- 1.57, N = 3 SE +/- 2.68, N = 3 569.37 569.27 570.20 566.80 511.07 1. (CC) gcc options: -m64 -O3 -lssl -lcrypto -ldl
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 1000 2000 3000 4000 5000 SE +/- 33.84, N = 3 SE +/- 67.37, N = 5 SE +/- 85.01, N = 3 SE +/- 78.63, N = 3 SE +/- 68.11, N = 3 4468.82 4411.75 4437.25 4411.72 4356.38 -pthreads -mthreads -pthreads -mthreads 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Single Thread - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Single Thread - Mode: Read Write GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 120 240 360 480 600 SE +/- 5.89, N = 3 SE +/- 1.53, N = 3 SE +/- 0.66, N = 3 SE +/- 14.62, N = 6 SE +/- 1.42, N = 3 552.55 546.21 549.49 538.44 531.79 -pthreads -mthreads -pthreads -mthreads 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: GET GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 300K 600K 900K 1200K 1500K SE +/- 2474.75, N = 3 SE +/- 9042.38, N = 3 SE +/- 5158.54, N = 3 SE +/- 2621.93, N = 3 SE +/- 15364.63, N = 3 1434042.92 1408567.41 1393442.96 1342892.96 1348061.04 -std=gnu99 -pipe -g3 -O3 -funroll-loops -march=native 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: SET GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 200K 400K 600K 800K 1000K SE +/- 4216.02, N = 3 SE +/- 3624.82, N = 3 SE +/- 2999.32, N = 3 SE +/- 2474.98, N = 3 SE +/- 2866.48, N = 3 969029.10 972159.31 965269.60 925939.11 933433.87 -std=gnu99 -pipe -g3 -O3 -funroll-loops -march=native 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl
Hierarchical INTegration Test: FLOAT OpenBenchmarking.org QUIPs, More Is Better Hierarchical INTegration 1.0 Test: FLOAT GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 40M 80M 120M 160M 200M SE +/- 355658.03, N = 3 SE +/- 165686.10, N = 3 SE +/- 291268.73, N = 3 SE +/- 15968.48, N = 3 SE +/- 131098.46, N = 3 177738053.32 166219905.04 165733740.33 141990405.69 143252876.20 1. (CC) gcc options: -O3 -march=native -lm
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.7 Static Web Page Serving GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 6K 12K 18K 24K 30K SE +/- 80.36, N = 3 SE +/- 236.47, N = 3 SE +/- 81.85, N = 3 SE +/- 52.16, N = 3 SE +/- 518.63, N = 3 27564.72 28383.22 29512.13 29812.18 27344.42 1. (CC) gcc options: -shared -fPIC -pthread -O3 -march=native
Phoronix Test Suite v10.8.5