GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking Benchmarks by Michael Larabel for a future article on phoronix looking at early GCC 7 compiler performance compared to GCC 6 and GCC 5 and then LLVM Clang 3.8 and Clang 3.9.
HTML result view exported from: https://openbenchmarking.org/result/1609139-LO-GCCCLANG151&grt&sor .
GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 Intel Xeon E5-2609 v4 @ 1.70GHz (8 Cores) MSI X99A WORKSTATION (MS-7A54) v1.0 Intel Xeon E7 v4/Xeon 16384MB 3 x 120GB TOSHIBA-TR150 LLVMpipe Realtek ALC1150 Intel Connection Ubuntu 16.04 4.8.0-999-generic (x86_64) 20160908 Unity 7.4.0 X Server 1.18.3 modesetting 1.18.3 3.3 Mesa 11.2.0 Gallium 0.4 GCC 5.4.0 ext4 1024x768 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0-2ubuntu4 Clang 3.9.0-svn279689-1~exp1 OpenBenchmarking.org Environment Details - LIBGL_ALWAYS_SOFTWARE=1 Compiler Details - GCC 5.4.0, GCC 6.2.0, GCC 7.0.0 20160904: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran Disk Details - CFQ / data=ordered,errors=remount-ro,relatime,rw Processor Details - Scaling Governor: intel_pstate powersave
GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking apache: Static Web Page Serving bullet: Raytests bullet: 3000 Fall bullet: 136 Ragdolls c-ray: Total Time ebizzy: Phoronix Test Suite v6.6.0 fftw: Float + SSE - 1D FFT Size 4096 fftw: Float + SSE - 2D FFT Size 4096 fhourstones: Complex Connect-4 Solving encode-flac: WAV To FLAC hint: FLOAT himeno: Poisson Pressure Solver encode-mp3: WAV To MP3 lammps: Rhodopsin Protein n-queens: Elapsed Time openssl: RSA 4096-bit Performance pgbench: Buffer Test - Normal Load - Read Write pgbench: Buffer Test - Single Thread - Read Write redis: GET redis: SET scimark2: Composite scimark2: Monte Carlo scimark2: Fast Fourier Transform scimark2: Sparse Matrix Multiply scimark2: Dense LU Matrix Factorization scimark2: Jacobi Successive Over-Relaxation smallpt: Global Illumination Renderer; 100 Samples hmmer: Pfam Database Search build-imagemagick: Time To Compile mafft: Multiple Sequence Alignment build-php: Time To Compile GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 27564.72 5.94 9.13 6.45 21.30 155360 10966 7410.50 6954.67 13.45 177738053.32 893.82 22.84 72.00 57.53 569.37 4468.82 552.55 1434042.92 969029.10 695.70 304.10 228.18 1047.40 1327.15 571.65 47 12.18 67.21 5.99 35.31 28383.22 5.98 9.11 6.46 21.25 155273 10936 7515.98 6884.93 13.44 166219905.04 1091.32 22.42 56.21 569.27 4411.75 546.21 1408567.41 972159.31 738.21 304.07 232.19 1280.02 1302.74 572.05 47 12.18 94.51 6.41 36.25 29512.13 5.96 9.01 6.40 23.36 155692 10951 7517.94 6905.30 13.45 165733740.33 1095.90 22.03 52.54 570.20 4437.25 549.49 1393442.96 965269.60 800.64 304.03 239.27 1290.94 1596.90 572.06 30 12.22 81.27 6.45 36.52 29812.18 6.20 9.57 6.89 39.34 152747 10357 6841.34 7164.37 13.26 141990405.69 748.09 26.94 63.22 566.80 4411.72 538.44 1342892.96 925939.11 1111.78 126.46 223.71 1414.48 2936.85 857.42 12.37 74.34 7.74 28.45 27344.42 6.21 9.83 6.92 44.42 138671 10331 6923.64 6563.47 13.15 143252876.20 841.55 27.46 62.30 511.07 4356.38 531.79 1348061.04 933433.87 1115.00 126.43 226.01 1448.22 2917.96 856.35 13.76 78.45 7.27 35.25 OpenBenchmarking.org
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.7 Static Web Page Serving Clang 3.8.0 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.9.0 6K 12K 18K 24K 30K SE +/- 52.16, N = 3 SE +/- 81.85, N = 3 SE +/- 236.47, N = 3 SE +/- 80.36, N = 3 SE +/- 518.63, N = 3 29812.18 29512.13 28383.22 27564.72 27344.42 1. (CC) gcc options: -shared -fPIC -pthread -O3 -march=native
Bullet Physics Engine Test: Raytests OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Raytests GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 5.94 5.96 5.98 6.20 6.21 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 3000 Fall OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 3000 Fall GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.8.0 Clang 3.9.0 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 9.01 9.11 9.13 9.57 9.83 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 136 Ragdolls OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 136 Ragdolls GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 6.40 6.45 6.46 6.89 6.92 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time GCC 6.2.0 GCC 5.4.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 SE +/- 0.01, N = 3 21.25 21.30 23.36 39.34 44.42 1. (CC) gcc options: -lm -lpthread -O3 -march=native
ebizzy Phoronix Test Suite v6.6.0 OpenBenchmarking.org Records/s, More Is Better ebizzy 0.3 Phoronix Test Suite v6.6.0 GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 30K 60K 90K 120K 150K SE +/- 360.05, N = 3 SE +/- 87.83, N = 3 SE +/- 68.34, N = 3 SE +/- 2308.58, N = 4 SE +/- 392.88, N = 3 155692 155360 155273 152747 138671 1. (CC) gcc options: -pthread -lpthread -O3 -march=native
FFTW Build: Float + SSE - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.4 Build: Float + SSE - Size: 1D FFT Size 4096 GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 2K 4K 6K 8K 10K SE +/- 68.86, N = 5 SE +/- 50.75, N = 5 SE +/- 44.65, N = 5 SE +/- 40.65, N = 5 SE +/- 39.29, N = 5 10966 10951 10936 10357 10331 1. (CC) gcc options: -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.4 Build: Float + SSE - Size: 2D FFT Size 4096 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.9.0 Clang 3.8.0 1600 3200 4800 6400 8000 SE +/- 15.43, N = 5 SE +/- 22.70, N = 5 SE +/- 43.96, N = 5 SE +/- 16.42, N = 5 SE +/- 36.82, N = 5 7517.94 7515.98 7410.50 6923.64 6841.34 1. (CC) gcc options: -O3 -march=native -lm
Fhourstones Complex Connect-4 Solving OpenBenchmarking.org Kpos / sec, More Is Better Fhourstones 3.1 Complex Connect-4 Solving Clang 3.8.0 GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.9.0 1500 3000 4500 6000 7500 SE +/- 0.87, N = 3 SE +/- 5.88, N = 3 SE +/- 10.00, N = 3 SE +/- 11.54, N = 3 SE +/- 2.59, N = 3 7164.37 6954.67 6905.30 6884.93 6563.47 1. (CC) gcc options: -O3
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.1 WAV To FLAC Clang 3.9.0 Clang 3.8.0 GCC 6.2.0 GCC 5.4.0 GCC 7.0.0 20160904 3 6 9 12 15 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 13.15 13.26 13.44 13.45 13.45 -fvisibility=hidden -fvisibility=hidden -fvisibility=hidden 1. (CXX) g++ options: -O3 -march=native -lm
Hierarchical INTegration Test: FLOAT OpenBenchmarking.org QUIPs, More Is Better Hierarchical INTegration 1.0 Test: FLOAT GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.9.0 Clang 3.8.0 40M 80M 120M 160M 200M SE +/- 355658.03, N = 3 SE +/- 165686.10, N = 3 SE +/- 291268.73, N = 3 SE +/- 131098.46, N = 3 SE +/- 15968.48, N = 3 177738053.32 166219905.04 165733740.33 143252876.20 141990405.69 1. (CC) gcc options: -O3 -march=native -lm
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.9.0 Clang 3.8.0 200 400 600 800 1000 SE +/- 0.65, N = 3 SE +/- 1.34, N = 3 SE +/- 0.49, N = 3 SE +/- 1.07, N = 3 SE +/- 0.07, N = 3 1095.90 1091.32 893.82 841.55 748.09 1. (CC) gcc options: -O3 -march=native -mavx2
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.99.3 WAV To MP3 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.8.0 Clang 3.9.0 6 12 18 24 30 SE +/- 0.01, N = 5 SE +/- 0.03, N = 5 SE +/- 0.01, N = 5 SE +/- 0.04, N = 5 SE +/- 0.04, N = 5 22.03 22.42 22.84 26.94 27.46 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -march=native -lm
LAMMPS Molecular Dynamics Simulator Test: Rhodopsin Protein OpenBenchmarking.org Loop Time, Fewer Is Better LAMMPS Molecular Dynamics Simulator 1.0 Test: Rhodopsin Protein Clang 3.9.0 Clang 3.8.0 GCC 5.4.0 16 32 48 64 80 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 SE +/- 0.12, N = 3 62.30 63.22 72.00 1. (CXX) g++ options: -lfftw -lmpich
N-Queens Elapsed Time OpenBenchmarking.org Seconds, Fewer Is Better N-Queens 1.0 Elapsed Time GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 13 26 39 52 65 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 52.54 56.21 57.53 1. (CC) gcc options: -static -fopenmp -O3 -march=native
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.0.1g RSA 4096-bit Performance GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 120 240 360 480 600 SE +/- 0.25, N = 3 SE +/- 0.58, N = 3 SE +/- 1.31, N = 3 SE +/- 1.57, N = 3 SE +/- 2.68, N = 3 570.20 569.37 569.27 566.80 511.07 1. (CC) gcc options: -m64 -O3 -lssl -lcrypto -ldl
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 1000 2000 3000 4000 5000 SE +/- 33.84, N = 3 SE +/- 85.01, N = 3 SE +/- 67.37, N = 5 SE +/- 78.63, N = 3 SE +/- 68.11, N = 3 4468.82 4437.25 4411.75 4411.72 4356.38 -pthreads -mthreads -pthreads -mthreads 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Single Thread - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Single Thread - Mode: Read Write GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 120 240 360 480 600 SE +/- 5.89, N = 3 SE +/- 0.66, N = 3 SE +/- 1.53, N = 3 SE +/- 14.62, N = 6 SE +/- 1.42, N = 3 552.55 549.49 546.21 538.44 531.79 -pthreads -mthreads -pthreads -mthreads 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: GET GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.9.0 Clang 3.8.0 300K 600K 900K 1200K 1500K SE +/- 2474.75, N = 3 SE +/- 9042.38, N = 3 SE +/- 5158.54, N = 3 SE +/- 15364.63, N = 3 SE +/- 2621.93, N = 3 1434042.92 1408567.41 1393442.96 1348061.04 1342892.96 -std=gnu99 -pipe -g3 -O3 -funroll-loops -march=native 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: SET GCC 6.2.0 GCC 5.4.0 GCC 7.0.0 20160904 Clang 3.9.0 Clang 3.8.0 200K 400K 600K 800K 1000K SE +/- 3624.82, N = 3 SE +/- 4216.02, N = 3 SE +/- 2999.32, N = 3 SE +/- 2866.48, N = 3 SE +/- 2474.98, N = 3 972159.31 969029.10 965269.60 933433.87 925939.11 -std=gnu99 -pipe -g3 -O3 -funroll-loops -march=native 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite Clang 3.9.0 Clang 3.8.0 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 200 400 600 800 1000 SE +/- 0.69, N = 4 SE +/- 0.09, N = 4 SE +/- 0.09, N = 4 SE +/- 0.14, N = 4 SE +/- 2.13, N = 4 1115.00 1111.78 800.64 738.21 695.70 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 70 140 210 280 350 SE +/- 0.00, N = 4 SE +/- 0.00, N = 4 SE +/- 0.00, N = 4 SE +/- 0.02, N = 4 SE +/- 0.15, N = 4 304.10 304.07 304.03 126.46 126.43 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.9.0 Clang 3.8.0 50 100 150 200 250 SE +/- 0.16, N = 4 SE +/- 0.38, N = 4 SE +/- 0.13, N = 4 SE +/- 1.03, N = 4 SE +/- 0.36, N = 4 239.27 232.19 228.18 226.01 223.71 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply Clang 3.9.0 Clang 3.8.0 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 300 600 900 1200 1500 SE +/- 2.31, N = 4 SE +/- 0.32, N = 4 SE +/- 0.35, N = 4 SE +/- 0.17, N = 4 SE +/- 2.22, N = 4 1448.22 1414.48 1290.94 1280.02 1047.40 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization Clang 3.8.0 Clang 3.9.0 GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 600 1200 1800 2400 3000 SE +/- 0.55, N = 4 SE +/- 1.74, N = 4 SE +/- 0.14, N = 4 SE +/- 9.92, N = 4 SE +/- 0.23, N = 4 2936.85 2917.96 1596.90 1327.15 1302.74 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Clang 3.8.0 Clang 3.9.0 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 200 400 600 800 1000 SE +/- 0.01, N = 4 SE +/- 0.04, N = 4 SE +/- 0.01, N = 4 SE +/- 0.00, N = 4 SE +/- 0.41, N = 4 857.42 856.35 572.06 572.05 571.65 1. (CXX) g++ options: -O3 -march=native
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 11 22 33 44 55 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.88, N = 3 30 47 47 1. (CXX) g++ options: -fopenmp -O3 -march=native
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 12.18 12.18 12.22 12.37 13.76 1. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.9.0 Time To Compile GCC 5.4.0 Clang 3.8.0 Clang 3.9.0 GCC 7.0.0 20160904 GCC 6.2.0 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 SE +/- 0.22, N = 3 SE +/- 0.14, N = 3 SE +/- 0.17, N = 3 67.21 74.34 78.45 81.27 94.51
Timed MAFFT Alignment Multiple Sequence Alignment OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 6.864 Multiple Sequence Alignment GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.9.0 Clang 3.8.0 2 4 6 8 10 SE +/- 0.03, N = 3 SE +/- 0.19, N = 6 SE +/- 0.10, N = 6 SE +/- 0.17, N = 6 SE +/- 0.14, N = 6 5.99 6.41 6.45 7.27 7.74 1. (CC) gcc options: -O3 -lm -lpthread
Timed PHP Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 5.2.9 Time To Compile Clang 3.8.0 Clang 3.9.0 GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 8 16 24 32 40 SE +/- 0.17, N = 3 SE +/- 0.40, N = 3 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 SE +/- 0.05, N = 3 28.45 35.25 35.31 36.25 36.52 1. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm
Phoronix Test Suite v10.8.5