GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking Benchmarks by Michael Larabel for a future article on phoronix looking at early GCC 7 compiler performance compared to GCC 6 and GCC 5 and then LLVM Clang 3.8 and Clang 3.9.
HTML result view exported from: https://openbenchmarking.org/result/1609139-LO-GCCCLANG151&sor&grr .
GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 Intel Xeon E5-2609 v4 @ 1.70GHz (8 Cores) MSI X99A WORKSTATION (MS-7A54) v1.0 Intel Xeon E7 v4/Xeon 16384MB 3 x 120GB TOSHIBA-TR150 LLVMpipe Realtek ALC1150 Intel Connection Ubuntu 16.04 4.8.0-999-generic (x86_64) 20160908 Unity 7.4.0 X Server 1.18.3 modesetting 1.18.3 3.3 Mesa 11.2.0 Gallium 0.4 GCC 5.4.0 ext4 1024x768 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0-2ubuntu4 Clang 3.9.0-svn279689-1~exp1 OpenBenchmarking.org Environment Details - LIBGL_ALWAYS_SOFTWARE=1 Compiler Details - GCC 5.4.0, GCC 6.2.0, GCC 7.0.0 20160904: --disable-multilib --enable-checking=release --enable-languages=c,c++,fortran Disk Details - CFQ / data=ordered,errors=remount-ro,relatime,rw Processor Details - Scaling Governor: intel_pstate powersave
GCC vs. LLVM Clang 3.8 3.9 Compiler Benchmarking apache: Static Web Page Serving hint: FLOAT redis: SET redis: GET pgbench: Buffer Test - Single Thread - Read Write pgbench: Buffer Test - Normal Load - Read Write openssl: RSA 4096-bit Performance n-queens: Elapsed Time encode-mp3: WAV To MP3 encode-flac: WAV To FLAC bullet: 136 Ragdolls bullet: 3000 Fall bullet: Raytests smallpt: Global Illumination Renderer; 100 Samples c-ray: Total Time build-php: Time To Compile build-imagemagick: Time To Compile ebizzy: Phoronix Test Suite v6.6.0 himeno: Poisson Pressure Solver scimark2: Jacobi Successive Over-Relaxation scimark2: Dense LU Matrix Factorization scimark2: Sparse Matrix Multiply scimark2: Fast Fourier Transform scimark2: Monte Carlo scimark2: Composite fhourstones: Complex Connect-4 Solving mafft: Multiple Sequence Alignment hmmer: Pfam Database Search fftw: Float + SSE - 2D FFT Size 4096 fftw: Float + SSE - 1D FFT Size 4096 lammps: Rhodopsin Protein GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 27564.72 177738053.32 969029.10 1434042.92 552.55 4468.82 569.37 57.53 22.84 13.45 6.45 9.13 5.94 47 21.30 35.31 67.21 155360 893.82 571.65 1327.15 1047.40 228.18 304.10 695.70 6954.67 5.99 12.18 7410.50 10966 72.00 28383.22 166219905.04 972159.31 1408567.41 546.21 4411.75 569.27 56.21 22.42 13.44 6.46 9.11 5.98 47 21.25 36.25 94.51 155273 1091.32 572.05 1302.74 1280.02 232.19 304.07 738.21 6884.93 6.41 12.18 7515.98 10936 29512.13 165733740.33 965269.60 1393442.96 549.49 4437.25 570.20 52.54 22.03 13.45 6.40 9.01 5.96 30 23.36 36.52 81.27 155692 1095.90 572.06 1596.90 1290.94 239.27 304.03 800.64 6905.30 6.45 12.22 7517.94 10951 29812.18 141990405.69 925939.11 1342892.96 538.44 4411.72 566.80 26.94 13.26 6.89 9.57 6.20 39.34 28.45 74.34 152747 748.09 857.42 2936.85 1414.48 223.71 126.46 1111.78 7164.37 7.74 12.37 6841.34 10357 63.22 27344.42 143252876.20 933433.87 1348061.04 531.79 4356.38 511.07 27.46 13.15 6.92 9.83 6.21 44.42 35.25 78.45 138671 841.55 856.35 2917.96 1448.22 226.01 126.43 1115.00 6563.47 7.27 13.76 6923.64 10331 62.30 OpenBenchmarking.org
Apache Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better Apache Benchmark 2.4.7 Static Web Page Serving Clang 3.8.0 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.9.0 6K 12K 18K 24K 30K SE +/- 52.16, N = 3 SE +/- 81.85, N = 3 SE +/- 236.47, N = 3 SE +/- 80.36, N = 3 SE +/- 518.63, N = 3 29812.18 29512.13 28383.22 27564.72 27344.42 1. (CC) gcc options: -shared -fPIC -pthread -O3 -march=native
Hierarchical INTegration Test: FLOAT OpenBenchmarking.org QUIPs, More Is Better Hierarchical INTegration 1.0 Test: FLOAT GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.9.0 Clang 3.8.0 40M 80M 120M 160M 200M SE +/- 355658.03, N = 3 SE +/- 165686.10, N = 3 SE +/- 291268.73, N = 3 SE +/- 131098.46, N = 3 SE +/- 15968.48, N = 3 177738053.32 166219905.04 165733740.33 143252876.20 141990405.69 1. (CC) gcc options: -O3 -march=native -lm
Redis Test: SET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: SET GCC 6.2.0 GCC 5.4.0 GCC 7.0.0 20160904 Clang 3.9.0 Clang 3.8.0 200K 400K 600K 800K 1000K SE +/- 3624.82, N = 3 SE +/- 4216.02, N = 3 SE +/- 2999.32, N = 3 SE +/- 2866.48, N = 3 SE +/- 2474.98, N = 3 972159.31 969029.10 965269.60 933433.87 925939.11 -std=gnu99 -pipe -g3 -O3 -funroll-loops -march=native 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl
Redis Test: GET OpenBenchmarking.org Requests Per Second, More Is Better Redis 3.0.1 Test: GET GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.9.0 Clang 3.8.0 300K 600K 900K 1200K 1500K SE +/- 2474.75, N = 3 SE +/- 9042.38, N = 3 SE +/- 5158.54, N = 3 SE +/- 15364.63, N = 3 SE +/- 2621.93, N = 3 1434042.92 1408567.41 1393442.96 1348061.04 1342892.96 -std=gnu99 -pipe -g3 -O3 -funroll-loops -march=native 1. (CC) gcc options: -ggdb -rdynamic -lm -pthread -ldl
PostgreSQL pgbench Scaling: Buffer Test - Test: Single Thread - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Single Thread - Mode: Read Write GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 120 240 360 480 600 SE +/- 5.89, N = 3 SE +/- 0.66, N = 3 SE +/- 1.53, N = 3 SE +/- 14.62, N = 6 SE +/- 1.42, N = 3 552.55 549.49 546.21 538.44 531.79 -pthreads -mthreads -pthreads -mthreads 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Write OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 1000 2000 3000 4000 5000 SE +/- 33.84, N = 3 SE +/- 85.01, N = 3 SE +/- 67.37, N = 5 SE +/- 78.63, N = 3 SE +/- 68.11, N = 3 4468.82 4437.25 4411.75 4411.72 4356.38 -pthreads -mthreads -pthreads -mthreads 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -pthread -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.0.1g RSA 4096-bit Performance GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 120 240 360 480 600 SE +/- 0.25, N = 3 SE +/- 0.58, N = 3 SE +/- 1.31, N = 3 SE +/- 1.57, N = 3 SE +/- 2.68, N = 3 570.20 569.37 569.27 566.80 511.07 1. (CC) gcc options: -m64 -O3 -lssl -lcrypto -ldl
N-Queens Elapsed Time OpenBenchmarking.org Seconds, Fewer Is Better N-Queens 1.0 Elapsed Time GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 13 26 39 52 65 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 52.54 56.21 57.53 1. (CC) gcc options: -static -fopenmp -O3 -march=native
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.99.3 WAV To MP3 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.8.0 Clang 3.9.0 6 12 18 24 30 SE +/- 0.01, N = 5 SE +/- 0.03, N = 5 SE +/- 0.01, N = 5 SE +/- 0.04, N = 5 SE +/- 0.04, N = 5 22.03 22.42 22.84 26.94 27.46 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -march=native -lm
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.1 WAV To FLAC Clang 3.9.0 Clang 3.8.0 GCC 6.2.0 GCC 5.4.0 GCC 7.0.0 20160904 3 6 9 12 15 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 SE +/- 0.01, N = 5 13.15 13.26 13.44 13.45 13.45 -fvisibility=hidden -fvisibility=hidden -fvisibility=hidden 1. (CXX) g++ options: -O3 -march=native -lm
Bullet Physics Engine Test: 136 Ragdolls OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 136 Ragdolls GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 6.40 6.45 6.46 6.89 6.92 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: 3000 Fall OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: 3000 Fall GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.8.0 Clang 3.9.0 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 9.01 9.11 9.13 9.57 9.83 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Bullet Physics Engine Test: Raytests OpenBenchmarking.org Seconds, Fewer Is Better Bullet Physics Engine 2.81 Test: Raytests GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 2 4 6 8 10 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 5.94 5.96 5.98 6.20 6.21 1. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU
Smallpt Global Illumination Renderer; 100 Samples OpenBenchmarking.org Seconds, Fewer Is Better Smallpt 1.0 Global Illumination Renderer; 100 Samples GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 11 22 33 44 55 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.88, N = 3 30 47 47 1. (CXX) g++ options: -fopenmp -O3 -march=native
C-Ray Total Time OpenBenchmarking.org Seconds, Fewer Is Better C-Ray 1.1 Total Time GCC 6.2.0 GCC 5.4.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 10 20 30 40 50 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.08, N = 3 SE +/- 0.01, N = 3 21.25 21.30 23.36 39.34 44.42 1. (CC) gcc options: -lm -lpthread -O3 -march=native
Timed PHP Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed PHP Compilation 5.2.9 Time To Compile Clang 3.8.0 Clang 3.9.0 GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 8 16 24 32 40 SE +/- 0.17, N = 3 SE +/- 0.40, N = 3 SE +/- 0.06, N = 3 SE +/- 0.09, N = 3 SE +/- 0.05, N = 3 28.45 35.25 35.31 36.25 36.52 1. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.9.0 Time To Compile GCC 5.4.0 Clang 3.8.0 Clang 3.9.0 GCC 7.0.0 20160904 GCC 6.2.0 20 40 60 80 100 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 SE +/- 0.22, N = 3 SE +/- 0.14, N = 3 SE +/- 0.17, N = 3 67.21 74.34 78.45 81.27 94.51
ebizzy Phoronix Test Suite v6.6.0 OpenBenchmarking.org Records/s, More Is Better ebizzy 0.3 Phoronix Test Suite v6.6.0 GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 30K 60K 90K 120K 150K SE +/- 360.05, N = 3 SE +/- 87.83, N = 3 SE +/- 68.34, N = 3 SE +/- 2308.58, N = 4 SE +/- 392.88, N = 3 155692 155360 155273 152747 138671 1. (CC) gcc options: -pthread -lpthread -O3 -march=native
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.9.0 Clang 3.8.0 200 400 600 800 1000 SE +/- 0.65, N = 3 SE +/- 1.34, N = 3 SE +/- 0.49, N = 3 SE +/- 1.07, N = 3 SE +/- 0.07, N = 3 1095.90 1091.32 893.82 841.55 748.09 1. (CC) gcc options: -O3 -march=native -mavx2
SciMark Computational Test: Jacobi Successive Over-Relaxation OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Clang 3.8.0 Clang 3.9.0 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 200 400 600 800 1000 SE +/- 0.01, N = 4 SE +/- 0.04, N = 4 SE +/- 0.01, N = 4 SE +/- 0.00, N = 4 SE +/- 0.41, N = 4 857.42 856.35 572.06 572.05 571.65 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Dense LU Matrix Factorization OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Dense LU Matrix Factorization Clang 3.8.0 Clang 3.9.0 GCC 7.0.0 20160904 GCC 5.4.0 GCC 6.2.0 600 1200 1800 2400 3000 SE +/- 0.55, N = 4 SE +/- 1.74, N = 4 SE +/- 0.14, N = 4 SE +/- 9.92, N = 4 SE +/- 0.23, N = 4 2936.85 2917.96 1596.90 1327.15 1302.74 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Sparse Matrix Multiply OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Sparse Matrix Multiply Clang 3.9.0 Clang 3.8.0 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 300 600 900 1200 1500 SE +/- 2.31, N = 4 SE +/- 0.32, N = 4 SE +/- 0.35, N = 4 SE +/- 0.17, N = 4 SE +/- 2.22, N = 4 1448.22 1414.48 1290.94 1280.02 1047.40 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Fast Fourier Transform OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Fast Fourier Transform GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.9.0 Clang 3.8.0 50 100 150 200 250 SE +/- 0.16, N = 4 SE +/- 0.38, N = 4 SE +/- 0.13, N = 4 SE +/- 1.03, N = 4 SE +/- 0.36, N = 4 239.27 232.19 228.18 226.01 223.71 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Monte Carlo OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Monte Carlo GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 70 140 210 280 350 SE +/- 0.00, N = 4 SE +/- 0.00, N = 4 SE +/- 0.00, N = 4 SE +/- 0.02, N = 4 SE +/- 0.15, N = 4 304.10 304.07 304.03 126.46 126.43 1. (CXX) g++ options: -O3 -march=native
SciMark Computational Test: Composite OpenBenchmarking.org Mflops, More Is Better SciMark 2.0 Computational Test: Composite Clang 3.9.0 Clang 3.8.0 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 200 400 600 800 1000 SE +/- 0.69, N = 4 SE +/- 0.09, N = 4 SE +/- 0.09, N = 4 SE +/- 0.14, N = 4 SE +/- 2.13, N = 4 1115.00 1111.78 800.64 738.21 695.70 1. (CXX) g++ options: -O3 -march=native
Fhourstones Complex Connect-4 Solving OpenBenchmarking.org Kpos / sec, More Is Better Fhourstones 3.1 Complex Connect-4 Solving Clang 3.8.0 GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.9.0 1500 3000 4500 6000 7500 SE +/- 0.87, N = 3 SE +/- 5.88, N = 3 SE +/- 10.00, N = 3 SE +/- 11.54, N = 3 SE +/- 2.59, N = 3 7164.37 6954.67 6905.30 6884.93 6563.47 1. (CC) gcc options: -O3
Timed MAFFT Alignment Multiple Sequence Alignment OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 6.864 Multiple Sequence Alignment GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.9.0 Clang 3.8.0 2 4 6 8 10 SE +/- 0.03, N = 3 SE +/- 0.19, N = 6 SE +/- 0.10, N = 6 SE +/- 0.17, N = 6 SE +/- 0.14, N = 6 5.99 6.41 6.45 7.27 7.74 1. (CC) gcc options: -O3 -lm -lpthread
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 2.3.2 Pfam Database Search GCC 5.4.0 GCC 6.2.0 GCC 7.0.0 20160904 Clang 3.8.0 Clang 3.9.0 4 8 12 16 20 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 12.18 12.18 12.22 12.37 13.76 1. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.4 Build: Float + SSE - Size: 2D FFT Size 4096 GCC 7.0.0 20160904 GCC 6.2.0 GCC 5.4.0 Clang 3.9.0 Clang 3.8.0 1600 3200 4800 6400 8000 SE +/- 15.43, N = 5 SE +/- 22.70, N = 5 SE +/- 43.96, N = 5 SE +/- 16.42, N = 5 SE +/- 36.82, N = 5 7517.94 7515.98 7410.50 6923.64 6841.34 1. (CC) gcc options: -O3 -march=native -lm
FFTW Build: Float + SSE - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.4 Build: Float + SSE - Size: 1D FFT Size 4096 GCC 5.4.0 GCC 7.0.0 20160904 GCC 6.2.0 Clang 3.8.0 Clang 3.9.0 2K 4K 6K 8K 10K SE +/- 68.86, N = 5 SE +/- 50.75, N = 5 SE +/- 44.65, N = 5 SE +/- 40.65, N = 5 SE +/- 39.29, N = 5 10966 10951 10936 10357 10331 1. (CC) gcc options: -O3 -march=native -lm
LAMMPS Molecular Dynamics Simulator Test: Rhodopsin Protein OpenBenchmarking.org Loop Time, Fewer Is Better LAMMPS Molecular Dynamics Simulator 1.0 Test: Rhodopsin Protein Clang 3.9.0 Clang 3.8.0 GCC 5.4.0 16 32 48 64 80 SE +/- 0.09, N = 3 SE +/- 0.14, N = 3 SE +/- 0.12, N = 3 62.30 63.22 72.00 1. (CXX) g++ options: -lfftw -lmpich
Phoronix Test Suite v10.8.5