LLVM Clang 3.8 Compiler Tuning Optimization Levels More LLVM Clang 3.8 compiler optimization tuning benchmarks by Michael Larabel. -O3 -march=native: Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (20 Cores), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0, Chipset: Intel Xeon E7 v3/Xeon, Memory: 16384MB, Disk: PNY CS1211 120GB + 80GB INTEL SSDSCKGW08, Graphics: AMD FirePro V7900 2048MB, Audio: Realtek ALC892, Monitor: ASUS PB278, Network: Intel Connection OS: Ubuntu 16.04, Kernel: 4.5.0-040500rc1-generic (x86_64) 20160124, Desktop: Unity 7.4.0, Display Server: X Server 1.17.3, Display Driver: radeon 7.6.1, OpenGL: 3.3 Mesa 11.0.8 Gallium 0.4, Compiler: Clang 3.8.0 (SVN 259676) + LLVM 3.8.0, File-System: ext4, Screen Resolution: 2560x1440 -O0: Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (20 Cores), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0, Chipset: Intel Xeon E7 v3/Xeon, Memory: 16384MB, Disk: PNY CS1211 120GB + 80GB INTEL SSDSCKGW08, Graphics: AMD FirePro V7900 2048MB, Audio: Realtek ALC892, Monitor: ASUS PB278, Network: Intel Connection OS: Ubuntu 16.04, Kernel: 4.5.0-040500rc1-generic (x86_64) 20160124, Desktop: Unity 7.4.0, Display Server: X Server 1.17.3, Display Driver: radeon 7.6.1, OpenGL: 3.3 Mesa 11.0.8 Gallium 0.4, Compiler: Clang 3.8.0 (SVN 259676) + LLVM 3.8.0, File-System: ext4, Screen Resolution: 2560x1440 -O1: Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (20 Cores), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0, Chipset: Intel Xeon E7 v3/Xeon, Memory: 16384MB, Disk: PNY CS1211 120GB + 80GB INTEL SSDSCKGW08, Graphics: AMD FirePro V7900 2048MB, Audio: Realtek ALC892, Monitor: ASUS PB278, Network: Intel Connection OS: Ubuntu 16.04, Kernel: 4.5.0-040500rc1-generic (x86_64) 20160124, Desktop: Unity 7.4.0, Display Server: X Server 1.17.3, Display Driver: radeon 7.6.1, OpenGL: 3.3 Mesa 11.0.8 Gallium 0.4, Compiler: Clang 3.8.0 (SVN 259676) + LLVM 3.8.0, File-System: ext4, Screen Resolution: 2560x1440 -O2: Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (20 Cores), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0, Chipset: Intel Xeon E7 v3/Xeon, Memory: 16384MB, Disk: PNY CS1211 120GB + 80GB INTEL SSDSCKGW08, Graphics: AMD FirePro V7900 2048MB, Audio: Realtek ALC892, Monitor: ASUS PB278, Network: Intel Connection OS: Ubuntu 16.04, Kernel: 4.5.0-040500rc1-generic (x86_64) 20160124, Desktop: Unity 7.4.0, Display Server: X Server 1.17.3, Display Driver: radeon 7.6.1, OpenGL: 3.3 Mesa 11.0.8 Gallium 0.4, Compiler: Clang 3.8.0 (SVN 259676) + LLVM 3.8.0, File-System: ext4, Screen Resolution: 2560x1440 -O3: Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (20 Cores), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0, Chipset: Intel Xeon E7 v3/Xeon, Memory: 16384MB, Disk: PNY CS1211 120GB + 80GB INTEL SSDSCKGW08, Graphics: AMD FirePro V7900 2048MB, Audio: Realtek ALC892, Monitor: ASUS PB278, Network: Intel Connection OS: Ubuntu 16.04, Kernel: 4.5.0-040500rc1-generic (x86_64) 20160124, Desktop: Unity 7.4.0, Display Server: X Server 1.17.3, Display Driver: radeon 7.6.1, OpenGL: 3.3 Mesa 11.0.8 Gallium 0.4, Compiler: Clang 3.8.0 (SVN 259676) + LLVM 3.8.0, File-System: ext4, Screen Resolution: 2560x1440 -Oz: Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (20 Cores), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0, Chipset: Intel Xeon E7 v3/Xeon, Memory: 16384MB, Disk: PNY CS1211 120GB + 80GB INTEL SSDSCKGW08, Graphics: AMD FirePro V7900 2048MB, Audio: Realtek ALC892, Monitor: ASUS PB278, Network: Intel Connection OS: Ubuntu 16.04, Kernel: 4.5.0-040500rc1-generic (x86_64) 20160124, Desktop: Unity 7.4.0, Display Server: X Server 1.17.3, Display Driver: radeon 7.6.1, OpenGL: 3.3 Mesa 11.0.8 Gallium 0.4, Compiler: Clang 3.8.0 (SVN 259676) + LLVM 3.8.0, File-System: ext4, Screen Resolution: 2560x1440 -Ofast: Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (20 Cores), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0, Chipset: Intel Xeon E7 v3/Xeon, Memory: 16384MB, Disk: PNY CS1211 120GB + 80GB INTEL SSDSCKGW08, Graphics: AMD FirePro V7900 2048MB, Audio: Realtek ALC892, Monitor: ASUS PB278, Network: Intel Connection OS: Ubuntu 16.04, Kernel: 4.5.0-040500rc1-generic (x86_64) 20160124, Desktop: Unity 7.4.0, Display Server: X Server 1.17.3, Display Driver: radeon 7.6.1, OpenGL: 3.3 Mesa 11.0.8 Gallium 0.4, Compiler: Clang 3.8.0 (SVN 259676) + LLVM 3.8.0, File-System: ext4, Screen Resolution: 2560x1440 -Ofast -march=native: Processor: Intel Xeon E5-2687W v3 @ 3.50GHz (20 Cores), Motherboard: MSI X99S SLI PLUS (MS-7885) v1.0, Chipset: Intel Xeon E7 v3/Xeon, Memory: 16384MB, Disk: PNY CS1211 120GB + 80GB INTEL SSDSCKGW08, Graphics: AMD FirePro V7900 2048MB, Audio: Realtek ALC892, Monitor: ASUS PB278, Network: Intel Connection OS: Ubuntu 16.04, Kernel: 4.5.0-040500rc1-generic (x86_64) 20160124, Desktop: Unity 7.4.0, Display Server: X Server 1.17.3, Display Driver: radeon 7.6.1, OpenGL: 3.3 Mesa 11.0.8 Gallium 0.4, Compiler: Clang 3.8.0 (SVN 259676) + LLVM 3.8.0, File-System: ext4, Screen Resolution: 2560x1440 FLAC Audio Encoding 1.3.1 WAV To FLAC Seconds < Lower Is Better -O3 -march=native .... 7.11 |====== -O0 .................. 57.44 |================================================= -O1 .................. 9.12 |======== -O2 .................. 8.65 |======= -O3 .................. 8.66 |======= -Oz .................. 12.27 |========== -Ofast ............... 8.74 |======= -Ofast -march=native . 7.50 |====== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better -O3 -march=native .... 1342.94 |============================================ -O0 .................. 284.58 |========= -O1 .................. 1334.47 |=========================================== -O2 .................. 1359.01 |============================================ -O3 .................. 1354.29 |============================================ -Oz .................. 1002.24 |================================= -Ofast ............... 1390.24 |============================================= -Ofast -march=native . 1442.19 |=============================================== GraphicsMagick 1.3.19 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better -O3 -march=native .... 84 |==================================================== -O0 .................. 18 |=========== -O1 .................. 70 |=========================================== -O2 .................. 81 |================================================== -O3 .................. 80 |================================================== -Oz .................. 71 |============================================ -Ofast ............... 81 |================================================== -Ofast -march=native . 84 |==================================================== Timed PHP Compilation 5.2.9 Time To Compile Seconds < Lower Is Better -O3 -march=native .... 15.93 |================================================= -O0 .................. 4.24 |============= -O1 .................. 8.93 |=========================== -O2 .................. 11.82 |==================================== -O3 .................. 15.79 |================================================ -Oz .................. 9.55 |============================= -Ofast ............... 15.91 |================================================= -Ofast -march=native . 16.07 |================================================= Hierarchical INTegration 1.0 Test: FLOAT QUIPs > Higher Is Better -O3 -march=native .... 268576816.54 |=================================== -O0 .................. 112554681.82 |=============== -O1 .................. 266823791.59 |=================================== -O2 .................. 322378022.36 |========================================== -O3 .................. 321885788.84 |========================================== -Oz .................. 314508487.41 |========================================= -Ofast ............... 298029220.70 |======================================= -Ofast -march=native . 251257760.35 |================================= LAME MP3 Encoding 3.99.3 WAV To MP3 Seconds < Lower Is Better -O3 -march=native .... 15.22 |==================== -O0 .................. 37.10 |================================================= -O1 .................. 15.68 |===================== -O2 .................. 13.95 |================== -O3 .................. 14.38 |=================== -Oz .................. 16.88 |====================== -Ofast ............... 14.10 |=================== -Ofast -march=native . 15.20 |==================== C-Ray 1.1 Total Time Seconds < Lower Is Better -O3 -march=native .... 12.78 |======================= -O0 .................. 27.58 |================================================= -O1 .................. 16.30 |============================= -O2 .................. 19.81 |=================================== -O3 .................. 13.21 |======================= -Oz .................. 19.70 |=================================== -Ofast ............... 12.35 |====================== -Ofast -march=native . 11.22 |==================== Timed Apache Compilation 2.4.7 Time To Compile Seconds < Lower Is Better -O3 -march=native .... 21.87 |================================================= -O0 .................. 9.25 |===================== -O1 .................. 17.83 |======================================== -O2 .................. 21.27 |================================================ -O3 .................. 21.67 |================================================= -Oz .................. 19.22 |=========================================== -Ofast ............... 21.59 |================================================ -Ofast -march=native . 21.51 |================================================ GraphicsMagick 1.3.19 Operation: Sharpen Iterations Per Minute > Higher Is Better -O3 -march=native .... 107 |=============================================== -O0 .................. 61 |=========================== -O1 .................. 111 |================================================= -O2 .................. 113 |================================================== -O3 .................. 113 |================================================== -Oz .................. 108 |================================================ -Ofast ............... 109 |================================================ -Ofast -march=native . 115 |=================================================== GraphicsMagick 1.3.19 Operation: HWB Color Space Iterations Per Minute > Higher Is Better -O3 -march=native .... 150 |=================================================== -O0 .................. 83 |============================ -O1 .................. 150 |=================================================== -O2 .................. 150 |=================================================== -O3 .................. 150 |=================================================== -Oz .................. 144 |================================================= -Ofast ............... 150 |=================================================== -Ofast -march=native . 150 |=================================================== GraphicsMagick 1.3.19 Operation: Blur Iterations Per Minute > Higher Is Better -O3 -march=native .... 117 |================================================== -O0 .................. 67 |============================= -O1 .................. 116 |================================================== -O2 .................. 119 |=================================================== -O3 .................. 119 |=================================================== -Oz .................. 113 |================================================ -Ofast ............... 118 |=================================================== -Ofast -march=native . 118 |=================================================== PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Single Thread - Mode: Read Write TPS > Higher Is Better -O3 -march=native . 362.58 |================================================= -O0 ............... 284.51 |====================================== -O1 ............... 348.86 |=============================================== -O2 ............... 379.00 |=================================================== -O3 ............... 351.75 |=============================================== -Oz ............... 357.32 |================================================ Redis 3.0.1 Test: LPUSH Requests Per Second > Higher Is Better -O3 -march=native .... 575188.23 |=========================================== -O0 .................. 456864.10 |=================================== -O1 .................. 585266.59 |============================================ -O2 .................. 570314.27 |=========================================== -O3 .................. 587392.23 |============================================ -Oz .................. 589078.56 |============================================ -Ofast ............... 595839.81 |============================================= -Ofast -march=native . 592685.14 |============================================= Redis 3.0.1 Test: SET Requests Per Second > Higher Is Better -O3 -march=native .... 580208.84 |============================================= -O0 .................. 468101.81 |==================================== -O1 .................. 575054.87 |============================================ -O2 .................. 579653.25 |============================================= -O3 .................. 582801.37 |============================================= -Oz .................. 585438.60 |============================================= -Ofast ............... 567622.52 |============================================ -Ofast -march=native . 572157.50 |============================================ Redis 3.0.1 Test: SADD Requests Per Second > Higher Is Better -O3 -march=native .... 587747.73 |============================================ -O0 .................. 481943.16 |==================================== -O1 .................. 587106.65 |============================================ -O2 .................. 599041.21 |============================================= -O3 .................. 592835.98 |============================================ -Oz .................. 581864.31 |=========================================== -Ofast ............... 589115.87 |============================================ -Ofast -march=native . 602725.54 |============================================= Redis 3.0.1 Test: GET Requests Per Second > Higher Is Better -O3 -march=native .... 642443.67 |============================================= -O0 .................. 527734.53 |===================================== -O1 .................. 625334.40 |=========================================== -O2 .................. 624757.29 |=========================================== -O3 .................. 649056.62 |============================================= -Oz .................. 632178.77 |============================================ -Ofast ............... 630964.63 |============================================ -Ofast -march=native . 625437.44 |=========================================== Redis 3.0.1 Test: LPOP Requests Per Second > Higher Is Better -O3 -march=native .... 624901.33 |=========================================== -O0 .................. 541334.66 |===================================== -O1 .................. 642965.38 |============================================ -O2 .................. 633016.31 |============================================ -O3 .................. 642800.77 |============================================ -Oz .................. 631316.90 |============================================ -Ofast ............... 652420.02 |============================================= -Ofast -march=native . 632522.27 |============================================ SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better -O3 -march=native .... 4901.04 |=============================================== -O0 .................. 4133.95 |======================================== -O1 .................. 4187.61 |======================================== -O2 .................. 4137.16 |======================================== -O3 .................. 4127.10 |======================================== -Oz .................. 4181.89 |======================================== -Ofast ............... 4185.32 |======================================== -Ofast -march=native . 4834.50 |============================================== PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Write TPS > Higher Is Better -O3 -march=native . 5342.15 |================================================== -O0 ............... 5176.52 |================================================ -O1 ............... 5209.08 |================================================= -O2 ............... 5050.83 |=============================================== -O3 ............... 4630.05 |=========================================== -Oz ............... 4911.10 |============================================== PostgreSQL pgbench 9.4.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write TPS > Higher Is Better -O3 -march=native . 5092.20 |================================================== -O0 ............... 5007.75 |================================================= -O1 ............... 5049.34 |================================================== -O2 ............... 4968.10 |================================================= -O3 ............... 4597.65 |============================================= -Oz ............... 4736.91 |=============================================== SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better -O3 -march=native .... 2613.77 |============================================ -O0 .................. 2807.96 |=============================================== -O1 .................. 2776.50 |============================================== -O2 .................. 2798.01 |=============================================== -O3 .................. 2766.53 |============================================== -Oz .................. 2806.62 |=============================================== -Ofast ............... 2811.52 |=============================================== -Ofast -march=native . 2570.60 |=========================================== Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better -O3 -march=native .... 12 |================================================ -O2 .................. 12 |================================================ -O3 .................. 12 |================================================ -Oz .................. 13 |==================================================== -Ofast ............... 12 |================================================ -Ofast -march=native . 12 |================================================ SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better -O3 -march=native .... 1918.30 |=============================================== -O0 .................. 1803.18 |============================================ -O1 .................. 1807.18 |============================================ -O2 .................. 1802.36 |============================================ -O3 .................. 1793.50 |============================================ -Oz .................. 1812.18 |============================================ -Ofast ............... 1814.25 |============================================ -Ofast -march=native . 1893.59 |============================================== Apache Benchmark 2.4.7 Static Web Page Serving Requests Per Second > Higher Is Better -O3 -march=native .... 23355.46 |============================================== -O0 .................. 23021.42 |============================================= -O1 .................. 23380.23 |============================================== -O2 .................. 23360.85 |============================================== -O3 .................. 23283.07 |============================================== -Oz .................. 23342.16 |============================================== -Ofast ............... 23470.93 |============================================== -Ofast -march=native . 22668.60 |============================================ SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better -O3 -march=native .... 362.39 |================================================ -O0 .................. 358.28 |=============================================== -O1 .................. 357.19 |=============================================== -O2 .................. 363.59 |================================================ -O3 .................. 358.05 |=============================================== -Oz .................. 357.50 |=============================================== -Ofast ............... 360.54 |================================================ -Ofast -march=native . 359.77 |=============================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better -O3 -march=native .... 235.58 |=============================================== -O0 .................. 238.14 |================================================ -O1 .................. 237.83 |================================================ -O2 .................. 234.76 |=============================================== -O3 .................. 237.63 |================================================ -Oz .................. 237.11 |================================================ -Ofast ............... 236.54 |================================================ -Ofast -march=native . 237.83 |================================================ SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better -O3 -march=native .... 1478.72 |=============================================== -O0 .................. 1477.57 |=============================================== -O1 .................. 1476.74 |=============================================== -O2 .................. 1478.30 |=============================================== -O3 .................. 1478.19 |=============================================== -Oz .................. 1477.78 |=============================================== -Ofast ............... 1477.31 |=============================================== -Ofast -march=native . 1465.27 |=============================================== Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better -O3 -march=native .... 14.80 |============================================= -O0 .................. 12.18 |===================================== -O1 .................. 15.99 |================================================= -O2 .................. 14.02 |=========================================== -O3 .................. 15.13 |============================================== -Oz .................. 15.09 |============================================== -Ofast ............... 10.93 |================================= -Ofast -march=native . 12.12 |=====================================