GCC 9 Compiler Tuning Intel Core i9-7980XE compiler benchmarks by Michael Larabel for a future article. -O0: Processor: Intel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads), Motherboard: ASUS PRIME X299-A (1602 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 16384MB, Disk: 15GB Ultra USB 3.0 + Samsung SSD 970 EVO 500GB, Graphics: NVIDIA NV120 12GB, Audio: Realtek ALC1220, Monitor: ASUS PB278, Network: Intel I219-V OS: Clear Linux OS 27030, Kernel: 4.19.13-680.native (x86_64), Desktop: GNOME Shell 3.30.2, Display Server: X Server 1.20.3, Display Driver: nouveau 1.0.15, OpenGL: 4.3 Mesa 19.0.0-devel, Compiler: GCC 9.0.0 20181228 + Clang 7.0.1 + LLVM 7.0.1, File-System: ext4, Screen Resolution: 2560x1440 -Og: Processor: Intel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads), Motherboard: ASUS PRIME X299-A (1602 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 16384MB, Disk: 15GB Ultra USB 3.0 + Samsung SSD 970 EVO 500GB, Graphics: NVIDIA NV120 12GB, Audio: Realtek ALC1220, Monitor: ASUS PB278, Network: Intel I219-V OS: Clear Linux OS 27030, Kernel: 4.19.13-680.native (x86_64), Desktop: GNOME Shell 3.30.2, Display Server: X Server 1.20.3, Display Driver: nouveau 1.0.15, OpenGL: 4.3 Mesa 19.0.0-devel, Compiler: GCC 9.0.0 20181228 + Clang 7.0.1 + LLVM 7.0.1, File-System: ext4, Screen Resolution: 2560x1440 -O1: Processor: Intel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads), Motherboard: ASUS PRIME X299-A (1602 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 16384MB, Disk: 15GB Ultra USB 3.0 + Samsung SSD 970 EVO 500GB, Graphics: NVIDIA NV120 12GB, Audio: Realtek ALC1220, Monitor: ASUS PB278, Network: Intel I219-V OS: Clear Linux OS 27030, Kernel: 4.19.13-680.native (x86_64), Desktop: GNOME Shell 3.30.2, Display Server: X Server 1.20.3, Display Driver: nouveau 1.0.15, OpenGL: 4.3 Mesa 19.0.0-devel, Compiler: GCC 9.0.0 20181228 + Clang 7.0.1 + LLVM 7.0.1, File-System: ext4, Screen Resolution: 2560x1440 -O2: Processor: Intel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads), Motherboard: ASUS PRIME X299-A (1602 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 16384MB, Disk: 15GB Ultra USB 3.0 + Samsung SSD 970 EVO 500GB, Graphics: NVIDIA NV120 12GB, Audio: Realtek ALC1220, Monitor: ASUS PB278, Network: Intel I219-V OS: Clear Linux OS 27030, Kernel: 4.19.13-680.native (x86_64), Desktop: GNOME Shell 3.30.2, Display Server: X Server 1.20.3, Display Driver: nouveau 1.0.15, OpenGL: 4.3 Mesa 19.0.0-devel, Compiler: GCC 9.0.0 20181228 + Clang 7.0.1 + LLVM 7.0.1, File-System: ext4, Screen Resolution: 2560x1440 -O2 -ftree-vectorize -ftree-slp-vectorize: Processor: Intel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads), Motherboard: ASUS PRIME X299-A (1602 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 16384MB, Disk: 15GB Ultra USB 3.0 + Samsung SSD 970 EVO 500GB, Graphics: NVIDIA NV120 12GB, Audio: Realtek ALC1220, Monitor: ASUS PB278, Network: Intel I219-V OS: Clear Linux OS 27030, Kernel: 4.19.13-680.native (x86_64), Desktop: GNOME Shell 3.30.2, Display Server: X Server 1.20.3, Display Driver: nouveau 1.0.15, OpenGL: 4.3 Mesa 19.0.0-devel, Compiler: GCC 9.0.0 20181228 + Clang 7.0.1 + LLVM 7.0.1, File-System: ext4, Screen Resolution: 2560x1440 -O3: Processor: Intel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads), Motherboard: ASUS PRIME X299-A (1602 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 16384MB, Disk: 15GB Ultra USB 3.0 + Samsung SSD 970 EVO 500GB, Graphics: NVIDIA NV120 12GB, Audio: Realtek ALC1220, Monitor: ASUS PB278, Network: Intel I219-V OS: Clear Linux OS 27030, Kernel: 4.19.13-680.native (x86_64), Desktop: GNOME Shell 3.30.2, Display Server: X Server 1.20.3, Display Driver: nouveau 1.0.15, OpenGL: 4.3 Mesa 19.0.0-devel, Compiler: GCC 9.0.0 20181228 + Clang 7.0.1 + LLVM 7.0.1, File-System: ext4, Screen Resolution: 2560x1440 -O3 -march=native: Processor: Intel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads), Motherboard: ASUS PRIME X299-A (1602 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 16384MB, Disk: 15GB Ultra USB 3.0 + Samsung SSD 970 EVO 500GB, Graphics: NVIDIA NV120 12GB, Audio: Realtek ALC1220, Monitor: ASUS PB278, Network: Intel I219-V OS: Clear Linux OS 27030, Kernel: 4.19.13-680.native (x86_64), Desktop: GNOME Shell 3.30.2, Display Server: X Server 1.20.3, Display Driver: nouveau 1.0.15, OpenGL: 4.3 Mesa 19.0.0-devel, Compiler: GCC 9.0.0 20181228 + Clang 7.0.1 + LLVM 7.0.1, File-System: ext4, Screen Resolution: 2560x1440 Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better -O0 ....................................... 7.37 |============================= -Og ....................................... 5.05 |==================== -O1 ....................................... 4.69 |================== -O2 ....................................... 4.21 |================= -O2 -ftree-vectorize -ftree-slp-vectorize . 4.18 |================ -O3 ....................................... 4.09 |================ -O3 -march=native ......................... 4.12 |================ SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better -O0 ....................................... 696 |======= -Og ....................................... 1809 |=================== -O1 ....................................... 2094 |====================== -O2 ....................................... 2019 |====================== -O2 -ftree-vectorize -ftree-slp-vectorize . 2357 |========================= -O3 ....................................... 2453 |========================== -O3 -march=native ......................... 2710 |============================= SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better -O0 ....................................... 152 |===== -Og ....................................... 301 |========= -O1 ....................................... 865 |=========================== -O2 ....................................... 933 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 924 |============================= -O3 ....................................... 944 |============================= -O3 -march=native ......................... 972 |============================== SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better -O0 ....................................... 341 |================= -Og ....................................... 595 |============================= -O1 ....................................... 601 |============================= -O2 ....................................... 612 |============================== -O2 -ftree-vectorize -ftree-slp-vectorize . 575 |============================ -O3 ....................................... 572 |============================ -O3 -march=native ......................... 584 |============================= SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better -O0 ....................................... 799 |======= -Og ....................................... 3261 |=========================== -O1 ....................................... 3342 |============================ -O2 ....................................... 3404 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 3420 |============================= -O3 ....................................... 3400 |============================ -O3 -march=native ......................... 3467 |============================= SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better -O0 ....................................... 980 |==== -Og ....................................... 3659 |================= -O1 ....................................... 4415 |==================== -O2 ....................................... 3938 |================== -O2 -ftree-vectorize -ftree-slp-vectorize . 5632 |========================== -O3 ....................................... 5742 |========================== -O3 -march=native ......................... 6373 |============================= SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better -O0 ....................................... 1208 |================ -Og ....................................... 1230 |================= -O1 ....................................... 1250 |================= -O2 ....................................... 1210 |================ -O2 -ftree-vectorize -ftree-slp-vectorize . 1234 |================= -O3 ....................................... 1607 |====================== -O3 -march=native ......................... 2156 |============================= x264 2018-09-25 H.264 Video Encoding Frames Per Second > Higher Is Better -O0 ....................................... 96.58 |================== -Og ....................................... 135.54 |========================== -O1 ....................................... 140.42 |=========================== -O2 ....................................... 140.00 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 139.00 |========================== -O3 ....................................... 142.00 |=========================== -O3 -march=native ......................... 139.00 |========================== x265 2.8 H.265 Video Encoding Frames Per Second > Higher Is Better -O0 ....................................... 59.26 |============================ -Og ....................................... 59.31 |============================ -O1 ....................................... 59.12 |============================ -O2 ....................................... 59.07 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 59.06 |============================ -O3 ....................................... 59.12 |============================ -O3 -march=native ......................... 59.91 |============================ Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better -O0 ....................................... 452 |==== -Og ....................................... 1515 |============== -O1 ....................................... 1477 |============= -O2 ....................................... 3044 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 3000 |=========================== -O3 ....................................... 2988 |=========================== -O3 -march=native ......................... 3210 |============================= ebizzy 0.3 Records/s > Higher Is Better -O0 ....................................... 598738 |======================== -Og ....................................... 628510 |========================== -O1 ....................................... 663780 |=========================== -O2 ....................................... 595906 |======================== -O2 -ftree-vectorize -ftree-slp-vectorize . 641913 |========================== -O3 ....................................... 612542 |========================= -O3 -march=native ......................... 648338 |========================== Timed ImageMagick Compilation 6.9.0 Time To Compile Seconds < Lower Is Better -O0 ....................................... 5.82 |====== -Og ....................................... 8.69 |========= -O1 ....................................... 17.73 |=================== -O2 ....................................... 23.63 |========================= -O2 -ftree-vectorize -ftree-slp-vectorize . 24.39 |========================== -O3 ....................................... 26.30 |============================ -O3 -march=native ......................... 26.22 |============================ Timed PHP Compilation 7.1.9 Time To Compile Seconds < Lower Is Better -O0 ....................................... 13.65 |======= -Og ....................................... 17.68 |========= -O1 ....................................... 22.99 |============ -O2 ....................................... 40.49 |====================== -O2 -ftree-vectorize -ftree-slp-vectorize . 41.22 |====================== -O3 ....................................... 52.07 |============================ -O3 -march=native ......................... 52.55 |============================ C-Ray 1.1 Total Time - 4K, 16 Rays Per Pixel Seconds < Lower Is Better -O0 ....................................... 127.58 |=========================== -Og ....................................... 88.50 |=================== -O1 ....................................... 87.42 |=================== -O2 ....................................... 78.69 |================= -O2 -ftree-vectorize -ftree-slp-vectorize . 79.00 |================= -O3 ....................................... 44.15 |========= -O3 -march=native ......................... 33.61 |======= Smallpt 1.0 Global Illumination Renderer; 128 Samples Seconds < Lower Is Better -O0 ....................................... 70.69 |============================ -Og ....................................... 9.26 |==== -O1 ....................................... 12.51 |===== -O2 ....................................... 11.82 |===== -O2 -ftree-vectorize -ftree-slp-vectorize . 11.89 |===== -O3 ....................................... 11.87 |===== -O3 -march=native ......................... 6.07 |== AOBench Size: 2048 x 2048 - Total Time Seconds < Lower Is Better -O0 ....................................... 65.32 |============================ -Og ....................................... 54.40 |======================= -O1 ....................................... 56.56 |======================== -O2 ....................................... 53.26 |======================= -O2 -ftree-vectorize -ftree-slp-vectorize . 51.48 |====================== -O3 ....................................... 50.03 |===================== -O3 -march=native ......................... 30.81 |============= Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better -O0 ....................................... 3.98 |============================= -Og ....................................... 3.99 |============================= -O1 ....................................... 3.97 |============================= -O2 ....................................... 3.95 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 3.99 |============================= -O3 ....................................... 3.98 |============================= -O3 -march=native ......................... 3.57 |========================== Bullet Physics Engine 2.81 Test: 1000 Stack Seconds < Lower Is Better -O0 ....................................... 4.75 |============================= -Og ....................................... 4.70 |============================= -O1 ....................................... 4.72 |============================= -O2 ....................................... 4.68 |============================= -O2 -ftree-vectorize -ftree-slp-vectorize . 4.72 |============================= -O3 ....................................... 4.67 |============================= -O3 -march=native ......................... 3.93 |======================== Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better -O0 ....................................... 4.22 |============================= -Og ....................................... 4.09 |============================ -O1 ....................................... 4.14 |============================ -O2 ....................................... 4.09 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 4.22 |============================= -O3 ....................................... 4.16 |============================= -O3 -march=native ......................... 3.89 |=========================== XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 Seconds < Lower Is Better -O0 ....................................... 108.34 |=========================== -Og ....................................... 78.08 |=================== -O1 ....................................... 76.44 |=================== -O2 ....................................... 75.01 |=================== -O2 -ftree-vectorize -ftree-slp-vectorize . 75.55 |=================== -O3 ....................................... 72.66 |================== -O3 -march=native ......................... 73.06 |================== Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 Seconds < Lower Is Better -O0 ....................................... 18.34 |============================ -Og ....................................... 10.54 |================ -O1 ....................................... 10.08 |=============== -O2 ....................................... 10.19 |================ -O2 -ftree-vectorize -ftree-slp-vectorize . 10.31 |================ -O3 ....................................... 10.31 |================ -O3 -march=native ......................... 10.29 |================ dav1d 0.1 Video Input: Summer Nature 4K Seconds < Lower Is Better -O0 ....................................... 78.20 |=========================== -Og ....................................... 79.84 |=========================== -O1 ....................................... 78.12 |========================== -O2 ....................................... 82.55 |============================ -O2 -ftree-vectorize -ftree-slp-vectorize . 81.51 |============================ -O3 ....................................... 80.78 |=========================== -O3 -march=native ......................... 80.95 |=========================== dav1d 0.1 Video Input: Summer Nature 1080p Seconds < Lower Is Better -O0 ....................................... 22.57 |============================ -Og ....................................... 19.40 |======================== -O1 ....................................... 19.17 |======================== -O2 ....................................... 18.63 |======================= -O2 -ftree-vectorize -ftree-slp-vectorize . 20.00 |========================= -O3 ....................................... 19.75 |========================= -O3 -march=native ......................... 19.67 |======================== FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better -O0 ....................................... 68.11 |============================ -Og ....................................... 12.27 |===== -O1 ....................................... 11.29 |===== -O2 ....................................... 10.38 |==== -O2 -ftree-vectorize -ftree-slp-vectorize . 10.47 |==== -O3 ....................................... 10.46 |==== -O3 -march=native ......................... 9.24 |==== LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better -O0 ....................................... 31.87 |============================ -Og ....................................... 13.64 |============ -O1 ....................................... 11.76 |========== -O2 ....................................... 11.73 |========== -O2 -ftree-vectorize -ftree-slp-vectorize . 10.20 |========= -O3 ....................................... 10.05 |========= -O3 -march=native ......................... 9.24 |======== m-queens 1.2 Time To Solve Seconds < Lower Is Better -O0 ....................................... 104.39 |=========================== -Og ....................................... 57.89 |=============== -O1 ....................................... 50.57 |============= -O2 ....................................... 49.54 |============= -O2 -ftree-vectorize -ftree-slp-vectorize . 49.55 |============= -O3 ....................................... 49.49 |============= -O3 -march=native ......................... 48.48 |============= Cpuminer-Opt 3.8.8.1 Algorithm: lbry kH/s - Hash Speed > Higher Is Better -Og ....................................... 45080 |======================= -O1 ....................................... 52880 |============================ -O2 ....................................... 50180 |========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 53133 |============================ -O3 ....................................... 53793 |============================ -O3 -march=native ......................... 53657 |============================ Cpuminer-Opt 3.8.8.1 Algorithm: skein kH/s - Hash Speed > Higher Is Better -Og ....................................... 51990 |======================= -O1 ....................................... 62157 |============================ -O2 ....................................... 59763 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 62853 |============================ -O3 ....................................... 62230 |============================ -O3 -march=native ......................... 61980 |============================ PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better -O0 ....................................... 314089 |================= -Og ....................................... 429547 |======================= -O1 ....................................... 452185 |======================== -O2 ....................................... 495259 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 499438 |=========================== -O3 ....................................... 503005 |=========================== -O3 -march=native ......................... 503111 |=========================== Redis 4.0.8 Test: GET Requests Per Second > Higher Is Better -O0 ....................................... 3424738 |========================== -Og ....................................... 3223612 |======================== -O1 ....................................... 3356679 |========================= -O2 ....................................... 3060926 |======================= -O2 -ftree-vectorize -ftree-slp-vectorize . 3148044 |======================== -O3 ....................................... 3229670 |========================= -O3 -march=native ......................... 3258540 |========================= Redis 4.0.8 Test: SET Requests Per Second > Higher Is Better -O0 ....................................... 2390460 |========================== -Og ....................................... 2322162 |========================= -O1 ....................................... 2202227 |======================== -O2 ....................................... 2323211 |========================= -O2 -ftree-vectorize -ftree-slp-vectorize . 2305436 |========================= -O3 ....................................... 2349704 |========================== -O3 -march=native ......................... 2291356 |========================= NGINX Benchmark 1.9.9 Static Web Page Serving Requests Per Second > Higher Is Better -O0 ....................................... 45556 |=========================== -Og ....................................... 37865 |====================== -O1 ....................................... 37172 |====================== -O2 ....................................... 45790 |=========================== -O2 -ftree-vectorize -ftree-slp-vectorize . 47967 |============================ -O3 ....................................... 47523 |============================ -O3 -march=native ......................... 47670 |============================