Xeon Gold GCC 8.2 RC1 PGO GCC 8.2 RC1 compiler Profile Guided Optimization (PGO) benchmarks for a future article on Phoronix.com. -O3 -march=native: Processor: 2 x Intel Xeon Gold 6138 @ 3.70GHz (40 Cores / 80 Threads), Motherboard: TYAN S7106 (V1.01 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 96256MB, Disk: 256GB Samsung SSD 850 + 2000GB Seagate ST2000DM006-2DM1 + 2 x 120GB TOSHIBA-TR150, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Intel I210 Gigabit Connection OS: Ubuntu 18.04, Kernel: 4.15.0-23-generic (x86_64), Compiler: GCC 8.1.1 20180719, File-System: ext4, Screen Resolution: 1920x1080 -O3 -march=native - PGO: Processor: 2 x Intel Xeon Gold 6138 @ 3.70GHz (40 Cores / 80 Threads), Motherboard: TYAN S7106 (V1.01 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 96256MB, Disk: 256GB Samsung SSD 850 + 2000GB Seagate ST2000DM006-2DM1 + 2 x 120GB TOSHIBA-TR150, Graphics: ASPEED ASPEED Family, Monitor: VE228, Network: Intel I210 Gigabit Connection OS: Ubuntu 18.04, Kernel: 4.15.0-23-generic (x86_64), Compiler: GCC 8.1.1 20180719, File-System: ext4, Screen Resolution: 1920x1080 SciMark 2.0 Computational Test: Composite Mflops > Higher Is Better -O3 -march=native ....... 2316.77 |============================================ -O3 -march=native - PGO . 2190.87 |========================================== SciMark 2.0 Computational Test: Fast Fourier Transform Mflops > Higher Is Better -O3 -march=native ....... 669.07 |============================================= -O3 -march=native - PGO . 666.03 |============================================= SciMark 2.0 Computational Test: Jacobi Successive Over-Relaxation Mflops > Higher Is Better -O3 -march=native ....... 1788.57 |============================================ -O3 -march=native - PGO . 1765.21 |=========================================== SciMark 2.0 Computational Test: Monte Carlo Mflops > Higher Is Better -O3 -march=native ....... 777.44 |============================================= -O3 -march=native - PGO . 262.32 |=============== SciMark 2.0 Computational Test: Sparse Matrix Multiply Mflops > Higher Is Better -O3 -march=native ....... 2869.66 |============================================ -O3 -march=native - PGO . 2862.24 |============================================ SciMark 2.0 Computational Test: Dense LU Matrix Factorization Mflops > Higher Is Better -O3 -march=native ....... 5479.09 |============================================ -O3 -march=native - PGO . 5398.53 |=========================================== VP9 libvpx Encoding 1.7.0 vpxenc Frames Per Second > Higher Is Better -O3 -march=native ....... 13.24 |============================================== -O3 -march=native - PGO . 13.21 |============================================== FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 512 Mflops > Higher Is Better -O3 -march=native . 9527.23 |================================================== FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 512 Mflops > Higher Is Better -O3 -march=native . 7878.00 |================================================== PolyBench-C 4.2 Test: 3 Matrix Multiplications Seconds < Lower Is Better -O3 -march=native ....... 3.37 |=============================================== -O3 -march=native - PGO . 3.28 |============================================== PolyBench-C 4.2 Test: Correlation Computation Seconds < Lower Is Better -O3 -march=native ....... 5.99 |=============================================== -O3 -march=native - PGO . 5.96 |=============================================== PolyBench-C 4.2 Test: Covariance Computation Seconds < Lower Is Better -O3 -march=native ....... 5.92 |=============================================== -O3 -march=native - PGO . 5.86 |=============================================== SQLite 3.22 Timed SQLite Insertions Seconds < Lower Is Better -O3 -march=native . 84.89 |==================================================== OpenSSL 1.1.0f RSA 4096-bit Performance Signs Per Second > Higher Is Better -O3 -march=native ....... 7928.90 |============================================ -O3 -march=native - PGO . 7911.87 |============================================ AOBench Size: 2048 x 2048 - Total Time Seconds < Lower Is Better -O3 -march=native ....... 43.38 |============================================== -O3 -march=native - PGO . 41.24 |============================================ 7-Zip Compression 16.02 Compress Speed Test MIPS > Higher Is Better -O3 -march=native ....... 142656 |============================================= -O3 -march=native - PGO . 141223 |============================================= Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 Seconds < Lower Is Better -O3 -march=native . 122.23 |=================================================== m-queens 1.1 Time To Solve Seconds < Lower Is Better -O3 -march=native ....... 29.66 |============================================== -O3 -march=native - PGO . 28.86 |============================================= C-Ray 1.1 Total Time Seconds < Lower Is Better -O3 -march=native ....... 2.65 |=============================================== -O3 -march=native - PGO . 2.20 |======================================= Smallpt 1.0 Global Illumination Renderer; 100 Samples Seconds < Lower Is Better -O3 -march=native ....... 3 |================================================== -O3 -march=native - PGO . 3 |================================================== Crafty 25.2 Elapsed Time Nodes Per Second > Higher Is Better -O3 -march=native ....... 7336681 |============================================ -O3 -march=native - PGO . 7395646 |============================================ Compile Bench 0.6 Test: Initial Create MB/s > Higher Is Better -O3 -march=native ....... 479.95 |============================================= -O3 -march=native - PGO . 482.28 |============================================= Compile Bench 0.6 Test: Compile MB/s > Higher Is Better -O3 -march=native ....... 1655.64 |============================================ -O3 -march=native - PGO . 1633.75 |=========================================== Compile Bench 0.6 Test: Read Compiled Tree MB/s > Higher Is Better -O3 -march=native ....... 2289.46 |============================================ -O3 -march=native - PGO . 2284.67 |============================================ Stockfish 9 Total Time Nodes Per Second > Higher Is Better -O3 -march=native ....... 71599364 |========================================= -O3 -march=native - PGO . 74633445 |=========================================== TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better -O3 -march=native ....... 1239469 |======================================== -O3 -march=native - PGO . 1353383 |============================================ Redis 4.0.8 Test: SET Requests Per Second > Higher Is Better -O3 -march=native . 1526558.67 |=============================================== Redis 4.0.8 Test: GET Requests Per Second > Higher Is Better -O3 -march=native . 1782445.62 |=============================================== Redis 4.0.8 Test: LPUSH Requests Per Second > Higher Is Better -O3 -march=native . 1550944.21 |=============================================== Redis 4.0.8 Test: LPOP Requests Per Second > Higher Is Better -O3 -march=native . 1543751.87 |=============================================== Redis 4.0.8 Test: SADD Requests Per Second > Higher Is Better -O3 -march=native . 1665771.85 |=============================================== PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Single Thread - Mode: Read Write TPS > Higher Is Better -O3 -march=native . 366.35 |=================================================== PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Single Thread - Mode: Read Only TPS > Higher Is Better -O3 -march=native . 17350.80 |================================================= PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Write TPS > Higher Is Better -O3 -march=native . 2984.39 |================================================== PostgreSQL pgbench 10.3 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better -O3 -march=native . 604019.27 |================================================ libjpeg-turbo tjbench 1.5.3 Test: Decompression Throughput Megapixels/sec > Higher Is Better -O3 -march=native . 160.34 |=================================================== Apache Benchmark 2.4.29 Static Web Page Serving Requests Per Second > Higher Is Better -O3 -march=native . 21804.00 |================================================= LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better -O3 -march=native . 10.36 |==================================================== FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better -O3 -march=native . 11.05 |==================================================== ebizzy 0.3 Records/s > Higher Is Better -O3 -march=native ....... 1000303 |=========================================== -O3 -march=native - PGO . 1029708 |============================================ GraphicsMagick 1.3.28 Operation: HWB Color Space Iterations Per Minute > Higher Is Better -O3 -march=native . 219 |====================================================== GraphicsMagick 1.3.28 Operation: Blur Iterations Per Minute > Higher Is Better -O3 -march=native . 161 |====================================================== GraphicsMagick 1.3.28 Operation: Local Adaptive Thresholding Iterations Per Minute > Higher Is Better -O3 -march=native . 88 |======================================================= GraphicsMagick 1.3.28 Operation: Resizing Iterations Per Minute > Higher Is Better -O3 -march=native . 186 |====================================================== GraphicsMagick 1.3.28 Operation: Sharpen Iterations Per Minute > Higher Is Better -O3 -march=native . 196 |====================================================== Timed HMMer Search 2.3.2 Pfam Database Search Seconds < Lower Is Better -O3 -march=native . 12.68 |==================================================== Bullet Physics Engine 2.81 Test: 3000 Fall Seconds < Lower Is Better -O3 -march=native . 4.38 |===================================================== Bullet Physics Engine 2.81 Test: 1000 Stack Seconds < Lower Is Better -O3 -march=native . 4.87 |===================================================== Bullet Physics Engine 2.81 Test: 136 Ragdolls Seconds < Lower Is Better -O3 -march=native . 2.78 |===================================================== Bullet Physics Engine 2.81 Test: 1000 Convex Seconds < Lower Is Better -O3 -march=native . 4.78 |===================================================== Bullet Physics Engine 2.81 Test: Prim Trimesh Seconds < Lower Is Better -O3 -march=native . 1.02 |===================================================== Bullet Physics Engine 2.81 Test: Convex Trimesh Seconds < Lower Is Better -O3 -march=native . 1.21 |===================================================== Bullet Physics Engine 2.81 Test: Raytests Seconds < Lower Is Better -O3 -march=native . 2.74 |=====================================================