GCC 10 AMD Threadripper 3960X PGO Optimization AMD Ryzen Threadripper GCC 10 PGO benchmarks by Michael Larabel for a future article. GCC 10: Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160 GCC 10 - PGO: Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160 SQLite 3.30.1 Threads / Copies: 1 Seconds < Lower Is Better GCC 10 . 14.18 |=============================================================== HPC Challenge 1.5.0 Test / Class: G-HPL GFLOPS > Higher Is Better GCC 10 ....... 63.63 |========================================================= GCC 10 - PGO . 63.48 |========================================================= HPC Challenge 1.5.0 Test / Class: G-Ffte GFLOPS > Higher Is Better GCC 10 ....... 10.49 |======================================================== GCC 10 - PGO . 10.64 |========================================================= HPC Challenge 1.5.0 Test / Class: G-Ffte GFLOP/s > Higher Is Better GCC 10 ....... 10.49 |======================================================== GCC 10 - PGO . 10.64 |========================================================= HPC Challenge 1.5.0 Test / Class: EP-DGEMM GFLOPS > Higher Is Better GCC 10 ....... 32.93 |========================================================= GCC 10 - PGO . 32.68 |========================================================= HPC Challenge 1.5.0 Test / Class: G-Ptrans GB/s > Higher Is Better GCC 10 ....... 5.47737 |======================================================= GCC 10 - PGO . 5.52421 |======================================================= HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad GB/s > Higher Is Better GCC 10 ....... 1.79750 |======================================================= GCC 10 - PGO . 1.79549 |======================================================= HPC Challenge 1.5.0 Test / Class: G-Random Access GUP/s > Higher Is Better GCC 10 ....... 0.14278 |======================================================= GCC 10 - PGO . 0.14161 |======================================================= HPC Challenge 1.5.0 Test / Class: Random Ring Latency usecs < Lower Is Better GCC 10 ....... 0.45863 |======================================================= GCC 10 - PGO . 0.45680 |======================================================= HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth GB/s > Higher Is Better GCC 10 ....... 3.40678 |======================================================= GCC 10 - PGO . 3.36248 |====================================================== HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth MB/s > Higher Is Better GCC 10 ....... 22977.00 |===================================================== GCC 10 - PGO . 23248.09 |====================================================== miniFE 2.2 Problem Size: Small CG Mflops > Higher Is Better GCC 10 ....... 7740.10 |======================================================= GCC 10 - PGO . 7728.25 |======================================================= FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 Mflops > Higher Is Better GCC 10 . 10443 |=============================================================== FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 Mflops > Higher Is Better GCC 10 . 10512 |=============================================================== FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 Mflops > Higher Is Better GCC 10 . 6687.3 |============================================================== FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 Mflops > Higher Is Better GCC 10 . 15396 |=============================================================== FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 Mflops > Higher Is Better GCC 10 . 45404 |=============================================================== FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 Mflops > Higher Is Better GCC 10 . 22667 |=============================================================== Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis Seconds < Lower Is Better GCC 10 . 70.01 |=============================================================== QMCPACK 3.8 Total Execution Time - Seconds < Lower Is Better GCC 10 ....... 1878 |========================================================= GCC 10 - PGO . 1896 |========================================================== BYTE Unix Benchmark 3.6 Computational Test: Dhrystone 2 LPS > Higher Is Better GCC 10 ....... 48055276.3 |==================================================== GCC 10 - PGO . 48511999.7 |==================================================== Crafty 25.2 Elapsed Time Nodes Per Second > Higher Is Better GCC 10 ....... 9234824 |======================================================= GCC 10 - PGO . 9309942 |======================================================= TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better GCC 10 ....... 1346651 |================================================ GCC 10 - PGO . 1533348 |======================================================= MKL-DNN DNNL 1.1 Harness: Deconvolution Batch deconv_1d - Data Type: f32 ms < Lower Is Better GCC 10 ....... 2.32419 |======================================================= GCC 10 - PGO . 2.31720 |======================================================= MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_alexnet - Data Type: f32 ms < Lower Is Better GCC 10 ....... 123.99 |======================================================== GCC 10 - PGO . 123.30 |======================================================== MKL-DNN DNNL 1.1 Harness: Recurrent Neural Network Training - Data Type: f32 ms < Lower Is Better GCC 10 ....... 194.25 |======================================================== GCC 10 - PGO . 195.32 |======================================================== MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 ms < Lower Is Better GCC 10 ....... 52.33 |======================================================== GCC 10 - PGO . 53.41 |========================================================= TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better GCC 10 . 938.47 |============================================================== ACES DGEMM 1.0 Sustained Floating-Point Rate GFLOP/s > Higher Is Better GCC 10 ....... 8.567282 |================================================ GCC 10 - PGO . 9.669028 |====================================================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better GCC 10 ....... 4684.30 |===================================================== GCC 10 - PGO . 4848.09 |======================================================= Stockfish 9 Total Time Nodes Per Second > Higher Is Better GCC 10 ....... 79359613 |====================================================== GCC 10 - PGO . 78501983 |===================================================== Timed ImageMagick Compilation 6.9.0 Time To Compile Seconds < Lower Is Better GCC 10 . 16.47 |=============================================================== XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 Seconds < Lower Is Better GCC 10 . 19.87 |=============================================================== FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better GCC 10 . 7.719 |=============================================================== LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better GCC 10 . 7.297 |=============================================================== Radiance Benchmark 5.0 Test: Serial Seconds < Lower Is Better GCC 10 ....... 555.94 |======================================================== GCC 10 - PGO . 556.48 |======================================================== Radiance Benchmark 5.0 Test: SMP Parallel Seconds < Lower Is Better GCC 10 ....... 171.30 |======================================================== GCC 10 - PGO . 172.35 |======================================================== OpenSSL 1.1.1 RSA 4096-bit Performance Signs Per Second > Higher Is Better GCC 10 ....... 7180.6 |======================================================== GCC 10 - PGO . 7072.4 |======================================================= ASKAP 2018-11-10 Test: tConvolve MT - Gridding Million Grid Points Per Second > Higher Is Better GCC 10 ....... 1947.24 |======================================================= GCC 10 - PGO . 1947.23 |======================================================= ASKAP 2018-11-10 Test: tConvolve MT - Degridding Million Grid Points Per Second > Higher Is Better GCC 10 ....... 3359.12 |======================================================= GCC 10 - PGO . 3359.70 |======================================================= ASKAP 2018-11-10 Test: tConvolve OpenMP - Gridding Million Grid Points Per Second > Higher Is Better GCC 10 ....... 5433.80 |======================================================= GCC 10 - PGO . 5361.35 |====================================================== ASKAP 2018-11-10 Test: tConvolve OpenMP - Degridding Million Grid Points Per Second > Higher Is Better GCC 10 ....... 4096.25 |======================================================= GCC 10 - PGO . 3995.25 |====================================================== GROMACS 2019.4 Water Benchmark Ns Per Day > Higher Is Better GCC 10 . 2.505 |=============================================================== PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better GCC 10 . 669039.84 |=========================================================== PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only TPS > Higher Is Better GCC 10 . 676349.71 |=========================================================== SQLite Speedtest 3.30 Timed Time - Size 1,000 Seconds < Lower Is Better GCC 10 . 57.26 |=============================================================== Facebook RocksDB 6.3.6 Test: Random Fill Op/s > Higher Is Better GCC 10 ....... 938039 |======================================================== GCC 10 - PGO . 921081 |======================================================= Facebook RocksDB 6.3.6 Test: Random Read Op/s > Higher Is Better GCC 10 ....... 145207827 |===================================================== GCC 10 - PGO . 144768694 |===================================================== Facebook RocksDB 6.3.6 Test: Sequential Fill Op/s > Higher Is Better GCC 10 ....... 1019862 |======================================================= GCC 10 - PGO . 1020657 |======================================================= Facebook RocksDB 6.3.6 Test: Random Fill Sync Op/s > Higher Is Better GCC 10 ....... 24588 |========================================================= GCC 10 - PGO . 24588 |========================================================= Facebook RocksDB 6.3.6 Test: Read While Writing Op/s > Higher Is Better GCC 10 ....... 4889956 |======================================================= GCC 10 - PGO . 4868440 |=======================================================