Threadripper 3960X GCC 10 LTO Testing AMD Ryzen Threadripper 3960X GCC 10 LTO benchmarking by Michael Larabel for a future article. -O3 -march=native -flto: Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160 -O3 -march=native -flto -fwhole-program: Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160 -O3 -march=native: Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160 SQLite 3.30.1 Threads / Copies: 1 Seconds < Lower Is Better -O3 -march=native -flto ................. 14.23 |============================== -O3 -march=native -flto -fwhole-program . 14.28 |============================== -O3 -march=native ....................... 14.20 |============================== HPC Challenge 1.5.0 Test / Class: G-HPL GFLOPS > Higher Is Better -O3 -march=native -flto ................. 63.68 |============================== -O3 -march=native -flto -fwhole-program . 63.72 |============================== -O3 -march=native ....................... 63.54 |============================== HPC Challenge 1.5.0 Test / Class: G-Ffte GFLOPS > Higher Is Better -O3 -march=native -flto ................. 15.22 |============================= -O3 -march=native -flto -fwhole-program . 15.98 |============================== -O3 -march=native ....................... 15.20 |============================= HPC Challenge 1.5.0 Test / Class: G-Ffte GFLOP/s > Higher Is Better -O3 -march=native -flto ................. 15.22 |============================= -O3 -march=native -flto -fwhole-program . 15.98 |============================== -O3 -march=native ....................... 15.20 |============================= HPC Challenge 1.5.0 Test / Class: EP-DGEMM GFLOPS > Higher Is Better -O3 -march=native -flto ................. 32.76 |============================= -O3 -march=native -flto -fwhole-program . 33.60 |============================== -O3 -march=native ....................... 32.50 |============================= HPC Challenge 1.5.0 Test / Class: G-Ptrans GB/s > Higher Is Better -O3 -march=native -flto ................. 5.78791 |============================ -O3 -march=native -flto -fwhole-program . 5.80751 |============================ -O3 -march=native ....................... 5.79663 |============================ HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad GB/s > Higher Is Better -O3 -march=native -flto ................. 1.68874 |============================ -O3 -march=native -flto -fwhole-program . 1.69770 |============================ -O3 -march=native ....................... 1.68426 |============================ HPC Challenge 1.5.0 Test / Class: G-Random Access GUP/s > Higher Is Better -O3 -march=native -flto ................. 0.16722 |============================ -O3 -march=native -flto -fwhole-program . 0.16679 |============================ -O3 -march=native ....................... 0.16431 |============================ HPC Challenge 1.5.0 Test / Class: Random Ring Latency usecs < Lower Is Better -O3 -march=native -flto ................. 0.45234 |=========================== -O3 -march=native -flto -fwhole-program . 0.46971 |============================ -O3 -march=native ....................... 0.45479 |=========================== HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth GB/s > Higher Is Better -O3 -march=native -flto ................. 3.33117 |=========================== -O3 -march=native -flto -fwhole-program . 3.31499 |=========================== -O3 -march=native ....................... 3.42188 |============================ HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth MB/s > Higher Is Better -O3 -march=native -flto ................. 22951.43 |=========================== -O3 -march=native -flto -fwhole-program . 23030.71 |=========================== -O3 -march=native ....................... 22614.63 |=========================== miniFE 2.2 Problem Size: Small CG Mflops > Higher Is Better -O3 -march=native -flto ................. 7720.98 |============================ -O3 -march=native -flto -fwhole-program . 7735.16 |============================ -O3 -march=native ....................... 7745.54 |============================ FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 Mflops > Higher Is Better -O3 -march=native -flto ................. 11201 |========================== -O3 -march=native -flto -fwhole-program . 11069 |========================= -O3 -march=native ....................... 13149 |============================== FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 Mflops > Higher Is Better -O3 -march=native -flto ................. 12677 |============================= -O3 -march=native -flto -fwhole-program . 12595 |============================= -O3 -march=native ....................... 13217 |============================== FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 Mflops > Higher Is Better -O3 -march=native -flto ................. 8969.7 |============================= -O3 -march=native -flto -fwhole-program . 8540.7 |============================ -O3 -march=native ....................... 8209.1 |=========================== FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 Mflops > Higher Is Better -O3 -march=native -flto ................. 15488 |============================== -O3 -march=native -flto -fwhole-program . 15271 |============================== -O3 -march=native ....................... 15513 |============================== FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 Mflops > Higher Is Better -O3 -march=native -flto ................. 46350 |============================== -O3 -march=native -flto -fwhole-program . 45734 |============================== -O3 -march=native ....................... 45957 |============================== FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 Mflops > Higher Is Better -O3 -march=native -flto ................. 24505 |============================== -O3 -march=native -flto -fwhole-program . 24289 |============================== -O3 -march=native ....................... 23281 |============================= Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis Seconds < Lower Is Better -O3 -march=native -flto ................. 69.13 |============================= -O3 -march=native -flto -fwhole-program . 68.02 |============================ -O3 -march=native ....................... 71.83 |============================== QMCPACK 3.8 Total Execution Time - Seconds < Lower Is Better -O3 -march=native -flto ................. 1895.1 |============================= -O3 -march=native -flto -fwhole-program . 1900.3 |============================= -O3 -march=native ....................... 1878.0 |============================= BYTE Unix Benchmark 3.6 Computational Test: Dhrystone 2 LPS > Higher Is Better -O3 -march=native -flto ................. 67070357.6 |========================= -O3 -march=native -flto -fwhole-program . 64880292.8 |======================== -O3 -march=native ....................... 47812720.8 |================== Crafty 25.2 Elapsed Time Nodes Per Second > Higher Is Better -O3 -march=native -flto ................. 9209287 |============================ -O3 -march=native -flto -fwhole-program . 8978346 |=========================== -O3 -march=native ....................... 9240651 |============================ TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better -O3 -march=native -flto ................. 1422472 |============================ -O3 -march=native -flto -fwhole-program . 1418074 |============================ -O3 -march=native ....................... 1350615 |=========================== MKL-DNN DNNL 1.1 Harness: Deconvolution Batch deconv_1d - Data Type: f32 ms < Lower Is Better -O3 -march=native -flto ................. 2.31627 |============================ -O3 -march=native -flto -fwhole-program . 2.31590 |============================ -O3 -march=native ....................... 2.30735 |============================ MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_alexnet - Data Type: f32 ms < Lower Is Better -O3 -march=native -flto ................. 126.97 |============================= -O3 -march=native -flto -fwhole-program . 125.13 |============================= -O3 -march=native ....................... 126.14 |============================= MKL-DNN DNNL 1.1 Harness: Recurrent Neural Network Training - Data Type: f32 ms < Lower Is Better -O3 -march=native -flto ................. 194.67 |============================= -O3 -march=native -flto -fwhole-program . 195.07 |============================= -O3 -march=native ....................... 194.17 |============================= MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 ms < Lower Is Better -O3 -march=native -flto ................. 52.83 |============================== -O3 -march=native -flto -fwhole-program . 53.25 |============================== -O3 -march=native ....................... 52.23 |============================= TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better -O3 -march=native -flto ................. 950.33 |============================= -O3 -march=native -flto -fwhole-program . 950.18 |============================= -O3 -march=native ....................... 946.11 |============================= ACES DGEMM 1.0 Sustained Floating-Point Rate GFLOP/s > Higher Is Better -O3 -march=native -flto ................. 8.761475 |=========================== -O3 -march=native -flto -fwhole-program . 8.625776 |=========================== -O3 -march=native ....................... 8.781111 |=========================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better -O3 -march=native -flto ................. 4766.40 |=========================== -O3 -march=native -flto -fwhole-program . 4684.70 |=========================== -O3 -march=native ....................... 4893.66 |============================ Stockfish 9 Total Time Nodes Per Second > Higher Is Better -O3 -march=native -flto ................. 79613988 |========================== -O3 -march=native -flto -fwhole-program . 81375940 |=========================== -O3 -march=native ....................... 79813741 |========================== Timed ImageMagick Compilation 6.9.0 Time To Compile Seconds < Lower Is Better -O3 -march=native -flto ................. 75.25 |============================== -O3 -march=native -flto -fwhole-program . 74.87 |============================== -O3 -march=native ....................... 15.34 |====== XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 Seconds < Lower Is Better -O3 -march=native -flto ................. 19.87 |============================== -O3 -march=native -flto -fwhole-program . 19.83 |============================== -O3 -march=native ....................... 20.02 |============================== Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 Seconds < Lower Is Better -O3 -march=native -flto ................. 10.168 |============================= -O3 -march=native -flto -fwhole-program . 10.140 |============================= -O3 -march=native ....................... 9.994 |============================= FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better -O3 -march=native -flto ................. 8.044 |============================== -O3 -march=native -flto -fwhole-program . 8.073 |============================== -O3 -march=native ....................... 8.079 |============================== LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better -O3 -march=native -flto ................. 6.622 |============================== -O3 -march=native -flto -fwhole-program . 6.697 |============================== -O3 -march=native ....................... 6.710 |============================== Radiance Benchmark 5.0 Test: Serial Seconds < Lower Is Better -O3 -march=native -flto ................. 556.14 |============================= -O3 -march=native -flto -fwhole-program . 555.69 |============================= -O3 -march=native ....................... 559.02 |============================= Radiance Benchmark 5.0 Test: SMP Parallel Seconds < Lower Is Better -O3 -march=native -flto ................. 174.53 |============================= -O3 -march=native -flto -fwhole-program . 170.63 |============================ -O3 -march=native ....................... 168.59 |============================ OpenSSL 1.1.1 RSA 4096-bit Performance Signs Per Second > Higher Is Better -O3 -march=native -flto . 7182.9 |============================================= -O3 -march=native ....... 7178.7 |============================================= ASKAP 2018-11-10 Test: tConvolve MT - Gridding Million Grid Points Per Second > Higher Is Better -O3 -march=native -flto ................. 1949.81 |============================ -O3 -march=native -flto -fwhole-program . 1948.42 |============================ -O3 -march=native ....................... 1946.68 |============================ ASKAP 2018-11-10 Test: tConvolve MT - Degridding Million Grid Points Per Second > Higher Is Better -O3 -march=native -flto ................. 3366.19 |============================ -O3 -march=native -flto -fwhole-program . 3363.82 |============================ -O3 -march=native ....................... 3369.74 |============================ ASKAP 2018-11-10 Test: tConvolve OpenMP - Gridding Million Grid Points Per Second > Higher Is Better -O3 -march=native -flto ................. 5435.31 |============================ -O3 -march=native -flto -fwhole-program . 5433.80 |============================ -O3 -march=native ....................... 5471.50 |============================ ASKAP 2018-11-10 Test: tConvolve OpenMP - Degridding Million Grid Points Per Second > Higher Is Better -O3 -march=native -flto ................. 4117.58 |============================ -O3 -march=native -flto -fwhole-program . 4117.58 |============================ -O3 -march=native ....................... 4138.92 |============================ GROMACS 2019.4 Water Benchmark Ns Per Day > Higher Is Better -O3 -march=native -flto ................. 2.517 |============================== -O3 -march=native -flto -fwhole-program . 2.515 |============================== -O3 -march=native ....................... 2.514 |============================== PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better -O3 -march=native -flto . 701920.06 |========================================== -O3 -march=native ....... 670670.78 |======================================== PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only TPS > Higher Is Better -O3 -march=native -flto . 703431.18 |========================================== -O3 -march=native ....... 676431.71 |======================================== SQLite Speedtest 3.30 Timed Time - Size 1,000 Seconds < Lower Is Better -O3 -march=native -flto ................. 56.44 |============================== -O3 -march=native -flto -fwhole-program . 56.22 |============================= -O3 -march=native ....................... 57.37 |============================== Facebook RocksDB 6.3.6 Test: Random Fill Op/s > Higher Is Better -O3 -march=native -flto ................. 916114 |============================= -O3 -march=native -flto -fwhole-program . 919822 |============================= -O3 -march=native ....................... 927119 |============================= Facebook RocksDB 6.3.6 Test: Random Read Op/s > Higher Is Better -O3 -march=native -flto ................. 147319777 |========================== -O3 -march=native -flto -fwhole-program . 141628836 |========================= -O3 -march=native ....................... 145113856 |========================== Facebook RocksDB 6.3.6 Test: Sequential Fill Op/s > Higher Is Better -O3 -march=native -flto ................. 1010840 |============================ -O3 -march=native -flto -fwhole-program . 1019279 |============================ -O3 -march=native ....................... 1018223 |============================ Facebook RocksDB 6.3.6 Test: Random Fill Sync Op/s > Higher Is Better -O3 -march=native -flto ................. 24409 |============================== -O3 -march=native -flto -fwhole-program . 24495 |============================== -O3 -march=native ....................... 24502 |============================== Facebook RocksDB 6.3.6 Test: Read While Writing Op/s > Higher Is Better -O3 -march=native -flto ................. 4901767 |============================ -O3 -march=native -flto -fwhole-program . 4898750 |============================ -O3 -march=native ....................... 4868266 |============================ NGINX Benchmark 1.9.9 Static Web Page Serving Requests Per Second > Higher Is Better -O3 -march=native -flto ................. 43673.89 |=========================== -O3 -march=native -flto -fwhole-program . 43510.97 |=========================== -O3 -march=native ....................... 43138.29 |===========================