Threadripper 3960X GCC 10 LTO Testing AMD Ryzen Threadripper 3960X GCC 10 LTO benchmarking by Michael Larabel for a future article. -O3 -march=native: Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160 -O3 -march=native -flto: Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160 -O3 -march=native -flto -fwhole-program: Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160 SQLite 3.30.1 Threads / Copies: 1 Seconds < Lower Is Better -O3 -march=native ....................... 14.20 |============================== -O3 -march=native -flto ................. 14.23 |============================== -O3 -march=native -flto -fwhole-program . 14.28 |============================== HPC Challenge 1.5.0 Test / Class: G-HPL GFLOPS > Higher Is Better -O3 -march=native ....................... 63.54 |============================== -O3 -march=native -flto ................. 63.68 |============================== -O3 -march=native -flto -fwhole-program . 63.72 |============================== HPC Challenge 1.5.0 Test / Class: G-Ffte GFLOPS > Higher Is Better -O3 -march=native ....................... 15.20 |============================= -O3 -march=native -flto ................. 15.22 |============================= -O3 -march=native -flto -fwhole-program . 15.98 |============================== HPC Challenge 1.5.0 Test / Class: G-Ffte GFLOP/s > Higher Is Better -O3 -march=native ....................... 15.20 |============================= -O3 -march=native -flto ................. 15.22 |============================= -O3 -march=native -flto -fwhole-program . 15.98 |============================== HPC Challenge 1.5.0 Test / Class: EP-DGEMM GFLOPS > Higher Is Better -O3 -march=native ....................... 32.50 |============================= -O3 -march=native -flto ................. 32.76 |============================= -O3 -march=native -flto -fwhole-program . 33.60 |============================== HPC Challenge 1.5.0 Test / Class: G-Ptrans GB/s > Higher Is Better -O3 -march=native ....................... 5.79663 |============================ -O3 -march=native -flto ................. 5.78791 |============================ -O3 -march=native -flto -fwhole-program . 5.80751 |============================ HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad GB/s > Higher Is Better -O3 -march=native ....................... 1.68426 |============================ -O3 -march=native -flto ................. 1.68874 |============================ -O3 -march=native -flto -fwhole-program . 1.69770 |============================ HPC Challenge 1.5.0 Test / Class: G-Random Access GUP/s > Higher Is Better -O3 -march=native ....................... 0.16431 |============================ -O3 -march=native -flto ................. 0.16722 |============================ -O3 -march=native -flto -fwhole-program . 0.16679 |============================ HPC Challenge 1.5.0 Test / Class: Random Ring Latency usecs < Lower Is Better -O3 -march=native ....................... 0.45479 |=========================== -O3 -march=native -flto ................. 0.45234 |=========================== -O3 -march=native -flto -fwhole-program . 0.46971 |============================ HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth GB/s > Higher Is Better -O3 -march=native ....................... 3.42188 |============================ -O3 -march=native -flto ................. 3.33117 |=========================== -O3 -march=native -flto -fwhole-program . 3.31499 |=========================== HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth MB/s > Higher Is Better -O3 -march=native ....................... 22614.63 |=========================== -O3 -march=native -flto ................. 22951.43 |=========================== -O3 -march=native -flto -fwhole-program . 23030.71 |=========================== miniFE 2.2 Problem Size: Small CG Mflops > Higher Is Better -O3 -march=native ....................... 7745.54 |============================ -O3 -march=native -flto ................. 7720.98 |============================ -O3 -march=native -flto -fwhole-program . 7735.16 |============================ FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 Mflops > Higher Is Better -O3 -march=native ....................... 13149 |============================== -O3 -march=native -flto ................. 11201 |========================== -O3 -march=native -flto -fwhole-program . 11069 |========================= FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 Mflops > Higher Is Better -O3 -march=native ....................... 13217 |============================== -O3 -march=native -flto ................. 12677 |============================= -O3 -march=native -flto -fwhole-program . 12595 |============================= FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 Mflops > Higher Is Better -O3 -march=native ....................... 8209.1 |=========================== -O3 -march=native -flto ................. 8969.7 |============================= -O3 -march=native -flto -fwhole-program . 8540.7 |============================ FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 Mflops > Higher Is Better -O3 -march=native ....................... 15513 |============================== -O3 -march=native -flto ................. 15488 |============================== -O3 -march=native -flto -fwhole-program . 15271 |============================== FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 Mflops > Higher Is Better -O3 -march=native ....................... 45957 |============================== -O3 -march=native -flto ................. 46350 |============================== -O3 -march=native -flto -fwhole-program . 45734 |============================== FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 Mflops > Higher Is Better -O3 -march=native ....................... 23281 |============================= -O3 -march=native -flto ................. 24505 |============================== -O3 -march=native -flto -fwhole-program . 24289 |============================== Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis Seconds < Lower Is Better -O3 -march=native ....................... 71.83 |============================== -O3 -march=native -flto ................. 69.13 |============================= -O3 -march=native -flto -fwhole-program . 68.02 |============================ QMCPACK 3.8 Total Execution Time - Seconds < Lower Is Better -O3 -march=native ....................... 1878.0 |============================= -O3 -march=native -flto ................. 1895.1 |============================= -O3 -march=native -flto -fwhole-program . 1900.3 |============================= BYTE Unix Benchmark 3.6 Computational Test: Dhrystone 2 LPS > Higher Is Better -O3 -march=native ....................... 47812720.8 |================== -O3 -march=native -flto ................. 67070357.6 |========================= -O3 -march=native -flto -fwhole-program . 64880292.8 |======================== Crafty 25.2 Elapsed Time Nodes Per Second > Higher Is Better -O3 -march=native ....................... 9240651 |============================ -O3 -march=native -flto ................. 9209287 |============================ -O3 -march=native -flto -fwhole-program . 8978346 |=========================== TSCP 1.81 AI Chess Performance Nodes Per Second > Higher Is Better -O3 -march=native ....................... 1350615 |=========================== -O3 -march=native -flto ................. 1422472 |============================ -O3 -march=native -flto -fwhole-program . 1418074 |============================ MKL-DNN DNNL 1.1 Harness: Deconvolution Batch deconv_1d - Data Type: f32 ms < Lower Is Better -O3 -march=native ....................... 2.30735 |============================ -O3 -march=native -flto ................. 2.31627 |============================ -O3 -march=native -flto -fwhole-program . 2.31590 |============================ MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_alexnet - Data Type: f32 ms < Lower Is Better -O3 -march=native ....................... 126.14 |============================= -O3 -march=native -flto ................. 126.97 |============================= -O3 -march=native -flto -fwhole-program . 125.13 |============================= MKL-DNN DNNL 1.1 Harness: Recurrent Neural Network Training - Data Type: f32 ms < Lower Is Better -O3 -march=native ....................... 194.17 |============================= -O3 -march=native -flto ................. 194.67 |============================= -O3 -march=native -flto -fwhole-program . 195.07 |============================= MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 ms < Lower Is Better -O3 -march=native ....................... 52.23 |============================= -O3 -march=native -flto ................. 52.83 |============================== -O3 -march=native -flto -fwhole-program . 53.25 |============================== TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping FPS > Higher Is Better -O3 -march=native ....................... 946.11 |============================= -O3 -march=native -flto ................. 950.33 |============================= -O3 -march=native -flto -fwhole-program . 950.18 |============================= ACES DGEMM 1.0 Sustained Floating-Point Rate GFLOP/s > Higher Is Better -O3 -march=native ....................... 8.781111 |=========================== -O3 -march=native -flto ................. 8.761475 |=========================== -O3 -march=native -flto -fwhole-program . 8.625776 |=========================== Himeno Benchmark 3.0 Poisson Pressure Solver MFLOPS > Higher Is Better -O3 -march=native ....................... 4893.66 |============================ -O3 -march=native -flto ................. 4766.40 |=========================== -O3 -march=native -flto -fwhole-program . 4684.70 |=========================== Stockfish 9 Total Time Nodes Per Second > Higher Is Better -O3 -march=native ....................... 79813741 |========================== -O3 -march=native -flto ................. 79613988 |========================== -O3 -march=native -flto -fwhole-program . 81375940 |=========================== Timed ImageMagick Compilation 6.9.0 Time To Compile Seconds < Lower Is Better -O3 -march=native ....................... 15.34 |====== -O3 -march=native -flto ................. 75.25 |============================== -O3 -march=native -flto -fwhole-program . 74.87 |============================== XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 Seconds < Lower Is Better -O3 -march=native ....................... 20.02 |============================== -O3 -march=native -flto ................. 19.87 |============================== -O3 -march=native -flto -fwhole-program . 19.83 |============================== Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 Seconds < Lower Is Better -O3 -march=native ....................... 9.994 |============================= -O3 -march=native -flto ................. 10.168 |============================= -O3 -march=native -flto -fwhole-program . 10.140 |============================= FLAC Audio Encoding 1.3.2 WAV To FLAC Seconds < Lower Is Better -O3 -march=native ....................... 8.079 |============================== -O3 -march=native -flto ................. 8.044 |============================== -O3 -march=native -flto -fwhole-program . 8.073 |============================== LAME MP3 Encoding 3.100 WAV To MP3 Seconds < Lower Is Better -O3 -march=native ....................... 6.710 |============================== -O3 -march=native -flto ................. 6.622 |============================== -O3 -march=native -flto -fwhole-program . 6.697 |============================== Radiance Benchmark 5.0 Test: Serial Seconds < Lower Is Better -O3 -march=native ....................... 559.02 |============================= -O3 -march=native -flto ................. 556.14 |============================= -O3 -march=native -flto -fwhole-program . 555.69 |============================= Radiance Benchmark 5.0 Test: SMP Parallel Seconds < Lower Is Better -O3 -march=native ....................... 168.59 |============================ -O3 -march=native -flto ................. 174.53 |============================= -O3 -march=native -flto -fwhole-program . 170.63 |============================ OpenSSL 1.1.1 RSA 4096-bit Performance Signs Per Second > Higher Is Better -O3 -march=native ....... 7178.7 |============================================= -O3 -march=native -flto . 7182.9 |============================================= ASKAP 2018-11-10 Test: tConvolve MT - Gridding Million Grid Points Per Second > Higher Is Better -O3 -march=native ....................... 1946.68 |============================ -O3 -march=native -flto ................. 1949.81 |============================ -O3 -march=native -flto -fwhole-program . 1948.42 |============================ ASKAP 2018-11-10 Test: tConvolve MT - Degridding Million Grid Points Per Second > Higher Is Better -O3 -march=native ....................... 3369.74 |============================ -O3 -march=native -flto ................. 3366.19 |============================ -O3 -march=native -flto -fwhole-program . 3363.82 |============================ ASKAP 2018-11-10 Test: tConvolve OpenMP - Gridding Million Grid Points Per Second > Higher Is Better -O3 -march=native ....................... 5471.50 |============================ -O3 -march=native -flto ................. 5435.31 |============================ -O3 -march=native -flto -fwhole-program . 5433.80 |============================ ASKAP 2018-11-10 Test: tConvolve OpenMP - Degridding Million Grid Points Per Second > Higher Is Better -O3 -march=native ....................... 4138.92 |============================ -O3 -march=native -flto ................. 4117.58 |============================ -O3 -march=native -flto -fwhole-program . 4117.58 |============================ GROMACS 2019.4 Water Benchmark Ns Per Day > Higher Is Better -O3 -march=native ....................... 2.514 |============================== -O3 -march=native -flto ................. 2.517 |============================== -O3 -march=native -flto -fwhole-program . 2.515 |============================== PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only TPS > Higher Is Better -O3 -march=native ....... 670670.78 |======================================== -O3 -march=native -flto . 701920.06 |========================================== PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only TPS > Higher Is Better -O3 -march=native ....... 676431.71 |======================================== -O3 -march=native -flto . 703431.18 |========================================== SQLite Speedtest 3.30 Timed Time - Size 1,000 Seconds < Lower Is Better -O3 -march=native ....................... 57.37 |============================== -O3 -march=native -flto ................. 56.44 |============================== -O3 -march=native -flto -fwhole-program . 56.22 |============================= Facebook RocksDB 6.3.6 Test: Random Fill Op/s > Higher Is Better -O3 -march=native ....................... 927119 |============================= -O3 -march=native -flto ................. 916114 |============================= -O3 -march=native -flto -fwhole-program . 919822 |============================= Facebook RocksDB 6.3.6 Test: Random Read Op/s > Higher Is Better -O3 -march=native ....................... 145113856 |========================== -O3 -march=native -flto ................. 147319777 |========================== -O3 -march=native -flto -fwhole-program . 141628836 |========================= Facebook RocksDB 6.3.6 Test: Sequential Fill Op/s > Higher Is Better -O3 -march=native ....................... 1018223 |============================ -O3 -march=native -flto ................. 1010840 |============================ -O3 -march=native -flto -fwhole-program . 1019279 |============================ Facebook RocksDB 6.3.6 Test: Random Fill Sync Op/s > Higher Is Better -O3 -march=native ....................... 24502 |============================== -O3 -march=native -flto ................. 24409 |============================== -O3 -march=native -flto -fwhole-program . 24495 |============================== Facebook RocksDB 6.3.6 Test: Read While Writing Op/s > Higher Is Better -O3 -march=native ....................... 4868266 |============================ -O3 -march=native -flto ................. 4901767 |============================ -O3 -march=native -flto -fwhole-program . 4898750 |============================ NGINX Benchmark 1.9.9 Static Web Page Serving Requests Per Second > Higher Is Better -O3 -march=native ....................... 43138.29 |=========================== -O3 -march=native -flto ................. 43673.89 |=========================== -O3 -march=native -flto -fwhole-program . 43510.97 |===========================