AMD Ryzen Threadripper 3960X GCC 10 LTO benchmarking by Michael Larabel for a future article.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 1912225-PTS-THREADRI29
Threadripper 3960X GCC 10 LTO Testing
AMD Ryzen Threadripper 3960X GCC 10 LTO benchmarking by Michael Larabel for a future article.
-O3 -march=native:
Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723
OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160
-O3 -march=native -flto:
Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723
OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160
-O3 -march=native -flto -fwhole-program:
Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723
OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160
ACES DGEMM 1.0
Sustained Floating-Point Rate
GFLOP/s > Higher Is Better
-O3 -march=native ....................... 8.781111 |===========================
-O3 -march=native -flto ................. 8.761475 |===========================
-O3 -march=native -flto -fwhole-program . 8.625776 |===========================
ASKAP 2018-11-10
Test: tConvolve MT - Gridding
Million Grid Points Per Second > Higher Is Better
-O3 -march=native ....................... 1946.68 |============================
-O3 -march=native -flto ................. 1949.81 |============================
-O3 -march=native -flto -fwhole-program . 1948.42 |============================
ASKAP 2018-11-10
Test: tConvolve MT - Degridding
Million Grid Points Per Second > Higher Is Better
-O3 -march=native ....................... 3369.74 |============================
-O3 -march=native -flto ................. 3366.19 |============================
-O3 -march=native -flto -fwhole-program . 3363.82 |============================
ASKAP 2018-11-10
Test: tConvolve OpenMP - Gridding
Million Grid Points Per Second > Higher Is Better
-O3 -march=native ....................... 5471.50 |============================
-O3 -march=native -flto ................. 5435.31 |============================
-O3 -march=native -flto -fwhole-program . 5433.80 |============================
ASKAP 2018-11-10
Test: tConvolve OpenMP - Degridding
Million Grid Points Per Second > Higher Is Better
-O3 -march=native ....................... 4138.92 |============================
-O3 -march=native -flto ................. 4117.58 |============================
-O3 -march=native -flto -fwhole-program . 4117.58 |============================
BYTE Unix Benchmark 3.6
Computational Test: Dhrystone 2
LPS > Higher Is Better
-O3 -march=native ....................... 47812720.8 |==================
-O3 -march=native -flto ................. 67070357.6 |=========================
-O3 -march=native -flto -fwhole-program . 64880292.8 |========================
Crafty 25.2
Elapsed Time
Nodes Per Second > Higher Is Better
-O3 -march=native ....................... 9240651 |============================
-O3 -march=native -flto ................. 9209287 |============================
-O3 -march=native -flto -fwhole-program . 8978346 |===========================
Facebook RocksDB 6.3.6
Test: Random Fill
Op/s > Higher Is Better
-O3 -march=native ....................... 927119 |=============================
-O3 -march=native -flto ................. 916114 |=============================
-O3 -march=native -flto -fwhole-program . 919822 |=============================
Facebook RocksDB 6.3.6
Test: Random Read
Op/s > Higher Is Better
-O3 -march=native ....................... 145113856 |==========================
-O3 -march=native -flto ................. 147319777 |==========================
-O3 -march=native -flto -fwhole-program . 141628836 |=========================
Facebook RocksDB 6.3.6
Test: Sequential Fill
Op/s > Higher Is Better
-O3 -march=native ....................... 1018223 |============================
-O3 -march=native -flto ................. 1010840 |============================
-O3 -march=native -flto -fwhole-program . 1019279 |============================
Facebook RocksDB 6.3.6
Test: Random Fill Sync
Op/s > Higher Is Better
-O3 -march=native ....................... 24502 |==============================
-O3 -march=native -flto ................. 24409 |==============================
-O3 -march=native -flto -fwhole-program . 24495 |==============================
Facebook RocksDB 6.3.6
Test: Read While Writing
Op/s > Higher Is Better
-O3 -march=native ....................... 4868266 |============================
-O3 -march=native -flto ................. 4901767 |============================
-O3 -march=native -flto -fwhole-program . 4898750 |============================
FFTW 3.3.6
Build: Stock - Size: 1D FFT Size 32
Mflops > Higher Is Better
-O3 -march=native ....................... 13149 |==============================
-O3 -march=native -flto ................. 11201 |==========================
-O3 -march=native -flto -fwhole-program . 11069 |=========================
FFTW 3.3.6
Build: Stock - Size: 2D FFT Size 32
Mflops > Higher Is Better
-O3 -march=native ....................... 13217 |==============================
-O3 -march=native -flto ................. 12677 |=============================
-O3 -march=native -flto -fwhole-program . 12595 |=============================
FFTW 3.3.6
Build: Stock - Size: 2D FFT Size 4096
Mflops > Higher Is Better
-O3 -march=native ....................... 8209.1 |===========================
-O3 -march=native -flto ................. 8969.7 |=============================
-O3 -march=native -flto -fwhole-program . 8540.7 |============================
FFTW 3.3.6
Build: Float + SSE - Size: 1D FFT Size 32
Mflops > Higher Is Better
-O3 -march=native ....................... 15513 |==============================
-O3 -march=native -flto ................. 15488 |==============================
-O3 -march=native -flto -fwhole-program . 15271 |==============================
FFTW 3.3.6
Build: Float + SSE - Size: 2D FFT Size 32
Mflops > Higher Is Better
-O3 -march=native ....................... 45957 |==============================
-O3 -march=native -flto ................. 46350 |==============================
-O3 -march=native -flto -fwhole-program . 45734 |==============================
FFTW 3.3.6
Build: Float + SSE - Size: 2D FFT Size 4096
Mflops > Higher Is Better
-O3 -march=native ....................... 23281 |=============================
-O3 -march=native -flto ................. 24505 |==============================
-O3 -march=native -flto -fwhole-program . 24289 |==============================
FLAC Audio Encoding 1.3.2
WAV To FLAC
Seconds < Lower Is Better
-O3 -march=native ....................... 8.079 |==============================
-O3 -march=native -flto ................. 8.044 |==============================
-O3 -march=native -flto -fwhole-program . 8.073 |==============================
GROMACS 2019.4
Water Benchmark
Ns Per Day > Higher Is Better
-O3 -march=native ....................... 2.514 |==============================
-O3 -march=native -flto ................. 2.517 |==============================
-O3 -march=native -flto -fwhole-program . 2.515 |==============================
Himeno Benchmark 3.0
Poisson Pressure Solver
MFLOPS > Higher Is Better
-O3 -march=native ....................... 4893.66 |============================
-O3 -march=native -flto ................. 4766.40 |===========================
-O3 -march=native -flto -fwhole-program . 4684.70 |===========================
HPC Challenge 1.5.0
Test / Class: G-HPL
GFLOPS > Higher Is Better
-O3 -march=native ....................... 63.54 |==============================
-O3 -march=native -flto ................. 63.68 |==============================
-O3 -march=native -flto -fwhole-program . 63.72 |==============================
HPC Challenge 1.5.0
Test / Class: G-Ffte
GFLOPS > Higher Is Better
-O3 -march=native ....................... 15.20 |=============================
-O3 -march=native -flto ................. 15.22 |=============================
-O3 -march=native -flto -fwhole-program . 15.98 |==============================
HPC Challenge 1.5.0
Test / Class: G-Ffte
GFLOP/s > Higher Is Better
-O3 -march=native ....................... 15.20 |=============================
-O3 -march=native -flto ................. 15.22 |=============================
-O3 -march=native -flto -fwhole-program . 15.98 |==============================
HPC Challenge 1.5.0
Test / Class: EP-DGEMM
GFLOPS > Higher Is Better
-O3 -march=native ....................... 32.50 |=============================
-O3 -march=native -flto ................. 32.76 |=============================
-O3 -march=native -flto -fwhole-program . 33.60 |==============================
HPC Challenge 1.5.0
Test / Class: G-Ptrans
GB/s > Higher Is Better
-O3 -march=native ....................... 5.79663 |============================
-O3 -march=native -flto ................. 5.78791 |============================
-O3 -march=native -flto -fwhole-program . 5.80751 |============================
HPC Challenge 1.5.0
Test / Class: EP-STREAM Triad
GB/s > Higher Is Better
-O3 -march=native ....................... 1.68426 |============================
-O3 -march=native -flto ................. 1.68874 |============================
-O3 -march=native -flto -fwhole-program . 1.69770 |============================
HPC Challenge 1.5.0
Test / Class: G-Random Access
GUP/s > Higher Is Better
-O3 -march=native ....................... 0.16431 |============================
-O3 -march=native -flto ................. 0.16722 |============================
-O3 -march=native -flto -fwhole-program . 0.16679 |============================
HPC Challenge 1.5.0
Test / Class: Random Ring Latency
usecs < Lower Is Better
-O3 -march=native ....................... 0.45479 |===========================
-O3 -march=native -flto ................. 0.45234 |===========================
-O3 -march=native -flto -fwhole-program . 0.46971 |============================
HPC Challenge 1.5.0
Test / Class: Random Ring Bandwidth
GB/s > Higher Is Better
-O3 -march=native ....................... 3.42188 |============================
-O3 -march=native -flto ................. 3.33117 |===========================
-O3 -march=native -flto -fwhole-program . 3.31499 |===========================
HPC Challenge 1.5.0
Test / Class: Max Ping Pong Bandwidth
MB/s > Higher Is Better
-O3 -march=native ....................... 22614.63 |===========================
-O3 -march=native -flto ................. 22951.43 |===========================
-O3 -march=native -flto -fwhole-program . 23030.71 |===========================
LAME MP3 Encoding 3.100
WAV To MP3
Seconds < Lower Is Better
-O3 -march=native ....................... 6.710 |==============================
-O3 -march=native -flto ................. 6.622 |==============================
-O3 -march=native -flto -fwhole-program . 6.697 |==============================
miniFE 2.2
Problem Size: Small
CG Mflops > Higher Is Better
-O3 -march=native ....................... 7745.54 |============================
-O3 -march=native -flto ................. 7720.98 |============================
-O3 -march=native -flto -fwhole-program . 7735.16 |============================
MKL-DNN DNNL 1.1
Harness: Deconvolution Batch deconv_1d - Data Type: f32
ms < Lower Is Better
-O3 -march=native ....................... 2.30735 |============================
-O3 -march=native -flto ................. 2.31627 |============================
-O3 -march=native -flto -fwhole-program . 2.31590 |============================
MKL-DNN DNNL 1.1
Harness: Convolution Batch conv_alexnet - Data Type: f32
ms < Lower Is Better
-O3 -march=native ....................... 126.14 |=============================
-O3 -march=native -flto ................. 126.97 |=============================
-O3 -march=native -flto -fwhole-program . 125.13 |=============================
MKL-DNN DNNL 1.1
Harness: Recurrent Neural Network Training - Data Type: f32
ms < Lower Is Better
-O3 -march=native ....................... 194.17 |=============================
-O3 -march=native -flto ................. 194.67 |=============================
-O3 -march=native -flto -fwhole-program . 195.07 |=============================
MKL-DNN DNNL 1.1
Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32
ms < Lower Is Better
-O3 -march=native ....................... 52.23 |=============================
-O3 -march=native -flto ................. 52.83 |==============================
-O3 -march=native -flto -fwhole-program . 53.25 |==============================
NGINX Benchmark 1.9.9
Static Web Page Serving
Requests Per Second > Higher Is Better
-O3 -march=native ....................... 43138.29 |===========================
-O3 -march=native -flto ................. 43673.89 |===========================
-O3 -march=native -flto -fwhole-program . 43510.97 |===========================
OpenSSL 1.1.1
RSA 4096-bit Performance
Signs Per Second > Higher Is Better
-O3 -march=native ....... 7178.7 |=============================================
-O3 -march=native -flto . 7182.9 |=============================================
PostgreSQL pgbench 12.0
Scaling: Buffer Test - Test: Normal Load - Mode: Read Only
TPS > Higher Is Better
-O3 -march=native ....... 670670.78 |========================================
-O3 -march=native -flto . 701920.06 |==========================================
PostgreSQL pgbench 12.0
Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only
TPS > Higher Is Better
-O3 -march=native ....... 676431.71 |========================================
-O3 -march=native -flto . 703431.18 |==========================================
QMCPACK 3.8
Total Execution Time - Seconds < Lower Is Better
-O3 -march=native ....................... 1878.0 |=============================
-O3 -march=native -flto ................. 1895.1 |=============================
-O3 -march=native -flto -fwhole-program . 1900.3 |=============================
Radiance Benchmark 5.0
Test: Serial
Seconds < Lower Is Better
-O3 -march=native ....................... 559.02 |=============================
-O3 -march=native -flto ................. 556.14 |=============================
-O3 -march=native -flto -fwhole-program . 555.69 |=============================
Radiance Benchmark 5.0
Test: SMP Parallel
Seconds < Lower Is Better
-O3 -march=native ....................... 168.59 |============================
-O3 -march=native -flto ................. 174.53 |=============================
-O3 -march=native -flto -fwhole-program . 170.63 |============================
SQLite 3.30.1
Threads / Copies: 1
Seconds < Lower Is Better
-O3 -march=native ....................... 14.20 |==============================
-O3 -march=native -flto ................. 14.23 |==============================
-O3 -march=native -flto -fwhole-program . 14.28 |==============================
SQLite Speedtest 3.30
Timed Time - Size 1,000
Seconds < Lower Is Better
-O3 -march=native ....................... 57.37 |==============================
-O3 -march=native -flto ................. 56.44 |==============================
-O3 -march=native -flto -fwhole-program . 56.22 |=============================
Stockfish 9
Total Time
Nodes Per Second > Higher Is Better
-O3 -march=native ....................... 79813741 |==========================
-O3 -march=native -flto ................. 79613988 |==========================
-O3 -march=native -flto -fwhole-program . 81375940 |===========================
Timed ImageMagick Compilation 6.9.0
Time To Compile
Seconds < Lower Is Better
-O3 -march=native ....................... 15.34 |======
-O3 -march=native -flto ................. 75.25 |==============================
-O3 -march=native -flto -fwhole-program . 74.87 |==============================
Timed MrBayes Analysis 3.2.7
Primate Phylogeny Analysis
Seconds < Lower Is Better
-O3 -march=native ....................... 71.83 |==============================
-O3 -march=native -flto ................. 69.13 |=============================
-O3 -march=native -flto -fwhole-program . 68.02 |============================
TSCP 1.81
AI Chess Performance
Nodes Per Second > Higher Is Better
-O3 -march=native ....................... 1350615 |===========================
-O3 -march=native -flto ................. 1422472 |============================
-O3 -march=native -flto -fwhole-program . 1418074 |============================
TTSIOD 3D Renderer 2.3b
Phong Rendering With Soft-Shadow Mapping
FPS > Higher Is Better
-O3 -march=native ....................... 946.11 |=============================
-O3 -march=native -flto ................. 950.33 |=============================
-O3 -march=native -flto -fwhole-program . 950.18 |=============================
XZ Compression 5.2.4
Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9
Seconds < Lower Is Better
-O3 -march=native ....................... 20.02 |==============================
-O3 -march=native -flto ................. 19.87 |==============================
-O3 -march=native -flto -fwhole-program . 19.83 |==============================
Zstd Compression 1.3.4
Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19
Seconds < Lower Is Better
-O3 -march=native ....................... 9.994 |=============================
-O3 -march=native -flto ................. 10.168 |=============================
-O3 -march=native -flto -fwhole-program . 10.140 |=============================