AMD Ryzen Threadripper 3960X GCC 10 LTO benchmarking by Michael Larabel for a future article.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 1912225-PTS-THREADRI29 Threadripper 3960X GCC 10 LTO Testing - Phoronix Test Suite Threadripper 3960X GCC 10 LTO Testing AMD Ryzen Threadripper 3960X GCC 10 LTO benchmarking by Michael Larabel for a future article.
HTML result view exported from: https://openbenchmarking.org/result/1912225-PTS-THREADRI29&sro&grs .
Threadripper 3960X GCC 10 LTO Testing Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads) MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS) AMD Starship/Matisse 32768MB 1000GB Sabrent Rocket 4.0 1TB Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz) AMD Baffin HDMI/DP ASUS VP28U Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723 Ubuntu 19.10 5.4.0-nvme-hwmon (x86_64) GNOME Shell 3.34.1 X Server 1.20.5 modesetting 1.20.5 4.5 Mesa 19.2.1 (LLVM 9.0.0) GCC 10.0.0 20191208 ext4 3840x2160 OpenBenchmarking.org Environment Details - -O3 -march=native: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native" - -O3 -march=native -flto: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto" - -O3 -march=native -flto -fwhole-program: CXXFLAGS="-O3 -march=native -flto -fwhole-program" CFLAGS="-O3 -march=native -flto -fwhole-program" Compiler Details - --disable-multilib --enable-checking=release Disk Details - NONE / errors=remount-ro,relatime,rw Processor Details - Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
Threadripper 3960X GCC 10 LTO Testing build-imagemagick: Time To Compile byte: Dhrystone 2 fftw: Stock - 1D FFT Size 32 fftw: Stock - 2D FFT Size 4096 tscp: AI Chess Performance fftw: Float + SSE - 2D FFT Size 4096 hpcc: G-Ffte hpcc: G-Ffte fftw: Stock - 2D FFT Size 32 pgbench: Buffer Test - Normal Load - Read Only himeno: Poisson Pressure Solver rocksdb: Rand Read pgbench: Buffer Test - Heavy Contention - Read Only radiance: SMP Parallel hpcc: EP-DGEMM hpcc: Rand Ring Bandwidth crafty: Elapsed Time stockfish: Total Time sqlite-speedtest: Timed Time - Size 1,000 mkl-dnn: Convolution Batch conv_googlenet_v3 - f32 hpcc: Max Ping Pong Bandwidth mt-dgemm: Sustained Floating-Point Rate hpcc: G-Rand Access compress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 fftw: Float + SSE - 1D FFT Size 32 mkl-dnn: Convolution Batch conv_alexnet - f32 fftw: Float + SSE - 2D FFT Size 32 encode-mp3: WAV To MP3 nginx: Static Web Page Serving rocksdb: Rand Fill qmcpack: compress-xz: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 rocksdb: Seq Fill hpcc: EP-STREAM Triad askap: tConvolve OpenMP - Gridding rocksdb: Read While Writing radiance: Serial sqlite: 1 askap: tConvolve OpenMP - Degridding mkl-dnn: Recurrent Neural Network Training - f32 ttsiod-renderer: Phong Rendering With Soft-Shadow Mapping encode-flac: WAV To FLAC mkl-dnn: Deconvolution Batch deconv_1d - f32 rocksdb: Rand Fill Sync hpcc: G-Ptrans minife: Small hpcc: G-HPL askap: tConvolve MT - Degridding askap: tConvolve MT - Gridding gromacs: Water Benchmark openssl: RSA 4096-bit Performance mrbayes: Primate Phylogeny Analysis hpcc: Rand Ring Latency -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 15.336 47812720.8 13149 8209.1 1350615 23281 15.19630 15.19630 13217 670670.777447 4893.663462 145113856 676431.707071 168.585 32.49590 3.42188 9240651 79813741 57.367 52.2339 22614.628 8.781111 0.16431 9.994 15513 126.141 45957 6.710 43138.29 927119 1878 20.017 1018223 1.68426 5471.5 4868266 559.022 14.203 4138.92 194.173 946.107 8.079 2.30735 24502 5.79663 7745.54 63.54490 3369.74 1946.68 2.514 7178.7 71.834 0.45479 75.246 67070357.6 11201 8969.7 1422472 24505 15.21737 15.21737 12677 701920.062619 4766.396934 147319777 703431.182912 174.534 32.75707 3.33117 9209287 79613988 56.442 52.8269 22951.428 8.761475 0.16722 10.168 15488 126.971 46350 6.622 43673.89 916114 1895.1 19.867 1010840 1.68874 5435.31 4901767 556.135 14.232 4117.58 194.665 950.330 8.044 2.31627 24409 5.78791 7720.98 63.68487 3366.19 1949.81 2.517 7182.9 69.132 0.45234 74.873 64880292.8 11069 8540.7 1418074 24289 15.97783 15.97783 12595 4684.701975 141628836 170.631 33.59693 3.31499 8978346 81375940 56.224 53.2454 23030.709 8.625776 0.16679 10.140 15271 125.128 45734 6.697 43510.97 919822 1900.3 19.829 1019279 1.69770 5433.8 4898750 555.693 14.281 4117.58 195.071 950.184 8.073 2.31590 24495 5.80751 7735.16 63.71647 3363.82 1948.42 2.515 68.016 0.46971 OpenBenchmarking.org
Timed ImageMagick Compilation Time To Compile OpenBenchmarking.org Seconds, Fewer Is Better Timed ImageMagick Compilation 6.9.0 Time To Compile -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 20 40 60 80 100 SE +/- 0.16, N = 3 SE +/- 0.02, N = 3 SE +/- 0.25, N = 3 15.34 75.25 74.87
BYTE Unix Benchmark Computational Test: Dhrystone 2 OpenBenchmarking.org LPS, More Is Better BYTE Unix Benchmark 3.6 Computational Test: Dhrystone 2 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 14M 28M 42M 56M 70M SE +/- 256405.96, N = 3 SE +/- 508456.63, N = 3 SE +/- 276889.23, N = 3 47812720.8 67070357.6 64880292.8 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native
FFTW Build: Stock - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 3K 6K 9K 12K 15K SE +/- 27.74, N = 3 SE +/- 55.87, N = 3 SE +/- 3.21, N = 3 13149 11201 11069 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
FFTW Build: Stock - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 2K 4K 6K 8K 10K SE +/- 138.82, N = 3 SE +/- 155.20, N = 3 SE +/- 111.43, N = 3 8209.1 8969.7 8540.7 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
TSCP AI Chess Performance OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 300K 600K 900K 1200K 1500K SE +/- 1620.83, N = 5 SE +/- 1795.91, N = 5 SE +/- 1462.93, N = 5 1350615 1422472 1418074 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 5K 10K 15K 20K 25K SE +/- 190.93, N = 3 SE +/- 153.78, N = 3 SE +/- 143.81, N = 3 23281 24505 24289 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.45, N = 3 SE +/- 0.37, N = 3 15.20 15.22 15.98 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: G-Ffte OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.45, N = 3 SE +/- 0.37, N = 3 15.20 15.22 15.98 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
FFTW Build: Stock - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 3K 6K 9K 12K 15K SE +/- 6.49, N = 3 SE +/- 14.19, N = 3 SE +/- 72.89, N = 3 13217 12677 12595 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
PostgreSQL pgbench Scaling: Buffer Test - Test: Normal Load - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only -O3 -march=native -O3 -march=native -flto 150K 300K 450K 600K 750K SE +/- 654.90, N = 3 SE +/- 375.35, N = 3 670670.78 701920.06 -flto 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 1000 2000 3000 4000 5000 SE +/- 32.73, N = 3 SE +/- 22.94, N = 3 SE +/- 63.96, N = 4 4893.66 4766.40 4684.70 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -mavx2
Facebook RocksDB Test: Random Read OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Read -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 30M 60M 90M 120M 150M SE +/- 154500.46, N = 3 SE +/- 1236531.46, N = 3 SE +/- 84413.94, N = 3 145113856 147319777 141628836 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
PostgreSQL pgbench Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only -O3 -march=native -O3 -march=native -flto 150K 300K 450K 600K 750K SE +/- 3573.89, N = 3 SE +/- 5480.07, N = 3 676431.71 703431.18 -flto 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
Radiance Benchmark Test: SMP Parallel OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: SMP Parallel -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 40 80 120 160 200 168.59 174.53 170.63
HPC Challenge Test / Class: EP-DGEMM OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: EP-DGEMM -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 8 16 24 32 40 SE +/- 0.16, N = 3 SE +/- 0.64, N = 3 SE +/- 0.27, N = 3 32.50 32.76 33.60 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
HPC Challenge Test / Class: Random Ring Bandwidth OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 0.7699 1.5398 2.3097 3.0796 3.8495 SE +/- 0.03737, N = 3 SE +/- 0.02086, N = 3 SE +/- 0.08613, N = 3 3.42188 3.33117 3.31499 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
Crafty Elapsed Time OpenBenchmarking.org Nodes Per Second, More Is Better Crafty 25.2 Elapsed Time -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 2M 4M 6M 8M 10M SE +/- 19679.93, N = 3 SE +/- 13239.35, N = 3 SE +/- 5847.06, N = 3 9240651 9209287 8978346 1. (CC) gcc options: -pthread -lstdc++ -fprofile-use -lm
Stockfish Total Time OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 9 Total Time -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 20M 40M 60M 80M 100M SE +/- 1040327.44, N = 3 SE +/- 774628.71, N = 3 SE +/- 729395.20, N = 3 79813741 79613988 81375940 -fwhole-program 1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto
SQLite Speedtest Timed Time - Size 1,000 OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 13 26 39 52 65 SE +/- 0.11, N = 3 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 57.37 56.44 56.22 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -ldl -lz -lpthread
MKL-DNN DNNL Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 12 24 36 48 60 SE +/- 0.18, N = 3 SE +/- 0.34, N = 3 SE +/- 0.50, N = 3 52.23 52.83 53.25 MIN: 51.26 MIN: 51.44 MIN: 51.66 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
HPC Challenge Test / Class: Max Ping Pong Bandwidth OpenBenchmarking.org MB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 5K 10K 15K 20K 25K SE +/- 238.70, N = 3 SE +/- 301.57, N = 3 SE +/- 509.86, N = 3 22614.63 22951.43 23030.71 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 2 4 6 8 10 SE +/- 0.116188, N = 3 SE +/- 0.132221, N = 3 SE +/- 0.076167, N = 15 8.781111 8.761475 8.625776 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -fopenmp
HPC Challenge Test / Class: G-Random Access OpenBenchmarking.org GUP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Random Access -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 0.0376 0.0752 0.1128 0.1504 0.188 SE +/- 0.00098, N = 3 SE +/- 0.00027, N = 3 SE +/- 0.00041, N = 3 0.16431 0.16722 0.16679 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
Zstd Compression Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 OpenBenchmarking.org Seconds, Fewer Is Better Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 3 6 9 12 15 SE +/- 0.119, N = 5 SE +/- 0.025, N = 3 SE +/- 0.092, N = 3 9.994 10.168 10.140 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
FFTW Build: Float + SSE - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 3K 6K 9K 12K 15K SE +/- 84.89, N = 3 SE +/- 25.21, N = 3 SE +/- 111.55, N = 3 15513 15488 15271 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
MKL-DNN DNNL Harness: Convolution Batch conv_alexnet - Data Type: f32 OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_alexnet - Data Type: f32 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 30 60 90 120 150 SE +/- 0.75, N = 3 SE +/- 0.18, N = 3 SE +/- 1.81, N = 3 126.14 126.97 125.13 MIN: 124.08 MIN: 125.84 MIN: 120.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
FFTW Build: Float + SSE - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 10K 20K 30K 40K 50K SE +/- 23.68, N = 3 SE +/- 82.72, N = 3 SE +/- 49.75, N = 3 45957 46350 45734 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
LAME MP3 Encoding WAV To MP3 OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 2 4 6 8 10 SE +/- 0.016, N = 3 SE +/- 0.067, N = 3 SE +/- 0.007, N = 3 6.710 6.622 6.697 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -march=native -lm
NGINX Benchmark Static Web Page Serving OpenBenchmarking.org Requests Per Second, More Is Better NGINX Benchmark 1.9.9 Static Web Page Serving -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 9K 18K 27K 36K 45K SE +/- 628.72, N = 3 SE +/- 294.79, N = 3 SE +/- 103.71, N = 3 43138.29 43673.89 43510.97 -flto -flto -fwhole-program 1. (CC) gcc options: -lpthread -lcrypt -lcrypto -lz -O3 -march=native
Facebook RocksDB Test: Random Fill OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 200K 400K 600K 800K 1000K SE +/- 7063.47, N = 3 SE +/- 8977.10, N = 3 SE +/- 2437.07, N = 3 927119 916114 919822 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
QMCPACK OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.8 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 400 800 1200 1600 2000 1878.0 1895.1 1900.3 -flto -flto -fwhole-program 1. (CXX) g++ options: -O3 -march=native -fopenmp -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -lm
XZ Compression Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 OpenBenchmarking.org Seconds, Fewer Is Better XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 20.02 19.87 19.83 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -fvisibility=hidden -O3 -march=native
Facebook RocksDB Test: Sequential Fill OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Sequential Fill -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 200K 400K 600K 800K 1000K SE +/- 3160.54, N = 3 SE +/- 923.75, N = 3 SE +/- 1987.24, N = 3 1018223 1010840 1019279 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
HPC Challenge Test / Class: EP-STREAM Triad OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 0.382 0.764 1.146 1.528 1.91 SE +/- 0.00200, N = 3 SE +/- 0.00593, N = 3 SE +/- 0.01665, N = 3 1.68426 1.68874 1.69770 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve OpenMP - Gridding -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 1200 2400 3600 4800 6000 SE +/- 37.73, N = 3 SE +/- 64.06, N = 3 SE +/- 0.00, N = 3 5471.50 5435.31 5433.80 1. (CXX) g++ options: -lpthread
Facebook RocksDB Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Read While Writing -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 1000K 2000K 3000K 4000K 5000K SE +/- 30555.74, N = 3 SE +/- 9839.07, N = 3 SE +/- 22374.25, N = 3 4868266 4901767 4898750 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
Radiance Benchmark Test: Serial OpenBenchmarking.org Seconds, Fewer Is Better Radiance Benchmark 5.0 Test: Serial -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 120 240 360 480 600 559.02 556.14 555.69
SQLite Threads / Copies: 1 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.30.1 Threads / Copies: 1 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 14.20 14.23 14.28 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -lz -lm -ldl -lpthread
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve OpenMP - Degridding -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 900 1800 2700 3600 4500 SE +/- 21.33, N = 3 SE +/- 21.33, N = 3 SE +/- 21.33, N = 3 4138.92 4117.58 4117.58 1. (CXX) g++ options: -lpthread
MKL-DNN DNNL Harness: Recurrent Neural Network Training - Data Type: f32 OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Recurrent Neural Network Training - Data Type: f32 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 40 80 120 160 200 SE +/- 0.42, N = 3 SE +/- 0.33, N = 3 SE +/- 0.25, N = 3 194.17 194.67 195.07 MIN: 192.29 MIN: 192.64 MIN: 193.44 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
TTSIOD 3D Renderer Phong Rendering With Soft-Shadow Mapping OpenBenchmarking.org FPS, More Is Better TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 200 400 600 800 1000 SE +/- 3.96, N = 3 SE +/- 0.32, N = 3 SE +/- 1.61, N = 3 946.11 950.33 950.18 1. (CXX) g++ options: -O3 -march=native -fomit-frame-pointer -ffast-math -mtune=native -flto -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -fopenmp -fwhole-program -lstdc++
FLAC Audio Encoding WAV To FLAC OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.2 WAV To FLAC -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 2 4 6 8 10 SE +/- 0.005, N = 5 SE +/- 0.006, N = 5 SE +/- 0.004, N = 5 8.079 8.044 8.073 -flto -flto -fwhole-program 1. (CXX) g++ options: -O3 -march=native -fvisibility=hidden -logg -lm
MKL-DNN DNNL Harness: Deconvolution Batch deconv_1d - Data Type: f32 OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Deconvolution Batch deconv_1d - Data Type: f32 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 0.5212 1.0424 1.5636 2.0848 2.606 SE +/- 0.00889, N = 3 SE +/- 0.00389, N = 4 SE +/- 0.00798, N = 3 2.30735 2.31627 2.31590 MIN: 2.22 MIN: 2.25 MIN: 2.24 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
Facebook RocksDB Test: Random Fill Sync OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill Sync -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 5K 10K 15K 20K 25K SE +/- 29.87, N = 3 SE +/- 31.00, N = 3 SE +/- 34.81, N = 3 24502 24409 24495 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
HPC Challenge Test / Class: G-Ptrans OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ptrans -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 1.3067 2.6134 3.9201 5.2268 6.5335 SE +/- 0.02501, N = 3 SE +/- 0.01376, N = 3 SE +/- 0.01388, N = 3 5.79663 5.78791 5.80751 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 1700 3400 5100 6800 8500 SE +/- 18.94, N = 3 SE +/- 12.76, N = 3 SE +/- 8.54, N = 3 7745.54 7720.98 7735.16 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
HPC Challenge Test / Class: G-HPL OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 14 28 42 56 70 SE +/- 0.05, N = 3 SE +/- 0.14, N = 3 SE +/- 0.17, N = 3 63.54 63.68 63.72 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve MT - Degridding -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 700 1400 2100 2800 3500 SE +/- 3.60, N = 3 SE +/- 2.36, N = 3 SE +/- 1.18, N = 3 3369.74 3366.19 3363.82 1. (CXX) g++ options: -lpthread
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve MT - Gridding -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 400 800 1200 1600 2000 SE +/- 6.64, N = 3 SE +/- 2.80, N = 3 SE +/- 3.43, N = 3 1946.68 1949.81 1948.42 1. (CXX) g++ options: -lpthread
GROMACS Water Benchmark OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2019.4 Water Benchmark -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 0.5663 1.1326 1.6989 2.2652 2.8315 SE +/- 0.004, N = 3 SE +/- 0.001, N = 3 SE +/- 0.004, N = 3 2.514 2.517 2.515 -flto -flto -fwhole-program 1. (CXX) g++ options: -mavx2 -mfma -O3 -march=native -std=c++11 -funroll-all-loops -pthread -lrt -lpthread -lm
OpenSSL RSA 4096-bit Performance OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance -O3 -march=native -O3 -march=native -flto 1500 3000 4500 6000 7500 SE +/- 21.22, N = 3 SE +/- 21.57, N = 3 7178.7 7182.9 -flto 1. (CC) gcc options: -pthread -m64 -O3 -march=native -lssl -lcrypto -ldl
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 16 32 48 64 80 SE +/- 1.38, N = 13 SE +/- 0.87, N = 4 SE +/- 0.14, N = 3 71.83 69.13 68.02 -flto -flto -fwhole-program 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -march=native -lm
HPC Challenge Test / Class: Random Ring Latency OpenBenchmarking.org usecs, Fewer Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Latency -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 0.1057 0.2114 0.3171 0.4228 0.5285 SE +/- 0.00087, N = 3 SE +/- 0.00082, N = 3 SE +/- 0.01816, N = 3 0.45479 0.45234 0.46971 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
Phoronix Test Suite v10.8.4