AMD Ryzen Threadripper 3960X GCC 10 LTO benchmarking by Michael Larabel for a future article.
-O3 -march=native Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
-O3 -march=native -flto Environment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
-O3 -march=native -flto -fwhole-program Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723
OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160
Environment Notes: CXXFLAGS="-O3 -march=native -flto -fwhole-program" CFLAGS="-O3 -march=native -flto -fwhole-program"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 3K 6K 9K 12K 15K SE +/- 3.21, N = 3 SE +/- 55.87, N = 3 SE +/- 27.74, N = 3 11069 11201 13149 -flto -fwhole-program -flto 1. (CC) gcc options: -pthread -O3 -march=native -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 2K 4K 6K 8K 10K SE +/- 138.82, N = 3 SE +/- 111.43, N = 3 SE +/- 155.20, N = 3 8209.1 8540.7 8969.7 -flto -fwhole-program -flto 1. (CC) gcc options: -pthread -O3 -march=native -lm
TSCP This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 300K 600K 900K 1200K 1500K SE +/- 1620.83, N = 5 SE +/- 1462.93, N = 5 SE +/- 1795.91, N = 5 1350615 1418074 1422472 -flto -fwhole-program -flto 1. (CC) gcc options: -O3 -march=native
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 5K 10K 15K 20K 25K SE +/- 190.93, N = 3 SE +/- 143.81, N = 3 SE +/- 153.78, N = 3 23281 24289 24505 -flto -fwhole-program -flto 1. (CC) gcc options: -pthread -O3 -march=native -lm
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.45, N = 3 SE +/- 0.37, N = 3 15.20 15.22 15.98 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.45, N = 3 SE +/- 0.37, N = 3 15.20 15.22 15.98 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 3K 6K 9K 12K 15K SE +/- 72.89, N = 3 SE +/- 14.19, N = 3 SE +/- 6.49, N = 3 12595 12677 13217 -flto -fwhole-program -flto 1. (CC) gcc options: -pthread -O3 -march=native -lm
PostgreSQL pgbench This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only -O3 -march=native -O3 -march=native -flto 150K 300K 450K 600K 750K SE +/- 654.90, N = 3 SE +/- 375.35, N = 3 670670.78 701920.06 -flto 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
Himeno Benchmark The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 1000 2000 3000 4000 5000 SE +/- 63.96, N = 4 SE +/- 22.94, N = 3 SE +/- 32.73, N = 3 4684.70 4766.40 4893.66 -flto -fwhole-program -flto 1. (CC) gcc options: -O3 -march=native -mavx2
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Read -O3 -march=native -flto -fwhole-program -O3 -march=native -O3 -march=native -flto 30M 60M 90M 120M 150M SE +/- 84413.94, N = 3 SE +/- 154500.46, N = 3 SE +/- 1236531.46, N = 3 141628836 145113856 147319777 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
PostgreSQL pgbench This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only -O3 -march=native -O3 -march=native -flto 150K 300K 450K 600K 750K SE +/- 3573.89, N = 3 SE +/- 5480.07, N = 3 676431.71 703431.18 -flto 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: EP-DGEMM -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 8 16 24 32 40 SE +/- 0.16, N = 3 SE +/- 0.64, N = 3 SE +/- 0.27, N = 3 32.50 32.76 33.60 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 0.7699 1.5398 2.3097 3.0796 3.8495 SE +/- 0.08613, N = 3 SE +/- 0.02086, N = 3 SE +/- 0.03737, N = 3 3.31499 3.33117 3.42188 -flto -fwhole-program -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
Crafty This is a performance test of Crafty, an advanced open-source chess engine. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Crafty 25.2 Elapsed Time -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 2M 4M 6M 8M 10M SE +/- 5847.06, N = 3 SE +/- 13239.35, N = 3 SE +/- 19679.93, N = 3 8978346 9209287 9240651 1. (CC) gcc options: -pthread -lstdc++ -fprofile-use -lm
Stockfish This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 9 Total Time -O3 -march=native -flto -O3 -march=native -O3 -march=native -flto -fwhole-program 20M 40M 60M 80M 100M SE +/- 774628.71, N = 3 SE +/- 1040327.44, N = 3 SE +/- 729395.20, N = 3 79613988 79813741 81375940 -fwhole-program 1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -flto -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt
SQLite Speedtest This is a benchmark of SQLite's speedtest1 benchmark program with an increased problem size of 1,000. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 13 26 39 52 65 SE +/- 0.11, N = 3 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 57.37 56.44 56.22 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -ldl -lz -lpthread
MKL-DNN DNNL This is a test of the Intel MKL-DNN (DNNL / Deep Neural Network Library) as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 12 24 36 48 60 SE +/- 0.50, N = 3 SE +/- 0.34, N = 3 SE +/- 0.18, N = 3 53.25 52.83 52.23 MIN: 51.66 MIN: 51.44 MIN: 51.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 5K 10K 15K 20K 25K SE +/- 238.70, N = 3 SE +/- 301.57, N = 3 SE +/- 509.86, N = 3 22614.63 22951.43 23030.71 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
ACES DGEMM This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 2 4 6 8 10 SE +/- 0.076167, N = 15 SE +/- 0.132221, N = 3 SE +/- 0.116188, N = 3 8.625776 8.761475 8.781111 -flto -fwhole-program -flto 1. (CC) gcc options: -O3 -march=native -fopenmp
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GUP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Random Access -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 0.0376 0.0752 0.1128 0.1504 0.188 SE +/- 0.00098, N = 3 SE +/- 0.00041, N = 3 SE +/- 0.00027, N = 3 0.16431 0.16679 0.16722 -flto -fwhole-program -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
Zstd Compression This test measures the time needed to compress a sample file (an Ubuntu file-system image) using Zstd compression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 3 6 9 12 15 SE +/- 0.025, N = 3 SE +/- 0.092, N = 3 SE +/- 0.119, N = 5 10.168 10.140 9.994 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 3K 6K 9K 12K 15K SE +/- 111.55, N = 3 SE +/- 25.21, N = 3 SE +/- 84.89, N = 3 15271 15488 15513 -flto -fwhole-program -flto 1. (CC) gcc options: -pthread -O3 -march=native -lm
MKL-DNN DNNL This is a test of the Intel MKL-DNN (DNNL / Deep Neural Network Library) as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_alexnet - Data Type: f32 -O3 -march=native -flto -O3 -march=native -O3 -march=native -flto -fwhole-program 30 60 90 120 150 SE +/- 0.18, N = 3 SE +/- 0.75, N = 3 SE +/- 1.81, N = 3 126.97 126.14 125.13 MIN: 125.84 MIN: 124.08 MIN: 120.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 -O3 -march=native -flto -fwhole-program -O3 -march=native -O3 -march=native -flto 10K 20K 30K 40K 50K SE +/- 49.75, N = 3 SE +/- 23.68, N = 3 SE +/- 82.72, N = 3 45734 45957 46350 -flto -fwhole-program -flto 1. (CC) gcc options: -pthread -O3 -march=native -lm
LAME MP3 Encoding LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 2 4 6 8 10 SE +/- 0.016, N = 3 SE +/- 0.007, N = 3 SE +/- 0.067, N = 3 6.710 6.697 6.622 -flto -fwhole-program -flto 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -march=native -lm
NGINX Benchmark This is a test of ab, which is the Apache Benchmark program running against nginx. This test profile measures how many requests per second a given system can sustain when carrying out 2,000,000 requests with 500 requests being carried out concurrently. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better NGINX Benchmark 1.9.9 Static Web Page Serving -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 9K 18K 27K 36K 45K SE +/- 628.72, N = 3 SE +/- 103.71, N = 3 SE +/- 294.79, N = 3 43138.29 43510.97 43673.89 -flto -fwhole-program -flto 1. (CC) gcc options: -lpthread -lcrypt -lcrypto -lz -O3 -march=native
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 200K 400K 600K 800K 1000K SE +/- 8977.10, N = 3 SE +/- 2437.07, N = 3 SE +/- 7063.47, N = 3 916114 919822 927119 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
QMCPACK QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.8 -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 400 800 1200 1600 2000 1900.3 1895.1 1878.0 -flto -fwhole-program -flto 1. (CXX) g++ options: -O3 -march=native -fopenmp -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -lm
XZ Compression This test measures the time needed to compress a sample file (an Ubuntu file-system image) using XZ compression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 20.02 19.87 19.83 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -fvisibility=hidden -O3 -march=native
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Sequential Fill -O3 -march=native -flto -O3 -march=native -O3 -march=native -flto -fwhole-program 200K 400K 600K 800K 1000K SE +/- 923.75, N = 3 SE +/- 3160.54, N = 3 SE +/- 1987.24, N = 3 1010840 1018223 1019279 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 0.382 0.764 1.146 1.528 1.91 SE +/- 0.00200, N = 3 SE +/- 0.00593, N = 3 SE +/- 0.01665, N = 3 1.68426 1.68874 1.69770 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
ASKAP This is a CUDA benchmark of ATNF's ASKAP Benchmark with currently using the tConvolveCuda sub-test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve OpenMP - Gridding -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 1200 2400 3600 4800 6000 SE +/- 0.00, N = 3 SE +/- 64.06, N = 3 SE +/- 37.73, N = 3 5433.80 5435.31 5471.50 1. (CXX) g++ options: -lpthread
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Read While Writing -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 1000K 2000K 3000K 4000K 5000K SE +/- 30555.74, N = 3 SE +/- 22374.25, N = 3 SE +/- 9839.07, N = 3 4868266 4898750 4901767 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
SQLite This is a simple benchmark of SQLite. At present this test profile just measures the time to perform a pre-defined number of insertions on an indexed database. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.30.1 Threads / Copies: 1 -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 14.28 14.23 14.20 -flto -fwhole-program -flto 1. (CC) gcc options: -O3 -march=native -lz -lm -ldl -lpthread
ASKAP This is a CUDA benchmark of ATNF's ASKAP Benchmark with currently using the tConvolveCuda sub-test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve OpenMP - Degridding -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 900 1800 2700 3600 4500 SE +/- 21.33, N = 3 SE +/- 21.33, N = 3 SE +/- 21.33, N = 3 4117.58 4117.58 4138.92 1. (CXX) g++ options: -lpthread
MKL-DNN DNNL This is a test of the Intel MKL-DNN (DNNL / Deep Neural Network Library) as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Recurrent Neural Network Training - Data Type: f32 -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 40 80 120 160 200 SE +/- 0.25, N = 3 SE +/- 0.33, N = 3 SE +/- 0.42, N = 3 195.07 194.67 194.17 MIN: 193.44 MIN: 192.64 MIN: 192.29 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
TTSIOD 3D Renderer A portable GPL 3D software renderer that supports OpenMP and Intel Threading Building Blocks with many different rendering modes. This version does not use OpenGL but is entirely CPU/software based. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 200 400 600 800 1000 SE +/- 3.96, N = 3 SE +/- 1.61, N = 3 SE +/- 0.32, N = 3 946.11 950.18 950.33 1. (CXX) g++ options: -O3 -march=native -fomit-frame-pointer -ffast-math -mtune=native -flto -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -fopenmp -fwhole-program -lstdc++
FLAC Audio Encoding This test times how long it takes to encode a sample WAV file to FLAC format five times. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.2 WAV To FLAC -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 2 4 6 8 10 SE +/- 0.005, N = 5 SE +/- 0.004, N = 5 SE +/- 0.006, N = 5 8.079 8.073 8.044 -flto -fwhole-program -flto 1. (CXX) g++ options: -O3 -march=native -fvisibility=hidden -logg -lm
MKL-DNN DNNL This is a test of the Intel MKL-DNN (DNNL / Deep Neural Network Library) as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Deconvolution Batch deconv_1d - Data Type: f32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 0.5212 1.0424 1.5636 2.0848 2.606 SE +/- 0.00389, N = 4 SE +/- 0.00798, N = 3 SE +/- 0.00889, N = 3 2.31627 2.31590 2.30735 MIN: 2.25 MIN: 2.24 MIN: 2.22 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill Sync -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 5K 10K 15K 20K 25K SE +/- 31.00, N = 3 SE +/- 34.81, N = 3 SE +/- 29.87, N = 3 24409 24495 24502 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ptrans -O3 -march=native -flto -O3 -march=native -O3 -march=native -flto -fwhole-program 1.3067 2.6134 3.9201 5.2268 6.5335 SE +/- 0.01376, N = 3 SE +/- 0.02501, N = 3 SE +/- 0.01388, N = 3 5.78791 5.79663 5.80751 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
miniFE MiniFE Finite Element is an application for unstructured implicit finite element codes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 1700 3400 5100 6800 8500 SE +/- 12.76, N = 3 SE +/- 8.54, N = 3 SE +/- 18.94, N = 3 7720.98 7735.16 7745.54 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 14 28 42 56 70 SE +/- 0.05, N = 3 SE +/- 0.14, N = 3 SE +/- 0.17, N = 3 63.54 63.68 63.72 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
ASKAP This is a CUDA benchmark of ATNF's ASKAP Benchmark with currently using the tConvolveCuda sub-test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve MT - Degridding -O3 -march=native -flto -fwhole-program -O3 -march=native -flto -O3 -march=native 700 1400 2100 2800 3500 SE +/- 1.18, N = 3 SE +/- 2.36, N = 3 SE +/- 3.60, N = 3 3363.82 3366.19 3369.74 1. (CXX) g++ options: -lpthread
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve MT - Gridding -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 400 800 1200 1600 2000 SE +/- 6.64, N = 3 SE +/- 3.43, N = 3 SE +/- 2.80, N = 3 1946.68 1948.42 1949.81 1. (CXX) g++ options: -lpthread
GROMACS The Gromacs molecular dynamics package testing on the CPU with the water_GMX50 data. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2019.4 Water Benchmark -O3 -march=native -O3 -march=native -flto -fwhole-program -O3 -march=native -flto 0.5663 1.1326 1.6989 2.2652 2.8315 SE +/- 0.004, N = 3 SE +/- 0.004, N = 3 SE +/- 0.001, N = 3 2.514 2.515 2.517 -flto -fwhole-program -flto 1. (CXX) g++ options: -mavx2 -mfma -O3 -march=native -std=c++11 -funroll-all-loops -pthread -lrt -lpthread -lm
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance -O3 -march=native -O3 -march=native -flto 1500 3000 4500 6000 7500 SE +/- 21.22, N = 3 SE +/- 21.57, N = 3 7178.7 7182.9 -flto 1. (CC) gcc options: -pthread -m64 -O3 -march=native -lssl -lcrypto -ldl
Timed MrBayes Analysis This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis -O3 -march=native -O3 -march=native -flto -O3 -march=native -flto -fwhole-program 16 32 48 64 80 SE +/- 1.38, N = 13 SE +/- 0.87, N = 4 SE +/- 0.14, N = 3 71.83 69.13 68.02 -flto -flto -fwhole-program 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -march=native -lm
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org usecs, Fewer Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Latency -O3 -march=native -flto -fwhole-program -O3 -march=native -O3 -march=native -flto 0.1057 0.2114 0.3171 0.4228 0.5285 SE +/- 0.01816, N = 3 SE +/- 0.00087, N = 3 SE +/- 0.00082, N = 3 0.46971 0.45479 0.45234 -flto -fwhole-program -flto 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
-O3 -march=native Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
Testing initiated at 21 December 2019 11:30 by user pts.
-O3 -march=native -flto Environment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
Testing initiated at 20 December 2019 20:56 by user pts.
-O3 -march=native -flto -fwhole-program Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723
OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160
Environment Notes: CXXFLAGS="-O3 -march=native -flto -fwhole-program" CFLAGS="-O3 -march=native -flto -fwhole-program"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
Testing initiated at 21 December 2019 06:49 by user pts.