AMD Ryzen Threadripper 3960X GCC 10 LTO benchmarking by Michael Larabel for a future article.
-O3 -march=native -flto Environment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
-O3 -march=native -flto -fwhole-program Environment Notes: CXXFLAGS="-O3 -march=native -flto -fwhole-program" CFLAGS="-O3 -march=native -flto -fwhole-program"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
-O3 -march=native Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723
OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-HPL -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 14 28 42 56 70 SE +/- 0.14, N = 3 SE +/- 0.17, N = 3 SE +/- 0.05, N = 3 63.68 63.72 63.54 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
QMCPACK QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.8 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 400 800 1200 1600 2000 1895.1 1900.3 1878.0 -flto -flto -fwhole-program 1. (CXX) g++ options: -O3 -march=native -fopenmp -fomit-frame-pointer -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -lm
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 5K 10K 15K 20K 25K SE +/- 153.78, N = 3 SE +/- 143.81, N = 3 SE +/- 190.93, N = 3 24505 24289 23281 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
Timed MrBayes Analysis This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 16 32 48 64 80 SE +/- 0.87, N = 4 SE +/- 0.14, N = 3 SE +/- 1.38, N = 13 69.13 68.02 71.83 -flto -flto -fwhole-program 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4a -msha -maes -mavx -mfma -mavx2 -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -march=native -lm
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 2K 4K 6K 8K 10K SE +/- 155.20, N = 3 SE +/- 111.43, N = 3 SE +/- 138.82, N = 3 8969.7 8540.7 8209.1 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
MKL-DNN DNNL This is a test of the Intel MKL-DNN (DNNL / Deep Neural Network Library) as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_googlenet_v3 - Data Type: f32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 12 24 36 48 60 SE +/- 0.34, N = 3 SE +/- 0.50, N = 3 SE +/- 0.18, N = 3 52.83 53.25 52.23 MIN: 51.44 MIN: 51.66 MIN: 51.26 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
PostgreSQL pgbench This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Normal Load - Mode: Read Only -O3 -march=native -flto -O3 -march=native 150K 300K 450K 600K 750K SE +/- 375.35, N = 3 SE +/- 654.90, N = 3 701920.06 670670.78 -flto 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
OpenBenchmarking.org TPS, More Is Better PostgreSQL pgbench 12.0 Scaling: Buffer Test - Test: Heavy Contention - Mode: Read Only -O3 -march=native -flto -O3 -march=native 150K 300K 450K 600K 750K SE +/- 5480.07, N = 3 SE +/- 3573.89, N = 3 703431.18 676431.71 -flto 1. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm
ASKAP This is a CUDA benchmark of ATNF's ASKAP Benchmark with currently using the tConvolveCuda sub-test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve MT - Degridding -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 700 1400 2100 2800 3500 SE +/- 2.36, N = 3 SE +/- 1.18, N = 3 SE +/- 3.60, N = 3 3366.19 3363.82 3369.74 1. (CXX) g++ options: -lpthread
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve MT - Gridding -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 400 800 1200 1600 2000 SE +/- 2.80, N = 3 SE +/- 3.43, N = 3 SE +/- 6.64, N = 3 1949.81 1948.42 1946.68 1. (CXX) g++ options: -lpthread
GROMACS The Gromacs molecular dynamics package testing on the CPU with the water_GMX50 data. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2019.4 Water Benchmark -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 0.5663 1.1326 1.6989 2.2652 2.8315 SE +/- 0.001, N = 3 SE +/- 0.004, N = 3 SE +/- 0.004, N = 3 2.517 2.515 2.514 -flto -flto -fwhole-program 1. (CXX) g++ options: -mavx2 -mfma -O3 -march=native -std=c++11 -funroll-all-loops -pthread -lrt -lpthread -lm
Himeno Benchmark The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 1000 2000 3000 4000 5000 SE +/- 22.94, N = 3 SE +/- 63.96, N = 4 SE +/- 32.73, N = 3 4766.40 4684.70 4893.66 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -mavx2
ACES DGEMM This is a multi-threaded DGEMM benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 2 4 6 8 10 SE +/- 0.132221, N = 3 SE +/- 0.076167, N = 15 SE +/- 0.116188, N = 3 8.761475 8.625776 8.781111 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -fopenmp
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill Sync -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 5K 10K 15K 20K 25K SE +/- 31.00, N = 3 SE +/- 34.81, N = 3 SE +/- 29.87, N = 3 24409 24495 24502 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Fill -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 200K 400K 600K 800K 1000K SE +/- 8977.10, N = 3 SE +/- 2437.07, N = 3 SE +/- 7063.47, N = 3 916114 919822 927119 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Random Read -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 30M 60M 90M 120M 150M SE +/- 1236531.46, N = 3 SE +/- 84413.94, N = 3 SE +/- 154500.46, N = 3 147319777 141628836 145113856 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Read While Writing -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 1000K 2000K 3000K 4000K 5000K SE +/- 9839.07, N = 3 SE +/- 22374.25, N = 3 SE +/- 30555.74, N = 3 4901767 4898750 4868266 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
SQLite Speedtest This is a benchmark of SQLite's speedtest1 benchmark program with an increased problem size of 1,000. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better SQLite Speedtest 3.30 Timed Time - Size 1,000 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 13 26 39 52 65 SE +/- 0.10, N = 3 SE +/- 0.06, N = 3 SE +/- 0.11, N = 3 56.44 56.22 57.37 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -ldl -lz -lpthread
Stockfish This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Stockfish 9 Total Time -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 20M 40M 60M 80M 100M SE +/- 774628.71, N = 3 SE +/- 729395.20, N = 3 SE +/- 1040327.44, N = 3 79613988 81375940 79813741 -fwhole-program 1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -flto -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt
Facebook RocksDB This is a benchmark of Facebook's RocksDB as an embeddable persistent key-value store for fast storage based on Google's LevelDB. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Op/s, More Is Better Facebook RocksDB 6.3.6 Test: Sequential Fill -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 200K 400K 600K 800K 1000K SE +/- 923.75, N = 3 SE +/- 1987.24, N = 3 SE +/- 3160.54, N = 3 1010840 1019279 1018223 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fno-builtin-memcmp -fno-rtti -rdynamic -lpthread
NGINX Benchmark This is a test of ab, which is the Apache Benchmark program running against nginx. This test profile measures how many requests per second a given system can sustain when carrying out 2,000,000 requests with 500 requests being carried out concurrently. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Requests Per Second, More Is Better NGINX Benchmark 1.9.9 Static Web Page Serving -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 9K 18K 27K 36K 45K SE +/- 294.79, N = 3 SE +/- 103.71, N = 3 SE +/- 628.72, N = 3 43673.89 43510.97 43138.29 -flto -flto -fwhole-program 1. (CC) gcc options: -lpthread -lcrypt -lcrypto -lz -O3 -march=native
MKL-DNN DNNL This is a test of the Intel MKL-DNN (DNNL / Deep Neural Network Library) as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Recurrent Neural Network Training - Data Type: f32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 40 80 120 160 200 SE +/- 0.33, N = 3 SE +/- 0.25, N = 3 SE +/- 0.42, N = 3 194.67 195.07 194.17 MIN: 192.64 MIN: 193.44 MIN: 192.29 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
miniFE MiniFE Finite Element is an application for unstructured implicit finite element codes. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 1700 3400 5100 6800 8500 SE +/- 12.76, N = 3 SE +/- 8.54, N = 3 SE +/- 18.94, N = 3 7720.98 7735.16 7745.54 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
MKL-DNN DNNL This is a test of the Intel MKL-DNN (DNNL / Deep Neural Network Library) as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Deconvolution Batch deconv_1d - Data Type: f32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 0.5212 1.0424 1.5636 2.0848 2.606 SE +/- 0.00389, N = 4 SE +/- 0.00798, N = 3 SE +/- 0.00889, N = 3 2.31627 2.31590 2.30735 MIN: 2.25 MIN: 2.24 MIN: 2.22 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
Crafty This is a performance test of Crafty, an advanced open-source chess engine. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better Crafty 25.2 Elapsed Time -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 2M 4M 6M 8M 10M SE +/- 13239.35, N = 3 SE +/- 5847.06, N = 3 SE +/- 19679.93, N = 3 9209287 8978346 9240651 1. (CC) gcc options: -pthread -lstdc++ -fprofile-use -lm
TTSIOD 3D Renderer A portable GPL 3D software renderer that supports OpenMP and Intel Threading Building Blocks with many different rendering modes. This version does not use OpenGL but is entirely CPU/software based. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better TTSIOD 3D Renderer 2.3b Phong Rendering With Soft-Shadow Mapping -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 200 400 600 800 1000 SE +/- 0.32, N = 3 SE +/- 1.61, N = 3 SE +/- 3.96, N = 3 950.33 950.18 946.11 1. (CXX) g++ options: -O3 -march=native -flto -fomit-frame-pointer -ffast-math -mtune=native -msse -mrecip -mfpmath=sse -msse2 -mssse3 -lSDL -fopenmp -fwhole-program -lstdc++
OpenSSL OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Signs Per Second, More Is Better OpenSSL 1.1.1 RSA 4096-bit Performance -O3 -march=native -flto -O3 -march=native 1500 3000 4500 6000 7500 SE +/- 21.57, N = 3 SE +/- 21.22, N = 3 7182.9 7178.7 -flto 1. (CC) gcc options: -pthread -m64 -O3 -march=native -lssl -lcrypto -ldl
XZ Compression This test measures the time needed to compress a sample file (an Ubuntu file-system image) using XZ compression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better XZ Compression 5.2.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 9 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 5 10 15 20 25 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 19.87 19.83 20.02 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -fvisibility=hidden -O3 -march=native
MKL-DNN DNNL This is a test of the Intel MKL-DNN (DNNL / Deep Neural Network Library) as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better MKL-DNN DNNL 1.1 Harness: Convolution Batch conv_alexnet - Data Type: f32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 30 60 90 120 150 SE +/- 0.18, N = 3 SE +/- 1.81, N = 3 SE +/- 0.75, N = 3 126.97 125.13 126.14 MIN: 125.84 MIN: 120.75 MIN: 124.08 1. (CXX) g++ options: -O3 -march=native -std=c++11 -msse4.1 -fPIC -fopenmp -pie -lpthread -ldl
SQLite This is a simple benchmark of SQLite. At present this test profile just measures the time to perform a pre-defined number of insertions on an indexed database. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.30.1 Threads / Copies: 1 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 SE +/- 0.01, N = 3 14.23 14.28 14.20 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -lz -lm -ldl -lpthread
FLAC Audio Encoding This test times how long it takes to encode a sample WAV file to FLAC format five times. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better FLAC Audio Encoding 1.3.2 WAV To FLAC -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 2 4 6 8 10 SE +/- 0.006, N = 5 SE +/- 0.004, N = 5 SE +/- 0.005, N = 5 8.044 8.073 8.079 -flto -flto -fwhole-program 1. (CXX) g++ options: -O3 -march=native -fvisibility=hidden -logg -lm
Zstd Compression This test measures the time needed to compress a sample file (an Ubuntu file-system image) using Zstd compression. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better Zstd Compression 1.3.4 Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 3 6 9 12 15 SE +/- 0.025, N = 3 SE +/- 0.092, N = 3 SE +/- 0.119, N = 5 10.168 10.140 9.994 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native -pthread -lz -llzma
ASKAP This is a CUDA benchmark of ATNF's ASKAP Benchmark with currently using the tConvolveCuda sub-test. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve OpenMP - Degridding -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 900 1800 2700 3600 4500 SE +/- 21.33, N = 3 SE +/- 21.33, N = 3 SE +/- 21.33, N = 3 4117.58 4117.58 4138.92 1. (CXX) g++ options: -lpthread
OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 2018-11-10 Test: tConvolve OpenMP - Gridding -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 1200 2400 3600 4800 6000 SE +/- 64.06, N = 3 SE +/- 0.00, N = 3 SE +/- 37.73, N = 3 5435.31 5433.80 5471.50 1. (CXX) g++ options: -lpthread
LAME MP3 Encoding LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Seconds, Fewer Is Better LAME MP3 Encoding 3.100 WAV To MP3 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 2 4 6 8 10 SE +/- 0.067, N = 3 SE +/- 0.007, N = 3 SE +/- 0.016, N = 3 6.622 6.697 6.710 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -ffast-math -funroll-loops -fschedule-insns2 -fbranch-count-reg -fforce-addr -pipe -march=native -lm
FFTW FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 10K 20K 30K 40K 50K SE +/- 82.72, N = 3 SE +/- 49.75, N = 3 SE +/- 23.68, N = 3 46350 45734 45957 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 3K 6K 9K 12K 15K SE +/- 55.87, N = 3 SE +/- 3.21, N = 3 SE +/- 27.74, N = 3 11201 11069 13149 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 3K 6K 9K 12K 15K SE +/- 14.19, N = 3 SE +/- 72.89, N = 3 SE +/- 6.49, N = 3 12677 12595 13217 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 3K 6K 9K 12K 15K SE +/- 25.21, N = 3 SE +/- 111.55, N = 3 SE +/- 84.89, N = 3 15488 15271 15513 -flto -flto -fwhole-program 1. (CC) gcc options: -pthread -O3 -march=native -lm
TSCP This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Nodes Per Second, More Is Better TSCP 1.81 AI Chess Performance -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 300K 600K 900K 1200K 1500K SE +/- 1795.91, N = 5 SE +/- 1462.93, N = 5 SE +/- 1620.83, N = 5 1422472 1418074 1350615 -flto -flto -fwhole-program 1. (CC) gcc options: -O3 -march=native
HPC Challenge HPC Challenge (HPCC) is a cluster-focused benchmark consisting of the HPL Linpack TPP benchmark, DGEMM, STREAM, PTRANS, RandomAccess, FFT, and communication bandwidth and latency. This HPC Challenge test profile attempts to ship with standard yet versatile configuration/input files though they can be modified. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org MB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Max Ping Pong Bandwidth -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 5K 10K 15K 20K 25K SE +/- 301.57, N = 3 SE +/- 509.86, N = 3 SE +/- 238.70, N = 3 22951.43 23030.71 22614.63 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Bandwidth -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 0.7699 1.5398 2.3097 3.0796 3.8495 SE +/- 0.02086, N = 3 SE +/- 0.08613, N = 3 SE +/- 0.03737, N = 3 3.33117 3.31499 3.42188 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org usecs, Fewer Is Better HPC Challenge 1.5.0 Test / Class: Random Ring Latency -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 0.1057 0.2114 0.3171 0.4228 0.5285 SE +/- 0.00082, N = 3 SE +/- 0.01816, N = 3 SE +/- 0.00087, N = 3 0.45234 0.46971 0.45479 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GUP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Random Access -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 0.0376 0.0752 0.1128 0.1504 0.188 SE +/- 0.00027, N = 3 SE +/- 0.00041, N = 3 SE +/- 0.00098, N = 3 0.16722 0.16679 0.16431 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: EP-STREAM Triad -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 0.382 0.764 1.146 1.528 1.91 SE +/- 0.00593, N = 3 SE +/- 0.01665, N = 3 SE +/- 0.00200, N = 3 1.68874 1.69770 1.68426 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GB/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ptrans -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 1.3067 2.6134 3.9201 5.2268 6.5335 SE +/- 0.01376, N = 3 SE +/- 0.01388, N = 3 SE +/- 0.02501, N = 3 5.78791 5.80751 5.79663 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: EP-DGEMM -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 8 16 24 32 40 SE +/- 0.64, N = 3 SE +/- 0.27, N = 3 SE +/- 0.16, N = 3 32.76 33.60 32.50 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GFLOP/s, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 4 8 12 16 20 SE +/- 0.45, N = 3 SE +/- 0.37, N = 3 SE +/- 0.07, N = 3 15.22 15.98 15.20 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
OpenBenchmarking.org GFLOPS, More Is Better HPC Challenge 1.5.0 Test / Class: G-Ffte -O3 -march=native -flto -O3 -march=native -flto -fwhole-program -O3 -march=native 4 8 12 16 20 SE +/- 0.45, N = 3 SE +/- 0.37, N = 3 SE +/- 0.07, N = 3 15.22 15.98 15.20 -flto -flto -fwhole-program 1. (CC) gcc options: -lblas -lm -pthread -lmpi -fomit-frame-pointer -O3 -march=native -funroll-loops 2. ATLAS + Open MPI 3.1.3
-O3 -march=native -flto Environment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
Testing initiated at 20 December 2019 20:56 by user pts.
-O3 -march=native -flto -fwhole-program Environment Notes: CXXFLAGS="-O3 -march=native -flto -fwhole-program" CFLAGS="-O3 -march=native -flto -fwhole-program"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
Testing initiated at 21 December 2019 06:49 by user pts.
-O3 -march=native Processor: AMD Ryzen Threadripper 3960X 24-Core @ 3.80GHz (24 Cores / 48 Threads), Motherboard: MSI Creator TRX40 (MS-7C59) v1.0 (1.12N1 BIOS), Chipset: AMD Starship/Matisse, Memory: 32768MB, Disk: 1000GB Sabrent Rocket 4.0 1TB, Graphics: Gigabyte AMD Radeon 540/540X/550/550X / RX 540X/550/550X 2GB (1206/1750MHz), Audio: AMD Baffin HDMI/DP, Monitor: ASUS VP28U, Network: Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Device 2723
OS: Ubuntu 19.10, Kernel: 5.4.0-nvme-hwmon (x86_64), Desktop: GNOME Shell 3.34.1, Display Server: X Server 1.20.5, Display Driver: modesetting 1.20.5, OpenGL: 4.5 Mesa 19.2.1 (LLVM 9.0.0), Compiler: GCC 10.0.0 20191208, File-System: ext4, Screen Resolution: 3840x2160
Environment Notes: CXXFLAGS="-O3 -march=native" CFLAGS="-O3 -march=native"Compiler Notes: --disable-multilib --enable-checking=releaseDisk Notes: NONE / errors=remount-ro,relatime,rwProcessor Notes: Scaling Governor: acpi-cpufreq ondemand - CPU Microcode: 0x8301025Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + tsx_async_abort: Not affected
Testing initiated at 21 December 2019 11:30 by user pts.