GCC 9.1 PGO Optimizations AMD Threadripper

AMD Ryzen Threadripper 2990WX GCC 9 PGO benchmarks by Michael Larabel (Profile Guided Optimizations).

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 1905138-HV-GCC91COMP58
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
-O3 -march=native
May 12 2019
  1 Hour, 59 Minutes
-O3 -march=native + PGO
May 12 2019
  1 Hour, 57 Minutes
Invert Behavior (Only Show Selected Data)
  1 Hour, 58 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


GCC 9.1 PGO Optimizations AMD ThreadripperOpenBenchmarking.orgPhoronix Test SuiteAMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads)ASUS ROG ZENITH EXTREME (1701 BIOS)AMD 17h32768MBSamsung SSD 970 EVO 500GBAMD Radeon RX 64 8GB (1590/800MHz)Realtek ALC1220ASUS VP28UIntel I211 + Qualcomm Atheros QCA6174 802.11ac + Wilocity Wil6200 802.11adUbuntu 18.044.18.0-18-generic (x86_64)GNOME Shell 3.28.3X Server 1.20.1amdgpu 18.1.04.5 Mesa 18.2.8 (LLVM 7.0.0)GCC 9.1.0ext43840x2160ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionGCC 9.1 PGO Optimizations AMD Threadripper BenchmarksSystem Logs- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=native- --disable-multilib --enable-checing=release- Scaling Governor: acpi-cpufreq ondemand- Python 2.7.15rc1 + Python 3.6.7- __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O3 -march=native vs. -O3 -march=native + PGO ComparisonPhoronix Test SuiteBaseline+49.7%+49.7%+99.4%+99.4%+149.1%+149.1%198.8%17.3%12.7%9.3%6.1%4.5%3.3%3.1%2.2%1Monte Carlo168.6%CtypeA.C.PFunction Objects2048 x 2048 - Total Time5.8%AV1 Video EncodingTotal Time4.2%Composite4%Stepanov VectorF.F.TS.M.M2.6%Rand Numbers2.3%S.At-test1SciMarkCppPerformanceBenchmarksTSCPCppPerformanceBenchmarksAOBenchHigh Performance Conjugate GradientAOM AV1StockfishSciMarkCppPerformanceBenchmarksSciMarkSciMarkCppPerformanceBenchmarksCppPerformanceBenchmarks-O3 -march=native-O3 -march=native + PGO

GCC 9.1 PGO Optimizations AMD Threadrippert-test1: 1hpcg: mafft: Multiple Sequence Alignmentscimark2: Compositescimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationtscp: AI Chess Performanceaom-av1: AV1 Video Encodingx265: H.265 1080p Video Encodinghimeno: Poisson Pressure Solverstockfish: Total Timec-ray: Total Time - 4K, 16 Rays Per Pixelsmallpt: Global Illumination Renderer; 128 Samplesaobench: 2048 x 2048 - Total Timecpp-perf-bench: Atolcpp-perf-bench: Ctypecpp-perf-bench: Math Librarycpp-perf-bench: Rand Numberscpp-perf-bench: Stepanov Vectorcpp-perf-bench: Function Objectscpp-perf-bench: Stepanov Abstractionmcperf: Add-O3 -march=native-O3 -march=native + PGO28.740.912.63255572826132206356220811091140.2233.7913216784187717.963.8339.1069.3134.02353102775.2715.4628.36477749.620.862.64245627126931396408219512500670.2333.8813076510215217.733.8236.8668.9229.01348105172.8714.1427.75OpenBenchmarking.org

t-test1

This is a test of t-test1 for basic memory allocator benchmarks. Note this test profile is currently very basic and the overall time does include the warmup time of the custom t-test1 compilation. Improvements welcome. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is Bettert-test1 2017-01-13Threads: 1-O3 -march=native-O3 -march=native + PGO714212835SE +/- 0.39, N = 3SE +/- 0.01, N = 328.749.62-fprofile-correction1. (CC) gcc options: -pthread -O3 -march=native

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0-O3 -march=native-O3 -march=native + PGO0.20480.40960.61440.81921.024SE +/- 0.03, N = 12SE +/- 0.03, N = 150.910.86

Timed MAFFT Alignment

This test performs an alignment of 100 pyruvate decarboxylase sequences. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence Alignment-O3 -march=native-O3 -march=native + PGO0.5941.1881.7822.3762.97SE +/- 0.00, N = 3SE +/- 0.02, N = 32.632.641. (CC) gcc options: -std=c99 -O3 -lm -lpthread

SciMark

This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Composite-O3 -march=native-O3 -march=native + PGO5001000150020002500SE +/- 2.91, N = 3SE +/- 9.91, N = 325552456-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte Carlo-O3 -march=native-O3 -march=native + PGO160320480640800SE +/- 0.27, N = 3SE +/- 0.21, N = 3728271-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier Transform-O3 -march=native-O3 -march=native + PGO60120180240300SE +/- 0.21, N = 3SE +/- 0.07, N = 3261269-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix Multiply-O3 -march=native-O3 -march=native + PGO7001400210028003500SE +/- 9.00, N = 3SE +/- 7.39, N = 332203139-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix Factorization-O3 -march=native-O3 -march=native + PGO14002800420056007000SE +/- 11.42, N = 3SE +/- 46.62, N = 363566408-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-Relaxation-O3 -march=native-O3 -march=native + PGO5001000150020002500SE +/- 0.39, N = 3SE +/- 0.45, N = 322082195-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

TSCP

This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess Performance-O3 -march=native-O3 -march=native + PGO300K600K900K1200K1500KSE +/- 2184.70, N = 5SE +/- 1133.15, N = 511091141250067-fprofile-correction1. (CC) gcc options: -O3 -march=native

AOM AV1

This is a simple test of the AOMedia AV1 encoder run on the CPU with a sample video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 2019-02-11AV1 Video Encoding-O3 -march=native-O3 -march=native + PGO0.05180.10360.15540.20720.259SE +/- 0.00, N = 3SE +/- 0.00, N = 30.220.23-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread

x265

This is a simple test of the x265 encoder run on the CPU with a sample 1080p video file. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 3.0H.265 1080p Video Encoding-O3 -march=native-O3 -march=native + PGO816243240SE +/- 0.07, N = 3SE +/- 0.09, N = 333.7933.88-fprofile-correction1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread -lrt -ldl -lnuma

Himeno Benchmark

The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solver-O3 -march=native-O3 -march=native + PGO30060090012001500SE +/- 3.47, N = 3SE +/- 0.26, N = 313211307-fprofile-correction1. (CC) gcc options: -O3 -march=native -mavx2

Stockfish

This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total Time-O3 -march=native-O3 -march=native + PGO15M30M45M60M75MSE +/- 458502.78, N = 3SE +/- 173283.29, N = 36784187765102152-fprofile-correction1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

C-Ray

This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time - 4K, 16 Rays Per Pixel-O3 -march=native-O3 -march=native + PGO48121620SE +/- 0.02, N = 3SE +/- 0.01, N = 317.9617.73-fprofile-correction1. (CC) gcc options: -lm -lpthread -O3 -march=native

Smallpt

Smallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 128 Samples-O3 -march=native-O3 -march=native + PGO0.86181.72362.58543.44724.309SE +/- 0.04, N = 3SE +/- 0.02, N = 33.833.82-fprofile-correction1. (CXX) g++ options: -fopenmp -O3 -march=native

AOBench

AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total Time-O3 -march=native-O3 -march=native + PGO918273645SE +/- 0.01, N = 3SE +/- 0.01, N = 339.1036.86-fprofile-correction1. (CC) gcc options: -lm -O3 -march=native

CppPerformanceBenchmarks

CppPerformanceBenchmarks is a set of C++ compiler performance benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Atol-O3 -march=native-O3 -march=native + PGO1530456075SE +/- 0.17, N = 3SE +/- 0.37, N = 369.3168.92-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Ctype-O3 -march=native-O3 -march=native + PGO816243240SE +/- 0.01, N = 3SE +/- 0.00, N = 334.0229.01-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Math Library-O3 -march=native-O3 -march=native + PGO80160240320400SE +/- 0.93, N = 3SE +/- 0.21, N = 3353348-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Random Numbers-O3 -march=native-O3 -march=native + PGO2004006008001000SE +/- 0.03, N = 3SE +/- 0.03, N = 310271051-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Stepanov Vector-O3 -march=native-O3 -march=native + PGO20406080100SE +/- 0.02, N = 3SE +/- 0.01, N = 375.2772.87-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Function Objects-O3 -march=native-O3 -march=native + PGO48121620SE +/- 0.00, N = 3SE +/- 0.02, N = 315.4614.14-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Stepanov Abstraction-O3 -march=native-O3 -march=native + PGO714212835SE +/- 0.02, N = 3SE +/- 0.04, N = 328.3627.75-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

Memcached mcperf

This is a test of twmperf/mcperf with memcached. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: Add-O3 -march=native10K20K30K40K50KSE +/- 2440.20, N = 12477741. (CC) gcc options: -O3 -march=native -lm -rdynamic