GCC 9.1 PGO Optimizations AMD Threadripper

AMD Ryzen Threadripper 2990WX GCC 9 PGO benchmarks by Michael Larabel (Profile Guided Optimizations).

HTML result view exported from: https://openbenchmarking.org/result/1905138-HV-GCC91COMP58&sor&gru.

GCC 9.1 PGO Optimizations AMD ThreadripperProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen Resolution-O3 -march=native-O3 -march=native + PGOAMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads)ASUS ROG ZENITH EXTREME (1701 BIOS)AMD 17h32768MBSamsung SSD 970 EVO 500GBAMD Radeon RX 64 8GB (1590/800MHz)Realtek ALC1220ASUS VP28UIntel I211 + Qualcomm Atheros QCA6174 802.11ac + Wilocity Wil6200 802.11adUbuntu 18.044.18.0-18-generic (x86_64)GNOME Shell 3.28.3X Server 1.20.1amdgpu 18.1.04.5 Mesa 18.2.8 (LLVM 7.0.0)GCC 9.1.0ext43840x2160OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- --disable-multilib --enable-checing=releaseProcessor Details- Scaling Governor: acpi-cpufreq ondemandPython Details- Python 2.7.15rc1 + Python 3.6.7Security Details- __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

GCC 9.1 PGO Optimizations AMD Threadripperaom-av1: AV1 Video Encodingx265: H.265 1080p Video Encodinghpcg: scimark2: Compositescimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationhimeno: Poisson Pressure Solvertscp: AI Chess Performancestockfish: Total Timemcperf: Addt-test1: 1mafft: Multiple Sequence Alignmentc-ray: Total Time - 4K, 16 Rays Per Pixelsmallpt: Global Illumination Renderer; 128 Samplesaobench: 2048 x 2048 - Total Timecpp-perf-bench: Atolcpp-perf-bench: Ctypecpp-perf-bench: Math Librarycpp-perf-bench: Rand Numberscpp-perf-bench: Stepanov Vectorcpp-perf-bench: Function Objectscpp-perf-bench: Stepanov Abstraction-O3 -march=native-O3 -march=native + PGO0.2233.790.91255572826132206356220813211109114678418774777428.742.6317.963.8339.1069.3134.02353102775.2715.4628.360.2333.880.86245627126931396408219513071250067651021529.622.6417.733.8236.8668.9229.01348105172.8714.1427.75OpenBenchmarking.org

AOM AV1

AV1 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 2019-02-11AV1 Video Encoding-O3 -march=native + PGO-O3 -march=native0.05180.10360.15540.20720.259SE +/- 0.00, N = 3SE +/- 0.00, N = 30.230.22-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread

x265

H.265 1080p Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 3.0H.265 1080p Video Encoding-O3 -march=native + PGO-O3 -march=native816243240SE +/- 0.09, N = 3SE +/- 0.07, N = 333.8833.79-fprofile-correction1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread -lrt -ldl -lnuma

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0-O3 -march=native-O3 -march=native + PGO0.20480.40960.61440.81921.024SE +/- 0.03, N = 12SE +/- 0.03, N = 150.910.86

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Composite-O3 -march=native-O3 -march=native + PGO5001000150020002500SE +/- 2.91, N = 3SE +/- 9.91, N = 325552456-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte Carlo-O3 -march=native-O3 -march=native + PGO160320480640800SE +/- 0.27, N = 3SE +/- 0.21, N = 3728271-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier Transform-O3 -march=native + PGO-O3 -march=native60120180240300SE +/- 0.07, N = 3SE +/- 0.21, N = 3269261-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix Multiply-O3 -march=native-O3 -march=native + PGO7001400210028003500SE +/- 9.00, N = 3SE +/- 7.39, N = 332203139-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix Factorization-O3 -march=native + PGO-O3 -march=native14002800420056007000SE +/- 46.62, N = 3SE +/- 11.42, N = 364086356-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-Relaxation-O3 -march=native-O3 -march=native + PGO5001000150020002500SE +/- 0.39, N = 3SE +/- 0.45, N = 322082195-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solver-O3 -march=native-O3 -march=native + PGO30060090012001500SE +/- 3.47, N = 3SE +/- 0.26, N = 313211307-fprofile-correction1. (CC) gcc options: -O3 -march=native -mavx2

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess Performance-O3 -march=native + PGO-O3 -march=native300K600K900K1200K1500KSE +/- 1133.15, N = 5SE +/- 2184.70, N = 512500671109114-fprofile-correction1. (CC) gcc options: -O3 -march=native

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total Time-O3 -march=native-O3 -march=native + PGO15M30M45M60M75MSE +/- 458502.78, N = 3SE +/- 173283.29, N = 36784187765102152-fprofile-correction1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

Memcached mcperf

Method: Add

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: Add-O3 -march=native10K20K30K40K50KSE +/- 2440.20, N = 12477741. (CC) gcc options: -O3 -march=native -lm -rdynamic

t-test1

Threads: 1

OpenBenchmarking.orgSeconds, Fewer Is Bettert-test1 2017-01-13Threads: 1-O3 -march=native + PGO-O3 -march=native714212835SE +/- 0.01, N = 3SE +/- 0.39, N = 39.6228.74-fprofile-correction1. (CC) gcc options: -pthread -O3 -march=native

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence Alignment-O3 -march=native-O3 -march=native + PGO0.5941.1881.7822.3762.97SE +/- 0.00, N = 3SE +/- 0.02, N = 32.632.641. (CC) gcc options: -std=c99 -O3 -lm -lpthread

C-Ray

Total Time - 4K, 16 Rays Per Pixel

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time - 4K, 16 Rays Per Pixel-O3 -march=native + PGO-O3 -march=native48121620SE +/- 0.01, N = 3SE +/- 0.02, N = 317.7317.96-fprofile-correction1. (CC) gcc options: -lm -lpthread -O3 -march=native

Smallpt

Global Illumination Renderer; 128 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 128 Samples-O3 -march=native + PGO-O3 -march=native0.86181.72362.58543.44724.309SE +/- 0.02, N = 3SE +/- 0.04, N = 33.823.83-fprofile-correction1. (CXX) g++ options: -fopenmp -O3 -march=native

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total Time-O3 -march=native + PGO-O3 -march=native918273645SE +/- 0.01, N = 3SE +/- 0.01, N = 336.8639.10-fprofile-correction1. (CC) gcc options: -lm -O3 -march=native

CppPerformanceBenchmarks

Test: Atol

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Atol-O3 -march=native + PGO-O3 -march=native1530456075SE +/- 0.37, N = 3SE +/- 0.17, N = 368.9269.31-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

CppPerformanceBenchmarks

Test: Ctype

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Ctype-O3 -march=native + PGO-O3 -march=native816243240SE +/- 0.00, N = 3SE +/- 0.01, N = 329.0134.02-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

CppPerformanceBenchmarks

Test: Math Library

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Math Library-O3 -march=native + PGO-O3 -march=native80160240320400SE +/- 0.21, N = 3SE +/- 0.93, N = 3348353-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

CppPerformanceBenchmarks

Test: Random Numbers

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Random Numbers-O3 -march=native-O3 -march=native + PGO2004006008001000SE +/- 0.03, N = 3SE +/- 0.03, N = 310271051-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

CppPerformanceBenchmarks

Test: Stepanov Vector

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Stepanov Vector-O3 -march=native + PGO-O3 -march=native20406080100SE +/- 0.01, N = 3SE +/- 0.02, N = 372.8775.27-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

CppPerformanceBenchmarks

Test: Function Objects

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Function Objects-O3 -march=native + PGO-O3 -march=native48121620SE +/- 0.02, N = 3SE +/- 0.00, N = 314.1415.46-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

CppPerformanceBenchmarks

Test: Stepanov Abstraction

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Stepanov Abstraction-O3 -march=native + PGO-O3 -march=native714212835SE +/- 0.04, N = 3SE +/- 0.02, N = 327.7528.36-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11


Phoronix Test Suite v10.8.4