GCC 9.1 PGO Optimizations AMD Threadripper

AMD Ryzen Threadripper 2990WX GCC 9 PGO benchmarks by Michael Larabel (Profile Guided Optimizations).

HTML result view exported from: https://openbenchmarking.org/result/1905138-HV-GCC91COMP58&rdt&grs.

GCC 9.1 PGO Optimizations AMD ThreadripperProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen Resolution-O3 -march=native-O3 -march=native + PGOAMD Ryzen Threadripper 2990WX 32-Core @ 3.00GHz (32 Cores / 64 Threads)ASUS ROG ZENITH EXTREME (1701 BIOS)AMD 17h32768MBSamsung SSD 970 EVO 500GBAMD Radeon RX 64 8GB (1590/800MHz)Realtek ALC1220ASUS VP28UIntel I211 + Qualcomm Atheros QCA6174 802.11ac + Wilocity Wil6200 802.11adUbuntu 18.044.18.0-18-generic (x86_64)GNOME Shell 3.28.3X Server 1.20.1amdgpu 18.1.04.5 Mesa 18.2.8 (LLVM 7.0.0)GCC 9.1.0ext43840x2160OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- --disable-multilib --enable-checing=releaseProcessor Details- Scaling Governor: acpi-cpufreq ondemandPython Details- Python 2.7.15rc1 + Python 3.6.7Security Details- __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

GCC 9.1 PGO Optimizations AMD Threadrippert-test1: 1scimark2: Monte Carlocpp-perf-bench: Ctypetscp: AI Chess Performancecpp-perf-bench: Function Objectsaobench: 2048 x 2048 - Total Timeaom-av1: AV1 Video Encodingstockfish: Total Timescimark2: Compositecpp-perf-bench: Stepanov Vectorscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplycpp-perf-bench: Rand Numberscpp-perf-bench: Stepanov Abstractioncpp-perf-bench: Math Libraryc-ray: Total Time - 4K, 16 Rays Per Pixelhimeno: Poisson Pressure Solverscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationcpp-perf-bench: Atolmafft: Multiple Sequence Alignmentx265: H.265 1080p Video Encodingsmallpt: Global Illumination Renderer; 128 Samplesmcperf: Addhpcg: -O3 -march=native-O3 -march=native + PGO28.7472834.02110911415.4639.100.2267841877255575.272613220102728.3635317.9613216356220869.312.6333.793.83477740.919.6227129.01125006714.1436.860.2365102152245672.872693139105127.7534817.7313076408219568.922.6433.883.820.86OpenBenchmarking.org

t-test1

Threads: 1

OpenBenchmarking.orgSeconds, Fewer Is Bettert-test1 2017-01-13Threads: 1-O3 -march=native-O3 -march=native + PGO714212835SE +/- 0.39, N = 3SE +/- 0.01, N = 328.749.62-fprofile-correction1. (CC) gcc options: -pthread -O3 -march=native

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte Carlo-O3 -march=native-O3 -march=native + PGO160320480640800SE +/- 0.27, N = 3SE +/- 0.21, N = 3728271-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

CppPerformanceBenchmarks

Test: Ctype

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Ctype-O3 -march=native-O3 -march=native + PGO816243240SE +/- 0.01, N = 3SE +/- 0.00, N = 334.0229.01-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess Performance-O3 -march=native-O3 -march=native + PGO300K600K900K1200K1500KSE +/- 2184.70, N = 5SE +/- 1133.15, N = 511091141250067-fprofile-correction1. (CC) gcc options: -O3 -march=native

CppPerformanceBenchmarks

Test: Function Objects

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Function Objects-O3 -march=native-O3 -march=native + PGO48121620SE +/- 0.00, N = 3SE +/- 0.02, N = 315.4614.14-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total Time-O3 -march=native-O3 -march=native + PGO918273645SE +/- 0.01, N = 3SE +/- 0.01, N = 339.1036.86-fprofile-correction1. (CC) gcc options: -lm -O3 -march=native

AOM AV1

AV1 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is BetterAOM AV1 2019-02-11AV1 Video Encoding-O3 -march=native-O3 -march=native + PGO0.05180.10360.15540.20720.259SE +/- 0.00, N = 3SE +/- 0.00, N = 30.220.23-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11 -U_FORTIFY_SOURCE -lm -lpthread

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total Time-O3 -march=native-O3 -march=native + PGO15M30M45M60M75MSE +/- 458502.78, N = 3SE +/- 173283.29, N = 36784187765102152-fprofile-correction1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Composite-O3 -march=native-O3 -march=native + PGO5001000150020002500SE +/- 2.91, N = 3SE +/- 9.91, N = 325552456-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

CppPerformanceBenchmarks

Test: Stepanov Vector

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Stepanov Vector-O3 -march=native-O3 -march=native + PGO20406080100SE +/- 0.02, N = 3SE +/- 0.01, N = 375.2772.87-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier Transform-O3 -march=native-O3 -march=native + PGO60120180240300SE +/- 0.21, N = 3SE +/- 0.07, N = 3261269-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix Multiply-O3 -march=native-O3 -march=native + PGO7001400210028003500SE +/- 9.00, N = 3SE +/- 7.39, N = 332203139-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

CppPerformanceBenchmarks

Test: Random Numbers

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Random Numbers-O3 -march=native-O3 -march=native + PGO2004006008001000SE +/- 0.03, N = 3SE +/- 0.03, N = 310271051-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

CppPerformanceBenchmarks

Test: Stepanov Abstraction

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Stepanov Abstraction-O3 -march=native-O3 -march=native + PGO714212835SE +/- 0.02, N = 3SE +/- 0.04, N = 328.3627.75-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

CppPerformanceBenchmarks

Test: Math Library

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Math Library-O3 -march=native-O3 -march=native + PGO80160240320400SE +/- 0.93, N = 3SE +/- 0.21, N = 3353348-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

C-Ray

Total Time - 4K, 16 Rays Per Pixel

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time - 4K, 16 Rays Per Pixel-O3 -march=native-O3 -march=native + PGO48121620SE +/- 0.02, N = 3SE +/- 0.01, N = 317.9617.73-fprofile-correction1. (CC) gcc options: -lm -lpthread -O3 -march=native

Himeno Benchmark

Poisson Pressure Solver

OpenBenchmarking.orgMFLOPS, More Is BetterHimeno Benchmark 3.0Poisson Pressure Solver-O3 -march=native-O3 -march=native + PGO30060090012001500SE +/- 3.47, N = 3SE +/- 0.26, N = 313211307-fprofile-correction1. (CC) gcc options: -O3 -march=native -mavx2

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix Factorization-O3 -march=native-O3 -march=native + PGO14002800420056007000SE +/- 11.42, N = 3SE +/- 46.62, N = 363566408-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-Relaxation-O3 -march=native-O3 -march=native + PGO5001000150020002500SE +/- 0.39, N = 3SE +/- 0.45, N = 322082195-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

CppPerformanceBenchmarks

Test: Atol

OpenBenchmarking.orgSeconds, Fewer Is BetterCppPerformanceBenchmarks 9Test: Atol-O3 -march=native-O3 -march=native + PGO1530456075SE +/- 0.17, N = 3SE +/- 0.37, N = 369.3168.92-fprofile-correction1. (CXX) g++ options: -O3 -march=native -std=c++11

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence Alignment-O3 -march=native-O3 -march=native + PGO0.5941.1881.7822.3762.97SE +/- 0.00, N = 3SE +/- 0.02, N = 32.632.641. (CC) gcc options: -std=c99 -O3 -lm -lpthread

x265

H.265 1080p Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx265 3.0H.265 1080p Video Encoding-O3 -march=native-O3 -march=native + PGO816243240SE +/- 0.07, N = 3SE +/- 0.09, N = 333.7933.88-fprofile-correction1. (CXX) g++ options: -O3 -march=native -rdynamic -lpthread -lrt -ldl -lnuma

Smallpt

Global Illumination Renderer; 128 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 128 Samples-O3 -march=native-O3 -march=native + PGO0.86181.72362.58543.44724.309SE +/- 0.04, N = 3SE +/- 0.02, N = 33.833.82-fprofile-correction1. (CXX) g++ options: -fopenmp -O3 -march=native

Memcached mcperf

Method: Add

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: Add-O3 -march=native10K20K30K40K50KSE +/- 2440.20, N = 12477741. (CC) gcc options: -O3 -march=native -lm -rdynamic

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0-O3 -march=native-O3 -march=native + PGO0.20480.40960.61440.81921.024SE +/- 0.03, N = 12SE +/- 0.03, N = 150.910.86


Phoronix Test Suite v10.8.5