Xeon Gold GCC 8.2 RC1 PGO

GCC 8.2 RC1 compiler Profile Guided Optimization (PGO) benchmarks for a future article on Phoronix.com.

HTML result view exported from: https://openbenchmarking.org/result/1807196-RA-XEONGOLDG95&grt&sor.

Xeon Gold GCC 8.2 RC1 PGOProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen Resolution-O3 -march=native-O3 -march=native - PGO2 x Intel Xeon Gold 6138 @ 3.70GHz (40 Cores / 80 Threads)TYAN S7106 (V1.01 BIOS)Intel Sky Lake-E DMI3 Registers96256MB256GB Samsung SSD 850 + 2000GB Seagate ST2000DM006-2DM1 + 2 x 120GB TOSHIBA-TR150ASPEED ASPEED FamilyVE228Intel I210 Gigabit ConnectionUbuntu 18.044.15.0-23-generic (x86_64)GCC 8.1.1 20180719ext41920x1080OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- --disable-multilib --enable-checking=releaseDisk Details- CFQ / data=ordered,relatime,rwProcessor Details- Scaling Governor: intel_pstate powersavePython Details- Python 2.7.15rc1 + Python 3.6.5Security Details- KPTI + __user pointer sanitization + Full generic retpoline IBPB IBRS_FW Protection

Xeon Gold GCC 8.2 RC1 PGOcompress-7zip: Compress Speed Testaobench: 2048 x 2048 - Total Timeapache: Static Web Page Servingbullet: 3000 Fallbullet: 1000 Stackbullet: 136 Ragdollsbullet: 1000 Convexbullet: Prim Trimeshbullet: Convex Trimeshbullet: Raytestsc-ray: Total Timecompilebench: Initial Createcompilebench: Compilecompilebench: Read Compiled Treecrafty: Elapsed Timeebizzy: fftw: Stock - 1D FFT Size 512fftw: Stock - 2D FFT Size 512encode-flac: WAV To FLACgraphics-magick: HWB Color Spacegraphics-magick: Blurgraphics-magick: Local Adaptive Thresholdinggraphics-magick: Resizinggraphics-magick: Sharpenencode-mp3: WAV To MP3tjbench: Decompression Throughputm-queens: Time To Solveopenssl: RSA 4096-bit Performancepolybench-c: 3 Matrix Multiplicationspolybench-c: Correlation Computationpolybench-c: Covariance Computationpgbench: Buffer Test - Single Thread - Read Writepgbench: Buffer Test - Single Thread - Read Onlypgbench: Buffer Test - Normal Load - Read Writepgbench: Buffer Test - Normal Load - Read Onlyredis: SETredis: GETredis: LPUSHredis: LPOPredis: SADDscimark2: Compositescimark2: Fast Fourier Transformscimark2: Jacobi Successive Over-Relaxationscimark2: Monte Carloscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationsmallpt: Global Illumination Renderer; 100 Samplessqlite: Timed SQLite Insertionsstockfish: Total Timehmmer: Pfam Database Searchtscp: AI Chess Performancevpxenc: vpxenccompress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19-O3 -march=native-O3 -march=native - PGO14265643.3821804.004.384.872.784.781.021.212.742.65479.951655.642289.46733668110003039527.237878.0011.052191618818619610.36160.3429.667928.903.375.995.92366.3517350.802984.39604019.271526558.671782445.621550944.211543751.871665771.852316.77669.071788.57777.442869.665479.09384.897159936412.68123946913.24122.2314122341.242.20482.281633.752284.677395646102970828.867911.873.285.965.862190.87666.031765.21262.322862.245398.53374633445135338313.21OpenBenchmarking.org

7-Zip Compression

Compress Speed Test

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 16.02Compress Speed Test-O3 -march=native-O3 -march=native - PGO30K60K90K120K150KSE +/- 1581.90, N = 3SE +/- 2678.08, N = 31426561412231. (CXX) g++ options: -pipe -lpthread

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total Time-O3 -march=native - PGO-O3 -march=native1020304050SE +/- 0.47, N = 3SE +/- 0.66, N = 341.2443.38-fprofile-correction1. (CC) gcc options: -lm -O3 -march=native

Apache Benchmark

Static Web Page Serving

OpenBenchmarking.orgRequests Per Second, More Is BetterApache Benchmark 2.4.29Static Web Page Serving-O3 -march=native5K10K15K20K25KSE +/- 102.18, N = 321804.001. (CC) gcc options: -shared -fPIC -pthread -O3 -march=native

Bullet Physics Engine

Test: 3000 Fall

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 3000 Fall-O3 -march=native0.98551.9712.95653.9424.9275SE +/- 0.08, N = 34.381. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: 1000 Stack

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 Stack-O3 -march=native1.09582.19163.28744.38325.479SE +/- 0.09, N = 34.871. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: 136 Ragdolls

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 136 Ragdolls-O3 -march=native0.62551.2511.87652.5023.1275SE +/- 0.05, N = 32.781. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: 1000 Convex

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 Convex-O3 -march=native1.07552.1513.22654.3025.3775SE +/- 0.08, N = 34.781. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: Prim Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Prim Trimesh-O3 -march=native0.22950.4590.68850.9181.1475SE +/- 0.02, N = 31.021. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: Convex Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Convex Trimesh-O3 -march=native0.27230.54460.81691.08921.3615SE +/- 0.02, N = 31.211. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: Raytests

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Raytests-O3 -march=native0.61651.2331.84952.4663.0825SE +/- 0.05, N = 32.741. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time-O3 -march=native - PGO-O3 -march=native0.59631.19261.78892.38522.9815SE +/- 0.00, N = 3SE +/- 0.06, N = 62.202.65-fprofile-correction1. (CC) gcc options: -lm -lpthread -O3 -march=native

Compile Bench

Test: Initial Create

OpenBenchmarking.orgMB/s, More Is BetterCompile Bench 0.6Test: Initial Create-O3 -march=native - PGO-O3 -march=native100200300400500SE +/- 6.85, N = 3SE +/- 6.42, N = 6482.28479.95

Compile Bench

Test: Compile

OpenBenchmarking.orgMB/s, More Is BetterCompile Bench 0.6Test: Compile-O3 -march=native-O3 -march=native - PGO400800120016002000SE +/- 20.32, N = 3SE +/- 8.19, N = 31655.641633.75

Compile Bench

Test: Read Compiled Tree

OpenBenchmarking.orgMB/s, More Is BetterCompile Bench 0.6Test: Read Compiled Tree-O3 -march=native-O3 -march=native - PGO5001000150020002500SE +/- 32.86, N = 3SE +/- 28.64, N = 32289.462284.67

Crafty

Elapsed Time

OpenBenchmarking.orgNodes Per Second, More Is BetterCrafty 25.2Elapsed Time-O3 -march=native - PGO-O3 -march=native1.6M3.2M4.8M6.4M8MSE +/- 26393.81, N = 3SE +/- 36654.73, N = 3739564673366811. (CC) gcc options: -pthread -lstdc++ -fprofile-use -lm

ebizzy

OpenBenchmarking.orgRecords/s, More Is Betterebizzy 0.3-O3 -march=native - PGO-O3 -march=native200K400K600K800K1000KSE +/- 15560.63, N = 3SE +/- 14948.14, N = 510297081000303-fprofile-correction1. (CC) gcc options: -pthread -lpthread -O3 -march=native

FFTW

Build: Stock - Size: 1D FFT Size 512

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 1D FFT Size 512-O3 -march=native2K4K6K8K10KSE +/- 163.90, N = 39527.231. (CC) gcc options: -pthread -O3 -march=native -lm

FFTW

Build: Stock - Size: 2D FFT Size 512

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 2D FFT Size 512-O3 -march=native2K4K6K8K10KSE +/- 36.75, N = 37878.001. (CC) gcc options: -pthread -O3 -march=native -lm

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLAC-O3 -march=native3691215SE +/- 0.06, N = 511.051. (CXX) g++ options: -O3 -march=native -fvisibility=hidden -lm

GraphicsMagick

Operation: HWB Color Space

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: HWB Color Space-O3 -march=native50100150200250SE +/- 1.45, N = 32191. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

GraphicsMagick

Operation: Blur

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: Blur-O3 -march=native4080120160200SE +/- 0.33, N = 31611. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

GraphicsMagick

Operation: Local Adaptive Thresholding

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: Local Adaptive Thresholding-O3 -march=native20406080100881. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

GraphicsMagick

Operation: Resizing

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: Resizing-O3 -march=native4080120160200SE +/- 0.88, N = 31861. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

GraphicsMagick

Operation: Sharpen

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: Sharpen-O3 -march=native40801201602001961. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

LAME MP3 Encoding

WAV To MP3

OpenBenchmarking.orgSeconds, Fewer Is BetterLAME MP3 Encoding 3.100WAV To MP3-O3 -march=native3691215SE +/- 0.14, N = 310.361. (CC) gcc options: -O3 -march=native -lm

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 1.5.3Test: Decompression Throughput-O3 -march=native4080120160200SE +/- 3.01, N = 3160.341. (CC) gcc options: -O3 -march=native -lm

m-queens

Time To Solve

OpenBenchmarking.orgSeconds, Fewer Is Betterm-queens 1.1Time To Solve-O3 -march=native - PGO-O3 -march=native714212835SE +/- 0.04, N = 3SE +/- 0.11, N = 328.8629.66-fprofile-correction1. (CXX) g++ options: -fopenmp -O3 -march=native -O2

OpenSSL

RSA 4096-bit Performance

OpenBenchmarking.orgSigns Per Second, More Is BetterOpenSSL 1.1.0fRSA 4096-bit Performance-O3 -march=native-O3 -march=native - PGO2K4K6K8K10KSE +/- 61.67, N = 3SE +/- 50.78, N = 37928.907911.87-lssl1. (CC) gcc options: -O3 -pthread -m64 -lcrypto -ldl

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix Multiplications-O3 -march=native - PGO-O3 -march=native0.75831.51662.27493.03323.7915SE +/- 0.05, N = 3SE +/- 0.05, N = 33.283.37-fprofile-correction1. (CC) gcc options: -O3 -march=native

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation Computation-O3 -march=native - PGO-O3 -march=native1.34782.69564.04345.39126.739SE +/- 0.10, N = 3SE +/- 0.03, N = 35.965.99-fprofile-correction1. (CC) gcc options: -O3 -march=native

PolyBench-C

Test: Covariance Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Covariance Computation-O3 -march=native - PGO-O3 -march=native1.3322.6643.9965.3286.66SE +/- 0.11, N = 3SE +/- 0.08, N = 35.865.92-fprofile-correction1. (CC) gcc options: -O3 -march=native

PostgreSQL pgbench

Scaling: Buffer Test - Test: Single Thread - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 10.3Scaling: Buffer Test - Test: Single Thread - Mode: Read Write-O3 -march=native80160240320400SE +/- 3.05, N = 3366.351. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

PostgreSQL pgbench

Scaling: Buffer Test - Test: Single Thread - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 10.3Scaling: Buffer Test - Test: Single Thread - Mode: Read Only-O3 -march=native4K8K12K16K20KSE +/- 194.54, N = 317350.801. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

PostgreSQL pgbench

Scaling: Buffer Test - Test: Normal Load - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 10.3Scaling: Buffer Test - Test: Normal Load - Mode: Read Write-O3 -march=native6001200180024003000SE +/- 87.76, N = 62984.391. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

PostgreSQL pgbench

Scaling: Buffer Test - Test: Normal Load - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 10.3Scaling: Buffer Test - Test: Normal Load - Mode: Read Only-O3 -march=native130K260K390K520K650KSE +/- 3801.72, N = 3604019.271. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SET-O3 -march=native300K600K900K1200K1500KSE +/- 41218.58, N = 61526558.671. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: GET-O3 -march=native400K800K1200K1600K2000KSE +/- 69578.92, N = 61782445.621. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: LPUSH

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: LPUSH-O3 -march=native300K600K900K1200K1500KSE +/- 53817.21, N = 61550944.211. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: LPOP

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: LPOP-O3 -march=native300K600K900K1200K1500KSE +/- 51072.66, N = 61543751.871. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: SADD

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SADD-O3 -march=native400K800K1200K1600K2000KSE +/- 68674.39, N = 61665771.851. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Composite-O3 -march=native-O3 -march=native - PGO5001000150020002500SE +/- 19.54, N = 3SE +/- 18.71, N = 32316.772190.87-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier Transform-O3 -march=native-O3 -march=native - PGO140280420560700SE +/- 11.67, N = 3SE +/- 13.38, N = 3669.07666.03-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-Relaxation-O3 -march=native-O3 -march=native - PGO400800120016002000SE +/- 4.61, N = 3SE +/- 0.35, N = 31788.571765.21-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte Carlo-O3 -march=native-O3 -march=native - PGO2004006008001000SE +/- 2.96, N = 3SE +/- 0.47, N = 3777.44262.32-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix Multiply-O3 -march=native-O3 -march=native - PGO6001200180024003000SE +/- 19.55, N = 3SE +/- 13.47, N = 32869.662862.24-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix Factorization-O3 -march=native-O3 -march=native - PGO12002400360048006000SE +/- 95.81, N = 3SE +/- 94.72, N = 35479.095398.53-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 Samples-O3 -march=native-O3 -march=native - PGO0.6751.352.0252.73.37533-fprofile-correction1. (CXX) g++ options: -fopenmp -O3 -march=native

SQLite

Timed SQLite Insertions

OpenBenchmarking.orgSeconds, Fewer Is BetterSQLite 3.22Timed SQLite Insertions-O3 -march=native20406080100SE +/- 0.30, N = 384.891. (CC) gcc options: -O3 -march=native -lz -ldl -lpthread

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total Time-O3 -march=native - PGO-O3 -march=native16M32M48M64M80MSE +/- 565015.90, N = 3SE +/- 662704.17, N = 37463344571599364-fprofile-correction1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database Search-O3 -march=native3691215SE +/- 0.04, N = 312.681. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess Performance-O3 -march=native - PGO-O3 -march=native300K600K900K1200K1500KSE +/- 16207.22, N = 5SE +/- 14073.87, N = 513533831239469-fprofile-correction1. (CC) gcc options: -O3 -march=native

VP9 libvpx Encoding

vpxenc

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.7.0vpxenc-O3 -march=native-O3 -march=native - PGO3691215SE +/- 0.06, N = 3SE +/- 0.02, N = 313.2413.21-fprofile-correction1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=native -fPIC -U_FORTIFY_SOURCE

Zstd Compression

Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19

OpenBenchmarking.orgSeconds, Fewer Is BetterZstd Compression 1.3.4Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19-O3 -march=native306090120150SE +/- 0.40, N = 3122.231. (CC) gcc options: -O3 -march=native -pthread -lz


Phoronix Test Suite v10.8.5