Xeon Gold GCC 8.2 RC1 PGO

GCC 8.2 RC1 compiler Profile Guided Optimization (PGO) benchmarks for a future article on Phoronix.com.

HTML result view exported from: https://openbenchmarking.org/result/1807196-RA-XEONGOLDG95.

Xeon Gold GCC 8.2 RC1 PGOProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen Resolution-O3 -march=native-O3 -march=native - PGO2 x Intel Xeon Gold 6138 @ 3.70GHz (40 Cores / 80 Threads)TYAN S7106 (V1.01 BIOS)Intel Sky Lake-E DMI3 Registers96256MB256GB Samsung SSD 850 + 2000GB Seagate ST2000DM006-2DM1 + 2 x 120GB TOSHIBA-TR150ASPEED ASPEED FamilyVE228Intel I210 Gigabit ConnectionUbuntu 18.044.15.0-23-generic (x86_64)GCC 8.1.1 20180719ext41920x1080OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- --disable-multilib --enable-checking=releaseDisk Details- CFQ / data=ordered,relatime,rwProcessor Details- Scaling Governor: intel_pstate powersavePython Details- Python 2.7.15rc1 + Python 3.6.5Security Details- KPTI + __user pointer sanitization + Full generic retpoline IBPB IBRS_FW Protection

Xeon Gold GCC 8.2 RC1 PGOscimark2: Compositescimark2: Fast Fourier Transformscimark2: Jacobi Successive Over-Relaxationscimark2: Monte Carloscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationvpxenc: vpxencfftw: Stock - 1D FFT Size 512fftw: Stock - 2D FFT Size 512polybench-c: 3 Matrix Multiplicationspolybench-c: Correlation Computationpolybench-c: Covariance Computationsqlite: Timed SQLite Insertionsopenssl: RSA 4096-bit Performanceaobench: 2048 x 2048 - Total Timecompress-7zip: Compress Speed Testcompress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19m-queens: Time To Solvec-ray: Total Timesmallpt: Global Illumination Renderer; 100 Samplescrafty: Elapsed Timecompilebench: Initial Createcompilebench: Compilecompilebench: Read Compiled Treestockfish: Total Timetscp: AI Chess Performanceredis: SETredis: GETredis: LPUSHredis: LPOPredis: SADDpgbench: Buffer Test - Single Thread - Read Writepgbench: Buffer Test - Single Thread - Read Onlypgbench: Buffer Test - Normal Load - Read Writepgbench: Buffer Test - Normal Load - Read Onlytjbench: Decompression Throughputapache: Static Web Page Servingencode-mp3: WAV To MP3encode-flac: WAV To FLACebizzy: graphics-magick: HWB Color Spacegraphics-magick: Blurgraphics-magick: Local Adaptive Thresholdinggraphics-magick: Resizinggraphics-magick: Sharpenhmmer: Pfam Database Searchbullet: 3000 Fallbullet: 1000 Stackbullet: 136 Ragdollsbullet: 1000 Convexbullet: Prim Trimeshbullet: Convex Trimeshbullet: Raytests-O3 -march=native-O3 -march=native - PGO2316.77669.071788.57777.442869.665479.0913.249527.237878.003.375.995.9284.897928.9043.38142656122.2329.662.6537336681479.951655.642289.467159936412394691526558.671782445.621550944.211543751.871665771.85366.3517350.802984.39604019.27160.3421804.0010.3611.0510003032191618818619612.684.384.872.784.781.021.212.742190.87666.031765.21262.322862.245398.5313.213.285.965.867911.8741.2414122328.862.2037395646482.281633.752284.677463344513533831029708OpenBenchmarking.org

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Composite-O3 -march=native-O3 -march=native - PGO5001000150020002500SE +/- 19.54, N = 3SE +/- 18.71, N = 32316.772190.87-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier Transform-O3 -march=native-O3 -march=native - PGO140280420560700SE +/- 11.67, N = 3SE +/- 13.38, N = 3669.07666.03-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-Relaxation-O3 -march=native-O3 -march=native - PGO400800120016002000SE +/- 4.61, N = 3SE +/- 0.35, N = 31788.571765.21-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte Carlo-O3 -march=native-O3 -march=native - PGO2004006008001000SE +/- 2.96, N = 3SE +/- 0.47, N = 3777.44262.32-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix Multiply-O3 -march=native-O3 -march=native - PGO6001200180024003000SE +/- 19.55, N = 3SE +/- 13.47, N = 32869.662862.24-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix Factorization-O3 -march=native-O3 -march=native - PGO12002400360048006000SE +/- 95.81, N = 3SE +/- 94.72, N = 35479.095398.53-fprofile-correction1. (CC) gcc options: -O3 -march=native -lm

VP9 libvpx Encoding

vpxenc

OpenBenchmarking.orgFrames Per Second, More Is BetterVP9 libvpx Encoding 1.7.0vpxenc-O3 -march=native-O3 -march=native - PGO3691215SE +/- 0.06, N = 3SE +/- 0.02, N = 313.2413.21-fprofile-correction1. (CXX) g++ options: -m64 -lm -lpthread -O3 -march=native -fPIC -U_FORTIFY_SOURCE

FFTW

Build: Stock - Size: 1D FFT Size 512

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 1D FFT Size 512-O3 -march=native2K4K6K8K10KSE +/- 163.90, N = 39527.231. (CC) gcc options: -pthread -O3 -march=native -lm

FFTW

Build: Stock - Size: 2D FFT Size 512

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Stock - Size: 2D FFT Size 512-O3 -march=native2K4K6K8K10KSE +/- 36.75, N = 37878.001. (CC) gcc options: -pthread -O3 -march=native -lm

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix Multiplications-O3 -march=native-O3 -march=native - PGO0.75831.51662.27493.03323.7915SE +/- 0.05, N = 3SE +/- 0.05, N = 33.373.28-fprofile-correction1. (CC) gcc options: -O3 -march=native

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation Computation-O3 -march=native-O3 -march=native - PGO1.34782.69564.04345.39126.739SE +/- 0.03, N = 3SE +/- 0.10, N = 35.995.96-fprofile-correction1. (CC) gcc options: -O3 -march=native

PolyBench-C

Test: Covariance Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Covariance Computation-O3 -march=native-O3 -march=native - PGO1.3322.6643.9965.3286.66SE +/- 0.08, N = 3SE +/- 0.11, N = 35.925.86-fprofile-correction1. (CC) gcc options: -O3 -march=native

SQLite

Timed SQLite Insertions

OpenBenchmarking.orgSeconds, Fewer Is BetterSQLite 3.22Timed SQLite Insertions-O3 -march=native20406080100SE +/- 0.30, N = 384.891. (CC) gcc options: -O3 -march=native -lz -ldl -lpthread

OpenSSL

RSA 4096-bit Performance

OpenBenchmarking.orgSigns Per Second, More Is BetterOpenSSL 1.1.0fRSA 4096-bit Performance-O3 -march=native-O3 -march=native - PGO2K4K6K8K10KSE +/- 61.67, N = 3SE +/- 50.78, N = 37928.907911.87-lssl1. (CC) gcc options: -O3 -pthread -m64 -lcrypto -ldl

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total Time-O3 -march=native-O3 -march=native - PGO1020304050SE +/- 0.66, N = 3SE +/- 0.47, N = 343.3841.24-fprofile-correction1. (CC) gcc options: -lm -O3 -march=native

7-Zip Compression

Compress Speed Test

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 16.02Compress Speed Test-O3 -march=native-O3 -march=native - PGO30K60K90K120K150KSE +/- 1581.90, N = 3SE +/- 2678.08, N = 31426561412231. (CXX) g++ options: -pipe -lpthread

Zstd Compression

Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19

OpenBenchmarking.orgSeconds, Fewer Is BetterZstd Compression 1.3.4Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19-O3 -march=native306090120150SE +/- 0.40, N = 3122.231. (CC) gcc options: -O3 -march=native -pthread -lz

m-queens

Time To Solve

OpenBenchmarking.orgSeconds, Fewer Is Betterm-queens 1.1Time To Solve-O3 -march=native-O3 -march=native - PGO714212835SE +/- 0.11, N = 3SE +/- 0.04, N = 329.6628.86-fprofile-correction1. (CXX) g++ options: -fopenmp -O3 -march=native -O2

C-Ray

Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time-O3 -march=native-O3 -march=native - PGO0.59631.19261.78892.38522.9815SE +/- 0.06, N = 6SE +/- 0.00, N = 32.652.20-fprofile-correction1. (CC) gcc options: -lm -lpthread -O3 -march=native

Smallpt

Global Illumination Renderer; 100 Samples

OpenBenchmarking.orgSeconds, Fewer Is BetterSmallpt 1.0Global Illumination Renderer; 100 Samples-O3 -march=native-O3 -march=native - PGO0.6751.352.0252.73.37533-fprofile-correction1. (CXX) g++ options: -fopenmp -O3 -march=native

Crafty

Elapsed Time

OpenBenchmarking.orgNodes Per Second, More Is BetterCrafty 25.2Elapsed Time-O3 -march=native-O3 -march=native - PGO1.6M3.2M4.8M6.4M8MSE +/- 36654.73, N = 3SE +/- 26393.81, N = 3733668173956461. (CC) gcc options: -pthread -lstdc++ -fprofile-use -lm

Compile Bench

Test: Initial Create

OpenBenchmarking.orgMB/s, More Is BetterCompile Bench 0.6Test: Initial Create-O3 -march=native-O3 -march=native - PGO100200300400500SE +/- 6.42, N = 6SE +/- 6.85, N = 3479.95482.28

Compile Bench

Test: Compile

OpenBenchmarking.orgMB/s, More Is BetterCompile Bench 0.6Test: Compile-O3 -march=native-O3 -march=native - PGO400800120016002000SE +/- 20.32, N = 3SE +/- 8.19, N = 31655.641633.75

Compile Bench

Test: Read Compiled Tree

OpenBenchmarking.orgMB/s, More Is BetterCompile Bench 0.6Test: Read Compiled Tree-O3 -march=native-O3 -march=native - PGO5001000150020002500SE +/- 32.86, N = 3SE +/- 28.64, N = 32289.462284.67

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total Time-O3 -march=native-O3 -march=native - PGO16M32M48M64M80MSE +/- 662704.17, N = 3SE +/- 565015.90, N = 37159936474633445-fprofile-correction1. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess Performance-O3 -march=native-O3 -march=native - PGO300K600K900K1200K1500KSE +/- 14073.87, N = 5SE +/- 16207.22, N = 512394691353383-fprofile-correction1. (CC) gcc options: -O3 -march=native

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SET-O3 -march=native300K600K900K1200K1500KSE +/- 41218.58, N = 61526558.671. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: GET-O3 -march=native400K800K1200K1600K2000KSE +/- 69578.92, N = 61782445.621. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: LPUSH

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: LPUSH-O3 -march=native300K600K900K1200K1500KSE +/- 53817.21, N = 61550944.211. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: LPOP

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: LPOP-O3 -march=native300K600K900K1200K1500KSE +/- 51072.66, N = 61543751.871. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: SADD

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SADD-O3 -march=native400K800K1200K1600K2000KSE +/- 68674.39, N = 61665771.851. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

PostgreSQL pgbench

Scaling: Buffer Test - Test: Single Thread - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 10.3Scaling: Buffer Test - Test: Single Thread - Mode: Read Write-O3 -march=native80160240320400SE +/- 3.05, N = 3366.351. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

PostgreSQL pgbench

Scaling: Buffer Test - Test: Single Thread - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 10.3Scaling: Buffer Test - Test: Single Thread - Mode: Read Only-O3 -march=native4K8K12K16K20KSE +/- 194.54, N = 317350.801. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

PostgreSQL pgbench

Scaling: Buffer Test - Test: Normal Load - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 10.3Scaling: Buffer Test - Test: Normal Load - Mode: Read Write-O3 -march=native6001200180024003000SE +/- 87.76, N = 62984.391. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

PostgreSQL pgbench

Scaling: Buffer Test - Test: Normal Load - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL pgbench 10.3Scaling: Buffer Test - Test: Normal Load - Mode: Read Only-O3 -march=native130K260K390K520K650KSE +/- 3801.72, N = 3604019.271. (CC) gcc options: -fno-strict-aliasing -fwrapv -O3 -march=native -lpgcommon -lpgport -lpq -lpthread -lrt -lcrypt -ldl -lm

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 1.5.3Test: Decompression Throughput-O3 -march=native4080120160200SE +/- 3.01, N = 3160.341. (CC) gcc options: -O3 -march=native -lm

Apache Benchmark

Static Web Page Serving

OpenBenchmarking.orgRequests Per Second, More Is BetterApache Benchmark 2.4.29Static Web Page Serving-O3 -march=native5K10K15K20K25KSE +/- 102.18, N = 321804.001. (CC) gcc options: -shared -fPIC -pthread -O3 -march=native

LAME MP3 Encoding

WAV To MP3

OpenBenchmarking.orgSeconds, Fewer Is BetterLAME MP3 Encoding 3.100WAV To MP3-O3 -march=native3691215SE +/- 0.14, N = 310.361. (CC) gcc options: -O3 -march=native -lm

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLAC-O3 -march=native3691215SE +/- 0.06, N = 511.051. (CXX) g++ options: -O3 -march=native -fvisibility=hidden -lm

ebizzy

OpenBenchmarking.orgRecords/s, More Is Betterebizzy 0.3-O3 -march=native-O3 -march=native - PGO200K400K600K800K1000KSE +/- 14948.14, N = 5SE +/- 15560.63, N = 310003031029708-fprofile-correction1. (CC) gcc options: -pthread -lpthread -O3 -march=native

GraphicsMagick

Operation: HWB Color Space

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: HWB Color Space-O3 -march=native50100150200250SE +/- 1.45, N = 32191. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

GraphicsMagick

Operation: Blur

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: Blur-O3 -march=native4080120160200SE +/- 0.33, N = 31611. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

GraphicsMagick

Operation: Local Adaptive Thresholding

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: Local Adaptive Thresholding-O3 -march=native20406080100881. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

GraphicsMagick

Operation: Resizing

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: Resizing-O3 -march=native4080120160200SE +/- 0.88, N = 31861. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

GraphicsMagick

Operation: Sharpen

OpenBenchmarking.orgIterations Per Minute, More Is BetterGraphicsMagick 1.3.28Operation: Sharpen-O3 -march=native40801201602001961. (CC) gcc options: -fopenmp -O3 -march=native -pthread -lXext -lSM -lICE -lX11 -lz -lm -ldl -lpthread

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database Search-O3 -march=native3691215SE +/- 0.04, N = 312.681. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm

Bullet Physics Engine

Test: 3000 Fall

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 3000 Fall-O3 -march=native0.98551.9712.95653.9424.9275SE +/- 0.08, N = 34.381. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: 1000 Stack

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 Stack-O3 -march=native1.09582.19163.28744.38325.479SE +/- 0.09, N = 34.871. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: 136 Ragdolls

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 136 Ragdolls-O3 -march=native0.62551.2511.87652.5023.1275SE +/- 0.05, N = 32.781. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: 1000 Convex

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 Convex-O3 -march=native1.07552.1513.22654.3025.3775SE +/- 0.08, N = 34.781. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: Prim Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Prim Trimesh-O3 -march=native0.22950.4590.68850.9181.1475SE +/- 0.02, N = 31.021. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: Convex Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Convex Trimesh-O3 -march=native0.27230.54460.81691.08921.3615SE +/- 0.02, N = 31.211. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU

Bullet Physics Engine

Test: Raytests

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Raytests-O3 -march=native0.61651.2331.84952.4663.0825SE +/- 0.05, N = 32.741. (CXX) g++ options: -O3 -march=native -rdynamic -lglut -lGL -lGLU


Phoronix Test Suite v10.8.4