PGI Compiler 18.10 Benchmarks vs. GCC vs. LLVM Clang

PGI compiler benchmarks for a future article on Phoronix.com.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 1812180-SK-PGICOMPIL33
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts

Limit displaying results to tests within:

C/C++ Compiler Tests 5 Tests
CPU Massive 4 Tests
Creator Workloads 2 Tests
HPC - High Performance Computing 2 Tests
Multi-Core 3 Tests
Renderers 2 Tests
Single-Threaded 2 Tests

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Geometric Means Per-Suite/Category
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
GCC 8.2.0
December 17 2018
  28 Minutes
LLVM Clang 7.0
December 17 2018
  58 Minutes
PGI Compiler 18.10
December 17 2018
  58 Minutes
Invert Hiding All Results Option
  48 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


PGI Compiler 18.10 Benchmarks vs. GCC vs. LLVM ClangOpenBenchmarking.orgPhoronix Test SuiteIntel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads)ASUS PRIME X299-A (1602 BIOS)Intel Sky Lake-E DMI3 Registers16384MB240GB Force MP510NVIDIA NV120 12GBRealtek ALC1220ASUS PB278Intel ConnectionUbuntu 18.104.20.0-999-generic (x86_64) 20181206GNOME Shell 3.30.1X Server 1.20.1modesetting 1.20.14.3 Mesa 18.2.2GCC 8.2.0PGI Compiler 18.10-1ext42560x1440ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilersFile-SystemScreen ResolutionPGI Compiler 18.10 Benchmarks Vs. GCC Vs. LLVM Clang PerformanceSystem Logs- CXXFLAGS=-O3 CXXFLAGS_OVERRIDE=-O3 CFLAGS=-O3 CFLAGS_OVERRIDE=-O3- GCC 8.2.0: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - NONE / errors=remount-ro,relatime,rw- Scaling Governor: intel_pstate powersave- __user pointer sanitization + disabled STIBP: disabled + PTE Inversion; VMX: vulnerable- PGI Compiler 18.10: Python 2.7.15+ + Python 3.6.7

GCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10Result OverviewPhoronix Test Suite100%141%181%222%263%C-RayAOBenchTSCPBlogBenchSciMarkTimed HMMer SearchPolyBench-CHigh Performance Conjugate Gradient

PGI Compiler 18.10 Benchmarks vs. GCC vs. LLVM Clangblogbench: Readhpcg: polybench-c: Covariance Computationpolybench-c: Correlation Computationpolybench-c: 3 Matrix Multiplicationshmmer: Pfam Database Searchscimark2: Compositescimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationtscp: AI Chess Performancec-ray: Total Time - 4K, 16 Rays Per Pixelaobench: 2048 x 2048 - Total TimeGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.107093821.344.864.862.869.9925189507733405579616651440321103.1032.189626311.334.874.882.838.282478717778316460691662162913652.7933.817523101.347.234.813.358.511932591768224346221438119943039.2424.15OpenBenchmarking.org

BlogBench

BlogBench is designed to replicate the load of a real-world busy file server by stressing the file-system with multiple threads of random reads, writes, and rewrites. The behavior is mimicked of that of a blog by creating blogs with content and pictures, modifying blog posts, adding comments to these blogs, and then reading the content of the blogs. All of these blogs generated are created locally with fake content and pictures. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFinal Score, More Is BetterBlogBench 1.1Test: ReadGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10200K400K600K800K1000KSE +/- 12212.43, N = 3SE +/- 75775.18, N = 9SE +/- 18545.35, N = 9709382962631752310-pthread-pthread1. (CC) gcc options: -O3

High Performance Conjugate Gradient

HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0GCC 8.2.0LLVM Clang 7.0PGI Compiler 18.100.30150.6030.90451.2061.5075SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 31.341.331.34

PolyBench-C

PolyBench-C is a C-language polyhedral benchmark suite made at the Ohio State University. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Covariance ComputationGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10246810SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.74, N = 124.864.877.231. (CC) gcc options: -O3

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation ComputationGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.101.0982.1963.2944.3925.49SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 34.864.884.811. (CC) gcc options: -O3

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix MultiplicationsGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.100.75381.50762.26143.01523.769SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 32.862.833.351. (CC) gcc options: -O3

Timed HMMer Search

This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.103691215SE +/- 0.02, N = 3SE +/- 0.11, N = 3SE +/- 0.09, N = 39.998.288.51-pthread-pthread1. (CC) gcc options: -O3 -lhmmer -lsquid -lm

SciMark

This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: CompositeGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.105001000150020002500SE +/- 0.69, N = 3SE +/- 1.38, N = 3SE +/- 0.39, N = 32518247819321. (CC) gcc options: -O3 -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.102004006008001000SE +/- 0.09, N = 3SE +/- 0.15, N = 3SE +/- 0.53, N = 39507175911. (CC) gcc options: -O3 -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.102004006008001000SE +/- 1.66, N = 3SE +/- 1.63, N = 3SE +/- 2.80, N = 37737787681. (CC) gcc options: -O3 -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.107001400210028003500SE +/- 2.59, N = 3SE +/- 1.62, N = 3SE +/- 0.36, N = 33405316422431. (CC) gcc options: -O3 -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.1013002600390052006500SE +/- 0.98, N = 3SE +/- 4.29, N = 3SE +/- 0.84, N = 35796606946221. (CC) gcc options: -O3 -lm

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10400800120016002000SE +/- 0.37, N = 3SE +/- 0.69, N = 3SE +/- 0.21, N = 31665166214381. (CC) gcc options: -O3 -lm

TSCP

This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10300K600K900K1200K1500KSE +/- 922.23, N = 5SE +/- 15151.69, N = 5SE +/- 521.80, N = 51440321162913611994301. (CC) gcc options: -O3

C-Ray

This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time - 4K, 16 Rays Per PixelGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.1020406080100SE +/- 0.66, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 3103.1052.7939.241. (CC) gcc options: -lm -lpthread -O3

AOBench

AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total TimeGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10816243240SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 332.1833.8124.151. (CC) gcc options: -lm -O3