PGI Compiler 18.10 Benchmarks vs. GCC vs. LLVM Clang

PGI compiler benchmarks for a future article on Phoronix.com.

HTML result view exported from: https://openbenchmarking.org/result/1812180-SK-PGICOMPIL33&sro&grs.

PGI Compiler 18.10 Benchmarks vs. GCC vs. LLVM ClangProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10Intel Core i9-7980XE @ 4.20GHz (18 Cores / 36 Threads)ASUS PRIME X299-A (1602 BIOS)Intel Sky Lake-E DMI3 Registers16384MB240GB Force MP510NVIDIA NV120 12GBRealtek ALC1220ASUS PB278Intel ConnectionUbuntu 18.104.20.0-999-generic (x86_64) 20181206GNOME Shell 3.30.1X Server 1.20.1modesetting 1.20.14.3 Mesa 18.2.2GCC 8.2.0ext42560x1440PGI Compiler 18.10-1OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3 CXXFLAGS_OVERRIDE=-O3 CFLAGS=-O3 CFLAGS_OVERRIDE=-O3Compiler Details- GCC 8.2.0: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Disk Details- NONE / errors=remount-ro,relatime,rwProcessor Details- Scaling Governor: intel_pstate powersaveSecurity Details- __user pointer sanitization + disabled STIBP: disabled + PTE Inversion; VMX: vulnerable Python Details- PGI Compiler 18.10: Python 2.7.15+ + Python 3.6.7

PGI Compiler 18.10 Benchmarks vs. GCC vs. LLVM Clangc-ray: Total Time - 4K, 16 Rays Per Pixelscimark2: Monte Carloscimark2: Sparse Matrix Multiplyaobench: 2048 x 2048 - Total Timetscp: AI Chess Performancescimark2: Dense LU Matrix Factorizationscimark2: Compositehmmer: Pfam Database Searchpolybench-c: 3 Matrix Multiplicationsscimark2: Jacobi Successive Over-Relaxationpolybench-c: Correlation Computationscimark2: Fast Fourier Transformhpcg: polybench-c: Covariance Computationblogbench: ReadGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10103.10950340532.181440321579625189.992.8616654.867731.344.8670938252.79717316433.811629136606924788.282.8316624.887781.334.8796263139.24591224324.151199430462219328.513.3514384.817681.347.23752310OpenBenchmarking.org

C-Ray

Total Time - 4K, 16 Rays Per Pixel

OpenBenchmarking.orgSeconds, Fewer Is BetterC-Ray 1.1Total Time - 4K, 16 Rays Per PixelGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.1020406080100SE +/- 0.66, N = 3SE +/- 0.00, N = 3SE +/- 0.02, N = 3103.1052.7939.241. (CC) gcc options: -lm -lpthread -O3

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.102004006008001000SE +/- 0.09, N = 3SE +/- 0.15, N = 3SE +/- 0.53, N = 39507175911. (CC) gcc options: -O3 -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.107001400210028003500SE +/- 2.59, N = 3SE +/- 1.62, N = 3SE +/- 0.36, N = 33405316422431. (CC) gcc options: -O3 -lm

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total TimeGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10816243240SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 332.1833.8124.151. (CC) gcc options: -lm -O3

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10300K600K900K1200K1500KSE +/- 922.23, N = 5SE +/- 15151.69, N = 5SE +/- 521.80, N = 51440321162913611994301. (CC) gcc options: -O3

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.1013002600390052006500SE +/- 0.98, N = 3SE +/- 4.29, N = 3SE +/- 0.84, N = 35796606946221. (CC) gcc options: -O3 -lm

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: CompositeGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.105001000150020002500SE +/- 0.69, N = 3SE +/- 1.38, N = 3SE +/- 0.39, N = 32518247819321. (CC) gcc options: -O3 -lm

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.103691215SE +/- 0.02, N = 3SE +/- 0.11, N = 3SE +/- 0.09, N = 39.998.288.51-pthread-pthread1. (CC) gcc options: -O3 -lhmmer -lsquid -lm

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix MultiplicationsGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.100.75381.50762.26143.01523.769SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 32.862.833.351. (CC) gcc options: -O3

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10400800120016002000SE +/- 0.37, N = 3SE +/- 0.69, N = 3SE +/- 0.21, N = 31665166214381. (CC) gcc options: -O3 -lm

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation ComputationGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.101.0982.1963.2944.3925.49SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 34.864.884.811. (CC) gcc options: -O3

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.102004006008001000SE +/- 1.66, N = 3SE +/- 1.63, N = 3SE +/- 2.80, N = 37737787681. (CC) gcc options: -O3 -lm

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0GCC 8.2.0LLVM Clang 7.0PGI Compiler 18.100.30150.6030.90451.2061.5075SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 31.341.331.34

PolyBench-C

Test: Covariance Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Covariance ComputationGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10246810SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.74, N = 124.864.877.231. (CC) gcc options: -O3

BlogBench

Test: Read

OpenBenchmarking.orgFinal Score, More Is BetterBlogBench 1.1Test: ReadGCC 8.2.0LLVM Clang 7.0PGI Compiler 18.10200K400K600K800K1000KSE +/- 12212.43, N = 3SE +/- 75775.18, N = 9SE +/- 18545.35, N = 9709382962631752310-pthread-pthread1. (CC) gcc options: -O3


Phoronix Test Suite v10.8.5