AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clang

2 x AMD EPYC 7601 compiler benchmarks on a future article for Phoronix.com by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1811203-SK-AMDAOCC1389.

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM ClangProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen ResolutionGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.32 x AMD EPYC 7601 32-Core @ 3.10GHz (64 Cores / 128 Threads)Dell 02MJ3T (1.2.5 BIOS)AMD Family 17h516096MB120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860Matrox Matrox G200eW3VE228Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMAUbuntu 18.104.19.0-041900-generic (x86_64)GCC 8.2.0ext41600x1200Clang 7.0.0-3 + LLVM 7.0.0Clang 7.0.0OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- GCC 8.2.0: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - LLVM Clang 7.0: Optimized build; Default target: x86_64-pc-linux-gnu; Host CPU: znver1- AMD AOCC 1.3: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver1 Python Details- Python 2.7.15+ + Python 3.6.7Security Details- __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clangrodinia: OpenMP Streamclusterhpcg: polybench-c: Correlation Computationpolybench-c: 3 Matrix Multiplicationsfftw: Float + SSE - 1D FFT Size 2048fftw: Float + SSE - 2D FFT Size 2048hmmer: Pfam Database Searchmafft: Multiple Sequence Alignmentfhourstones: Complex Connect-4 Solvingscimark2: Compositescimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationtscp: AI Chess Performancex264: H.264 Video Encodingstockfish: Total Timeswet: Averagebuild-apache: Time To Compilebuild-llvm: Time To Compilebuild-php: Time To Compileprimesieve: 1e12 Prime Number Generationaobench: 2048 x 2048 - Total Timebullet: Raytestsbullet: 3000 Fallbullet: 1000 Stackbullet: 1000 Convexbullet: 136 Ragdollsbullet: Prim Trimeshbullet: Convex Trimeshcompress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19encode-flac: WAV To FLACmencoder: AVI To LAVCminion: Gracefulminion: Solitaireminion: Quasigroupaircrack-ng: tjbench: Decompression Throughputredis: GETredis: SETmcperf: Getmcperf: Sethint: DOUBLEGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.323.330.505.363.7027361148296.523.6610431179655623125543953168885313714310290985849305808025.5615069.336.2544.513.145.075.815.373.111.141.3513.4813.3222.2956.4186.811478175214722353761419694630506245461524143821.810.505.534.1326574145806.123.4010905192555221825134911142991188614710735853921287059125.071451095.7848.463.155.236.155.263.291.131.3514.2912.2322.3355.6783.511428371014121505581364289580785708261837304022.410.605.333.6526619150566.693.5310799176655222628273795143089296014510751659323005627236.181981795.8145.193.145.075.875.233.161.101.3313.4712.8422.2055.1986.1314283435145215020314051996760443395624671701OpenBenchmarking.org

Rodinia

Test: OpenMP Streamcluster

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenMP StreamclusterGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3612182430SE +/- 0.53, N = 9SE +/- 0.48, N = 12SE +/- 0.36, N = 1223.3321.8122.41-O2 -lOpenCL-O3 -fopenmp-O3 -fopenmp1. (CXX) g++ options:

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.1350.270.4050.540.675SE +/- 0.02, N = 9SE +/- 0.02, N = 9SE +/- 0.03, N = 60.500.500.60

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation ComputationGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31.24432.48863.73294.97726.2215SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 35.365.535.331. (CC) gcc options: -O3 -march=native

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix MultiplicationsGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.92931.85862.78793.71724.6465SE +/- 0.06, N = 3SE +/- 0.05, N = 9SE +/- 0.04, N = 33.704.133.651. (CC) gcc options: -O3 -march=native

FFTW

Build: Float + SSE - Size: 1D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 2048GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.36K12K18K24K30KSE +/- 27.09, N = 3SE +/- 10.81, N = 3SE +/- 87.85, N = 32736126574266191. (CC) gcc options: -pthread -O3 -march=native -lm

FFTW

Build: Float + SSE - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 2048GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.33K6K9K12K15KSE +/- 174.65, N = 3SE +/- 81.26, N = 3SE +/- 182.77, N = 31482914580150561. (CC) gcc options: -pthread -O3 -march=native -lm

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3246810SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 36.526.126.691. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence AlignmentGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.82351.6472.47053.2944.1175SE +/- 0.07, N = 12SE +/- 0.00, N = 3SE +/- 0.06, N = 33.663.403.531. (CC) gcc options: -std=c99 -O3 -lm -lpthread

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 SolvingGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.32K4K6K8K10KSE +/- 22.28, N = 3SE +/- 9.52, N = 3SE +/- 3.26, N = 31043110905107991. (CC) gcc options: -O3

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: CompositeGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3400800120016002000SE +/- 2.36, N = 3SE +/- 1.80, N = 3SE +/- 7.20, N = 31796192517661. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3120240360480600SE +/- 0.02, N = 3SE +/- 0.07, N = 3SE +/- 0.47, N = 35565525521. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.350100150200250SE +/- 0.08, N = 3SE +/- 7.33, N = 3SE +/- 0.06, N = 32312182261. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.36001200180024003000SE +/- 20.51, N = 3SE +/- 9.44, N = 3SE +/- 4.73, N = 32554251328271. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.311002200330044005500SE +/- 14.86, N = 3SE +/- 12.20, N = 3SE +/- 38.71, N = 33953491137951. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3400800120016002000SE +/- 0.09, N = 3SE +/- 0.86, N = 3SE +/- 0.19, N = 31688142914301. (CC) gcc options: -O3 -march=native -lm

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3200K400K600K800K1000KSE +/- 1603.67, N = 5SE +/- 675.29, N = 5SE +/- 290.00, N = 58531379118868929601. (CC) gcc options: -O3 -march=native

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2018-09-25H.264 Video EncodingGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3306090120150SE +/- 0.32, N = 3SE +/- 1.13, N = 3SE +/- 0.38, N = 3143147145-mstack-alignment=64-mstack-alignment=641. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -ffast-math -march=native -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total TimeGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.320M40M60M80M100MSE +/- 862000.88, N = 3SE +/- 655582.47, N = 3SE +/- 457949.04, N = 31029098581073585391075165931. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

Swet

Average

OpenBenchmarking.orgOperations Per Second, More Is BetterSwet 1.5.16AverageGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3110M220M330M440M550MSE +/- 1295596.84, N = 3SE +/- 476119.15, N = 3SE +/- 235162.13, N = 34930580802128705912300562721. (CC) gcc options: -lm -lpthread -lcurses -lrt

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To CompileGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3816243240SE +/- 0.02, N = 3SE +/- 0.06, N = 3SE +/- 0.07, N = 325.5625.0736.18

Timed LLVM Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 6.0.1Time To CompileGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.34080120160200150145198

Timed PHP Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 7.1.9Time To CompileGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.34080120160200SE +/- 0.45, N = 3SE +/- 0.33, N = 3SE +/- 0.18, N = 369.33109.00179.001. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 7.11e12 Prime Number GenerationGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3246810SE +/- 0.07, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 36.255.785.811. (CXX) g++ options: -O3 -march=native -lpthread

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total TimeGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31122334455SE +/- 0.05, N = 3SE +/- 0.06, N = 3SE +/- 0.01, N = 344.5148.4645.191. (CC) gcc options: -lm -O3 -march=native

Bullet Physics Engine

Test: Raytests

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: RaytestsGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.70881.41762.12642.83523.544SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.143.153.14-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 3000 Fall

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 3000 FallGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31.17682.35363.53044.70725.884SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.075.235.07-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Stack

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 StackGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3246810SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 35.816.155.87-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Convex

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 ConvexGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31.20832.41663.62494.83326.0415SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 35.375.265.23-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 136 Ragdolls

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 136 RagdollsGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.74031.48062.22092.96123.7015SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.113.293.16-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Prim Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Prim TrimeshGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.25650.5130.76951.0261.2825SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.141.131.10-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Convex Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Convex TrimeshGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.30380.60760.91141.21521.519SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.351.351.33-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Zstd Compression

Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19

OpenBenchmarking.orgSeconds, Fewer Is BetterZstd Compression 1.3.4Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.348121620SE +/- 0.46, N = 12SE +/- 0.36, N = 12SE +/- 0.27, N = 1213.4814.2913.471. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLACGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.33691215SE +/- 0.05, N = 5SE +/- 0.07, N = 5SE +/- 0.10, N = 513.3212.2312.84-fvisibility=hidden1. (CXX) g++ options: -O3 -march=native -logg -lm

Mencoder

AVI To LAVC

OpenBenchmarking.orgSeconds, Fewer Is BetterMencoder 1.3.0AVI To LAVCGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3510152025SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.10, N = 322.2922.3322.201. (CC) gcc options: -fpie -pie -lncurses -lrt -lpng -lz -ljpeg -lasound -ldl -lpthread -lfreetype -lfontconfig -lbz2 -lmad -lvorbisenc -lvorbis -logg -rdynamic -lm

Minion

Benchmark: Graceful

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: GracefulGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31326395265SE +/- 0.11, N = 3SE +/- 0.14, N = 3SE +/- 0.06, N = 356.4155.6755.191. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Minion

Benchmark: Solitaire

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: SolitaireGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.320406080100SE +/- 0.47, N = 3SE +/- 0.26, N = 3SE +/- 0.18, N = 386.8183.5186.131. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Minion

Benchmark: Quasigroup

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: QuasigroupGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3306090120150SE +/- 0.68, N = 3SE +/- 0.87, N = 3SE +/- 0.29, N = 31471421421. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Aircrack-ng

OpenBenchmarking.orgk/s, More Is BetterAircrack-ng 1.3GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.320K40K60K80K100KSE +/- 39.19, N = 3SE +/- 47.61, N = 3SE +/- 215.70, N = 38175283710834351. (CXX) g++ options: -O3 -fvisibility=hidden -masm=intel -march=native -lpthread -lcrypto -lz -ldl -lm -pthread

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 1.5.3Test: Decompression ThroughputGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3306090120150SE +/- 0.03, N = 3SE +/- 0.13, N = 3SE +/- 0.04, N = 31471411451. (CC) gcc options: -O3 -march=native -lm

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: GETGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3500K1000K1500K2000K2500KSE +/- 41799.23, N = 3SE +/- 4615.00, N = 3SE +/- 36000.16, N = 32235376215055821502031. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SETGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3300K600K900K1200K1500KSE +/- 20200.62, N = 3SE +/- 4668.91, N = 3SE +/- 5730.03, N = 31419694136428914051991. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Memcached mcperf

Method: Get

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: GetGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.314K28K42K56K70KSE +/- 1117.87, N = 12SE +/- 2318.15, N = 12SE +/- 1087.27, N = 36305058078676041. (CC) gcc options: -O3 -march=native -lm -rdynamic

Memcached mcperf

Method: Set

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: SetGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.313K26K39K52K65KSE +/- 2709.38, N = 12SE +/- 4241.39, N = 12SE +/- 532.03, N = 36245457082433951. (CC) gcc options: -O3 -march=native -lm -rdynamic

Hierarchical INTegration

Test: DOUBLE

OpenBenchmarking.orgQUIPs, More Is BetterHierarchical INTegration 1.0Test: DOUBLEGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3130M260M390M520M650MSE +/- 9792012.87, N = 4SE +/- 4810737.51, N = 3SE +/- 3584521.65, N = 36152414386183730406246717011. (CC) gcc options: -O3 -march=native -lm


Phoronix Test Suite v10.8.4