AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clang

2 x AMD EPYC 7601 compiler benchmarks on a future article for Phoronix.com by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1811203-SK-AMDAOCC1389&rdt&grr.

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM ClangProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen ResolutionAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.02 x AMD EPYC 7601 32-Core @ 3.10GHz (64 Cores / 128 Threads)Dell 02MJ3T (1.2.5 BIOS)AMD Family 17h516096MB120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860Matrox Matrox G200eW3VE228Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMAUbuntu 18.104.19.0-041900-generic (x86_64)Clang 7.0.0ext41600x1200Clang 7.0.0-3 + LLVM 7.0.0GCC 8.2.0OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- AMD AOCC 1.3: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver1 - LLVM Clang 7.0: Optimized build; Default target: x86_64-pc-linux-gnu; Host CPU: znver1- GCC 8.2.0: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Python Details- Python 2.7.15+ + Python 3.6.7Security Details- __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clanghint: DOUBLEhpcg: fhourstones: Complex Connect-4 Solvingminion: Quasigroupfftw: Float + SSE - 2D FFT Size 2048build-php: Time To Compileminion: Solitairerodinia: OpenMP Streamclusterstockfish: Total Timemcperf: Setmcperf: Getminion: Gracefulcompress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19build-llvm: Time To Compileaobench: 2048 x 2048 - Total Timeswet: Averagescimark2: Compositeaircrack-ng: build-apache: Time To Compilemencoder: AVI To LAVCencode-flac: WAV To FLACredis: SETredis: GETmafft: Multiple Sequence Alignmentfftw: Float + SSE - 1D FFT Size 2048polybench-c: 3 Matrix Multiplicationshmmer: Pfam Database Searchtjbench: Decompression Throughputprimesieve: 1e12 Prime Number Generationbullet: Raytestspolybench-c: Correlation Computationx264: H.264 Video Encodingtscp: AI Chess Performancebullet: Convex Trimeshbullet: Prim Trimeshbullet: 136 Ragdollsbullet: 1000 Convexbullet: 1000 Stackbullet: 3000 Fallscimark2: Jacobi Successive Over-Relaxationscimark2: Dense LU Matrix Factorizationscimark2: Sparse Matrix Multiplyscimark2: Fast Fourier Transformscimark2: Monte CarloAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.06246717010.60107991421505617986.1322.41107516593433956760455.1913.4719845.1923005627217668343536.1822.2012.84140519921502033.53266193.656.691455.813.145.331458929601.331.103.165.235.875.071430379528272265526183730400.50109051421458010983.5121.81107358539570825807855.6714.2914548.4621287059119258371025.0722.3312.23136428921505583.40265744.136.121415.783.155.531479118861.351.133.295.266.155.231429491125132185526152414380.50104311471482969.3386.8123.33102909858624546305056.4113.4815044.5149305808017968175225.5622.2913.32141969422353763.66273613.706.521476.253.145.361438531371.351.143.115.375.815.07168839532554231556OpenBenchmarking.org

Hierarchical INTegration

Test: DOUBLE

OpenBenchmarking.orgQUIPs, More Is BetterHierarchical INTegration 1.0Test: DOUBLEAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0130M260M390M520M650MSE +/- 3584521.65, N = 3SE +/- 4810737.51, N = 3SE +/- 9792012.87, N = 46246717016183730406152414381. (CC) gcc options: -O3 -march=native -lm

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.1350.270.4050.540.675SE +/- 0.03, N = 6SE +/- 0.02, N = 9SE +/- 0.02, N = 90.600.500.50

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 SolvingAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.02K4K6K8K10KSE +/- 3.26, N = 3SE +/- 9.52, N = 3SE +/- 22.28, N = 31079910905104311. (CC) gcc options: -O3

Minion

Benchmark: Quasigroup

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: QuasigroupAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0306090120150SE +/- 0.29, N = 3SE +/- 0.87, N = 3SE +/- 0.68, N = 31421421471. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

FFTW

Build: Float + SSE - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 2048AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.03K6K9K12K15KSE +/- 182.77, N = 3SE +/- 81.26, N = 3SE +/- 174.65, N = 31505614580148291. (CC) gcc options: -pthread -O3 -march=native -lm

Timed PHP Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 7.1.9Time To CompileAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.04080120160200SE +/- 0.18, N = 3SE +/- 0.33, N = 3SE +/- 0.45, N = 3179.00109.0069.331. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm

Minion

Benchmark: Solitaire

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: SolitaireAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.020406080100SE +/- 0.18, N = 3SE +/- 0.26, N = 3SE +/- 0.47, N = 386.1383.5186.811. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Rodinia

Test: OpenMP Streamcluster

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenMP StreamclusterAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0612182430SE +/- 0.36, N = 12SE +/- 0.48, N = 12SE +/- 0.53, N = 922.4121.8123.33-O3 -fopenmp-O3 -fopenmp-O2 -lOpenCL1. (CXX) g++ options:

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total TimeAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.020M40M60M80M100MSE +/- 457949.04, N = 3SE +/- 655582.47, N = 3SE +/- 862000.88, N = 31075165931073585391029098581. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

Memcached mcperf

Method: Set

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: SetAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.013K26K39K52K65KSE +/- 532.03, N = 3SE +/- 4241.39, N = 12SE +/- 2709.38, N = 124339557082624541. (CC) gcc options: -O3 -march=native -lm -rdynamic

Memcached mcperf

Method: Get

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: GetAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.014K28K42K56K70KSE +/- 1087.27, N = 3SE +/- 2318.15, N = 12SE +/- 1117.87, N = 126760458078630501. (CC) gcc options: -O3 -march=native -lm -rdynamic

Minion

Benchmark: Graceful

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: GracefulAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01326395265SE +/- 0.06, N = 3SE +/- 0.14, N = 3SE +/- 0.11, N = 355.1955.6756.411. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Zstd Compression

Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19

OpenBenchmarking.orgSeconds, Fewer Is BetterZstd Compression 1.3.4Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.048121620SE +/- 0.27, N = 12SE +/- 0.36, N = 12SE +/- 0.46, N = 1213.4714.2913.481. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

Timed LLVM Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 6.0.1Time To CompileAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.04080120160200198145150

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total TimeAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01122334455SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.05, N = 345.1948.4644.511. (CC) gcc options: -lm -O3 -march=native

Swet

Average

OpenBenchmarking.orgOperations Per Second, More Is BetterSwet 1.5.16AverageAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0110M220M330M440M550MSE +/- 235162.13, N = 3SE +/- 476119.15, N = 3SE +/- 1295596.84, N = 32300562722128705914930580801. (CC) gcc options: -lm -lpthread -lcurses -lrt

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: CompositeAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0400800120016002000SE +/- 7.20, N = 3SE +/- 1.80, N = 3SE +/- 2.36, N = 31766192517961. (CC) gcc options: -O3 -march=native -lm

Aircrack-ng

OpenBenchmarking.orgk/s, More Is BetterAircrack-ng 1.3AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.020K40K60K80K100KSE +/- 215.70, N = 3SE +/- 47.61, N = 3SE +/- 39.19, N = 38343583710817521. (CXX) g++ options: -O3 -fvisibility=hidden -masm=intel -march=native -lpthread -lcrypto -lz -ldl -lm -pthread

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To CompileAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0816243240SE +/- 0.07, N = 3SE +/- 0.06, N = 3SE +/- 0.02, N = 336.1825.0725.56

Mencoder

AVI To LAVC

OpenBenchmarking.orgSeconds, Fewer Is BetterMencoder 1.3.0AVI To LAVCAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0510152025SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 322.2022.3322.291. (CC) gcc options: -fpie -pie -lncurses -lrt -lpng -lz -ljpeg -lasound -ldl -lpthread -lfreetype -lfontconfig -lbz2 -lmad -lvorbisenc -lvorbis -logg -rdynamic -lm

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLACAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.03691215SE +/- 0.10, N = 5SE +/- 0.07, N = 5SE +/- 0.05, N = 512.8412.2313.32-fvisibility=hidden1. (CXX) g++ options: -O3 -march=native -logg -lm

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SETAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0300K600K900K1200K1500KSE +/- 5730.03, N = 3SE +/- 4668.91, N = 3SE +/- 20200.62, N = 31405199136428914196941. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: GETAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0500K1000K1500K2000K2500KSE +/- 36000.16, N = 3SE +/- 4615.00, N = 3SE +/- 41799.23, N = 32150203215055822353761. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence AlignmentAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.82351.6472.47053.2944.1175SE +/- 0.06, N = 3SE +/- 0.00, N = 3SE +/- 0.07, N = 123.533.403.661. (CC) gcc options: -std=c99 -O3 -lm -lpthread

FFTW

Build: Float + SSE - Size: 1D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 2048AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.06K12K18K24K30KSE +/- 87.85, N = 3SE +/- 10.81, N = 3SE +/- 27.09, N = 32661926574273611. (CC) gcc options: -pthread -O3 -march=native -lm

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix MultiplicationsAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.92931.85862.78793.71724.6465SE +/- 0.04, N = 3SE +/- 0.05, N = 9SE +/- 0.06, N = 33.654.133.701. (CC) gcc options: -O3 -march=native

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0246810SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 36.696.126.521. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 1.5.3Test: Decompression ThroughputAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0306090120150SE +/- 0.04, N = 3SE +/- 0.13, N = 3SE +/- 0.03, N = 31451411471. (CC) gcc options: -O3 -march=native -lm

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 7.11e12 Prime Number GenerationAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0246810SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 35.815.786.251. (CXX) g++ options: -O3 -march=native -lpthread

Bullet Physics Engine

Test: Raytests

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: RaytestsAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.70881.41762.12642.83523.544SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.143.153.14-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation ComputationAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01.24432.48863.73294.97726.2215SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 35.335.535.361. (CC) gcc options: -O3 -march=native

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2018-09-25H.264 Video EncodingAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0306090120150SE +/- 0.38, N = 3SE +/- 1.13, N = 3SE +/- 0.32, N = 3145147143-mstack-alignment=64-mstack-alignment=641. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -ffast-math -march=native -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0200K400K600K800K1000KSE +/- 290.00, N = 5SE +/- 675.29, N = 5SE +/- 1603.67, N = 58929609118868531371. (CC) gcc options: -O3 -march=native

Bullet Physics Engine

Test: Convex Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Convex TrimeshAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.30380.60760.91141.21521.519SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.351.35-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Prim Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Prim TrimeshAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.25650.5130.76951.0261.2825SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.101.131.14-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 136 Ragdolls

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 136 RagdollsAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.74031.48062.22092.96123.7015SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.163.293.11-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Convex

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 ConvexAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01.20832.41663.62494.83326.0415SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.235.265.37-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Stack

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 StackAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0246810SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 35.876.155.81-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 3000 Fall

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 3000 FallAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01.17682.35363.53044.70725.884SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.075.235.07-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0400800120016002000SE +/- 0.19, N = 3SE +/- 0.86, N = 3SE +/- 0.09, N = 31430142916881. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.011002200330044005500SE +/- 38.71, N = 3SE +/- 12.20, N = 3SE +/- 14.86, N = 33795491139531. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.06001200180024003000SE +/- 4.73, N = 3SE +/- 9.44, N = 3SE +/- 20.51, N = 32827251325541. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.050100150200250SE +/- 0.06, N = 3SE +/- 7.33, N = 3SE +/- 0.08, N = 32262182311. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0120240360480600SE +/- 0.47, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 35525525561. (CC) gcc options: -O3 -march=native -lm


Phoronix Test Suite v10.8.5