AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clang

2 x AMD EPYC 7601 compiler benchmarks on a future article for Phoronix.com by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1811203-SK-AMDAOCC1389&grt&rdt.

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM ClangProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen ResolutionAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.02 x AMD EPYC 7601 32-Core @ 3.10GHz (64 Cores / 128 Threads)Dell 02MJ3T (1.2.5 BIOS)AMD Family 17h516096MB120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860Matrox Matrox G200eW3VE228Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMAUbuntu 18.104.19.0-041900-generic (x86_64)Clang 7.0.0ext41600x1200Clang 7.0.0-3 + LLVM 7.0.0GCC 8.2.0OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- AMD AOCC 1.3: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver1 - LLVM Clang 7.0: Optimized build; Default target: x86_64-pc-linux-gnu; Host CPU: znver1- GCC 8.2.0: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v Python Details- Python 2.7.15+ + Python 3.6.7Security Details- __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clangaircrack-ng: aobench: 2048 x 2048 - Total Timebullet: Raytestsbullet: 3000 Fallbullet: 1000 Stackbullet: 1000 Convexbullet: 136 Ragdollsbullet: Prim Trimeshbullet: Convex Trimeshfftw: Float + SSE - 1D FFT Size 2048fftw: Float + SSE - 2D FFT Size 2048fhourstones: Complex Connect-4 Solvingencode-flac: WAV To FLAChint: DOUBLEhpcg: tjbench: Decompression Throughputmcperf: Getmcperf: Setmencoder: AVI To LAVCminion: Gracefulminion: Solitaireminion: Quasigrouppolybench-c: Correlation Computationpolybench-c: 3 Matrix Multiplicationsprimesieve: 1e12 Prime Number Generationredis: GETredis: SETrodinia: OpenMP Streamclusterscimark2: Compositescimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationstockfish: Total Timeswet: Averagebuild-apache: Time To Compilehmmer: Pfam Database Searchbuild-llvm: Time To Compilemafft: Multiple Sequence Alignmentbuild-php: Time To Compiletscp: AI Chess Performancex264: H.264 Video Encodingcompress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.08343545.193.145.075.875.233.161.101.3326619150561079912.846246717010.60145676044339522.2055.1986.131425.333.655.812150203140519922.41176655222628273795143010751659323005627236.186.691983.5317989296014513.478371048.463.155.236.155.263.291.131.3526574145801090512.236183730400.50141580785708222.3355.6783.511425.534.135.782150558136428921.81192555221825134911142910735853921287059125.076.121453.4010991188614714.298175244.513.145.075.815.373.111.141.3527361148291043113.326152414380.50147630506245422.2956.4186.811475.363.706.252235376141969423.33179655623125543953168810290985849305808025.566.521503.6669.3385313714313.48OpenBenchmarking.org

Aircrack-ng

OpenBenchmarking.orgk/s, More Is BetterAircrack-ng 1.3AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.020K40K60K80K100KSE +/- 215.70, N = 3SE +/- 47.61, N = 3SE +/- 39.19, N = 38343583710817521. (CXX) g++ options: -O3 -fvisibility=hidden -masm=intel -march=native -lpthread -lcrypto -lz -ldl -lm -pthread

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total TimeAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01122334455SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.05, N = 345.1948.4644.511. (CC) gcc options: -lm -O3 -march=native

Bullet Physics Engine

Test: Raytests

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: RaytestsAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.70881.41762.12642.83523.544SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.143.153.14-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 3000 Fall

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 3000 FallAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01.17682.35363.53044.70725.884SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.075.235.07-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Stack

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 StackAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0246810SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 35.876.155.81-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Convex

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 ConvexAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01.20832.41663.62494.83326.0415SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.235.265.37-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 136 Ragdolls

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 136 RagdollsAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.74031.48062.22092.96123.7015SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.163.293.11-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Prim Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Prim TrimeshAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.25650.5130.76951.0261.2825SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.101.131.14-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Convex Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Convex TrimeshAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.30380.60760.91141.21521.519SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.351.35-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

FFTW

Build: Float + SSE - Size: 1D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 2048AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.06K12K18K24K30KSE +/- 87.85, N = 3SE +/- 10.81, N = 3SE +/- 27.09, N = 32661926574273611. (CC) gcc options: -pthread -O3 -march=native -lm

FFTW

Build: Float + SSE - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 2048AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.03K6K9K12K15KSE +/- 182.77, N = 3SE +/- 81.26, N = 3SE +/- 174.65, N = 31505614580148291. (CC) gcc options: -pthread -O3 -march=native -lm

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 SolvingAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.02K4K6K8K10KSE +/- 3.26, N = 3SE +/- 9.52, N = 3SE +/- 22.28, N = 31079910905104311. (CC) gcc options: -O3

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLACAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.03691215SE +/- 0.10, N = 5SE +/- 0.07, N = 5SE +/- 0.05, N = 512.8412.2313.32-fvisibility=hidden1. (CXX) g++ options: -O3 -march=native -logg -lm

Hierarchical INTegration

Test: DOUBLE

OpenBenchmarking.orgQUIPs, More Is BetterHierarchical INTegration 1.0Test: DOUBLEAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0130M260M390M520M650MSE +/- 3584521.65, N = 3SE +/- 4810737.51, N = 3SE +/- 9792012.87, N = 46246717016183730406152414381. (CC) gcc options: -O3 -march=native -lm

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.1350.270.4050.540.675SE +/- 0.03, N = 6SE +/- 0.02, N = 9SE +/- 0.02, N = 90.600.500.50

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 1.5.3Test: Decompression ThroughputAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0306090120150SE +/- 0.04, N = 3SE +/- 0.13, N = 3SE +/- 0.03, N = 31451411471. (CC) gcc options: -O3 -march=native -lm

Memcached mcperf

Method: Get

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: GetAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.014K28K42K56K70KSE +/- 1087.27, N = 3SE +/- 2318.15, N = 12SE +/- 1117.87, N = 126760458078630501. (CC) gcc options: -O3 -march=native -lm -rdynamic

Memcached mcperf

Method: Set

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: SetAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.013K26K39K52K65KSE +/- 532.03, N = 3SE +/- 4241.39, N = 12SE +/- 2709.38, N = 124339557082624541. (CC) gcc options: -O3 -march=native -lm -rdynamic

Mencoder

AVI To LAVC

OpenBenchmarking.orgSeconds, Fewer Is BetterMencoder 1.3.0AVI To LAVCAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0510152025SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 322.2022.3322.291. (CC) gcc options: -fpie -pie -lncurses -lrt -lpng -lz -ljpeg -lasound -ldl -lpthread -lfreetype -lfontconfig -lbz2 -lmad -lvorbisenc -lvorbis -logg -rdynamic -lm

Minion

Benchmark: Graceful

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: GracefulAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01326395265SE +/- 0.06, N = 3SE +/- 0.14, N = 3SE +/- 0.11, N = 355.1955.6756.411. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Minion

Benchmark: Solitaire

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: SolitaireAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.020406080100SE +/- 0.18, N = 3SE +/- 0.26, N = 3SE +/- 0.47, N = 386.1383.5186.811. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Minion

Benchmark: Quasigroup

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: QuasigroupAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0306090120150SE +/- 0.29, N = 3SE +/- 0.87, N = 3SE +/- 0.68, N = 31421421471. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation ComputationAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01.24432.48863.73294.97726.2215SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 35.335.535.361. (CC) gcc options: -O3 -march=native

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix MultiplicationsAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.92931.85862.78793.71724.6465SE +/- 0.04, N = 3SE +/- 0.05, N = 9SE +/- 0.06, N = 33.654.133.701. (CC) gcc options: -O3 -march=native

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 7.11e12 Prime Number GenerationAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0246810SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.07, N = 35.815.786.251. (CXX) g++ options: -O3 -march=native -lpthread

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: GETAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0500K1000K1500K2000K2500KSE +/- 36000.16, N = 3SE +/- 4615.00, N = 3SE +/- 41799.23, N = 32150203215055822353761. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SETAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0300K600K900K1200K1500KSE +/- 5730.03, N = 3SE +/- 4668.91, N = 3SE +/- 20200.62, N = 31405199136428914196941. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Rodinia

Test: OpenMP Streamcluster

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenMP StreamclusterAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0612182430SE +/- 0.36, N = 12SE +/- 0.48, N = 12SE +/- 0.53, N = 922.4121.8123.33-O3 -fopenmp-O3 -fopenmp-O2 -lOpenCL1. (CXX) g++ options:

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: CompositeAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0400800120016002000SE +/- 7.20, N = 3SE +/- 1.80, N = 3SE +/- 2.36, N = 31766192517961. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0120240360480600SE +/- 0.47, N = 3SE +/- 0.07, N = 3SE +/- 0.02, N = 35525525561. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.050100150200250SE +/- 0.06, N = 3SE +/- 7.33, N = 3SE +/- 0.08, N = 32262182311. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.06001200180024003000SE +/- 4.73, N = 3SE +/- 9.44, N = 3SE +/- 20.51, N = 32827251325541. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.011002200330044005500SE +/- 38.71, N = 3SE +/- 12.20, N = 3SE +/- 14.86, N = 33795491139531. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0400800120016002000SE +/- 0.19, N = 3SE +/- 0.86, N = 3SE +/- 0.09, N = 31430142916881. (CC) gcc options: -O3 -march=native -lm

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total TimeAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.020M40M60M80M100MSE +/- 457949.04, N = 3SE +/- 655582.47, N = 3SE +/- 862000.88, N = 31075165931073585391029098581. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

Swet

Average

OpenBenchmarking.orgOperations Per Second, More Is BetterSwet 1.5.16AverageAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0110M220M330M440M550MSE +/- 235162.13, N = 3SE +/- 476119.15, N = 3SE +/- 1295596.84, N = 32300562722128705914930580801. (CC) gcc options: -lm -lpthread -lcurses -lrt

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To CompileAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0816243240SE +/- 0.07, N = 3SE +/- 0.06, N = 3SE +/- 0.02, N = 336.1825.0725.56

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0246810SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 36.696.126.521. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm

Timed LLVM Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 6.0.1Time To CompileAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.04080120160200198145150

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence AlignmentAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.82351.6472.47053.2944.1175SE +/- 0.06, N = 3SE +/- 0.00, N = 3SE +/- 0.07, N = 123.533.403.661. (CC) gcc options: -std=c99 -O3 -lm -lpthread

Timed PHP Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 7.1.9Time To CompileAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.04080120160200SE +/- 0.18, N = 3SE +/- 0.33, N = 3SE +/- 0.45, N = 3179.00109.0069.331. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0200K400K600K800K1000KSE +/- 290.00, N = 5SE +/- 675.29, N = 5SE +/- 1603.67, N = 58929609118868531371. (CC) gcc options: -O3 -march=native

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2018-09-25H.264 Video EncodingAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0306090120150SE +/- 0.38, N = 3SE +/- 1.13, N = 3SE +/- 0.32, N = 3145147143-mstack-alignment=64-mstack-alignment=641. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -ffast-math -march=native -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize

Zstd Compression

Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19

OpenBenchmarking.orgSeconds, Fewer Is BetterZstd Compression 1.3.4Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.048121620SE +/- 0.27, N = 12SE +/- 0.36, N = 12SE +/- 0.46, N = 1213.4714.2913.481. (CC) gcc options: -O3 -march=native -pthread -lz -llzma


Phoronix Test Suite v10.8.5