AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clang

2 x AMD EPYC 7601 compiler benchmarks on a future article for Phoronix.com by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1811203-SK-AMDAOCC1389&sro.

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM ClangProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen ResolutionGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.32 x AMD EPYC 7601 32-Core @ 3.10GHz (64 Cores / 128 Threads)Dell 02MJ3T (1.2.5 BIOS)AMD Family 17h516096MB120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860Matrox Matrox G200eW3VE228Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMAUbuntu 18.104.19.0-041900-generic (x86_64)GCC 8.2.0ext41600x1200Clang 7.0.0-3 + LLVM 7.0.0Clang 7.0.0OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- GCC 8.2.0: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - LLVM Clang 7.0: Optimized build; Default target: x86_64-pc-linux-gnu; Host CPU: znver1- AMD AOCC 1.3: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver1 Python Details- Python 2.7.15+ + Python 3.6.7Security Details- __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clangrodinia: OpenMP Streamclusterhpcg: polybench-c: Correlation Computationpolybench-c: 3 Matrix Multiplicationsfftw: Float + SSE - 1D FFT Size 2048fftw: Float + SSE - 2D FFT Size 2048hmmer: Pfam Database Searchmafft: Multiple Sequence Alignmentfhourstones: Complex Connect-4 Solvingscimark2: Compositescimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationtscp: AI Chess Performancex264: H.264 Video Encodingstockfish: Total Timeswet: Averagebuild-apache: Time To Compilebuild-llvm: Time To Compilebuild-php: Time To Compileprimesieve: 1e12 Prime Number Generationaobench: 2048 x 2048 - Total Timebullet: Raytestsbullet: 3000 Fallbullet: 1000 Stackbullet: 1000 Convexbullet: 136 Ragdollsbullet: Prim Trimeshbullet: Convex Trimeshcompress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19encode-flac: WAV To FLACmencoder: AVI To LAVCminion: Gracefulminion: Solitaireminion: Quasigroupaircrack-ng: tjbench: Decompression Throughputredis: GETredis: SETmcperf: Getmcperf: Sethint: DOUBLEGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.323.330.505.363.7027361148296.523.6610431179655623125543953168885313714310290985849305808025.5615069.336.2544.513.145.075.815.373.111.141.3513.4813.3222.2956.4186.811478175214722353761419694630506245461524143821.810.505.534.1326574145806.123.4010905192555221825134911142991188614710735853921287059125.071451095.7848.463.155.236.155.263.291.131.3514.2912.2322.3355.6783.511428371014121505581364289580785708261837304022.410.605.333.6526619150566.693.5310799176655222628273795143089296014510751659323005627236.181981795.8145.193.145.075.875.233.161.101.3313.4712.8422.2055.1986.1314283435145215020314051996760443395624671701OpenBenchmarking.org

Rodinia

Test: OpenMP Streamcluster

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenMP StreamclusterAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0612182430SE +/- 0.36, N = 12SE +/- 0.53, N = 9SE +/- 0.48, N = 1222.4123.3321.81-O3 -fopenmp-O2 -lOpenCL-O3 -fopenmp1. (CXX) g++ options:

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0AMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.1350.270.4050.540.675SE +/- 0.03, N = 6SE +/- 0.02, N = 9SE +/- 0.02, N = 90.600.500.50

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation ComputationAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.01.24432.48863.73294.97726.2215SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 35.335.365.531. (CC) gcc options: -O3 -march=native

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix MultiplicationsAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.92931.85862.78793.71724.6465SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.05, N = 93.653.704.131. (CC) gcc options: -O3 -march=native

FFTW

Build: Float + SSE - Size: 1D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 2048AMD AOCC 1.3GCC 8.2.0LLVM Clang 7.06K12K18K24K30KSE +/- 87.85, N = 3SE +/- 27.09, N = 3SE +/- 10.81, N = 32661927361265741. (CC) gcc options: -pthread -O3 -march=native -lm

FFTW

Build: Float + SSE - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 2048AMD AOCC 1.3GCC 8.2.0LLVM Clang 7.03K6K9K12K15KSE +/- 182.77, N = 3SE +/- 174.65, N = 3SE +/- 81.26, N = 31505614829145801. (CC) gcc options: -pthread -O3 -march=native -lm

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0246810SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.05, N = 36.696.526.121. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence AlignmentAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.82351.6472.47053.2944.1175SE +/- 0.06, N = 3SE +/- 0.07, N = 12SE +/- 0.00, N = 33.533.663.401. (CC) gcc options: -std=c99 -O3 -lm -lpthread

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 SolvingAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.02K4K6K8K10KSE +/- 3.26, N = 3SE +/- 22.28, N = 3SE +/- 9.52, N = 31079910431109051. (CC) gcc options: -O3

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: CompositeAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0400800120016002000SE +/- 7.20, N = 3SE +/- 2.36, N = 3SE +/- 1.80, N = 31766179619251. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0120240360480600SE +/- 0.47, N = 3SE +/- 0.02, N = 3SE +/- 0.07, N = 35525565521. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.050100150200250SE +/- 0.06, N = 3SE +/- 0.08, N = 3SE +/- 7.33, N = 32262312181. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.06001200180024003000SE +/- 4.73, N = 3SE +/- 20.51, N = 3SE +/- 9.44, N = 32827255425131. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.011002200330044005500SE +/- 38.71, N = 3SE +/- 14.86, N = 3SE +/- 12.20, N = 33795395349111. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0400800120016002000SE +/- 0.19, N = 3SE +/- 0.09, N = 3SE +/- 0.86, N = 31430168814291. (CC) gcc options: -O3 -march=native -lm

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0200K400K600K800K1000KSE +/- 290.00, N = 5SE +/- 1603.67, N = 5SE +/- 675.29, N = 58929608531379118861. (CC) gcc options: -O3 -march=native

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2018-09-25H.264 Video EncodingAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0306090120150SE +/- 0.38, N = 3SE +/- 0.32, N = 3SE +/- 1.13, N = 3145143147-mstack-alignment=64-mstack-alignment=641. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -ffast-math -march=native -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total TimeAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.020M40M60M80M100MSE +/- 457949.04, N = 3SE +/- 862000.88, N = 3SE +/- 655582.47, N = 31075165931029098581073585391. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

Swet

Average

OpenBenchmarking.orgOperations Per Second, More Is BetterSwet 1.5.16AverageAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0110M220M330M440M550MSE +/- 235162.13, N = 3SE +/- 1295596.84, N = 3SE +/- 476119.15, N = 32300562724930580802128705911. (CC) gcc options: -lm -lpthread -lcurses -lrt

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To CompileAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0816243240SE +/- 0.07, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 336.1825.5625.07

Timed LLVM Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 6.0.1Time To CompileAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.04080120160200198150145

Timed PHP Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 7.1.9Time To CompileAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.04080120160200SE +/- 0.18, N = 3SE +/- 0.45, N = 3SE +/- 0.33, N = 3179.0069.33109.001. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 7.11e12 Prime Number GenerationAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0246810SE +/- 0.02, N = 3SE +/- 0.07, N = 3SE +/- 0.01, N = 35.816.255.781. (CXX) g++ options: -O3 -march=native -lpthread

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total TimeAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.01122334455SE +/- 0.01, N = 3SE +/- 0.05, N = 3SE +/- 0.06, N = 345.1944.5148.461. (CC) gcc options: -lm -O3 -march=native

Bullet Physics Engine

Test: Raytests

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: RaytestsAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.70881.41762.12642.83523.544SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.143.143.15-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 3000 Fall

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 3000 FallAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.01.17682.35363.53044.70725.884SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.075.075.23-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Stack

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 StackAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0246810SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 35.875.816.15-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Convex

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 ConvexAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.01.20832.41663.62494.83326.0415SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.235.375.26-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 136 Ragdolls

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 136 RagdollsAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.74031.48062.22092.96123.7015SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.163.113.29-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Prim Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Prim TrimeshAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.25650.5130.76951.0261.2825SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.101.141.13-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Convex Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Convex TrimeshAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.30380.60760.91141.21521.519SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.351.35-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Zstd Compression

Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19

OpenBenchmarking.orgSeconds, Fewer Is BetterZstd Compression 1.3.4Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19AMD AOCC 1.3GCC 8.2.0LLVM Clang 7.048121620SE +/- 0.27, N = 12SE +/- 0.46, N = 12SE +/- 0.36, N = 1213.4713.4814.291. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLACAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.03691215SE +/- 0.10, N = 5SE +/- 0.05, N = 5SE +/- 0.07, N = 512.8413.3212.23-fvisibility=hidden1. (CXX) g++ options: -O3 -march=native -logg -lm

Mencoder

AVI To LAVC

OpenBenchmarking.orgSeconds, Fewer Is BetterMencoder 1.3.0AVI To LAVCAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0510152025SE +/- 0.10, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 322.2022.2922.331. (CC) gcc options: -fpie -pie -lncurses -lrt -lpng -lz -ljpeg -lasound -ldl -lpthread -lfreetype -lfontconfig -lbz2 -lmad -lvorbisenc -lvorbis -logg -rdynamic -lm

Minion

Benchmark: Graceful

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: GracefulAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.01326395265SE +/- 0.06, N = 3SE +/- 0.11, N = 3SE +/- 0.14, N = 355.1956.4155.671. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Minion

Benchmark: Solitaire

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: SolitaireAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.020406080100SE +/- 0.18, N = 3SE +/- 0.47, N = 3SE +/- 0.26, N = 386.1386.8183.511. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Minion

Benchmark: Quasigroup

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: QuasigroupAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0306090120150SE +/- 0.29, N = 3SE +/- 0.68, N = 3SE +/- 0.87, N = 31421471421. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Aircrack-ng

OpenBenchmarking.orgk/s, More Is BetterAircrack-ng 1.3AMD AOCC 1.3GCC 8.2.0LLVM Clang 7.020K40K60K80K100KSE +/- 215.70, N = 3SE +/- 39.19, N = 3SE +/- 47.61, N = 38343581752837101. (CXX) g++ options: -O3 -fvisibility=hidden -masm=intel -march=native -lpthread -lcrypto -lz -ldl -lm -pthread

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 1.5.3Test: Decompression ThroughputAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0306090120150SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.13, N = 31451471411. (CC) gcc options: -O3 -march=native -lm

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: GETAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0500K1000K1500K2000K2500KSE +/- 36000.16, N = 3SE +/- 41799.23, N = 3SE +/- 4615.00, N = 32150203223537621505581. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SETAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0300K600K900K1200K1500KSE +/- 5730.03, N = 3SE +/- 20200.62, N = 3SE +/- 4668.91, N = 31405199141969413642891. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Memcached mcperf

Method: Get

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: GetAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.014K28K42K56K70KSE +/- 1087.27, N = 3SE +/- 1117.87, N = 12SE +/- 2318.15, N = 126760463050580781. (CC) gcc options: -O3 -march=native -lm -rdynamic

Memcached mcperf

Method: Set

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: SetAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.013K26K39K52K65KSE +/- 532.03, N = 3SE +/- 2709.38, N = 12SE +/- 4241.39, N = 124339562454570821. (CC) gcc options: -O3 -march=native -lm -rdynamic

Hierarchical INTegration

Test: DOUBLE

OpenBenchmarking.orgQUIPs, More Is BetterHierarchical INTegration 1.0Test: DOUBLEAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0130M260M390M520M650MSE +/- 3584521.65, N = 3SE +/- 9792012.87, N = 4SE +/- 4810737.51, N = 36246717016152414386183730401. (CC) gcc options: -O3 -march=native -lm


Phoronix Test Suite v10.8.4