AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clang

2 x AMD EPYC 7601 compiler benchmarks on a future article for Phoronix.com by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1811203-SK-AMDAOCC1389&grw&sor.

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM ClangProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen ResolutionGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.32 x AMD EPYC 7601 32-Core @ 3.10GHz (64 Cores / 128 Threads)Dell 02MJ3T (1.2.5 BIOS)AMD Family 17h516096MB120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860Matrox Matrox G200eW3VE228Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMAUbuntu 18.104.19.0-041900-generic (x86_64)GCC 8.2.0ext41600x1200Clang 7.0.0-3 + LLVM 7.0.0Clang 7.0.0OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- GCC 8.2.0: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - LLVM Clang 7.0: Optimized build; Default target: x86_64-pc-linux-gnu; Host CPU: znver1- AMD AOCC 1.3: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver1 Python Details- Python 2.7.15+ + Python 3.6.7Security Details- __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clangbullet: Raytestsbullet: 3000 Fallbullet: 1000 Stackbullet: 1000 Convexbullet: 136 Ragdollsbullet: Prim Trimeshbullet: Convex Trimeshtscp: AI Chess Performancescimark2: Compositescimark2: Monte Carloscimark2: Fast Fourier Transformscimark2: Sparse Matrix Multiplyscimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationhint: DOUBLEminion: Gracefulminion: Solitaireminion: Quasigroupencode-flac: WAV To FLACtjbench: Decompression Throughputfftw: Float + SSE - 1D FFT Size 2048fftw: Float + SSE - 2D FFT Size 2048hmmer: Pfam Database Searchmafft: Multiple Sequence Alignmenthpcg: rodinia: OpenMP Streamclustermencoder: AVI To LAVCaircrack-ng: primesieve: 1e12 Prime Number Generationbuild-apache: Time To Compilestockfish: Total Timebuild-llvm: Time To Compilebuild-php: Time To Compilecompress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19swet: Averageaobench: 2048 x 2048 - Total Timex264: H.264 Video Encodingmcperf: Getmcperf: Setredis: GETredis: SETfhourstones: Complex Connect-4 Solvingpolybench-c: Correlation Computationpolybench-c: 3 Matrix MultiplicationsGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.33.145.075.815.373.111.141.35853137179655623125543953168861524143856.4186.8114713.3214727361148296.523.660.5023.3322.29817526.2525.5610290985815069.3313.4849305808044.51143630506245422353761419694104315.363.703.155.236.155.263.291.131.35911886192555221825134911142961837304055.6783.5114212.2314126574145806.123.400.5021.8122.33837105.7825.0710735853914510914.2921287059148.46147580785708221505581364289109055.534.133.145.075.875.233.161.101.33892960176655222628273795143062467170155.1986.1314212.8414526619150566.693.530.6022.4122.20834355.8136.1810751659319817913.4723005627245.19145676044339521502031405199107995.333.65OpenBenchmarking.org

Bullet Physics Engine

Test: Raytests

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: RaytestsGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.00.70881.41762.12642.83523.544SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.143.143.15-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 3000 Fall

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 3000 FallGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.01.17682.35363.53044.70725.884SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.075.075.23-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Stack

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 StackGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.0246810SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 35.815.876.15-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 1000 Convex

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 ConvexAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01.20832.41663.62494.83326.0415SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.235.265.37-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 136 Ragdolls

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 136 RagdollsGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.00.74031.48062.22092.96123.7015SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.113.163.29-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Prim Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Prim TrimeshAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.25650.5130.76951.0261.2825SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.101.131.14-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: Convex Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Convex TrimeshAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.30380.60760.91141.21521.519SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.331.351.35-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.0200K400K600K800K1000KSE +/- 675.29, N = 5SE +/- 290.00, N = 5SE +/- 1603.67, N = 59118868929608531371. (CC) gcc options: -O3 -march=native

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: CompositeLLVM Clang 7.0GCC 8.2.0AMD AOCC 1.3400800120016002000SE +/- 1.80, N = 3SE +/- 2.36, N = 3SE +/- 7.20, N = 31925179617661. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.0120240360480600SE +/- 0.02, N = 3SE +/- 0.47, N = 3SE +/- 0.07, N = 35565525521. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.050100150200250SE +/- 0.08, N = 3SE +/- 0.06, N = 3SE +/- 7.33, N = 32312262181. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.06001200180024003000SE +/- 4.73, N = 3SE +/- 20.51, N = 3SE +/- 9.44, N = 32827255425131. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationLLVM Clang 7.0GCC 8.2.0AMD AOCC 1.311002200330044005500SE +/- 12.20, N = 3SE +/- 14.86, N = 3SE +/- 38.71, N = 34911395337951. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.0400800120016002000SE +/- 0.09, N = 3SE +/- 0.19, N = 3SE +/- 0.86, N = 31688143014291. (CC) gcc options: -O3 -march=native -lm

Hierarchical INTegration

Test: DOUBLE

OpenBenchmarking.orgQUIPs, More Is BetterHierarchical INTegration 1.0Test: DOUBLEAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.0130M260M390M520M650MSE +/- 3584521.65, N = 3SE +/- 4810737.51, N = 3SE +/- 9792012.87, N = 46246717016183730406152414381. (CC) gcc options: -O3 -march=native -lm

Minion

Benchmark: Graceful

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: GracefulAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.01326395265SE +/- 0.06, N = 3SE +/- 0.14, N = 3SE +/- 0.11, N = 355.1955.6756.411. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Minion

Benchmark: Solitaire

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: SolitaireLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.020406080100SE +/- 0.26, N = 3SE +/- 0.18, N = 3SE +/- 0.47, N = 383.5186.1386.811. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Minion

Benchmark: Quasigroup

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: QuasigroupLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.0306090120150SE +/- 0.87, N = 3SE +/- 0.29, N = 3SE +/- 0.68, N = 31421421471. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLACLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.03691215SE +/- 0.07, N = 5SE +/- 0.10, N = 5SE +/- 0.05, N = 512.2312.8413.32-fvisibility=hidden1. (CXX) g++ options: -O3 -march=native -logg -lm

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 1.5.3Test: Decompression ThroughputGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.0306090120150SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.13, N = 31471451411. (CC) gcc options: -O3 -march=native -lm

FFTW

Build: Float + SSE - Size: 1D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 2048GCC 8.2.0AMD AOCC 1.3LLVM Clang 7.06K12K18K24K30KSE +/- 27.09, N = 3SE +/- 87.85, N = 3SE +/- 10.81, N = 32736126619265741. (CC) gcc options: -pthread -O3 -march=native -lm

FFTW

Build: Float + SSE - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 2048AMD AOCC 1.3GCC 8.2.0LLVM Clang 7.03K6K9K12K15KSE +/- 182.77, N = 3SE +/- 174.65, N = 3SE +/- 81.26, N = 31505614829145801. (CC) gcc options: -pthread -O3 -march=native -lm

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchLLVM Clang 7.0GCC 8.2.0AMD AOCC 1.3246810SE +/- 0.05, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 36.126.526.691. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence AlignmentLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.00.82351.6472.47053.2944.1175SE +/- 0.00, N = 3SE +/- 0.06, N = 3SE +/- 0.07, N = 123.403.533.661. (CC) gcc options: -std=c99 -O3 -lm -lpthread

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0AMD AOCC 1.3LLVM Clang 7.0GCC 8.2.00.1350.270.4050.540.675SE +/- 0.03, N = 6SE +/- 0.02, N = 9SE +/- 0.02, N = 90.600.500.50

Rodinia

Test: OpenMP Streamcluster

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenMP StreamclusterLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.0612182430SE +/- 0.48, N = 12SE +/- 0.36, N = 12SE +/- 0.53, N = 921.8122.4123.33-O3 -fopenmp-O3 -fopenmp-O2 -lOpenCL1. (CXX) g++ options:

Mencoder

AVI To LAVC

OpenBenchmarking.orgSeconds, Fewer Is BetterMencoder 1.3.0AVI To LAVCAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.0510152025SE +/- 0.10, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 322.2022.2922.331. (CC) gcc options: -fpie -pie -lncurses -lrt -lpng -lz -ljpeg -lasound -ldl -lpthread -lfreetype -lfontconfig -lbz2 -lmad -lvorbisenc -lvorbis -logg -rdynamic -lm

Aircrack-ng

OpenBenchmarking.orgk/s, More Is BetterAircrack-ng 1.3LLVM Clang 7.0AMD AOCC 1.3GCC 8.2.020K40K60K80K100KSE +/- 47.61, N = 3SE +/- 215.70, N = 3SE +/- 39.19, N = 38371083435817521. (CXX) g++ options: -O3 -fvisibility=hidden -masm=intel -march=native -lpthread -lcrypto -lz -ldl -lm -pthread

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 7.11e12 Prime Number GenerationLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.0246810SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.07, N = 35.785.816.251. (CXX) g++ options: -O3 -march=native -lpthread

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To CompileLLVM Clang 7.0GCC 8.2.0AMD AOCC 1.3816243240SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.07, N = 325.0725.5636.18

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total TimeAMD AOCC 1.3LLVM Clang 7.0GCC 8.2.020M40M60M80M100MSE +/- 457949.04, N = 3SE +/- 655582.47, N = 3SE +/- 862000.88, N = 31075165931073585391029098581. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

Timed LLVM Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 6.0.1Time To CompileLLVM Clang 7.0GCC 8.2.0AMD AOCC 1.34080120160200145150198

Timed PHP Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 7.1.9Time To CompileGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.34080120160200SE +/- 0.45, N = 3SE +/- 0.33, N = 3SE +/- 0.18, N = 369.33109.00179.001. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm

Zstd Compression

Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19

OpenBenchmarking.orgSeconds, Fewer Is BetterZstd Compression 1.3.4Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19AMD AOCC 1.3GCC 8.2.0LLVM Clang 7.048121620SE +/- 0.27, N = 12SE +/- 0.46, N = 12SE +/- 0.36, N = 1213.4713.4814.291. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

Swet

Average

OpenBenchmarking.orgOperations Per Second, More Is BetterSwet 1.5.16AverageGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.0110M220M330M440M550MSE +/- 1295596.84, N = 3SE +/- 235162.13, N = 3SE +/- 476119.15, N = 34930580802300562722128705911. (CC) gcc options: -lm -lpthread -lcurses -lrt

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total TimeGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.01122334455SE +/- 0.05, N = 3SE +/- 0.01, N = 3SE +/- 0.06, N = 344.5145.1948.461. (CC) gcc options: -lm -O3 -march=native

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2018-09-25H.264 Video EncodingLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.0306090120150SE +/- 1.13, N = 3SE +/- 0.38, N = 3SE +/- 0.32, N = 3147145143-mstack-alignment=64-mstack-alignment=641. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -ffast-math -march=native -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize

Memcached mcperf

Method: Get

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: GetAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.014K28K42K56K70KSE +/- 1087.27, N = 3SE +/- 1117.87, N = 12SE +/- 2318.15, N = 126760463050580781. (CC) gcc options: -O3 -march=native -lm -rdynamic

Memcached mcperf

Method: Set

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: SetGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.313K26K39K52K65KSE +/- 2709.38, N = 12SE +/- 4241.39, N = 12SE +/- 532.03, N = 36245457082433951. (CC) gcc options: -O3 -march=native -lm -rdynamic

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: GETGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3500K1000K1500K2000K2500KSE +/- 41799.23, N = 3SE +/- 4615.00, N = 3SE +/- 36000.16, N = 32235376215055821502031. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SETGCC 8.2.0AMD AOCC 1.3LLVM Clang 7.0300K600K900K1200K1500KSE +/- 20200.62, N = 3SE +/- 5730.03, N = 3SE +/- 4668.91, N = 31419694140519913642891. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 SolvingLLVM Clang 7.0AMD AOCC 1.3GCC 8.2.02K4K6K8K10KSE +/- 9.52, N = 3SE +/- 3.26, N = 3SE +/- 22.28, N = 31090510799104311. (CC) gcc options: -O3

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation ComputationAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.01.24432.48863.73294.97726.2215SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 35.335.365.531. (CC) gcc options: -O3 -march=native

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix MultiplicationsAMD AOCC 1.3GCC 8.2.0LLVM Clang 7.00.92931.85862.78793.71724.6465SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.05, N = 93.653.704.131. (CC) gcc options: -O3 -march=native


Phoronix Test Suite v10.8.5