AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clang

2 x AMD EPYC 7601 compiler benchmarks on a future article for Phoronix.com by Michael Larabel.

HTML result view exported from: https://openbenchmarking.org/result/1811203-SK-AMDAOCC1389&grs.

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM ClangProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelCompilerFile-SystemScreen ResolutionGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.32 x AMD EPYC 7601 32-Core @ 3.10GHz (64 Cores / 128 Threads)Dell 02MJ3T (1.2.5 BIOS)AMD Family 17h516096MB120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860Matrox Matrox G200eW3VE228Broadcom BCM57416 NetXtreme-E Dual-Media 10G RDMAUbuntu 18.104.19.0-041900-generic (x86_64)GCC 8.2.0ext41600x1200Clang 7.0.0-3 + LLVM 7.0.0Clang 7.0.0OpenBenchmarking.orgEnvironment Details- CXXFLAGS=-O3-march=native CFLAGS=-O3-march=nativeCompiler Details- GCC 8.2.0: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v - LLVM Clang 7.0: Optimized build; Default target: x86_64-pc-linux-gnu; Host CPU: znver1- AMD AOCC 1.3: Optimized build with assertions; Default target: x86_64-unknown-linux-gnu; Host CPU: znver1 Python Details- Python 2.7.15+ + Python 3.6.7Security Details- __user pointer sanitization + Full AMD retpoline IBPB + SSB disabled via prctl and seccomp

AMD AOCC 1.3 Compiler Benchmarks vs. GCC vs. LLVM Clangbuild-php: Time To Compileswet: Averagebuild-apache: Time To Compilebuild-llvm: Time To Compilescimark2: Dense LU Matrix Factorizationscimark2: Jacobi Successive Over-Relaxationpolybench-c: 3 Matrix Multiplicationsscimark2: Sparse Matrix Multiplyhmmer: Pfam Database Searchscimark2: Compositeencode-flac: WAV To FLACaobench: 2048 x 2048 - Total Timeprimesieve: 1e12 Prime Number Generationtscp: AI Chess Performancescimark2: Fast Fourier Transformbullet: 1000 Stackbullet: 136 Ragdollsfhourstones: Complex Connect-4 Solvingstockfish: Total Timetjbench: Decompression Throughputredis: SETredis: GETminion: Solitairepolybench-c: Correlation Computationbullet: Prim Trimeshminion: Quasigroupfftw: Float + SSE - 2D FFT Size 2048bullet: 3000 Fallfftw: Float + SSE - 1D FFT Size 2048x264: H.264 Video Encodingbullet: 1000 Convexaircrack-ng: minion: Gracefulhint: DOUBLEbullet: Convex Trimeshscimark2: Monte Carlomencoder: AVI To LAVCbullet: Raytestsmcperf: Setmcperf: Getcompress-zstd: Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19mafft: Multiple Sequence Alignmenthpcg: rodinia: OpenMP StreamclusterGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.369.3349305808025.56150395316883.7025546.52179613.3244.516.258531372315.813.11104311029098581471419694223537686.815.361.14147148295.07273611435.378175256.416152414381.3555622.293.14624546305013.483.660.5023.3310921287059125.07145491114294.1325136.12192512.2348.465.789118862186.153.29109051073585391411364289215055883.515.531.13142145805.23265741475.268371055.676183730401.3555222.333.15570825807814.293.400.5021.8117923005627236.18198379514303.6528276.69176612.8445.195.818929602265.873.16107991075165931451405199215020386.135.331.10142150565.07266191455.238343555.196246717011.3355222.203.14433956760413.473.530.6022.41OpenBenchmarking.org

Timed PHP Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed PHP Compilation 7.1.9Time To CompileGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.34080120160200SE +/- 0.45, N = 3SE +/- 0.33, N = 3SE +/- 0.18, N = 369.33109.00179.001. (CC) gcc options: -O3 -march=native -pedantic -ldl -lz -lm

Swet

Average

OpenBenchmarking.orgOperations Per Second, More Is BetterSwet 1.5.16AverageGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3110M220M330M440M550MSE +/- 1295596.84, N = 3SE +/- 476119.15, N = 3SE +/- 235162.13, N = 34930580802128705912300562721. (CC) gcc options: -lm -lpthread -lcurses -lrt

Timed Apache Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Apache Compilation 2.4.7Time To CompileGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3816243240SE +/- 0.02, N = 3SE +/- 0.06, N = 3SE +/- 0.07, N = 325.5625.0736.18

Timed LLVM Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 6.0.1Time To CompileGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.34080120160200150145198

SciMark

Computational Test: Dense LU Matrix Factorization

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Dense LU Matrix FactorizationGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.311002200330044005500SE +/- 14.86, N = 3SE +/- 12.20, N = 3SE +/- 38.71, N = 33953491137951. (CC) gcc options: -O3 -march=native -lm

SciMark

Computational Test: Jacobi Successive Over-Relaxation

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Jacobi Successive Over-RelaxationGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3400800120016002000SE +/- 0.09, N = 3SE +/- 0.86, N = 3SE +/- 0.19, N = 31688142914301. (CC) gcc options: -O3 -march=native -lm

PolyBench-C

Test: 3 Matrix Multiplications

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: 3 Matrix MultiplicationsGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.92931.85862.78793.71724.6465SE +/- 0.06, N = 3SE +/- 0.05, N = 9SE +/- 0.04, N = 33.704.133.651. (CC) gcc options: -O3 -march=native

SciMark

Computational Test: Sparse Matrix Multiply

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Sparse Matrix MultiplyGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.36001200180024003000SE +/- 20.51, N = 3SE +/- 9.44, N = 3SE +/- 4.73, N = 32554251328271. (CC) gcc options: -O3 -march=native -lm

Timed HMMer Search

Pfam Database Search

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed HMMer Search 2.3.2Pfam Database SearchGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3246810SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 36.526.126.691. (CC) gcc options: -O3 -march=native -pthread -lhmmer -lsquid -lm

SciMark

Computational Test: Composite

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: CompositeGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3400800120016002000SE +/- 2.36, N = 3SE +/- 1.80, N = 3SE +/- 7.20, N = 31796192517661. (CC) gcc options: -O3 -march=native -lm

FLAC Audio Encoding

WAV To FLAC

OpenBenchmarking.orgSeconds, Fewer Is BetterFLAC Audio Encoding 1.3.2WAV To FLACGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.33691215SE +/- 0.05, N = 5SE +/- 0.07, N = 5SE +/- 0.10, N = 513.3212.2312.84-fvisibility=hidden1. (CXX) g++ options: -O3 -march=native -logg -lm

AOBench

Size: 2048 x 2048 - Total Time

OpenBenchmarking.orgSeconds, Fewer Is BetterAOBenchSize: 2048 x 2048 - Total TimeGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31122334455SE +/- 0.05, N = 3SE +/- 0.06, N = 3SE +/- 0.01, N = 344.5148.4645.191. (CC) gcc options: -lm -O3 -march=native

Primesieve

1e12 Prime Number Generation

OpenBenchmarking.orgSeconds, Fewer Is BetterPrimesieve 7.11e12 Prime Number GenerationGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3246810SE +/- 0.07, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 36.255.785.811. (CXX) g++ options: -O3 -march=native -lpthread

TSCP

AI Chess Performance

OpenBenchmarking.orgNodes Per Second, More Is BetterTSCP 1.81AI Chess PerformanceGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3200K400K600K800K1000KSE +/- 1603.67, N = 5SE +/- 675.29, N = 5SE +/- 290.00, N = 58531379118868929601. (CC) gcc options: -O3 -march=native

SciMark

Computational Test: Fast Fourier Transform

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Fast Fourier TransformGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.350100150200250SE +/- 0.08, N = 3SE +/- 7.33, N = 3SE +/- 0.06, N = 32312182261. (CC) gcc options: -O3 -march=native -lm

Bullet Physics Engine

Test: 1000 Stack

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 StackGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3246810SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 35.816.155.87-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Bullet Physics Engine

Test: 136 Ragdolls

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 136 RagdollsGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.74031.48062.22092.96123.7015SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.113.293.16-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Fhourstones

Complex Connect-4 Solving

OpenBenchmarking.orgKpos / sec, More Is BetterFhourstones 3.1Complex Connect-4 SolvingGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.32K4K6K8K10KSE +/- 22.28, N = 3SE +/- 9.52, N = 3SE +/- 3.26, N = 31043110905107991. (CC) gcc options: -O3

Stockfish

Total Time

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 9Total TimeGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.320M40M60M80M100MSE +/- 862000.88, N = 3SE +/- 655582.47, N = 3SE +/- 457949.04, N = 31029098581073585391075165931. (CXX) g++ options: -m64 -lpthread -O3 -march=native -fno-exceptions -std=c++11 -pedantic -msse -msse3 -mpopcnt -flto

libjpeg-turbo tjbench

Test: Decompression Throughput

OpenBenchmarking.orgMegapixels/sec, More Is Betterlibjpeg-turbo tjbench 1.5.3Test: Decompression ThroughputGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3306090120150SE +/- 0.03, N = 3SE +/- 0.13, N = 3SE +/- 0.04, N = 31471411451. (CC) gcc options: -O3 -march=native -lm

Redis

Test: SET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: SETGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3300K600K900K1200K1500KSE +/- 20200.62, N = 3SE +/- 4668.91, N = 3SE +/- 5730.03, N = 31419694136428914051991. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Redis

Test: GET

OpenBenchmarking.orgRequests Per Second, More Is BetterRedis 4.0.8Test: GETGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3500K1000K1500K2000K2500KSE +/- 41799.23, N = 3SE +/- 4615.00, N = 3SE +/- 36000.16, N = 32235376215055821502031. (CC) gcc options: -ggdb -rdynamic -lm -ldl -pthread

Minion

Benchmark: Solitaire

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: SolitaireGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.320406080100SE +/- 0.47, N = 3SE +/- 0.26, N = 3SE +/- 0.18, N = 386.8183.5186.131. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

PolyBench-C

Test: Correlation Computation

OpenBenchmarking.orgSeconds, Fewer Is BetterPolyBench-C 4.2Test: Correlation ComputationGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31.24432.48863.73294.97726.2215SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 35.365.535.331. (CC) gcc options: -O3 -march=native

Bullet Physics Engine

Test: Prim Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Prim TrimeshGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.25650.5130.76951.0261.2825SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.141.131.10-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Minion

Benchmark: Quasigroup

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: QuasigroupGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3306090120150SE +/- 0.68, N = 3SE +/- 0.87, N = 3SE +/- 0.29, N = 31471421421. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

FFTW

Build: Float + SSE - Size: 2D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 2D FFT Size 2048GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.33K6K9K12K15KSE +/- 174.65, N = 3SE +/- 81.26, N = 3SE +/- 182.77, N = 31482914580150561. (CC) gcc options: -pthread -O3 -march=native -lm

Bullet Physics Engine

Test: 3000 Fall

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 3000 FallGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31.17682.35363.53044.70725.884SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 35.075.235.07-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

FFTW

Build: Float + SSE - Size: 1D FFT Size 2048

OpenBenchmarking.orgMflops, More Is BetterFFTW 3.3.6Build: Float + SSE - Size: 1D FFT Size 2048GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.36K12K18K24K30KSE +/- 27.09, N = 3SE +/- 10.81, N = 3SE +/- 87.85, N = 32736126574266191. (CC) gcc options: -pthread -O3 -march=native -lm

x264

H.264 Video Encoding

OpenBenchmarking.orgFrames Per Second, More Is Betterx264 2018-09-25H.264 Video EncodingGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3306090120150SE +/- 0.32, N = 3SE +/- 1.13, N = 3SE +/- 0.38, N = 3143147145-mstack-alignment=64-mstack-alignment=641. (CC) gcc options: -ldl -lavformat -lavcodec -lavutil -lswscale -m64 -lm -lpthread -O3 -ffast-math -march=native -std=gnu99 -fPIC -fomit-frame-pointer -fno-tree-vectorize

Bullet Physics Engine

Test: 1000 Convex

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: 1000 ConvexGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31.20832.41663.62494.83326.0415SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 35.375.265.23-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Aircrack-ng

OpenBenchmarking.orgk/s, More Is BetterAircrack-ng 1.3GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.320K40K60K80K100KSE +/- 39.19, N = 3SE +/- 47.61, N = 3SE +/- 215.70, N = 38175283710834351. (CXX) g++ options: -O3 -fvisibility=hidden -masm=intel -march=native -lpthread -lcrypto -lz -ldl -lm -pthread

Minion

Benchmark: Graceful

OpenBenchmarking.orgSeconds, Fewer Is BetterMinion 1.8Benchmark: GracefulGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.31326395265SE +/- 0.11, N = 3SE +/- 0.14, N = 3SE +/- 0.06, N = 356.4155.6755.191. (CXX) g++ options: -std=gnu++11 -O3 -fomit-frame-pointer -rdynamic

Hierarchical INTegration

Test: DOUBLE

OpenBenchmarking.orgQUIPs, More Is BetterHierarchical INTegration 1.0Test: DOUBLEGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3130M260M390M520M650MSE +/- 9792012.87, N = 4SE +/- 4810737.51, N = 3SE +/- 3584521.65, N = 36152414386183730406246717011. (CC) gcc options: -O3 -march=native -lm

Bullet Physics Engine

Test: Convex Trimesh

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: Convex TrimeshGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.30380.60760.91141.21521.519SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 31.351.351.33-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

SciMark

Computational Test: Monte Carlo

OpenBenchmarking.orgMflops, More Is BetterSciMark 2.0Computational Test: Monte CarloGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3120240360480600SE +/- 0.02, N = 3SE +/- 0.07, N = 3SE +/- 0.47, N = 35565525521. (CC) gcc options: -O3 -march=native -lm

Mencoder

AVI To LAVC

OpenBenchmarking.orgSeconds, Fewer Is BetterMencoder 1.3.0AVI To LAVCGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3510152025SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.10, N = 322.2922.3322.201. (CC) gcc options: -fpie -pie -lncurses -lrt -lpng -lz -ljpeg -lasound -ldl -lpthread -lfreetype -lfontconfig -lbz2 -lmad -lvorbisenc -lvorbis -logg -rdynamic -lm

Bullet Physics Engine

Test: Raytests

OpenBenchmarking.orgSeconds, Fewer Is BetterBullet Physics Engine 2.81Test: RaytestsGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.70881.41762.12642.83523.544SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33.143.153.14-lglut -lGL -lGLU1. (CXX) g++ options: -O3 -march=native -rdynamic

Memcached mcperf

Method: Set

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: SetGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.313K26K39K52K65KSE +/- 2709.38, N = 12SE +/- 4241.39, N = 12SE +/- 532.03, N = 36245457082433951. (CC) gcc options: -O3 -march=native -lm -rdynamic

Memcached mcperf

Method: Get

OpenBenchmarking.orgOperations Per Second, More Is BetterMemcached mcperf 1.5.10Method: GetGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.314K28K42K56K70KSE +/- 1117.87, N = 12SE +/- 2318.15, N = 12SE +/- 1087.27, N = 36305058078676041. (CC) gcc options: -O3 -march=native -lm -rdynamic

Zstd Compression

Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19

OpenBenchmarking.orgSeconds, Fewer Is BetterZstd Compression 1.3.4Compressing ubuntu-16.04.3-server-i386.img, Compression Level 19GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.348121620SE +/- 0.46, N = 12SE +/- 0.36, N = 12SE +/- 0.27, N = 1213.4814.2913.471. (CC) gcc options: -O3 -march=native -pthread -lz -llzma

Timed MAFFT Alignment

Multiple Sequence Alignment

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed MAFFT Alignment 7.392Multiple Sequence AlignmentGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.82351.6472.47053.2944.1175SE +/- 0.07, N = 12SE +/- 0.00, N = 3SE +/- 0.06, N = 33.663.403.531. (CC) gcc options: -std=c99 -O3 -lm -lpthread

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.0GCC 8.2.0LLVM Clang 7.0AMD AOCC 1.30.1350.270.4050.540.675SE +/- 0.02, N = 9SE +/- 0.02, N = 9SE +/- 0.03, N = 60.500.500.60

Rodinia

Test: OpenMP Streamcluster

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 2.4Test: OpenMP StreamclusterGCC 8.2.0LLVM Clang 7.0AMD AOCC 1.3612182430SE +/- 0.53, N = 9SE +/- 0.48, N = 12SE +/- 0.36, N = 1223.3321.8122.41-O2 -lOpenCL-O3 -fopenmp-O3 -fopenmp1. (CXX) g++ options:


Phoronix Test Suite v10.8.4