AMD EPYC Compiler Tuning

GCC 9 compiler tuning benchmarks by Michael Larabel for a future article on Phoronix.com.

-O0

Environment Notes: CXXFLAGS=-O0 CFLAGS=-O0
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-Og

Environment Notes: CXXFLAGS=-Og CFLAGS=-Og
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O1

Environment Notes: CXXFLAGS=-O1 CFLAGS=-O1
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O2

Environment Notes: CXXFLAGS=-O2 CFLAGS=-O2
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O2 -ftree-vectorize -ftree-slp-vectorize

Environment Notes: CXXFLAGS=-O2-ftree-vectorize-ftree-slp-vectorize CFLAGS=-O2-ftree-vectorize-ftree-slp-vectorize
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O2 -march=znver1

Environment Notes: CXXFLAGS=-O2-march=znver1 CFLAGS=-O2-march=znver1
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O2 -flto

Environment Notes: CXXFLAGS=-O2-flto CFLAGS=-O2-flto
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O3

Environment Notes: CXXFLAGS=-O3 CFLAGS=-O3
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O3 -march=znver1

Environment Notes: CXXFLAGS=-O3-march=znver1 CFLAGS=-O3-march=znver1
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-O3 -march=znver1 -flto

Environment Notes: CXXFLAGS=-O3 march=znver1-flto CFLAGS=-O3-march=znver1-flto
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

-Ofast -march=znver1

Processor: 2 x AMD EPYC 7601 32-Core (64 Cores / 128 Threads), Motherboard: Dell 02MJ3T (1.2.5 BIOS), Chipset: AMD Family 17h, Memory: 16 x 32 GB DDR4-2400MT/s 36ASF4G72PZ-2G6D2, Disk: 120GB SSDSCKJB120G7R + 20 x 500GB Samsung SSD 860, Graphics: Matrox G200eW3, Monitor: VE228, Network: 2 x Broadcom BCM57416 NetXtreme-E 10GBase-T RDMA + 2 x Broadcom NetXtreme BCM5720 PCIe

OS: Ubuntu 18.04, Kernel: 5.0.0-050000rc6-generic (x86_64) 20190210, Desktop: GNOME Shell 3.28.3, Display Server: X Server, Compiler: GCC 9.0.1 20190210, File-System: ext4, Screen Resolution: 1600x1200

Environment Notes: CXXFLAGS=-Ofast-march=znver1 CFLAGS=-Ofast-march=znver1
Compiler Notes: --disable-multilib --enable-checking=release
Security Notes: __user pointer sanitization + Full AMD retpoline IBPB: conditional STIBP: disabled RSB filling + SSB disabled via prctl and seccomp

FLAC Audio Encoding

This test times how long it takes to encode a sample WAV file to FLAC format five times. Learn more via the OpenBenchmarking.org test page.

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

Timed PHP Compilation

This test times how long it takes to build PHP 5 with the Zend engine. Learn more via the OpenBenchmarking.org test page.

SciMark

This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. Learn more via the OpenBenchmarking.org test page.

C-Ray

This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.

LAME MP3 Encoding

LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.

FFTW

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. Learn more via the OpenBenchmarking.org test page.

Timed ImageMagick Compilation

This test times how long it takes to build ImageMagick. Learn more via the OpenBenchmarking.org test page.

Himeno Benchmark

The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method. Learn more via the OpenBenchmarking.org test page.

Timed Apache Compilation

This test times how long it takes to build the Apache HTTP Server. Learn more via the OpenBenchmarking.org test page.

GraphicsMagick

This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests to stress the system's CPU. Learn more via the OpenBenchmarking.org test page.

SciMark

GraphicsMagick

This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests to stress the system's CPU. Learn more via the OpenBenchmarking.org test page.

AOBench

AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. Learn more via the OpenBenchmarking.org test page.

GraphicsMagick

This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests to stress the system's CPU. Learn more via the OpenBenchmarking.org test page.

PostgreSQL pgbench

This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.

Timed HMMer Search

This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein. Learn more via the OpenBenchmarking.org test page.

x264

This is a simple test of the x264 encoder run on the CPU (OpenCL support disabled) with a sample video file. Learn more via the OpenBenchmarking.org test page.

PostgreSQL pgbench

This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.

libjpeg-turbo tjbench

tjbench is a JPEG decompression/compression benchmark part of libjpeg-turbo. Learn more via the OpenBenchmarking.org test page.

PostgreSQL pgbench

This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.

SciMark

PostgreSQL pgbench

This is a simple benchmark of PostgreSQL using pgbench. Learn more via the OpenBenchmarking.org test page.

John The Ripper

This is a benchmark of John The Ripper, which is a password cracker. Learn more via the OpenBenchmarking.org test page.

Bullet Physics Engine

This is a benchmark of the Bullet Physics Engine. Learn more via the OpenBenchmarking.org test page.

Hierarchical INTegration

This test runs the U.S. Department of Energy's Ames Laboratory Hierarchical INTegration (HINT) benchmark. Learn more via the OpenBenchmarking.org test page.

Bullet Physics Engine

This is a benchmark of the Bullet Physics Engine. Learn more via the OpenBenchmarking.org test page.

SVT-VP9

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.

Bullet Physics Engine

This is a benchmark of the Bullet Physics Engine. Learn more via the OpenBenchmarking.org test page.

VP9 libvpx Encoding

This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9/WebM format using a sample 1080p video. Learn more via the OpenBenchmarking.org test page.

SVT-AV1

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-AV1 CPU-based multi-threaded video encoder for the AV1 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.

VP9 libvpx Encoding

x265

This is a simple test of the x265 encoder run on the CPU with a sample 1080p video file. Learn more via the OpenBenchmarking.org test page.

Bullet Physics Engine

This is a benchmark of the Bullet Physics Engine. Learn more via the OpenBenchmarking.org test page.

Stockfish

This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores. Learn more via the OpenBenchmarking.org test page.

Bullet Physics Engine

This is a benchmark of the Bullet Physics Engine. Learn more via the OpenBenchmarking.org test page.

SVT-AV1

Hierarchical INTegration

This test runs the U.S. Department of Energy's Ames Laboratory Hierarchical INTegration (HINT) benchmark. Learn more via the OpenBenchmarking.org test page.

TSCP

This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. Learn more via the OpenBenchmarking.org test page.

ctx_clock

Ctx_clock is a simple test program to measure the context switch time in clock cycles. Learn more via the OpenBenchmarking.org test page.

Zstd Compression

This test measures the time needed to compress a sample file (an Ubuntu file-system image) using Zstd compression. Learn more via the OpenBenchmarking.org test page.

John The Ripper

This is a benchmark of John The Ripper, which is a password cracker. Learn more via the OpenBenchmarking.org test page.

SciMark

51 Results Shown

FLAC Audio Encoding
FFTW
Timed PHP Compilation
SciMark:
Sparse Matrix Multiply
Composite
C-Ray
LAME MP3 Encoding
FFTW
Timed ImageMagick Compilation
Himeno Benchmark
Timed Apache Compilation
GraphicsMagick:
Sharpen
Enhanced
HWB Color Space
Swirl
Noise-Gaussian
SciMark:
Jacobi Successive Over-Relaxation
Monte Carlo
GraphicsMagick
AOBench
GraphicsMagick
PostgreSQL pgbench
Timed HMMer Search
x264
PostgreSQL pgbench
libjpeg-turbo tjbench
PostgreSQL pgbench
SciMark
PostgreSQL pgbench
John The Ripper
Bullet Physics Engine
Hierarchical INTegration
Bullet Physics Engine
SVT-VP9
Bullet Physics Engine
VP9 libvpx Encoding
SVT-AV1
VP9 libvpx Encoding
x265
Bullet Physics Engine:
3000 Fall
Raytests
Stockfish
Bullet Physics Engine:
Convex Trimesh
Prim Trimesh
SVT-AV1
Hierarchical INTegration
TSCP
ctx_clock
Zstd Compression
John The Ripper
SciMark

-O0

Testing initiated at 15 February 2019 20:24 by user root.

-Og

Testing initiated at 19 February 2019 05:54 by user root.

-O1

Testing initiated at 16 February 2019 07:27 by user root.

-O2

Testing initiated at 16 February 2019 15:25 by user root.

-O2 -ftree-vectorize -ftree-slp-vectorize

Testing initiated at 18 February 2019 05:50 by user root.

-O2 -march=znver1

Testing initiated at 17 February 2019 07:16 by user root.

-O2 -flto

Testing initiated at 18 February 2019 20:32 by user root.

-O3

Testing initiated at 16 February 2019 21:58 by user root.

-O3 -march=znver1

Testing initiated at 15 February 2019 13:07 by user root.

-O3 -march=znver1 -flto

Testing initiated at 18 February 2019 12:40 by user root.

-Ofast -march=znver1

Testing initiated at 17 February 2019 14:18 by user root.

AMD EPYC Compiler Tuning

View

Limit displaying results to tests within:

Statistics

Graph Settings

Multi-Way Comparison

Table

Run Management

-O0

-Og

-O1

-O2

-O2 -ftree-vectorize -ftree-slp-vectorize

-O2 -march=znver1

-O2 -flto

-O3

-O3 -march=znver1

-O3 -march=znver1 -flto

-Ofast -march=znver1

FLAC Audio Encoding

FFTW

Timed PHP Compilation

SciMark

C-Ray

LAME MP3 Encoding

FFTW

Timed ImageMagick Compilation

Himeno Benchmark

Timed Apache Compilation

GraphicsMagick

SciMark

GraphicsMagick

AOBench

GraphicsMagick

PostgreSQL pgbench

Timed HMMer Search

x264

PostgreSQL pgbench

libjpeg-turbo tjbench

PostgreSQL pgbench

SciMark

PostgreSQL pgbench

John The Ripper

Bullet Physics Engine

Hierarchical INTegration

Bullet Physics Engine

SVT-VP9

Bullet Physics Engine

VP9 libvpx Encoding

SVT-AV1

VP9 libvpx Encoding

x265

Bullet Physics Engine

Stockfish

Bullet Physics Engine

SVT-AV1

Hierarchical INTegration

TSCP

ctx_clock

Zstd Compression

John The Ripper

SciMark

51 Results Shown

-O0

-Og

-O1

-O2

-O2 -ftree-vectorize -ftree-slp-vectorize

-O2 -march=znver1

-O2 -flto

-O3

-O3 -march=znver1

-O3 -march=znver1 -flto

-Ofast -march=znver1