GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake

Xeon Platinum 8380 compiler benchmarks by Michael Larabel looking at GCC 11 against LLVM Clang 12 for some initial holiday weekend tests...

GCC 11.1

Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Device 0998, Memory: 16 x 32 GB DDR4-3200MT/s Hynix HMA84GR7CJR4N-XN, Disk: 800GB INTEL SSDPF21Q800GB, Graphics: ASPEED, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP

OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: GCC 11.1.1 20210428, File-System: xfs, Screen Resolution: 1024x768

Kernel Notes: Transparent Huge Pages: madvise
Environment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"
Compiler Notes: --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver
Processor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270
Python Notes: Python 3.9.5
Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

Clang 12.0

OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: Clang 12.0.0, File-System: xfs, Screen Resolution: 1024x768

Kernel Notes: Transparent Huge Pages: madvise
Environment Notes: CXXFLAGS="-O3 -march=native -flto" CFLAGS="-O3 -march=native -flto"
Processor Notes: Scaling Governor: intel_pstate performance - CPU Microcode: 0xd000270
Python Notes: Python 3.9.5
Security Notes: SELinux + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected

Kvazaar

This is a test of Kvazaar as a CPU-based H.265 video encoder written in the C programming language and optimized in Assembly. Kvazaar is the winner of the 2016 ACM Open-Source Software Competition and developed at the Ultra Video Group, Tampere University, Finland. Learn more via the OpenBenchmarking.org test page.

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample 1080p YUV video file. Learn more via the OpenBenchmarking.org test page.

SVT-VP9

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-VP9 CPU-based multi-threaded video encoder for the VP9 video format with a sample YUV input video file. Learn more via the OpenBenchmarking.org test page.

x265

This is a simple test of the x265 encoder run on the CPU with 1080p and 4K options for H.265 video encode performance with x265. Learn more via the OpenBenchmarking.org test page.

GNU GMP GMPbench

GMPbench is a test of the GNU Multiple Precision Arithmetic (GMP) Library. GMPbench is a single-threaded integer benchmark that leverages the GMP library to stress the CPU with widening integer multiplication. Learn more via the OpenBenchmarking.org test page.

GraphicsMagick

This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests on a sample 6000x4000 pixel JPEG image. Learn more via the OpenBenchmarking.org test page.

Coremark

This is a test of EEMBC CoreMark processor benchmark. Learn more via the OpenBenchmarking.org test page.

Zstd Compression

This test measures the time needed to compress/decompress a sample file (a FreeBSD disk image - FreeBSD-12.2-RELEASE-amd64-memstick.img) using Zstd compression with options for different compression levels / settings. Learn more via the OpenBenchmarking.org test page.

libjpeg-turbo tjbench

tjbench is a JPEG decompression/compression benchmark that is part of libjpeg-turbo, a JPEG image codec library optimized for SIMD instructions on modern CPU architectures. Learn more via the OpenBenchmarking.org test page.

Himeno Benchmark

The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method. Learn more via the OpenBenchmarking.org test page.

Crypto++

Crypto++ is a C++ class library of cryptographic algorithms. Learn more via the OpenBenchmarking.org test page.

Liquid-DSP

LiquidSDR's Liquid-DSP is a software-defined radio (SDR) digital signal processing library. This test profile runs a multi-threaded benchmark of this SDR/DSP library focused on embedded platform usage. Learn more via the OpenBenchmarking.org test page.

OpenSSL

OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL. Learn more via the OpenBenchmarking.org test page.

Darmstadt Automotive Parallel Heterogeneous Suite

DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.

Kripke

Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. Kripke is developed by LLNL. Learn more via the OpenBenchmarking.org test page.

PostgreSQL pgbench

This is a benchmark of PostgreSQL using pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

WebP Image Encode

This is a test of Google's libwebp with the cwebp image encode utility and using a sample 6000x4000 pixel JPEG image as the input. Learn more via the OpenBenchmarking.org test page.

Caffe

This is a benchmark of the Caffe deep learning framework and currently supports the AlexNet and Googlenet model and execution on both CPUs and NVIDIA GPUs. Learn more via the OpenBenchmarking.org test page.

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI initiative. Learn more via the OpenBenchmarking.org test page.

PostgreSQL pgbench

This is a benchmark of PostgreSQL using pgbench for facilitating the database benchmarks. Learn more via the OpenBenchmarking.org test page.

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

TNN

TNN is an open-source deep learning reasoning framework developed by Tencent. Learn more via the OpenBenchmarking.org test page.

Timed MrBayes Analysis

This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.

C-Ray

This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Learn more via the OpenBenchmarking.org test page.

Primesieve

Primesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance. Learn more via the OpenBenchmarking.org test page.

AOBench

AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. Learn more via the OpenBenchmarking.org test page.

Bullet Physics Engine

This is a benchmark of the Bullet Physics Engine. Learn more via the OpenBenchmarking.org test page.

FLAC Audio Encoding

This test times how long it takes to encode a sample WAV file to FLAC format five times. Learn more via the OpenBenchmarking.org test page.

LAME MP3 Encoding

LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Learn more via the OpenBenchmarking.org test page.

Opus Codec Encoding

Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus. Learn more via the OpenBenchmarking.org test page.

eSpeak-NG Speech Engine

This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file. This test profile is now tracking the eSpeak-NG version of eSpeak. Learn more via the OpenBenchmarking.org test page.

Gcrypt Library

Libgcrypt is a general purpose cryptographic library developed as part of the GnuPG project. This is a benchmark of libgcrypt's integrated benchmark and is measuring the time to run the benchmark command with a cipher/mac/hash repetition count set for 50 times as simple, high level look at the overall crypto performance of the system under test. Learn more via the OpenBenchmarking.org test page.

WebP2 Image Encode

This is a test of Google's libwebp2 library with the WebP2 image encode utility and using a sample 6000x4000 pixel JPEG image as the input, similar to the WebP/libwebp test profile. WebP2 is currently experimental and under heavy development as ultimately the successor to WebP. WebP2 supports 10-bit HDR, more efficienct lossy compression, improved lossless compression, animation support, and full multi-threading support compared to WebP. Learn more via the OpenBenchmarking.org test page.

ASTC Encoder

ASTC Encoder (astcenc) is for the Adaptive Scalable Texture Compression (ASTC) format commonly used with OpenGL, OpenGL ES, and Vulkan graphics APIs. This test profile does a coding test of both compression/decompression. Learn more via the OpenBenchmarking.org test page.

WavPack Audio Encoding

This test times how long it takes to encode a sample WAV file to WavPack format with very high quality settings. Learn more via the OpenBenchmarking.org test page.

113 Results Shown

Kvazaar:
Bosphorus 4K - Very Fast
Bosphorus 4K - Ultra Fast
Bosphorus 1080p - Very Fast
Bosphorus 1080p - Ultra Fast
SVT-AV1:
Preset 4 - Bosphorus 4K
Preset 8 - Bosphorus 4K
Preset 4 - Bosphorus 1080p
Preset 8 - Bosphorus 1080p
SVT-HEVC:
1 - Bosphorus 1080p
7 - Bosphorus 1080p
10 - Bosphorus 1080p
SVT-VP9:
VMAF Optimized - Bosphorus 1080p
PSNR/SSIM Optimized - Bosphorus 1080p
Visual Quality Optimized - Bosphorus 1080p
x265:
Bosphorus 4K
Bosphorus 1080p
GNU GMP GMPbench
GraphicsMagick:
Rotate
Sharpen
Enhanced
Resizing
Coremark
Zstd Compression:
8 - Compression Speed
8 - Decompression Speed
19 - Compression Speed
19 - Decompression Speed
8, Long Mode - Compression Speed
8, Long Mode - Decompression Speed
19, Long Mode - Compression Speed
19, Long Mode - Decompression Speed
libjpeg-turbo tjbench
Himeno Benchmark
Crypto++
Liquid-DSP:
1 - 256 - 57
160 - 256 - 57
OpenSSL
Darmstadt Automotive Parallel Heterogeneous Suite:
OpenMP - NDT Mapping
OpenMP - Points2Image
OpenMP - Euclidean Cluster
Kripke
PostgreSQL pgbench:
100 - 250 - Read Only
100 - 250 - Read Write
WebP Image Encode:
Default
Quality 100
Quality 100, Lossless
Quality 100, Highest Compression
Quality 100, Lossless, Highest Compression
Caffe:
AlexNet - CPU - 200
GoogleNet - CPU - 200
oneDNN:
IP Shapes 1D - f32 - CPU
IP Shapes 3D - f32 - CPU
IP Shapes 1D - u8s8f32 - CPU
IP Shapes 3D - u8s8f32 - CPU
IP Shapes 1D - bf16bf16bf16 - CPU
IP Shapes 3D - bf16bf16bf16 - CPU
Convolution Batch Shapes Auto - f32 - CPU
Deconvolution Batch shapes_3d - f32 - CPU
Convolution Batch Shapes Auto - u8s8f32 - CPU
Deconvolution Batch shapes_1d - u8s8f32 - CPU
Deconvolution Batch shapes_3d - u8s8f32 - CPU
Recurrent Neural Network Training - f32 - CPU
Recurrent Neural Network Inference - f32 - CPU
Recurrent Neural Network Training - u8s8f32 - CPU
Convolution Batch Shapes Auto - bf16bf16bf16 - CPU
Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU
Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU
Recurrent Neural Network Inference - u8s8f32 - CPU
Matrix Multiply Batch Shapes Transformer - f32 - CPU
Recurrent Neural Network Training - bf16bf16bf16 - CPU
Recurrent Neural Network Inference - bf16bf16bf16 - CPU
Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU
Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU
PostgreSQL pgbench:
100 - 250 - Read Only - Average Latency
100 - 250 - Read Write - Average Latency
NCNN:
CPU - mobilenet
CPU-v2-v2 - mobilenet-v2
CPU-v3-v3 - mobilenet-v3
CPU - shufflenet-v2
CPU - mnasnet
CPU - efficientnet-b0
CPU - blazeface
CPU - googlenet
CPU - vgg16
CPU - resnet18
CPU - yolov4-tiny
CPU - squeezenet_ssd
CPU - regnety_400m
TNN:
CPU - MobileNet v2
CPU - SqueezeNet v1.1
Timed MrBayes Analysis
C-Ray
Primesieve
AOBench
Bullet Physics Engine:
3000 Fall
1000 Stack
1000 Convex
136 Ragdolls
Prim Trimesh
Convex Trimesh
FLAC Audio Encoding
LAME MP3 Encoding
Opus Codec Encoding
eSpeak-NG Speech Engine
Gcrypt Library
WebP2 Image Encode:
Default
Quality 75, Compression Effort 7
Quality 95, Compression Effort 7
Quality 100, Compression Effort 5
Quality 100, Lossless Compression
ASTC Encoder:
Medium
Thorough
Exhaustive
WavPack Audio Encoding

GCC 11.1

OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: GCC 11.1.1 20210428, File-System: xfs, Screen Resolution: 1024x768

Testing initiated at 28 May 2021 11:02 by user .

Clang 12.0

OS: Fedora 34, Kernel: 5.12.6-300.fc34.x86_64 (x86_64), Compiler: Clang 12.0.0, File-System: xfs, Screen Resolution: 1024x768

Testing initiated at 28 May 2021 19:40 by user .

GCC 11 vs. LLVM Clang 12 Benchmarks On Xeon Ice Lake

View

Statistics

Graph Settings

Multi-Way Comparison

Table

Run Management

GCC 11.1

Clang 12.0

Kvazaar

SVT-AV1

SVT-HEVC

SVT-VP9

x265

GNU GMP GMPbench

GraphicsMagick

Coremark

Zstd Compression

libjpeg-turbo tjbench

Himeno Benchmark

Crypto++

Liquid-DSP

OpenSSL

Darmstadt Automotive Parallel Heterogeneous Suite

Kripke

PostgreSQL pgbench

WebP Image Encode

Caffe

oneDNN

PostgreSQL pgbench

NCNN

TNN

Timed MrBayes Analysis

C-Ray

Primesieve

AOBench

Bullet Physics Engine

FLAC Audio Encoding

LAME MP3 Encoding

Opus Codec Encoding

eSpeak-NG Speech Engine

Gcrypt Library

WebP2 Image Encode

ASTC Encoder

WavPack Audio Encoding

113 Results Shown

GCC 11.1

Clang 12.0