Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

0xd000390

Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Ice Lake IEH, Memory: 512GB, Disk: 7682GB INTEL SSDPF2KX076TZ, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP

OS: Ubuntu 22.10, Kernel: 6.5.0-060500rc4daily20230804-generic (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.3, Vulkan: 1.3.224, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 1920x1080

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000390
Python Notes: Python 3.10.7
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

0xd0003a5

OS: Ubuntu 22.10, Kernel: 6.5.0-rc5-phx-tues (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.3, Vulkan: 1.3.224, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 1920x1080

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5
Python Notes: Python 3.10.7
Security Notes: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

CloverLeaf

CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version and benchmarked with the clover_bm.in input file (Problem 5). Learn more via the OpenBenchmarking.org test page.

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

Neural Magic DeepSparse

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Timed MrBayes Analysis

This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

dav1d

Dav1d is an open-source, speedy AV1 video decoder supporting modern SIMD CPU features. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

VP9 libvpx Encoding

This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9 video format. Learn more via the OpenBenchmarking.org test page.

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

Blender

165 Results Shown

miniBUDE:
OpenMP - BM1:
GFInst/s
Billion Interactions/s
OpenMP - BM2:
GFInst/s
Billion Interactions/s
CloverLeaf
libxsmm:
128
256
32
64
Laghos:
Triple Point Problem
Sedov Blast Wave, ube_922_hex.mesh
HeFFTe - Highly Efficient FFT for Exascale:
c2c - FFTW - float - 128
c2c - FFTW - float - 256
r2c - FFTW - float - 128
r2c - FFTW - float - 256
c2c - FFTW - double - 128
c2c - FFTW - double - 256
r2c - FFTW - double - 128
r2c - FFTW - double - 256
Palabos:
100
400
500
Xcompact3d Incompact3d
Remhos
SPECFEM3D:
Mount St. Helens
Layered Halfspace
Tomographic Model
Homogeneous Halfspace
Water-layered Halfspace
simdjson:
Kostya
TopTweet
LargeRand
PartialTweets
DistinctUserID
Embree:
Pathtracer ISPC - Crown
Pathtracer ISPC - Asian Dragon
OpenVKL
OSPRay:
particle_volume/ao/real_time
particle_volume/scivis/real_time
particle_volume/pathtracer/real_time
gravity_spheres_volume/dim_512/ao/real_time
gravity_spheres_volume/dim_512/scivis/real_time
oneDNN:
IP Shapes 3D - bf16bf16bf16 - CPU
Convolution Batch Shapes Auto - bf16bf16bf16 - CPU
Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU
Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU
Recurrent Neural Network Training - bf16bf16bf16 - CPU
Recurrent Neural Network Inference - bf16bf16bf16 - CPU
Cpuminer-Opt:
Magi
x25x
scrypt
Deepcoin
Blake-2 S
Garlicoin
Skeincoin
Myriad-Groestl
LBC, LBRY Credits
Quad SHA-256, Pyrite
Triple SHA-256, Onecoin
GROMACS
TensorFlow:
CPU - 256 - AlexNet
CPU - 512 - AlexNet
CPU - 256 - GoogLeNet
CPU - 256 - ResNet-50
CPU - 512 - GoogLeNet
CPU - 512 - ResNet-50
Neural Magic DeepSparse:
NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream:
items/sec
ms/batch
ResNet-50, Baseline - Asynchronous Multi-Stream:
items/sec
ms/batch
ResNet-50, Sparse INT8 - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
items/sec
ms/batch
BERT-Large, NLP Question Answering - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
items/sec
ms/batch
CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
items/sec
ms/batch
BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream:
items/sec
ms/batch
NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
items/sec
ms/batch
OpenVINO:
Face Detection FP16 - CPU:
FPS
ms
Person Detection FP16 - CPU:
FPS
ms
Person Detection FP32 - CPU:
FPS
ms
Vehicle Detection FP16 - CPU:
FPS
ms
Face Detection FP16-INT8 - CPU:
FPS
ms
Vehicle Detection FP16-INT8 - CPU:
FPS
ms
Weld Porosity Detection FP16 - CPU:
FPS
ms
Machine Translation EN To DE FP16 - CPU:
FPS
ms
Weld Porosity Detection FP16-INT8 - CPU:
FPS
ms
Person Vehicle Bike Detection FP16 - CPU:
FPS
ms
Age Gender Recognition Retail 0013 FP16 - CPU:
FPS
ms
Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
FPS
ms
ONNX Runtime:
GPT-2 - CPU - Standard
yolov4 - CPU - Standard
bertsquad-12 - CPU - Standard
CaffeNet 12-int8 - CPU - Standard
fcn-resnet101-11 - CPU - Standard
ArcFace ResNet-100 - CPU - Standard
ResNet50 v1-12-int8 - CPU - Standard
super-resolution-10 - CPU - Standard
Timed MrBayes Analysis
QMCPACK:
Li2_STO_ae
simple-H2O
FeCO6_b3lyp_gms
FeCO6_b3lyp_gms
dav1d:
Chimera 1080p
Summer Nature 4K
SVT-AV1:
Preset 8 - Bosphorus 4K
Preset 12 - Bosphorus 4K
Preset 13 - Bosphorus 4K
SVT-HEVC:
1 - Bosphorus 4K
7 - Bosphorus 4K
10 - Bosphorus 4K
VP9 libvpx Encoding
VVenC:
Bosphorus 4K - Fast
Bosphorus 4K - Faster
Intel Open Image Denoise:
RT.ldr_alb_nrm.3840x2160 - CPU-Only
RTLightmap.hdr.4096x4096 - CPU-Only
NCNN:
CPU - mobilenet
CPU-v2-v2 - mobilenet-v2
CPU-v3-v3 - mobilenet-v3
CPU - shufflenet-v2
CPU - mnasnet
CPU - efficientnet-b0
CPU - blazeface
CPU - googlenet
CPU - vgg16
CPU - resnet18
CPU - alexnet
CPU - resnet50
CPU - yolov4-tiny
CPU - squeezenet_ssd
CPU - regnety_400m
CPU - vision_transformer
CPU - FastestDet
Blender:
BMW27 - CPU-Only
Fishy Cat - CPU-Only

0xd000390

Testing initiated at 6 August 2023 17:43 by user phoronix.

0xd0003a5

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5
Python Notes: Python 3.10.7
Security Notes: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 8 August 2023 14:24 by user phoronix.

Xeon Platinum 8380 AVX-512 Workloads

View

Statistics

Graph Settings

Multi-Way Comparison

Table

Run Management

0xd000390

0xd0003a5

miniBUDE

CloverLeaf

libxsmm

Laghos

HeFFTe - Highly Efficient FFT for Exascale

Palabos

Xcompact3d Incompact3d

Remhos

SPECFEM3D

simdjson

Embree

OpenVKL

OSPRay

oneDNN

Cpuminer-Opt

GROMACS

TensorFlow

Neural Magic DeepSparse

OpenVINO

ONNX Runtime

Timed MrBayes Analysis

QMCPACK

dav1d

SVT-AV1

SVT-HEVC

VP9 libvpx Encoding

VVenC

Intel Open Image Denoise

NCNN

Blender

165 Results Shown

0xd000390

0xd0003a5