Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

0xd000390

Processor: 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads), Motherboard: Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS), Chipset: Intel Ice Lake IEH, Memory: 512GB, Disk: 7682GB INTEL SSDPF2KX076TZ, Graphics: ASPEED, Monitor: VE228, Network: 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP

OS: Ubuntu 22.10, Kernel: 6.5.0-060500rc4daily20230804-generic (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.3, Vulkan: 1.3.224, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 1920x1080

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd000390
Python Notes: Python 3.10.7
Security Notes: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

0xd0003a5

OS: Ubuntu 22.10, Kernel: 6.5.0-rc5-phx-tues (x86_64), Desktop: GNOME Shell 43.0, Display Server: X Server 1.21.1.3, Vulkan: 1.3.224, Compiler: GCC 12.2.0, File-System: ext4, Screen Resolution: 1920x1080

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5
Python Notes: Python 3.10.7
Security Notes: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Model Zoo. Learn more via the OpenBenchmarking.org test page.

Result

Inference Time Cost (ms)

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

ONNX Runtime

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

Result

Inference Time Cost (ms)

QMCPACK

QMCPACK is a modern high-performance open-source Quantum Monte Carlo (QMC) simulation code making use of MPI for this benchmark of the H20 example code. QMCPACK is an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids. QMCPACK is supported by the U.S. Department of Energy. Learn more via the OpenBenchmarking.org test page.

ONNX Runtime

Result

Inference Time Cost (ms)

libxsmm

TensorFlow

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

Timed MrBayes Analysis

This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Learn more via the OpenBenchmarking.org test page.

ONNX Runtime

Result

Inference Time Cost (ms)

QMCPACK

Palabos

The Palabos library is a framework for general purpose Computational Fluid Dynamics (CFD). Palabos uses a kernel based on the Lattice Boltzmann method. This test profile uses the Palabos MPI-based Cavity3D benchmark. Learn more via the OpenBenchmarking.org test page.

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OSPRay

Neural Magic DeepSparse

Palabos

QMCPACK

Palabos

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

VVenC

VVenC is the Fraunhofer Versatile Video Encoder as a fast/efficient H.266/VVC encoder. The vvenc encoder makes use of SIMD Everywhere (SIMDe). The vvenc software is published under the Clear BSD License. Learn more via the OpenBenchmarking.org test page.

OSPRay

TensorFlow

Cpuminer-Opt

Laghos

Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping. Learn more via the OpenBenchmarking.org test page.

oneDNN

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

TensorFlow

Neural Magic DeepSparse

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

VVenC

OpenVINO

Neural Magic DeepSparse

OpenVINO

SVT-HEVC

This is a test of the Intel Open Visual Cloud Scalable Video Technology SVT-HEVC CPU-based multi-threaded video encoder for the HEVC / H.265 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

simdjson

Neural Magic DeepSparse

OSPRay

Neural Magic DeepSparse

simdjson

VP9 libvpx Encoding

This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP9 video format. Learn more via the OpenBenchmarking.org test page.

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

SPECFEM3D

simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra. This test profile currently relies on CPU-based execution for SPECFEM3D and using a variety of their built-in examples/models for benchmarking. Learn more via the OpenBenchmarking.org test page.

oneDNN

Neural Magic DeepSparse

Laghos

Neural Magic DeepSparse

QMCPACK

TensorFlow

Neural Magic DeepSparse

libxsmm

SPECFEM3D

Blender

libxsmm

Cpuminer-Opt

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

Blender

oneDNN

SPECFEM3D

dav1d

Dav1d is an open-source, speedy AV1 video decoder supporting modern SIMD CPU features. This test profile times how long it takes to decode sample AV1 video content. Learn more via the OpenBenchmarking.org test page.

SPECFEM3D

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

dav1d

SVT-AV1

This is a benchmark of the SVT-AV1 open-source video encoder/decoder. SVT-AV1 was originally developed by Intel as part of their Open Visual Cloud / Scalable Video Technology (SVT). Development of SVT-AV1 has since moved to the Alliance for Open Media as part of upstream AV1 development. SVT-AV1 is a CPU-based multi-threaded video encoder for the AV1 video format with a sample YUV video file. Learn more via the OpenBenchmarking.org test page.

Remhos

Remhos (REMap High-Order Solver) is a miniapp that solves the pure advection equations that are used to perform monotonic and conservative discontinuous field interpolation (remap) as part of the Eulerian phase in Arbitrary Lagrangian Eulerian (ALE) simulations. Learn more via the OpenBenchmarking.org test page.

CloverLeaf

CloverLeaf is a Lagrangian-Eulerian hydrodynamics benchmark. This test profile currently makes use of CloverLeaf's OpenMP version and benchmarked with the clover_bm.in input file (Problem 5). Learn more via the OpenBenchmarking.org test page.

Xcompact3d Incompact3d

Xcompact3d Incompact3d is a Fortran-MPI based, finite difference high-performance code for solving the incompressible Navier-Stokes equation and as many as you need scalar transport equations. Learn more via the OpenBenchmarking.org test page.

miniBUDE

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

Embree

SVT-HEVC

oneDNN

SVT-HEVC

SVT-AV1

HeFFTe - Highly Efficient FFT for Exascale

HeFFTe is the Highly Efficient FFT for Exascale software developed as part of the Exascale Computing Project. This test profile uses HeFFTe's built-in speed benchmarks under a variety of configuration options and currently catering to CPU/processor testing. Learn more via the OpenBenchmarking.org test page.

oneDNN

HeFFTe - Highly Efficient FFT for Exascale

173 Results Shown

TensorFlow:
CPU - 512 - ResNet-50
CPU - 256 - ResNet-50
libxsmm
ONNX Runtime:
fcn-resnet101-11 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
OpenVKL
oneDNN
ONNX Runtime:
yolov4 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
GPT-2 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
ArcFace ResNet-100 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
CaffeNet 12-int8 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
super-resolution-10 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
QMCPACK
ONNX Runtime:
bertsquad-12 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
libxsmm
TensorFlow
OSPRay
Timed MrBayes Analysis
ONNX Runtime:
ResNet50 v1-12-int8 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
QMCPACK
Palabos
Cpuminer-Opt
OSPRay
Neural Magic DeepSparse:
BERT-Large, NLP Question Answering - Asynchronous Multi-Stream:
ms/batch
items/sec
Palabos
QMCPACK
Palabos
NCNN:
CPU - FastestDet
CPU - vision_transformer
CPU - regnety_400m
CPU - squeezenet_ssd
CPU - yolov4-tiny
CPU - resnet50
CPU - alexnet
CPU - resnet18
CPU - vgg16
CPU - googlenet
CPU - blazeface
CPU - efficientnet-b0
CPU - mnasnet
CPU - shufflenet-v2
CPU-v3-v3 - mobilenet-v3
CPU-v2-v2 - mobilenet-v2
CPU - mobilenet
VVenC
OSPRay
TensorFlow
Cpuminer-Opt
Laghos
oneDNN
simdjson:
PartialTweets
DistinctUserID
TopTweet
TensorFlow
Neural Magic DeepSparse:
CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream:
ms/batch
items/sec
OpenVINO:
Person Detection FP16 - CPU:
ms
FPS
Person Detection FP32 - CPU:
ms
FPS
Face Detection FP16 - CPU:
ms
FPS
Face Detection FP16-INT8 - CPU:
ms
FPS
Machine Translation EN To DE FP16 - CPU:
ms
FPS
Person Vehicle Bike Detection FP16 - CPU:
ms
FPS
VVenC
OpenVINO:
Weld Porosity Detection FP16 - CPU:
ms
FPS
Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
ms
FPS
Neural Magic DeepSparse:
BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream:
ms/batch
items/sec
OpenVINO:
Weld Porosity Detection FP16-INT8 - CPU:
ms
FPS
Age Gender Recognition Retail 0013 FP16 - CPU:
ms
FPS
Vehicle Detection FP16-INT8 - CPU:
ms
FPS
Vehicle Detection FP16 - CPU:
ms
FPS
SVT-HEVC
simdjson
Neural Magic DeepSparse:
CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream:
ms/batch
items/sec
NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream:
ms/batch
items/sec
OSPRay:
gravity_spheres_volume/dim_512/scivis/real_time
gravity_spheres_volume/dim_512/ao/real_time
Neural Magic DeepSparse:
NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream:
ms/batch
items/sec
NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream:
ms/batch
items/sec
NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream:
ms/batch
items/sec
simdjson
VP9 libvpx Encoding
miniBUDE:
OpenMP - BM2:
Billion Interactions/s
GFInst/s
SPECFEM3D
oneDNN
Neural Magic DeepSparse:
NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream:
ms/batch
items/sec
NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream:
ms/batch
items/sec
Laghos
Neural Magic DeepSparse:
NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream:
ms/batch
items/sec
QMCPACK
TensorFlow
Neural Magic DeepSparse:
CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream:
ms/batch
items/sec
CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
ms/batch
items/sec
ResNet-50, Baseline - Asynchronous Multi-Stream:
ms/batch
items/sec
ResNet-50, Sparse INT8 - Asynchronous Multi-Stream:
ms/batch
items/sec
libxsmm
SPECFEM3D
Blender
libxsmm
Cpuminer-Opt:
LBC, LBRY Credits
scrypt
Skeincoin
Blake-2 S
Magi
Deepcoin
Triple SHA-256, Onecoin
x25x
Quad SHA-256, Pyrite
GROMACS
Blender
oneDNN
SPECFEM3D
dav1d
SPECFEM3D:
Tomographic Model
Mount St. Helens
Intel Open Image Denoise
dav1d
SVT-AV1
Remhos
CloverLeaf
Xcompact3d Incompact3d
miniBUDE:
OpenMP - BM1:
Billion Interactions/s
GFInst/s
Embree
Intel Open Image Denoise
Embree
SVT-HEVC
oneDNN
SVT-HEVC
SVT-AV1:
Preset 12 - Bosphorus 4K
Preset 13 - Bosphorus 4K
HeFFTe - Highly Efficient FFT for Exascale
oneDNN
HeFFTe - Highly Efficient FFT for Exascale:
r2c - FFTW - double - 256
c2c - FFTW - float - 256
r2c - FFTW - float - 256
c2c - FFTW - float - 128
r2c - FFTW - double - 128
c2c - FFTW - double - 128
r2c - FFTW - float - 128

0xd000390

Testing initiated at 6 August 2023 17:43 by user phoronix.

0xd0003a5

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0003a5
Python Notes: Python 3.10.7
Security Notes: gather_data_sampling: Mitigation of Microcode + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 8 August 2023 14:24 by user phoronix.

Xeon Platinum 8380 AVX-512 Workloads

View

Statistics

Graph Settings

Multi-Way Comparison

Table

Run Management

0xd000390

0xd0003a5

TensorFlow

libxsmm

ONNX Runtime

OpenVKL

oneDNN

ONNX Runtime

QMCPACK

ONNX Runtime

libxsmm

TensorFlow

OSPRay

Timed MrBayes Analysis

ONNX Runtime

QMCPACK

Palabos

Cpuminer-Opt

OSPRay

Neural Magic DeepSparse

Palabos

QMCPACK

Palabos

NCNN

VVenC

OSPRay

TensorFlow

Cpuminer-Opt

Laghos

oneDNN

simdjson

TensorFlow

Neural Magic DeepSparse

OpenVINO

VVenC

OpenVINO

Neural Magic DeepSparse

OpenVINO

SVT-HEVC

simdjson

Neural Magic DeepSparse

OSPRay

Neural Magic DeepSparse

simdjson

VP9 libvpx Encoding

miniBUDE

SPECFEM3D

oneDNN

Neural Magic DeepSparse

Laghos

Neural Magic DeepSparse

QMCPACK

TensorFlow

Neural Magic DeepSparse

libxsmm

SPECFEM3D

Blender

libxsmm

Cpuminer-Opt

GROMACS

Blender

oneDNN

SPECFEM3D

dav1d

SPECFEM3D

Intel Open Image Denoise

dav1d

SVT-AV1

Remhos

CloverLeaf

Xcompact3d Incompact3d

miniBUDE

Embree