Xeon Platinum 8380 AVX-512 Workloads

Benchmarks for a future article. 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 22.10 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2308099-NE-XEONPLATI49&grs&rdt.

OSPRay

Benchmark: particle_volume/scivis/real_time

OSPRay

Benchmark: particle_volume/ao/real_time

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream

QMCPACK

Input: FeCO6_b3lyp_gms

simdjson

Throughput Test: LargeRandom

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream

TensorFlow

Device: CPU - Batch Size: 512 - Model: AlexNet

simdjson

Throughput Test: Kostya

OSPRay

Benchmark: particle_volume/pathtracer/real_time

TensorFlow

Device: CPU - Batch Size: 256 - Model: AlexNet

NCNN

Target: CPU - Model: googlenet

NCNN

Target: CPU - Model: vgg16

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream

libxsmm

M N K: 64

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream

OpenVKL

Benchmark: vklBenchmark ISPC

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128

Embree

Binary: Pathtracer ISPC - Model: Crown

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

NCNN

Target: CPU - Model: resnet18

NCNN

Target: CPU - Model: squeezenet_ssd

Neural Magic DeepSparse

Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

NCNN

Target: CPU - Model: blazeface

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

QMCPACK

Input: simple-H2O

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

TensorFlow

Device: CPU - Batch Size: 256 - Model: GoogLeNet

Neural Magic DeepSparse

Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream

NCNN

Target: CPU - Model: vision_transformer

TensorFlow

Device: CPU - Batch Size: 256 - Model: ResNet-50

Embree

Binary: Pathtracer ISPC - Model: Asian Dragon

simdjson

Throughput Test: DistinctUserID

simdjson

Throughput Test: PartialTweets

SPECFEM3D

Model: Tomographic Model

miniBUDE

Implementation: OpenMP - Input Deck: BM2

miniBUDE

Implementation: OpenMP - Input Deck: BM2

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128

simdjson

Throughput Test: TopTweet

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

Neural Magic DeepSparse

Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream

VP9 libvpx Encoding

Speed: Speed 5 - Input: Bosphorus 4K

NCNN

Target: CPU - Model: mnasnet

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

TensorFlow

Device: CPU - Batch Size: 512 - Model: GoogLeNet

QMCPACK

Input: FeCO6_b3lyp_gms

libxsmm

M N K: 128

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

SVT-AV1

Encoder Mode: Preset 12 - Input: Bosphorus 4K

Neural Magic DeepSparse

Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

NCNN

Target: CPU - Model: yolov4-tiny

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

SPECFEM3D

Model: Mount St. Helens

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

SPECFEM3D

Model: Homogeneous Halfspace

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

OpenVINO

Model: Person Vehicle Bike Detection FP16 - Device: CPU

Palabos

Grid Size: 400

NCNN

Target: CPU-v3-v3 - Model: mobilenet-v3

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256

NCNN

Target: CPU - Model: shufflenet-v2

TensorFlow

Device: CPU - Batch Size: 512 - Model: ResNet-50

NCNN

Target: CPU - Model: alexnet

NCNN

Target: CPU - Model: mobilenet

Remhos

Test: Sample Remap Example

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

HeFFTe - Highly Efficient FFT for Exascale

Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256

Palabos

Grid Size: 500

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

libxsmm

M N K: 256

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256

SVT-HEVC

Tuning: 10 - Input: Bosphorus 4K

HeFFTe - Highly Efficient FFT for Exascale

Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

Neural Magic DeepSparse

Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream

miniBUDE

Implementation: OpenMP - Input Deck: BM1

miniBUDE

Implementation: OpenMP - Input Deck: BM1

QMCPACK

Input: Li2_STO_ae

NCNN

Target: CPU - Model: efficientnet-b0

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

OpenVINO

Model: Weld Porosity Detection FP16 - Device: CPU

SPECFEM3D

Model: Water-layered Halfspace

Cpuminer-Opt

Algorithm: Myriad-Groestl

libxsmm

M N K: 32

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

Timed MrBayes Analysis

Primate Phylogeny Analysis

Cpuminer-Opt

Algorithm: Skeincoin

Neural Magic DeepSparse

Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream

Neural Magic DeepSparse

Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

OpenVINO

Model: Face Detection FP16 - Device: CPU

Blender

Blend File: Fishy Cat - Compute: CPU-Only

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

CloverLeaf

Lagrangian-Eulerian Hydrodynamics

Cpuminer-Opt

Algorithm: Quad SHA-256, Pyrite

VVenC

Video Input: Bosphorus 4K - Video Preset: Faster

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenVINO

Model: Vehicle Detection FP16-INT8 - Device: CPU

SPECFEM3D

Model: Layered Halfspace

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

OpenVINO

Model: Vehicle Detection FP16 - Device: CPU

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenVINO

Model: Person Detection FP16 - Device: CPU

Cpuminer-Opt

Algorithm: LBC, LBRY Credits

Cpuminer-Opt

Algorithm: Deepcoin

OpenVINO

Model: Person Detection FP16 - Device: CPU

VVenC

Video Input: Bosphorus 4K - Video Preset: Fast

Neural Magic DeepSparse

Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

dav1d

Video Input: Chimera 1080p

OpenVINO

Model: Weld Porosity Detection FP16-INT8 - Device: CPU

Laghos

Test: Triple Point Problem

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

SVT-HEVC

Tuning: 7 - Input: Bosphorus 4K

dav1d

Video Input: Summer Nature 4K

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

OpenVINO

Model: Person Detection FP32 - Device: CPU

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

Palabos

Grid Size: 100

Cpuminer-Opt

Algorithm: scrypt

Cpuminer-Opt

Algorithm: Blake-2 S

SVT-HEVC

Tuning: 1 - Input: Bosphorus 4K

OpenVINO

Model: Person Detection FP32 - Device: CPU

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

Cpuminer-Opt

Algorithm: Triple SHA-256, Onecoin

Laghos

Test: Sedov Blast Wave, ube_922_hex.mesh

Cpuminer-Opt

Algorithm: Magi

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

Cpuminer-Opt

Algorithm: x25x

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU

OpenVINO

Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU

NCNN

Target: CPU - Model: FastestDet

NCNN

Target: CPU - Model: regnety_400m

NCNN

Target: CPU - Model: resnet50

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

Cpuminer-Opt

Algorithm: Garlicoin

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

Phoronix Test Suite v10.8.5