AMD EPYC Turin AVX-512 Comparison

AMD EPYC 9755 AVX-512 comparison by Michael Larabel for a future article.

AVX-512 Off

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002110
Python Notes: Python 3.12.2
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AVX-512 256b DP

AVX-512 512b DP

Processor: AMD EPYC 9755 128-Core @ 2.70GHz (128 Cores / 256 Threads), Motherboard: AMD VOLCANO (RVOT1000D BIOS), Chipset: AMD Device 153a, Memory: 12 x 64GB DDR5-6000MT/s Samsung M321R8GA0PB1-CCPKC, Disk: 2 x 1920GB KIOXIA KCD8XPUG1T92, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe

OS: Ubuntu 24.04, Kernel: 6.10.0-phx (x86_64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

Result

Items / Sec Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Xmrig

Xmrig is an open-source cross-platform CPU/GPU miner for RandomX, KawPow, CryptoNight and AstroBWT. This test profile is setup to measure the Xmrig CPU mining performance. Learn more via the OpenBenchmarking.org test page.

Result

H/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

Result

images/sec Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

ONNX Runtime

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

Inference Time Cost (ms)

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

Inference Time Cost (ms)

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

Inference Time Cost (ms)

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

Inference Time Cost (ms)

simdjson

Result

GB/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

SVT-AV1

Result

Frames Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

TensorFlow

Result

images/sec Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

Result

Score Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Mobile Neural Network

simdjson

Result

GB/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

simdjson

Result

GB/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

oneDNN

This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

simdjson

Result

GB/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

ONNX Runtime

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

Inference Time Cost (ms)

simdjson

Result

GB/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OSPRay

Result

Items Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OSPRay Studio

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

libxsmm

Libxsmm is an open-source library for specialized dense and sparse matrix operations and deep learning primitives. Libxsmm supports making use of Intel AMX, AVX-512, and other modern CPU instruction set capabilities. Learn more via the OpenBenchmarking.org test page.

Result

GFLOPS/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OSPRay

Result

Items Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Y-Cruncher

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OSPRay

Result

Items Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OpenVINO

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

ONNX Runtime

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

Inference Time Cost (ms)

OpenVINO

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OSPRay Studio

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OpenVINO

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OSPRay Studio

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

Result

batches/sec Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OSPRay Studio

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

Result

batches/sec Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

SVT-AV1

Result

Frames Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Y-Cruncher

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

SVT-AV1

Result

Frames Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

NAMD

Result

ns/day Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

Result

batches/sec Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

SVT-AV1

Result

Frames Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

Result

Billion Interactions/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

Result

Ns Per Day Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs (and GPUs via SYCL) and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

Result

Frames Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Y-Cruncher

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

SMHasher

SMHasher is a hash function tester supporting various algorithms and able to make use of AVX and other modern CPU instruction set extensions. Learn more via the OpenBenchmarking.org test page.

Result

cycles/hash

NAMD

Result

ns/day Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

OpenFOAM

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

oneDNN

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Embree

Result

Frames Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

Result

Frames Per Second Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

oneDNN

Result

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

miniBUDE

Result

Billion Interactions/s Per Watt

CPU Peak Freq (Highest CPU Core Frequency

CPU Power Consumption

CPU Temp

CPU Temperature Monitor

CPU Power Consumption Monitor

CPU Peak Freq (Highest CPU Core Frequency) Monitor

92 Results Shown

OpenVKL
Xmrig
TensorFlow
ONNX Runtime:
ResNet101_DUC_HDC-12 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
fcn-resnet101-11 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
ArcFace ResNet-100 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
bertsquad-12 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
simdjson
SVT-AV1
TensorFlow
Numpy Benchmark
Mobile Neural Network:
resnet-v2-50
mobilenetV3
simdjson
OpenFOAM
simdjson
OSPRay Studio:
2 - 4K - 32 - Path Tracer - CPU
1 - 4K - 32 - Path Tracer - CPU
oneDNN
simdjson
ONNX Runtime:
super-resolution-10 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
simdjson
OSPRay
OSPRay Studio:
3 - 4K - 16 - Path Tracer - CPU
3 - 4K - 32 - Path Tracer - CPU
libxsmm
OSPRay
Y-Cruncher
OpenVINO:
Face Detection FP16-INT8 - CPU:
ms
FPS
OSPRay
OpenVINO:
Noise Suppression Poconet-Like FP16 - CPU:
ms
FPS
Person Detection FP16 - CPU:
ms
FPS
Machine Translation EN To DE FP16 - CPU:
ms
FPS
Person Vehicle Bike Detection FP16 - CPU:
ms
FPS
Person Re-Identification Retail FP16 - CPU:
ms
FPS
Road Segmentation ADAS FP16-INT8 - CPU:
ms
FPS
Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
ms
FPS
Face Detection Retail FP16-INT8 - CPU:
ms
FPS
Handwritten English Recognition FP16-INT8 - CPU:
ms
FPS
ONNX Runtime:
GPT-2 - CPU - Standard:
Inference Time Cost (ms)
Inferences Per Second
OpenVINO:
Vehicle Detection FP16-INT8 - CPU:
ms
FPS
OSPRay Studio
OpenVINO:
Weld Porosity Detection FP16-INT8 - CPU:
ms
FPS
OSPRay Studio:
1 - 4K - 1 - Path Tracer - CPU
2 - 4K - 1 - Path Tracer - CPU
PyTorch
OSPRay Studio:
2 - 4K - 16 - Path Tracer - CPU
1 - 4K - 16 - Path Tracer - CPU
PyTorch
SVT-AV1
Y-Cruncher
SVT-AV1
NAMD
PyTorch
SVT-AV1
miniBUDE:
OpenMP - BM2:
Billion Interactions/s
GFInst/s
GROMACS
Embree
Y-Cruncher
SMHasher:
FarmHash32 x86_64 AVX:
cycles/hash
MiB/sec
NAMD
OpenFOAM
oneDNN:
IP Shapes 3D - CPU
Convolution Batch Shapes Auto - CPU
Embree:
Pathtracer ISPC - Crown
Pathtracer ISPC - Asian Dragon
oneDNN
miniBUDE:
OpenMP - BM1:
Billion Interactions/s
GFInst/s
CPU Temperature Monitor:
Phoronix Test Suite System Monitoring:
Celsius
Watts
Megahertz

AVX-512 Off

Testing initiated at 29 September 2024 13:32 by user phoronix.

AVX-512 256b DP

Testing initiated at 28 September 2024 15:39 by user phoronix.

AVX-512 512b DP

OS: Ubuntu 24.04, Kernel: 6.10.0-phx (x86_64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200

Testing initiated at 30 September 2024 02:28 by user phoronix.

AMD EPYC Turin AVX-512 Comparison

View

Statistics

Graph Settings

Multi-Way Comparison

Table

Run Management

AVX-512 Off

AVX-512 256b DP

AVX-512 512b DP

OpenVKL

Xmrig

TensorFlow

ONNX Runtime

simdjson

SVT-AV1

TensorFlow

Numpy Benchmark

Mobile Neural Network

simdjson

OpenFOAM

simdjson

OSPRay Studio

oneDNN

simdjson

ONNX Runtime

simdjson

OSPRay

OSPRay Studio

libxsmm

OSPRay

Y-Cruncher

OpenVINO

OSPRay

OpenVINO

ONNX Runtime

OpenVINO

OSPRay Studio

OpenVINO

OSPRay Studio

PyTorch

OSPRay Studio

PyTorch

SVT-AV1

Y-Cruncher

SVT-AV1

NAMD

PyTorch

SVT-AV1

miniBUDE

GROMACS

Embree

Y-Cruncher

SMHasher

NAMD

OpenFOAM

oneDNN

Embree

oneDNN

miniBUDE

CPU Temperature Monitor

CPU Power Consumption Monitor

CPU Peak Freq (Highest CPU Core Frequency) Monitor

92 Results Shown

AVX-512 Off

AVX-512 256b DP

AVX-512 512b DP