AMD EPYC 9684X 3D V-Cache
AMD EPYC 9684X 96-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2307207-NE-UPLOAD92587&grr.
WRF
Input: conus 2.5km
OpenFOAM
Input: drivaerFastback, Large Mesh Size - Execution Time
OpenFOAM
Input: drivaerFastback, Large Mesh Size - Mesh Time
PETSc
Test: Streams
High Performance Conjugate Gradient
X Y Z: 144 144 144 - RT: 60
High Performance Conjugate Gradient
X Y Z: 192 192 192 - RT: 60
Whisper.cpp
Model: ggml-medium.en - Input: 2016 State of the Union
libxsmm
M N K: 128
High Performance Conjugate Gradient
X Y Z: 160 160 160 - RT: 60
High Performance Conjugate Gradient
X Y Z: 104 104 104 - RT: 60
Whisper.cpp
Model: ggml-small.en - Input: 2016 State of the Union
TensorFlow
Device: CPU - Batch Size: 256 - Model: ResNet-50
Palabos
Grid Size: 400
LeelaChessZero
Backend: BLAS
TensorFlow
Device: CPU - Batch Size: 512 - Model: GoogLeNet
TensorFlow
Device: CPU - Batch Size: 512 - Model: ResNet-50
LeelaChessZero
Backend: Eigen
Whisper.cpp
Model: ggml-base.en - Input: 2016 State of the Union
ASKAP
Test: tConvolve MT - Degridding
ASKAP
Test: tConvolve MT - Gridding
libxsmm
M N K: 256
LAMMPS Molecular Dynamics Simulator
Model: 20k Atoms
Timed Linux Kernel Compilation
Build: allmodconfig
Monte Carlo Simulations of Ionised Nebulae
Input: Dust 2D tau100.0
Timed LLVM Compilation
Build System: Unix Makefiles
Palabos
Grid Size: 500
Palabos
Grid Size: 1000
OSPRay
Benchmark: particle_volume/pathtracer/real_time
Numpy Benchmark
Stress-NG
Test: Socket Activity
Stress-NG
Test: Context Switching
Stress-NG
Test: IO_uring
Blender
Blend File: Barbershop - Compute: CPU-Only
OSPRay
Benchmark: particle_volume/scivis/real_time
Timed Gem5 Compilation
Time To Compile
OSPRay Studio
Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer
asmFish
1024 Hash Memory, 26 Depth
Stress-NG
Test: Cloning
Stress-NG
Test: SENDFILE
Ngspice
Circuit: C2670
OSPRay Studio
Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer
OSPRay Studio
Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path Tracer
OSPRay Studio
Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer
Timed LLVM Compilation
Build System: Ninja
OSPRay Studio
Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer
OSPRay Studio
Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path Tracer
Timed Node.js Compilation
Time To Compile
Ngspice
Circuit: C7552
OpenFOAM
Input: drivaerFastback, Medium Mesh Size - Execution Time
OpenFOAM
Input: drivaerFastback, Medium Mesh Size - Mesh Time
OSPRay Studio
Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer
Palabos
Grid Size: 100
OSPRay Studio
Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer
OSPRay Studio
Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path Tracer
OSPRay
Benchmark: particle_volume/ao/real_time
Timed Godot Game Engine Compilation
Time To Compile
PyHPC Benchmarks
Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing
Zstd Compression
Compression Level: 19, Long Mode - Decompression Speed
Zstd Compression
Compression Level: 19, Long Mode - Compression Speed
TensorFlow
Device: CPU - Batch Size: 64 - Model: ResNet-50
TensorFlow
Device: CPU - Batch Size: 256 - Model: GoogLeNet
Stockfish
Total Time
Laghos
Test: Sedov Blast Wave, ube_922_hex.mesh
Zstd Compression
Compression Level: 19 - Decompression Speed
Zstd Compression
Compression Level: 19 - Compression Speed
OpenVINO
Model: Person Detection FP32 - Device: CPU
OpenVINO
Model: Person Detection FP32 - Device: CPU
OpenVINO
Model: Person Detection FP16 - Device: CPU
OpenVINO
Model: Person Detection FP16 - Device: CPU
OpenVINO
Model: Face Detection FP16 - Device: CPU
OpenVINO
Model: Face Detection FP16 - Device: CPU
OpenVINO
Model: Face Detection FP16-INT8 - Device: CPU
OpenVINO
Model: Face Detection FP16-INT8 - Device: CPU
Zstd Compression
Compression Level: 12 - Decompression Speed
Zstd Compression
Compression Level: 12 - Compression Speed
Zstd Compression
Compression Level: 3, Long Mode - Decompression Speed
Zstd Compression
Compression Level: 3, Long Mode - Compression Speed
Zstd Compression
Compression Level: 3 - Decompression Speed
Zstd Compression
Compression Level: 3 - Compression Speed
Zstd Compression
Compression Level: 8, Long Mode - Decompression Speed
Zstd Compression
Compression Level: 8, Long Mode - Compression Speed
Zstd Compression
Compression Level: 8 - Decompression Speed
Zstd Compression
Compression Level: 8 - Compression Speed
OpenVINO
Model: Person Vehicle Bike Detection FP16 - Device: CPU
OpenVINO
Model: Person Vehicle Bike Detection FP16 - Device: CPU
OpenVINO
Model: Machine Translation EN To DE FP16 - Device: CPU
OpenVINO
Model: Machine Translation EN To DE FP16 - Device: CPU
OpenVINO
Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU
OpenVINO
Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU
OpenVINO
Model: Vehicle Detection FP16-INT8 - Device: CPU
OpenVINO
Model: Vehicle Detection FP16-INT8 - Device: CPU
OpenVINO
Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU
OpenVINO
Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU
OpenVINO
Model: Weld Porosity Detection FP16-INT8 - Device: CPU
OpenVINO
Model: Weld Porosity Detection FP16-INT8 - Device: CPU
OpenVINO
Model: Vehicle Detection FP16 - Device: CPU
OpenVINO
Model: Vehicle Detection FP16 - Device: CPU
OpenVINO
Model: Weld Porosity Detection FP16 - Device: CPU
OpenVINO
Model: Weld Porosity Detection FP16 - Device: CPU
OSPRay
Benchmark: gravity_spheres_volume/dim_512/scivis/real_time
OSPRay
Benchmark: gravity_spheres_volume/dim_512/ao/real_time
OSPRay
Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time
Neural Magic DeepSparse
Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream
Blender
Blend File: Pabellon Barcelona - Compute: CPU-Only
Neural Magic DeepSparse
Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream
TensorFlow
Device: CPU - Batch Size: 32 - Model: ResNet-50
Laghos
Test: Triple Point Problem
Neural Magic DeepSparse
Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream
TensorFlow
Device: CPU - Batch Size: 512 - Model: AlexNet
Neural Magic DeepSparse
Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream
Blender
Blend File: Classroom - Compute: CPU-Only
Neural Magic DeepSparse
Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream
Neural Magic DeepSparse
Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream
Timed Linux Kernel Compilation
Build: defconfig
GPAW
Input: Carbon Nanotube
TensorFlow
Device: CPU - Batch Size: 16 - Model: ResNet-50
7-Zip Compression
Test: Decompression Rating
7-Zip Compression
Test: Compression Rating
Timed PHP Compilation
Time To Compile
Stress-NG
Test: CPU Cache
srsRAN Project
Test: PUSCH Processor Benchmark, Throughput Total
Stress-NG
Test: Atomic
Stress-NG
Test: MMAP
Stress-NG
Test: Pthread
Stress-NG
Test: MEMFD
Liquid-DSP
Threads: 192 - Buffer Length: 256 - Filter Length: 512
Stress-NG
Test: Malloc
Liquid-DSP
Threads: 128 - Buffer Length: 256 - Filter Length: 512
Stress-NG
Test: Zlib
Stress-NG
Test: Matrix 3D Math
Liquid-DSP
Threads: 64 - Buffer Length: 256 - Filter Length: 512
Stress-NG
Test: Glibc Qsort Data Sorting
Stress-NG
Test: Vector Math
Stress-NG
Test: Forking
Stress-NG
Test: NUMA
Stress-NG
Test: Futex
Stress-NG
Test: Glibc C String Functions
Stress-NG
Test: Poll
Stress-NG
Test: Wide Vector Math
Stress-NG
Test: AVL Tree
Stress-NG
Test: Memory Copying
Stress-NG
Test: Floating Point
Stress-NG
Test: Function Call
Liquid-DSP
Threads: 192 - Buffer Length: 256 - Filter Length: 57
Stress-NG
Test: Fused Multiply-Add
Stress-NG
Test: Crypto
Stress-NG
Test: Pipe
Stress-NG
Test: Hash
Stress-NG
Test: Vector Floating Point
Stress-NG
Test: Vector Shuffle
Stress-NG
Test: Mutex
Liquid-DSP
Threads: 192 - Buffer Length: 256 - Filter Length: 32
Liquid-DSP
Threads: 128 - Buffer Length: 256 - Filter Length: 57
Stress-NG
Test: Semaphores
Stress-NG
Test: CPU Stress
Stress-NG
Test: System V Message Passing
Stress-NG
Test: Matrix Math
Liquid-DSP
Threads: 128 - Buffer Length: 256 - Filter Length: 32
Liquid-DSP
Threads: 64 - Buffer Length: 256 - Filter Length: 57
Liquid-DSP
Threads: 64 - Buffer Length: 256 - Filter Length: 32
ASKAP
Test: tConvolve MPI - Gridding
ASKAP
Test: tConvolve MPI - Degridding
HeFFTe - Highly Efficient FFT for Exascale
Test: c2c - Backend: FFTW - Precision: double - X Y Z: 512
NAMD
ATPase Simulation - 327,506 Atoms
PyHPC Benchmarks
Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State
TensorFlow
Device: CPU - Batch Size: 64 - Model: GoogLeNet
Algebraic Multi-Grid Benchmark
GROMACS
Implementation: MPI CPU - Input: water_GMX50_bare
TensorFlow
Device: CPU - Batch Size: 256 - Model: AlexNet
SPECFEM3D
Model: Water-layered Halfspace
SPECFEM3D
Model: Layered Halfspace
Blender
Blend File: Fishy Cat - Compute: CPU-Only
Remhos
Test: Sample Remap Example
LULESH
HeFFTe - Highly Efficient FFT for Exascale
Test: r2c - Backend: FFTW - Precision: double - X Y Z: 512
Xmrig
Variant: Monero - Hash Count: 1M
srsRAN Project
Test: Downlink Processor Benchmark
OpenFOAM
Input: drivaerFastback, Small Mesh Size - Execution Time
OpenFOAM
Input: drivaerFastback, Small Mesh Size - Mesh Time
Xmrig
Variant: Wownero - Hash Count: 1M
Embree
Binary: Pathtracer ISPC - Model: Asian Dragon Obj
TensorFlow
Device: CPU - Batch Size: 16 - Model: GoogLeNet
SPECFEM3D
Model: Tomographic Model
SPECFEM3D
Model: Mount St. Helens
SPECFEM3D
Model: Homogeneous Halfspace
NAS Parallel Benchmarks
Test / Class: EP.D
Monte Carlo Simulations of Ionised Nebulae
Input: Gas HII40
HeFFTe - Highly Efficient FFT for Exascale
Test: c2c - Backend: FFTW - Precision: float - X Y Z: 512
TensorFlow
Device: CPU - Batch Size: 64 - Model: AlexNet
ASKAP
Test: Hogbom Clean OpenMP
TensorFlow
Device: CPU - Batch Size: 32 - Model: GoogLeNet
CloverLeaf
Lagrangian-Eulerian Hydrodynamics
Blender
Blend File: BMW27 - Compute: CPU-Only
NAS Parallel Benchmarks
Test / Class: BT.C
ASTC Encoder
Preset: Thorough
ASTC Encoder
Preset: Exhaustive
NAS Parallel Benchmarks
Test / Class: SP.C
miniFE
Problem Size: Small
TensorFlow
Device: CPU - Batch Size: 16 - Model: AlexNet
Xcompact3d Incompact3d
Input: input.i3d 129 Cells Per Direction
Google Draco
Model: Lion
Google Draco
Model: Church Facade
ASKAP
Test: tConvolve OpenMP - Degridding
ASKAP
Test: tConvolve OpenMP - Gridding
NAS Parallel Benchmarks
Test / Class: IS.D
Xcompact3d Incompact3d
Input: input.i3d 193 Cells Per Direction
TensorFlow
Device: CPU - Batch Size: 32 - Model: AlexNet
Embree
Binary: Pathtracer ISPC - Model: Crown
ACES DGEMM
Sustained Floating-Point Rate
NAS Parallel Benchmarks
Test / Class: LU.C
HeFFTe - Highly Efficient FFT for Exascale
Test: r2c - Backend: FFTW - Precision: float - X Y Z: 512
libxsmm
M N K: 32
ASTC Encoder
Preset: Medium
libxsmm
M N K: 64
Embree
Binary: Pathtracer ISPC - Model: Asian Dragon
NAS Parallel Benchmarks
Test / Class: FT.C
NAS Parallel Benchmarks
Test / Class: CG.C
NAS Parallel Benchmarks
Test / Class: MG.C
HeFFTe - Highly Efficient FFT for Exascale
Test: c2c - Backend: FFTW - Precision: double - X Y Z: 256
HeFFTe - Highly Efficient FFT for Exascale
Test: r2c - Backend: FFTW - Precision: double - X Y Z: 256
HeFFTe - Highly Efficient FFT for Exascale
Test: c2c - Backend: FFTW - Precision: float - X Y Z: 256
HeFFTe - Highly Efficient FFT for Exascale
Test: r2c - Backend: FFTW - Precision: float - X Y Z: 256
HeFFTe - Highly Efficient FFT for Exascale
Test: c2c - Backend: FFTW - Precision: double - X Y Z: 128
HeFFTe - Highly Efficient FFT for Exascale
Test: r2c - Backend: FFTW - Precision: double - X Y Z: 128
HeFFTe - Highly Efficient FFT for Exascale
Test: c2c - Backend: FFTW - Precision: float - X Y Z: 128
HeFFTe - Highly Efficient FFT for Exascale
Test: r2c - Backend: FFTW - Precision: float - X Y Z: 128
Phoronix Test Suite v10.8.5