eoy2024

Benchmarks for a future article. AMD EPYC 4484PX 12-Core testing with a Supermicro AS-3015A-I H13SAE-MF v1.00 (2.1 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412086-NE-EOY20243255&grs&rdt.

LiteRT

Model: NASNet Mobile

oneDNN

Harness: IP Shapes 1D - Engine: CPU

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

BYTE Unix Benchmark

Computational Test: System Call

Apache Cassandra

Test: Writes

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

LiteRT

Model: DeepLab V3

LiteRT

Model: Quantized COCO SSD MobileNet v1

oneDNN

Harness: IP Shapes 3D - Engine: CPU

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

BYTE Unix Benchmark

Computational Test: Pipe

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

Primesieve

Length: 1e12

ASTC Encoder

Preset: Thorough

ASTC Encoder

Preset: Medium

ASTC Encoder

Preset: Fast

ASTC Encoder

Preset: Exhaustive

ASTC Encoder

Preset: Very Thorough

Primesieve

Length: 1e13

Etcpak

Benchmark: Multi-Threaded - Configuration: ETC2

BYTE Unix Benchmark

Computational Test: Whetstone Double

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

OSPRay

Benchmark: particle_volume/scivis/real_time

Rustls

Benchmark: handshake - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

BYTE Unix Benchmark

Computational Test: Dhrystone 2

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

Blender

Blend File: BMW27 - Compute: CPU-Only

OSPRay

Benchmark: particle_volume/ao/real_time

Stockfish

Chess Benchmark

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

Blender

Blend File: Classroom - Compute: CPU-Only

OSPRay

Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time

OpenSSL

Algorithm: AES-128-GCM

OpenSSL

Algorithm: AES-256-GCM

OSPRay

Benchmark: gravity_spheres_volume/dim_512/scivis/real_time

POV-Ray

Trace Time

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

Blender

Blend File: Fishy Cat - Compute: CPU-Only

Rustls

Benchmark: handshake - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

OSPRay

Benchmark: gravity_spheres_volume/dim_512/ao/real_time

ACES DGEMM

Sustained Floating-Point Rate

OpenSSL

Algorithm: ChaCha20

OpenSSL

Algorithm: ChaCha20-Poly1305

Blender

Blend File: Barbershop - Compute: CPU-Only

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

Rustls

Benchmark: handshake - Suite: TLS13_CHACHA20_POLY1305_SHA256

7-Zip Compression

Test: Decompression Rating

Blender

Blend File: Junkshop - Compute: CPU-Only

ONNX Runtime

Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard

RELION

Test: Basic - Device: CPU

Stockfish

Chess Benchmark

Renaissance

Test: Genetic Algorithm Using Jenetics + Futures

SVT-AV1

Encoder Mode: Preset 3 - Input: Bosphorus 4K

Build2

Time To Compile

XNNPACK

Model: FP16MobileNetV1

XNNPACK

Model: FP32MobileNetV3Small

simdjson

Throughput Test: PartialTweets

x265

Video Input: Bosphorus 4K

SVT-AV1

Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

simdjson

Throughput Test: DistinctUserID

SVT-AV1

Encoder Mode: Preset 5 - Input: Bosphorus 4K

OSPRay

Benchmark: particle_volume/pathtracer/real_time

XNNPACK

Model: FP32MobileNetV3Large

NAMD

Input: ATPase with 327,506 Atoms

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 1080p

XNNPACK

Model: FP16MobileNetV3Small

Rustls

Benchmark: handshake-ticket - Suite: TLS13_CHACHA20_POLY1305_SHA256

XNNPACK

Model: QS8MobileNetV2

Rustls

Benchmark: handshake-resume - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

SVT-AV1

Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit

Rustls

Benchmark: handshake-ticket - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

Whisperfile

Model Size: Small

Rustls

Benchmark: handshake-resume - Suite: TLS13_CHACHA20_POLY1305_SHA256

SVT-AV1

Encoder Mode: Preset 3 - Input: Bosphorus 1080p

NAMD

Input: STMV with 1,066,628 Atoms

7-Zip Compression

Test: Compression Rating

Rustls

Benchmark: handshake-resume - Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384

Whisper.cpp

Model: ggml-medium.en - Input: 2016 State of the Union

SVT-AV1

Encoder Mode: Preset 5 - Input: Bosphorus 1080p

PyPerformance

Benchmark: async_tree_io

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

SVT-AV1

Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit

Timed Eigen Compilation

Time To Compile

Rustls

Benchmark: handshake-ticket - Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

Renaissance

Test: Gaussian Mixture Model

x265

Video Input: Bosphorus 1080p

Whisperfile

Model Size: Medium

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

Renaissance

Test: Apache Spark PageRank

Apache CouchDB

Bulk Size: 300 - Inserts: 1000 - Rounds: 30

Whisperfile

Model Size: Tiny

simdjson

Throughput Test: Kostya

Numpy Benchmark

Apache CouchDB

Bulk Size: 500 - Inserts: 1000 - Rounds: 30

Renaissance

Test: Scala Dotty

Apache CouchDB

Bulk Size: 300 - Inserts: 3000 - Rounds: 30

QuantLib

Size: XXS

CP2K Molecular Dynamics

Input: H20-64

Renaissance

Test: Akka Unbalanced Cobwebbed Tree

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

Apache CouchDB

Bulk Size: 100 - Inserts: 3000 - Rounds: 30

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

Apache CouchDB

Bulk Size: 500 - Inserts: 3000 - Rounds: 30

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

XNNPACK

Model: FP32MobileNetV2

Whisper.cpp

Model: ggml-small.en - Input: 2016 State of the Union

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 1080p

Renaissance

Test: Random Forest

PyPerformance

Benchmark: asyncio_tcp_ssl

Apache CouchDB

Bulk Size: 100 - Inserts: 1000 - Rounds: 30

ONNX Runtime

Model: ZFNet-512 - Device: CPU - Executor: Standard

Renaissance

Test: Apache Spark Bayes

QuantLib

Size: S

Renaissance

Test: Finagle HTTP Requests

GROMACS

Input: water_GMX50_bare

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

SVT-AV1

Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit

Whisper.cpp

Model: ggml-base.en - Input: 2016 State of the Union

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

CP2K Molecular Dynamics

Input: H20-256

LiteRT

Model: Inception V4

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128

Renaissance

Test: ALS Movie Lens

FinanceBench

Benchmark: Bonds OpenMP

PyPerformance

Benchmark: python_startup

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16

Gcrypt Library

OpenVINO GenAI

Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU

XNNPACK

Model: FP16MobileNetV2

Renaissance

Test: Savina Reactors.IO

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128

PyPerformance

Benchmark: gc_collect

FinanceBench

Benchmark: Repo OpenMP

OpenVINO GenAI

Model: Gemma-7b-int4-ov - Device: CPU

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

XNNPACK

Model: FP16MobileNetV3Large

PyPerformance

Benchmark: raytrace

PyPerformance

Benchmark: chaos

PyPerformance

Benchmark: regex_compile

PyPerformance

Benchmark: crypto_pyaes

OpenVINO GenAI

Model: Falcon-7b-instruct-int4-ov - Device: CPU

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

simdjson

Throughput Test: TopTweet

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16

PyPerformance

Benchmark: json_loads

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

LiteRT

Model: Mobilenet Quant

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128

CP2K Molecular Dynamics

Input: Fayalite-FIST

PyPerformance

Benchmark: xml_etree

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

LiteRT

Model: Mobilenet Float

Renaissance

Test: In-Memory Database Shootout

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16

PyPerformance

Benchmark: pickle_pure_python

PyPerformance

Benchmark: django_template

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16

PyPerformance

Benchmark: asyncio_websockets

PyPerformance

Benchmark: go

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128

Y-Cruncher

Pi Digits To Calculate: 500M

XNNPACK

Model: FP32MobileNetV1

LiteRT

Model: SqueezeNet

PyPerformance

Benchmark: pathlib

PyPerformance

Benchmark: float

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

PyPerformance

Benchmark: nbody

Y-Cruncher

Pi Digits To Calculate: 1B

simdjson

Throughput Test: LargeRandom

LiteRT

Model: Inception ResNet V2

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256

OpenVINO GenAI

Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token

OpenVINO GenAI

Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token

OpenVINO GenAI

Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token

OpenVINO GenAI

Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token

OpenVINO GenAI

Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token

OpenVINO GenAI

Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

ONNX Runtime

Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

ONNX Runtime

Model: ZFNet-512 - Device: CPU - Executor: Standard

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

Phoronix Test Suite v10.8.5