general

2 x AMD EPYC 9274F 24-Core testing with a ASUS ESC8000A-E12 K14PG-D24 (1201 BIOS) and ASUS NVIDIA H100 NVL 94GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2501312-NE-GENERAL1651&grs.

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 2048

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 1024

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 512

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Prompt Processing 256

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 2048

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 1024

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 512

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Prompt Processing 256

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 128

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024

Llamafile

Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256

Llamafile

Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024

Llamafile

Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128

Llamafile

Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16

Llama.cpp

Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

Llama.cpp

Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

Llama.cpp

Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512

Llama.cpp

Backend: NVIDIA CUDA - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024

Llama.cpp

Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

Llama.cpp

Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

Llama.cpp

Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128

Llama.cpp

Backend: NVIDIA CUDA - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512

Llama.cpp

Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

Llama.cpp

Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

Llama.cpp

Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

Llama.cpp

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128

Llama.cpp

Backend: NVIDIA CUDA - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512

Llama.cpp

Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128

AI Benchmark Alpha

Device AI Score

AI Benchmark Alpha

Device Training Score

AI Benchmark Alpha

Device Inference Score

PyBench

Total For Average Test Times

NCNN

Target: Vulkan GPU - Model: vision_transformer

NCNN

Target: Vulkan GPU - Model: regnety_400m

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

NCNN

Target: Vulkan GPU - Model: resnet50

NCNN

Target: Vulkan GPU - Model: alexnet

NCNN

Target: Vulkan GPU - Model: resnet18

NCNN

Target: Vulkan GPU - Model: vgg16

NCNN

Target: Vulkan GPU - Model: googlenet

NCNN

Target: Vulkan GPU - Model: blazeface

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

NCNN

Target: Vulkan GPU - Model: mnasnet

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

NCNN

Target: Vulkan GPU - Model: mobilenet

NCNN

Target: CPU - Model: vision_transformer

NCNN

Target: CPU - Model: regnety_400m

NCNN

Target: CPU - Model: squeezenet_ssd

NCNN

Target: CPU - Model: yolov4-tiny

NCNN

Target: CPU - Model: resnet50

NCNN

Target: CPU - Model: alexnet

NCNN

Target: CPU - Model: resnet18

NCNN

Target: CPU - Model: vgg16

NCNN

Target: CPU - Model: googlenet

NCNN

Target: CPU - Model: blazeface

NCNN

Target: CPU - Model: efficientnet-b0

NCNN

Target: CPU - Model: mnasnet

NCNN

Target: CPU - Model: shufflenet-v2

NCNN

Target: CPU-v2-v2 - Model: mobilenet-v2

spaCy

Model: en_core_web_lg

Stress-NG

Test: Hyperbolic Trigonometric Math

Stress-NG

Test: POSIX Regular Expressions

Stress-NG

Test: System V Message Passing

Stress-NG

Test: Glibc Qsort Data Sorting

Stress-NG

Test: Glibc C String Functions

Stress-NG

Test: Integer Bit Operations

Stress-NG

Test: Bessel Math Operations

Stress-NG

Test: Vector Floating Point

Stress-NG

Test: Bitonic Integer Sort

Stress-NG

Test: Trigonometric Math

Stress-NG

Test: Fused Multiply-Add

Stress-NG

Test: Radix String Sort

Stress-NG

Test: Fractal Generator

Stress-NG

Test: Context Switching

Stress-NG

Test: Wide Vector Math

Stress-NG

Test: Logarithmic Math

Stress-NG

Test: Jpeg Compression

Stress-NG

Test: Exponential Math

Stress-NG

Test: Socket Activity

Stress-NG

Test: Mixed Scheduler

Stress-NG

Test: Vector Shuffle

Stress-NG

Test: Memory Copying

Stress-NG

Test: Matrix 3D Math

Stress-NG

Test: Floating Point

Stress-NG

Test: x86_64 RdRand

Stress-NG

Test: Function Call

Stress-NG

Test: Integer Math

Stress-NG

Test: AVX-512 VNNI

Stress-NG

Test: Vector Math

Stress-NG

Test: Matrix Math

Stress-NG

Test: Semaphores

Stress-NG

Test: Power Math

Stress-NG

Test: CPU Stress

Stress-NG

Test: CPU Cache

Stress-NG

Test: SENDFILE

Stress-NG

Test: AVL Tree

Stress-NG

Test: Pthread

Stress-NG

Test: Forking

Stress-NG

Test: Cloning

Stress-NG

Test: Malloc

Stress-NG

Test: Atomic

Stress-NG

Test: Mutex

Stress-NG

Test: MEMFD

Stress-NG

Test: Futex

Stress-NG

Test: Zlib

Stress-NG

Test: Poll

Stress-NG

Test: Pipe

Stress-NG

Test: NUMA

Stress-NG

Test: MMAP

Stress-NG

Test: Hash

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152

PyTorch

Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50

PyTorch

Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l

PyTorch

Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l

PyTorch

Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l

PyTorch

Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l

PyTorch

Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l

PyTorch

Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l

PyTorch

Device: CPU - Batch Size: 512 - Model: ResNet-152

PyTorch

Device: CPU - Batch Size: 256 - Model: ResNet-152

PyTorch

Device: CPU - Batch Size: 64 - Model: ResNet-152

PyTorch

Device: CPU - Batch Size: 512 - Model: ResNet-50

PyTorch

Device: CPU - Batch Size: 32 - Model: ResNet-152

PyTorch

Device: CPU - Batch Size: 256 - Model: ResNet-50

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-152

PyTorch

Device: CPU - Batch Size: 64 - Model: ResNet-50

PyTorch

Device: CPU - Batch Size: 32 - Model: ResNet-50

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-50

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-152

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-50

TensorFlow Lite

Model: Mobilenet Quant

TensorFlow Lite

Model: Mobilenet Float

TensorFlow Lite

Model: NASNet Mobile

TensorFlow Lite

Model: Inception V4

TensorFlow Lite

Model: SqueezeNet

LiteRT

Model: Quantized COCO SSD MobileNet v1

LiteRT

Model: Inception ResNet V2

LiteRT

Model: Mobilenet Float

LiteRT

Model: NASNet Mobile

LiteRT

Model: Inception V4

LiteRT

Model: SqueezeNet

LiteRT

Model: DeepLab V3

Intel MPI Benchmarks

Test: IMB-MPI1 Sendrecv

Intel MPI Benchmarks

Test: IMB-MPI1 Sendrecv

Intel MPI Benchmarks

Test: IMB-MPI1 Exchange

Intel MPI Benchmarks

Test: IMB-MPI1 Exchange

Intel MPI Benchmarks

Test: IMB-P2P PingPong

R Benchmark

Cython Benchmark

Test: N-Queens

Numpy Benchmark

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

oneDNN

Harness: IP Shapes 3D - Engine: CPU

oneDNN

Harness: IP Shapes 1D - Engine: CPU

Timed LLVM Compilation

Build System: Unix Makefiles

Timed LLVM Compilation

Build System: Ninja

Timed Linux Kernel Compilation

Build: allmodconfig

Timed Linux Kernel Compilation

Build: defconfig

Timed GCC Compilation

Time To Compile

Epoch

Epoch3D Deck: Cone

Timed MrBayes Analysis

Primate Phylogeny Analysis

NAS Parallel Benchmarks

Test / Class: SP.C

NAS Parallel Benchmarks

Test / Class: SP.B

NAS Parallel Benchmarks

Test / Class: MG.C

NAS Parallel Benchmarks

Test / Class: LU.C

NAS Parallel Benchmarks

Test / Class: IS.D

NAS Parallel Benchmarks

Test / Class: FT.C

NAS Parallel Benchmarks

Test / Class: EP.D

NAS Parallel Benchmarks

Test / Class: EP.C

NAS Parallel Benchmarks

Test / Class: CG.C

NAS Parallel Benchmarks

Test / Class: BT.C

HPL Linpack

Glibc Benchmarks

Benchmark: pthread_once

Glibc Benchmarks

Benchmark: sincos

Glibc Benchmarks

Benchmark: ffsll

Glibc Benchmarks

Benchmark: atanh

Glibc Benchmarks

Benchmark: asinh

Glibc Benchmarks

Benchmark: tanh

Glibc Benchmarks

Benchmark: sqrt

Glibc Benchmarks

Benchmark: sinh

Glibc Benchmarks

Benchmark: modf

Glibc Benchmarks

Benchmark: log2

Glibc Benchmarks

Benchmark: sin

Glibc Benchmarks

Benchmark: pow

Glibc Benchmarks

Benchmark: ffs

Glibc Benchmarks

Benchmark: exp

Glibc Benchmarks

Benchmark: cos

Compile Bench

Test: Read Compiled Tree

Compile Bench

Test: Initial Create

Compile Bench

Test: Compile

Llama.cpp

Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512