Nvidia

KVM testing on Ubuntu 24.04 via the Phoronix Test Suite.

ASPEED - 2 x Intel Xeon Gold 6226R

Processor: 2 x Intel Xeon Gold 6226R @ 3.90GHz (32 Cores / 64 Threads), Motherboard: (5.14 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 512GB, Disk: 2 x 8002GB INTEL SSDPE2KX080T8, Graphics: ASPEED 16GB, Audio: NVIDIA GA104 HD Audio, Monitor: 27B2G5, Network: 2 x Intel X722 for 1GbE + 2 x Broadcom BCM57414 NetXtreme-E 10Gb/25Gb

OS: Ubuntu 24.04, Kernel: 6.8.0-38-generic (x86_64), Display Server: X Server, Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.131, Compiler: GCC 13.2.0 + CUDA 12.4, File-System: ext4, Screen Resolution: 1920x1080

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x5003605
Graphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.04.57.00.08
Python Notes: Python 3.8.13
Security Notes: gather_data_sampling: Mitigation of Microcode + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + reg_file_data_sampling: Not affected + retbleed: Mitigation of Enhanced IBRS + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: SW loop KVM: SW loop + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled

5x A5000 kw-dl580-3-4 NVIDIA

Processor: 4 x Intel Xeon E7-4880 v2 (60 Cores / 120 Threads), Motherboard: QEMU Standard PC (Q35 + ICH9 2009) (edk2-20240813-1.fc40 BIOS), Chipset: Intel 82G33/G31/P35/P31 + ICH9, Memory: 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 4 GB RAM, Disk: 21GB VIRTUAL-DISK, Graphics: Red Hat QXL paravirtual graphic card 22GB, Audio: QEMU Generic, Network: 2 x Red Hat Virtio 1.0 device

OS: Ubuntu 24.04, Kernel: 6.8.0-45-generic (x86_64), Display Server: X Server, Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.131, Compiler: GCC 13.2.0 + CUDA 12.0, File-System: ext4, Screen Resolution: 1024x768, System Layer: KVM

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: CPU Microcode: 0x715
Graphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.6d.00.0d
Python Notes: Python 3.12.3
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion; VMX: flush not necessary SMT vulnerable + mds: Mitigation of Clear buffers; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines; IBPB: conditional; IBRS_FW; STIBP: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Retpoline + srbds: Not affected + tsx_async_abort: Not affected

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

FP16: No - Mode: Training - Network: Mobilenet - Device: OpenCL

ASPEED - 2 x Intel Xeon Gold 6226R: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

NeatBench

NeatBench is a benchmark of the cross-platform Neat Video software on the CPU and optional GPU (OpenCL / CUDA) support. Learn more via the OpenBenchmarking.org test page.

Acceleration: GPU

ASPEED - 2 x Intel Xeon Gold 6226R: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.

5x A5000 kw-dl580-3-4 NVIDIA: The test run did not produce a result.

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

Target: OpenCL - Benchmark: Triad

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: Reduction

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: Bus Speed Download

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: Bus Speed Readback

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: Texture Read Bandwidth

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

cl-mem

A basic OpenCL memory benchmark. Learn more via the OpenBenchmarking.org test page.

ViennaCL

ViennaCL is an open-source linear algebra library written in C++ and with support for OpenCL and OpenMP. This test profile makes use of ViennaCL's built-in benchmarks. Learn more via the OpenBenchmarking.org test page.

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

Mixbench

A benchmark suite for GPUs on mixed operational intensity kernels. Learn more via the OpenBenchmarking.org test page.

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: FFT SP

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: GEMM SGEMM_N

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: Max SP Flops

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

ViennaCL

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Mixbench

A benchmark suite for GPUs on mixed operational intensity kernels. Learn more via the OpenBenchmarking.org test page.

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

LuxCoreRender

LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

Backend: OpenCL

ASPEED - 2 x Intel Xeon Gold 6226R: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

FAHBench

FAHBench is a Folding@Home benchmark on the GPU. Learn more via the OpenBenchmarking.org test page.

MandelGPU

MandelGPU is an OpenCL benchmark and this test runs with the OpenCL rendering float4 kernel with a maximum of 4096 iterations. Learn more via the OpenBenchmarking.org test page.

OpenCL Device: GPU

ASPEED - 2 x Intel Xeon Gold 6226R: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status.

ArrayFire

Test: Conjugate Gradient OpenCL

ASPEED - 2 x Intel Xeon Gold 6226R: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result. E: ./arrayfire: 7: ./cg_opencl: not found

5x A5000 kw-dl580-3-4 NVIDIA: The test run did not produce a result. E: ./arrayfire: 7: ./cg_opencl: not found

FinanceBench

FinanceBench is a collection of financial program benchmarks with support for benchmarking on the GPU via OpenCL and CPU benchmarking with OpenMP. The FinanceBench test cases are focused on Black-Sholes-Merton Process with Analytic European Option engine, QMC (Sobol) Monte-Carlo method (Equity Option Example), Bonds Fixed-rate bond with flat forward curve, and Repo Securities repurchase agreement. FinanceBench was originally written by the Cavazos Lab at University of Delaware. Learn more via the OpenBenchmarking.org test page.

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

RedShift Demo

This is a test of MAXON's RedShift demo build that currently requires NVIDIA GPU acceleration. Learn more via the OpenBenchmarking.org test page.

ASPEED - 2 x Intel Xeon Gold 6226R: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./redshift: 3: /usr/redshift/bin/redshiftBenchmark: not found

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./redshift: 3: /usr/redshift/bin/redshiftBenchmark: not found

Rodinia

Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.

79 Results Shown

PlaidML:
No - Inference - IMDB LSTM - OpenCL
No - Inference - Mobilenet - OpenCL
Yes - Inference - Mobilenet - OpenCL
No - Inference - DenseNet 201 - OpenCL
SHOC Scalable HeterOgeneous Computing:
OpenCL - Triad
OpenCL - Reduction
OpenCL - Bus Speed Download
OpenCL - Bus Speed Readback
OpenCL - Texture Read Bandwidth
cl-mem:
Copy
Read
Write
ViennaCL:
CPU BLAS - sCOPY
CPU BLAS - sAXPY
CPU BLAS - sDOT
CPU BLAS - dCOPY
CPU BLAS - dAXPY
CPU BLAS - dDOT
CPU BLAS - dGEMV-N
CPU BLAS - dGEMV-T
OpenCL BLAS - sCOPY
OpenCL BLAS - sAXPY
OpenCL BLAS - sDOT
OpenCL BLAS - dCOPY
OpenCL BLAS - dAXPY
OpenCL BLAS - dDOT
OpenCL BLAS - dGEMV-N
OpenCL BLAS - dGEMV-T
clpeak
Mixbench:
OpenCL - Double Precision
OpenCL - Single Precision
SHOC Scalable HeterOgeneous Computing:
OpenCL - S3D
OpenCL - FFT SP
OpenCL - GEMM SGEMM_N
OpenCL - Max SP Flops
clpeak:
Single-Precision Float
Double-Precision Double
ViennaCL:
CPU BLAS - dGEMM-NN
CPU BLAS - dGEMM-NT
CPU BLAS - dGEMM-TN
CPU BLAS - dGEMM-TT
OpenCL BLAS - dGEMM-NN
OpenCL BLAS - dGEMM-NT
OpenCL BLAS - dGEMM-TT
OpenCL BLAS - dGEMM-TN
SHOC Scalable HeterOgeneous Computing
Mixbench
clpeak
Hashcat:
MD5
SHA1
7-Zip
SHA-512
TrueCrypt RIPEMD160 + XTS
LuxCoreRender:
DLSC - GPU
Danish Mood - GPU
Orange Juice - GPU
LuxCore Benchmark - GPU
Rainbow Colors and Prism - GPU
FAHBench
FinanceBench
NCNN:
Vulkan GPU - mobilenet
Vulkan GPU-v2-v2 - mobilenet-v2
Vulkan GPU-v3-v3 - mobilenet-v3
Vulkan GPU - shufflenet-v2
Vulkan GPU - mnasnet
Vulkan GPU - efficientnet-b0
Vulkan GPU - blazeface
Vulkan GPU - googlenet
Vulkan GPU - vgg16
Vulkan GPU - resnet18
Vulkan GPU - alexnet
Vulkan GPU - resnet50
Vulkan GPU - yolov4-tiny
Vulkan GPU - squeezenet_ssd
Vulkan GPU - regnety_400m
Vulkan GPU - vision_transformer
Vulkan GPU - FastestDet
Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
Rodinia

ASPEED - 2 x Intel Xeon Gold 6226R

Testing initiated at 19 July 2024 23:42 by user malogica.

5x A5000 kw-dl580-3-4 NVIDIA

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: CPU Microcode: 0x715
Graphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.6d.00.0d
Python Notes: Python 3.12.3
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion; VMX: flush not necessary SMT vulnerable + mds: Mitigation of Clear buffers; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines; IBPB: conditional; IBRS_FW; STIBP: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Retpoline + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 7 October 2024 22:58 by user root.

Nvidia

View

Statistics

Graph Settings

Additional Graphs

Multi-Way Comparison

Table

Run Management

ASPEED - 2 x Intel Xeon Gold 6226R

5x A5000 kw-dl580-3-4 NVIDIA

PlaidML

NeatBench

SHOC Scalable HeterOgeneous Computing

cl-mem

ViennaCL

clpeak

Mixbench

SHOC Scalable HeterOgeneous Computing

clpeak

ViennaCL

SHOC Scalable HeterOgeneous Computing

Mixbench

clpeak

Hashcat

LuxCoreRender

LeelaChessZero

FAHBench

MandelGPU

ArrayFire

FinanceBench

NCNN

RedShift Demo

Rodinia

79 Results Shown

ASPEED - 2 x Intel Xeon Gold 6226R

5x A5000 kw-dl580-3-4 NVIDIA