Nvidia

KVM testing on Ubuntu 24.04 via the Phoronix Test Suite.

ASPEED - 2 x Intel Xeon Gold 6226R

Processor: 2 x Intel Xeon Gold 6226R @ 3.90GHz (32 Cores / 64 Threads), Motherboard: (5.14 BIOS), Chipset: Intel Sky Lake-E DMI3 Registers, Memory: 512GB, Disk: 2 x 8002GB INTEL SSDPE2KX080T8, Graphics: ASPEED 16GB, Audio: NVIDIA GA104 HD Audio, Monitor: 27B2G5, Network: 2 x Intel X722 for 1GbE + 2 x Broadcom BCM57414 NetXtreme-E 10Gb/25Gb

OS: Ubuntu 24.04, Kernel: 6.8.0-38-generic (x86_64), Display Server: X Server, Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.131, Compiler: GCC 13.2.0 + CUDA 12.4, File-System: ext4, Screen Resolution: 1920x1080

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x5003605
Graphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.04.57.00.08
Python Notes: Python 3.8.13
Security Notes: gather_data_sampling: Mitigation of Microcode + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + reg_file_data_sampling: Not affected + retbleed: Mitigation of Enhanced IBRS + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: SW loop KVM: SW loop + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled

5x A5000 kw-dl580-3-4 NVIDIA

Processor: 4 x Intel Xeon E7-4880 v2 (60 Cores / 120 Threads), Motherboard: QEMU Standard PC (Q35 + ICH9 2009) (edk2-20240813-1.fc40 BIOS), Chipset: Intel 82G33/G31/P35/P31 + ICH9, Memory: 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 4 GB RAM, Disk: 21GB VIRTUAL-DISK, Graphics: Red Hat QXL paravirtual graphic card 22GB, Audio: QEMU Generic, Network: 2 x Red Hat Virtio 1.0 device

OS: Ubuntu 24.04, Kernel: 6.8.0-45-generic (x86_64), Display Server: X Server, Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.4.131, Compiler: GCC 13.2.0 + CUDA 12.0, File-System: ext4, Screen Resolution: 1024x768, System Layer: KVM

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: CPU Microcode: 0x715
Graphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.6d.00.0d
Python Notes: Python 3.12.3
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion; VMX: flush not necessary SMT vulnerable + mds: Mitigation of Clear buffers; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines; IBPB: conditional; IBRS_FW; STIBP: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Retpoline + srbds: Not affected + tsx_async_abort: Not affected

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

SHOC Scalable HeterOgeneous Computing

The CUDA and OpenCL version of Vetter's Scalable HeterOgeneous Computing benchmark suite. SHOC provides a number of different benchmark programs for evaluating the performance and stability of compute devices. Learn more via the OpenBenchmarking.org test page.

Target: OpenCL - Benchmark: Max SP Flops

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

LuxCoreRender

LuxCoreRender is an open-source 3D physically based renderer formerly known as LuxRender. LuxCoreRender supports CPU-based rendering as well as GPU acceleration via OpenCL, NVIDIA CUDA, and NVIDIA OptiX interfaces. Learn more via the OpenBenchmarking.org test page.

NeatBench

NeatBench is a benchmark of the cross-platform Neat Video software on the CPU and optional GPU (OpenCL / CUDA) support. Learn more via the OpenBenchmarking.org test page.

Acceleration: GPU

ASPEED - 2 x Intel Xeon Gold 6226R: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result.

5x A5000 kw-dl580-3-4 NVIDIA: The test run did not produce a result.

FAHBench

FAHBench is a Folding@Home benchmark on the GPU. Learn more via the OpenBenchmarking.org test page.

ViennaCL

ViennaCL is an open-source linear algebra library written in C++ and with support for OpenCL and OpenMP. This test profile makes use of ViennaCL's built-in benchmarks. Learn more via the OpenBenchmarking.org test page.

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

LuxCoreRender

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

FP16: No - Mode: Training - Network: Mobilenet - Device: OpenCL

ASPEED - 2 x Intel Xeon Gold 6226R: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

ViennaCL

Hashcat

Hashcat is an open-source, advanced password recovery tool supporting GPU acceleration with OpenCL, NVIDIA CUDA, and Radeon ROCm. Learn more via the OpenBenchmarking.org test page.

ViennaCL

LuxCoreRender

Rodinia

Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes select OpenCL, NVIDIA CUDA and OpenMP test binaries at the moment. Learn more via the OpenBenchmarking.org test page.

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

PlaidML

This test profile uses PlaidML deep learning framework developed by Intel for offering up various benchmarks. Learn more via the OpenBenchmarking.org test page.

FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ModuleNotFoundError: No module named 'tensorflow'

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

Mixbench

A benchmark suite for GPUs on mixed operational intensity kernels. Learn more via the OpenBenchmarking.org test page.

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: S3D

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

cl-mem

A basic OpenCL memory benchmark. Learn more via the OpenBenchmarking.org test page.

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

clpeak

Clpeak is designed to test the peak capabilities of OpenCL devices. Learn more via the OpenBenchmarking.org test page.

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: Reduction

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

FinanceBench

FinanceBench is a collection of financial program benchmarks with support for benchmarking on the GPU via OpenCL and CPU benchmarking with OpenMP. The FinanceBench test cases are focused on Black-Sholes-Merton Process with Analytic European Option engine, QMC (Sobol) Monte-Carlo method (Equity Option Example), Bonds Fixed-rate bond with flat forward curve, and Repo Securities repurchase agreement. FinanceBench was originally written by the Cavazos Lab at University of Delaware. Learn more via the OpenBenchmarking.org test page.

MandelGPU

MandelGPU is an OpenCL benchmark and this test runs with the OpenCL rendering float4 kernel with a maximum of 4096 iterations. Learn more via the OpenBenchmarking.org test page.

OpenCL Device: GPU

ASPEED - 2 x Intel Xeon Gold 6226R: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status.

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status.

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

Target: OpenCL - Benchmark: Bus Speed Download

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./shoc: 3: ./bin/shocdriver: not found

ArrayFire

Test: Conjugate Gradient OpenCL

ASPEED - 2 x Intel Xeon Gold 6226R: The test run did not produce a result. The test run did not produce a result. The test run did not produce a result. E: ./arrayfire: 7: ./cg_opencl: not found

5x A5000 kw-dl580-3-4 NVIDIA: The test run did not produce a result. E: ./arrayfire: 7: ./cg_opencl: not found

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

Backend: OpenCL

ASPEED - 2 x Intel Xeon Gold 6226R: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./lczero: line 4: ./lc0: No such file or directory

RedShift Demo

This is a test of MAXON's RedShift demo build that currently requires NVIDIA GPU acceleration. Learn more via the OpenBenchmarking.org test page.

ASPEED - 2 x Intel Xeon Gold 6226R: The test quit with a non-zero exit status. The test quit with a non-zero exit status. The test quit with a non-zero exit status. E: ./redshift: 3: /usr/redshift/bin/redshiftBenchmark: not found

5x A5000 kw-dl580-3-4 NVIDIA: The test quit with a non-zero exit status. E: ./redshift: 3: /usr/redshift/bin/redshiftBenchmark: not found

79 Results Shown

NCNN
SHOC Scalable HeterOgeneous Computing
NCNN:
Vulkan GPU - FastestDet
Vulkan GPU - vision_transformer
Vulkan GPU - regnety_400m
Vulkan GPU - squeezenet_ssd
Vulkan GPU - yolov4-tiny
Vulkan GPU - resnet50
Vulkan GPU - alexnet
Vulkan GPU - resnet18
Vulkan GPU - vgg16
Vulkan GPU - googlenet
Vulkan GPU - blazeface
Vulkan GPU - efficientnet-b0
Vulkan GPU - mnasnet
Vulkan GPU - shufflenet-v2
Vulkan GPU-v3-v3 - mobilenet-v3
Vulkan GPU-v2-v2 - mobilenet-v2
Vulkan GPU - mobilenet
LuxCoreRender:
DLSC - GPU
LuxCore Benchmark - GPU
Orange Juice - GPU
FAHBench
ViennaCL:
CPU BLAS - dGEMM-TT
CPU BLAS - dGEMM-TN
CPU BLAS - dGEMM-NT
CPU BLAS - dGEMM-NN
CPU BLAS - dGEMV-T
CPU BLAS - dGEMV-N
CPU BLAS - dDOT
CPU BLAS - dAXPY
CPU BLAS - dCOPY
CPU BLAS - sDOT
CPU BLAS - sAXPY
CPU BLAS - sCOPY
Hashcat
LuxCoreRender
PlaidML
Hashcat:
SHA1
SHA-512
ViennaCL
Hashcat:
TrueCrypt RIPEMD160 + XTS
7-Zip
ViennaCL:
OpenCL BLAS - dGEMM-TT
OpenCL BLAS - dGEMM-NT
OpenCL BLAS - dGEMM-NN
OpenCL BLAS - dGEMV-T
OpenCL BLAS - dGEMV-N
OpenCL BLAS - dDOT
OpenCL BLAS - dAXPY
OpenCL BLAS - dCOPY
OpenCL BLAS - sDOT
OpenCL BLAS - sAXPY
OpenCL BLAS - sCOPY
LuxCoreRender
Rodinia
SHOC Scalable HeterOgeneous Computing
PlaidML
clpeak
PlaidML:
Yes - Inference - Mobilenet - OpenCL
No - Inference - Mobilenet - OpenCL
clpeak
Mixbench:
OpenCL - Integer
OpenCL - Single Precision
OpenCL - Double Precision
SHOC Scalable HeterOgeneous Computing:
OpenCL - GEMM SGEMM_N
OpenCL - S3D
cl-mem:
Copy
Read
Write
SHOC Scalable HeterOgeneous Computing
clpeak
SHOC Scalable HeterOgeneous Computing
clpeak
SHOC Scalable HeterOgeneous Computing:
OpenCL - Bus Speed Readback
OpenCL - Reduction
FinanceBench
SHOC Scalable HeterOgeneous Computing:
OpenCL - MD5 Hash
OpenCL - Bus Speed Download

ASPEED - 2 x Intel Xeon Gold 6226R

Testing initiated at 19 July 2024 23:42 by user malogica.

5x A5000 kw-dl580-3-4 NVIDIA

Kernel Notes: Transparent Huge Pages: madvise
Compiler Notes: --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v
Processor Notes: CPU Microcode: 0x715
Graphics Notes: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.6d.00.0d
Python Notes: Python 3.12.3
Security Notes: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion; VMX: flush not necessary SMT vulnerable + mds: Mitigation of Clear buffers; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines; IBPB: conditional; IBRS_FW; STIBP: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Retpoline + srbds: Not affected + tsx_async_abort: Not affected

Testing initiated at 7 October 2024 22:58 by user root.

Nvidia

View

Statistics

Graph Settings

Additional Graphs

Multi-Way Comparison

Table

Run Management

ASPEED - 2 x Intel Xeon Gold 6226R

5x A5000 kw-dl580-3-4 NVIDIA

NCNN

SHOC Scalable HeterOgeneous Computing

NCNN

LuxCoreRender

NeatBench

FAHBench

ViennaCL

Hashcat

LuxCoreRender

PlaidML

Hashcat

PlaidML

Hashcat

ViennaCL

Hashcat

ViennaCL

LuxCoreRender

Rodinia

SHOC Scalable HeterOgeneous Computing

PlaidML

clpeak

PlaidML

clpeak

Mixbench

SHOC Scalable HeterOgeneous Computing

cl-mem

SHOC Scalable HeterOgeneous Computing

clpeak

SHOC Scalable HeterOgeneous Computing

clpeak

SHOC Scalable HeterOgeneous Computing

FinanceBench

MandelGPU

SHOC Scalable HeterOgeneous Computing

ArrayFire

LeelaChessZero

RedShift Demo

79 Results Shown

ASPEED - 2 x Intel Xeon Gold 6226R

5x A5000 kw-dl580-3-4 NVIDIA