a40-ml KVM testing on Ubuntu 22.04 via the Phoronix Test Suite. NVIDIA A40 - 80 x Intel Xeon: Processor: 80 x Intel Xeon (Icelake) (80 Cores), Motherboard: Nutanix AHV (0.0.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 8 x 16 GB RAM Red Hat, Disk: 8796GB VDISK, Graphics: NVIDIA A40 48GB, Network: Red Hat Virtio device OS: Ubuntu 22.04, Kernel: 6.5.0-45-generic (x86_64), Display Driver: NVIDIA, Vulkan: 1.3.255, Compiler: GCC 11.4.0, File-System: ext4, Screen Resolution: 1280x1024, System Layer: KVM SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N GFLOPS > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 6124.24 |======================================= SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction GB/s > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 334.48 |======================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash GHash/s > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 41.27 |========================================= SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP GFLOPS > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 1822.04 |======================================= SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad GB/s > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 24.42 |========================================= SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D GFLOPS > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 347.34 |======================================== TensorFlow Lite 2022-05-18 Model: Mobilenet Quant Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 3085.49 |======================================= TensorFlow Lite 2022-05-18 Model: Mobilenet Float Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 1857.44 |======================================= TensorFlow Lite 2022-05-18 Model: NASNet Mobile Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 32777.4 |======================================= TensorFlow Lite 2022-05-18 Model: Inception V4 Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 21435.0 |======================================= TensorFlow Lite 2022-05-18 Model: SqueezeNet Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 2726.29 |======================================= LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 3369.02 |======================================= LiteRT 2024-10-15 Model: Inception ResNet V2 Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 25768.9 |======================================= LiteRT 2024-10-15 Model: Mobilenet Quant Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 1719.85 |======================================= LiteRT 2024-10-15 Model: Mobilenet Float Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 1791.25 |======================================= LiteRT 2024-10-15 Model: NASNet Mobile Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 37733.5 |======================================= LiteRT 2024-10-15 Model: Inception V4 Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 21955.2 |======================================= LiteRT 2024-10-15 Model: SqueezeNet Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 2809.08 |======================================= LiteRT 2024-10-15 Model: DeepLab V3 Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 4389.58 |======================================= RNNoise 0.2 Input: 26 Minute Long Talking Sample Seconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 16.92 |========================================= R Benchmark Seconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 0.1618 |======================================== DeepSpeech 0.6 Acceleration: CPU Seconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 98.10 |========================================= Numpy Benchmark Score > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 359.47 |======================================== oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU ms < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 766.91 |======================================== oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU ms < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 1129.72 |======================================= oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU ms < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 1.32330 |======================================= oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU ms < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 3.11680 |======================================= oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU ms < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 1.61857 |======================================= oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU ms < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 0.786227 |====================================== SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth GB/s > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 1935.67 |======================================= SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback GB/s > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 26.40 |========================================= SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download GB/s > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 25.33 |========================================= SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops GFLOPS > Higher Is Better NVIDIA A40 - 80 x Intel Xeon . 37081.3 |======================================= TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: GoogLeNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 512 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 256 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 512 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 256 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 64 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 32 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 16 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 64 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 32 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 16 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: AlexNet images/sec > Higher Is Better TensorFlow 2.16.1 Device: GPU - Batch Size: 1 - Model: VGG-16 images/sec > Higher Is Better TensorFlow 2.16.1 Device: CPU - Batch Size: 1 - Model: VGG-16 images/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 512 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 256 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 64 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 32 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 16 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: NVIDIA CUDA GPU - Batch Size: 1 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 512 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 64 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 batches/sec > Higher Is Better PyTorch 2.2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 batches/sec > Higher Is Better TensorFlow Lite 2022-05-18 Model: Inception ResNet V2 Microseconds < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 40922.8 |======================================= oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU ms < Lower Is Better NVIDIA A40 - 80 x Intel Xeon . 7.21383 |======================================= LeelaChessZero 0.31.1 Backend: BLAS Nodes Per Second > Higher Is Better