a40-ml

KVM testing on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2412124-NE-A40ML481010&grs.

a40-mlProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDisplay DriverVulkanCompilerFile-SystemScreen ResolutionSystem LayerNVIDIA A40 - 80 x Intel Xeon80 x Intel Xeon (Icelake) (80 Cores)Nutanix AHV (0.0.0 BIOS)Intel 440FX 82441FX PMC8 x 16 GB RAM Red Hat8796GB VDISKNVIDIA A40 48GBRed Hat Virtio deviceUbuntu 22.046.5.0-45-generic (x86_64)NVIDIA1.3.255GCC 11.4.0ext41280x1024KVMOpenBenchmarking.org- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - CPU Microcode: 0x1- ??.??.??.??.??- Python 3.10.12- gather_data_sampling: Unknown: Dependent on hypervisor status + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: Syscall hardening KVM: SW loop + srbds: Not affected + tsx_async_abort: Not affected

a40-mlshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Reductionshoc: OpenCL - MD5 Hashshoc: OpenCL - FFT SPshoc: OpenCL - Triadshoc: OpenCL - S3Dtensorflow-lite: Mobilenet Quanttensorflow-lite: Mobilenet Floattensorflow-lite: NASNet Mobiletensorflow-lite: Inception V4tensorflow-lite: SqueezeNetlitert: Quantized COCO SSD MobileNet v1litert: Inception ResNet V2litert: Mobilenet Quantlitert: Mobilenet Floatlitert: NASNet Mobilelitert: Inception V4litert: SqueezeNetlitert: DeepLab V3rnnoise: 26 Minute Long Talking Samplerbenchmark: deepspeech: CPUnumpy: onednn: Recurrent Neural Network Inference - CPUonednn: Recurrent Neural Network Training - CPUonednn: Deconvolution Batch shapes_3d - CPUonednn: Convolution Batch Shapes Auto - CPUonednn: IP Shapes 3D - CPUonednn: IP Shapes 1D - CPUshoc: OpenCL - Texture Read Bandwidthshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Max SP Flopstensorflow-lite: Inception ResNet V2onednn: Deconvolution Batch shapes_1d - CPUlczero: BLASNVIDIA A40 - 80 x Intel Xeon6124.24334.47841.27091822.0424.4174347.3373085.491857.4432777.421435.02726.293369.0225768.91719.851791.2537733.521955.22809.084389.5816.9220.161898.09820359.47766.9061129.721.323303.116801.618570.7862271935.6726.402225.333037081.340922.87.21383OpenBenchmarking.org

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NNVIDIA A40 - 80 x Intel Xeon13002600390052006500SE +/- 27.83, N = 36124.241. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Reduction

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionNVIDIA A40 - 80 x Intel Xeon70140210280350SE +/- 0.54, N = 3334.481. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashNVIDIA A40 - 80 x Intel Xeon918273645SE +/- 0.00, N = 341.271. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPNVIDIA A40 - 80 x Intel Xeon400800120016002000SE +/- 0.32, N = 31822.041. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadNVIDIA A40 - 80 x Intel Xeon612182430SE +/- 0.01, N = 324.421. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DNVIDIA A40 - 80 x Intel Xeon80160240320400SE +/- 0.20, N = 3347.341. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

TensorFlow Lite

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Mobilenet QuantNVIDIA A40 - 80 x Intel Xeon7001400210028003500SE +/- 31.88, N = 53085.49

TensorFlow Lite

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Mobilenet FloatNVIDIA A40 - 80 x Intel Xeon400800120016002000SE +/- 8.77, N = 31857.44

TensorFlow Lite

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: NASNet MobileNVIDIA A40 - 80 x Intel Xeon7K14K21K28K35KSE +/- 229.61, N = 1232777.4

TensorFlow Lite

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Inception V4NVIDIA A40 - 80 x Intel Xeon5K10K15K20K25KSE +/- 92.43, N = 321435.0

TensorFlow Lite

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: SqueezeNetNVIDIA A40 - 80 x Intel Xeon6001200180024003000SE +/- 15.01, N = 32726.29

LiteRT

Model: Quantized COCO SSD MobileNet v1

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Quantized COCO SSD MobileNet v1NVIDIA A40 - 80 x Intel Xeon7001400210028003500SE +/- 6.32, N = 33369.02

LiteRT

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception ResNet V2NVIDIA A40 - 80 x Intel Xeon6K12K18K24K30KSE +/- 190.27, N = 325768.9

LiteRT

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet QuantNVIDIA A40 - 80 x Intel Xeon400800120016002000SE +/- 18.75, N = 31719.85

LiteRT

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet FloatNVIDIA A40 - 80 x Intel Xeon400800120016002000SE +/- 17.09, N = 31791.25

LiteRT

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: NASNet MobileNVIDIA A40 - 80 x Intel Xeon8K16K24K32K40KSE +/- 372.70, N = 337733.5

LiteRT

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception V4NVIDIA A40 - 80 x Intel Xeon5K10K15K20K25KSE +/- 210.50, N = 321955.2

LiteRT

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: SqueezeNetNVIDIA A40 - 80 x Intel Xeon6001200180024003000SE +/- 23.23, N = 32809.08

LiteRT

Model: DeepLab V3

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: DeepLab V3NVIDIA A40 - 80 x Intel Xeon9001800270036004500SE +/- 50.26, N = 34389.58

RNNoise

Input: 26 Minute Long Talking Sample

OpenBenchmarking.orgSeconds, Fewer Is BetterRNNoise 0.2Input: 26 Minute Long Talking SampleNVIDIA A40 - 80 x Intel Xeon48121620SE +/- 0.17, N = 1516.921. (CC) gcc options: -O2 -pedantic -fvisibility=hidden

R Benchmark

OpenBenchmarking.orgSeconds, Fewer Is BetterR BenchmarkNVIDIA A40 - 80 x Intel Xeon0.03640.07280.10920.14560.182SE +/- 0.0011, N = 30.16181. R scripting front-end version 4.1.2 (2021-11-01)

DeepSpeech

Acceleration: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterDeepSpeech 0.6Acceleration: CPUNVIDIA A40 - 80 x Intel Xeon20406080100SE +/- 0.53, N = 398.10

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkNVIDIA A40 - 80 x Intel Xeon80160240320400SE +/- 1.09, N = 3359.47

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUNVIDIA A40 - 80 x Intel Xeon170340510680850SE +/- 5.38, N = 15766.91MIN: 692.951. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUNVIDIA A40 - 80 x Intel Xeon2004006008001000SE +/- 6.18, N = 31129.72MIN: 1084.91. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUNVIDIA A40 - 80 x Intel Xeon0.29770.59540.89311.19081.4885SE +/- 0.01177, N = 31.32330MIN: 1.191. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUNVIDIA A40 - 80 x Intel Xeon0.70131.40262.10392.80523.5065SE +/- 0.03515, N = 33.11680MIN: 2.861. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

oneDNN

Harness: IP Shapes 3D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUNVIDIA A40 - 80 x Intel Xeon0.36420.72841.09261.45681.821SE +/- 0.02310, N = 31.61857MIN: 1.471. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

oneDNN

Harness: IP Shapes 1D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUNVIDIA A40 - 80 x Intel Xeon0.17690.35380.53070.70760.8845SE +/- 0.003359, N = 30.786227MIN: 0.711. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthNVIDIA A40 - 80 x Intel Xeon400800120016002000SE +/- 1.74, N = 31935.671. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackNVIDIA A40 - 80 x Intel Xeon612182430SE +/- 0.00, N = 326.401. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadNVIDIA A40 - 80 x Intel Xeon612182430SE +/- 0.00, N = 325.331. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP FlopsNVIDIA A40 - 80 x Intel Xeon8K16K24K32K40KSE +/- 0.19, N = 337081.31. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

TensorFlow Lite

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterTensorFlow Lite 2022-05-18Model: Inception ResNet V2NVIDIA A40 - 80 x Intel Xeon9K18K27K36K45KSE +/- 4628.73, N = 1240922.8

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUNVIDIA A40 - 80 x Intel Xeon246810SE +/- 0.13056, N = 157.21383MIN: 5.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl


Phoronix Test Suite v10.8.5