Nvidia

KVM testing on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2410088-NE-2407208NE20&rdt&grt.

NvidiaProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDisplay ServerDisplay DriverOpenCLCompilerFile-SystemScreen ResolutionSystem LayerASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA2 x Intel Xeon Gold 6226R @ 3.90GHz (32 Cores / 64 Threads)(5.14 BIOS)Intel Sky Lake-E DMI3 Registers512GB2 x 8002GB INTEL SSDPE2KX080T8ASPEED 16GBNVIDIA GA104 HD Audio27B2G52 x Intel X722 for 1GbE + 2 x Broadcom BCM57414 NetXtreme-E 10Gb/25GbUbuntu 24.046.8.0-38-generic (x86_64)X ServerNVIDIAOpenCL 3.0 CUDA 12.4.131GCC 13.2.0 + CUDA 12.4ext41920x10804 x Intel Xeon E7-4880 v2 (60 Cores / 120 Threads)QEMU Standard PC (Q35 + ICH9 2009) (edk2-20240813-1.fc40 BIOS)Intel 82G33/G31/P35/P31 + ICH916 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 16 GB + 4 GB RAM21GB VIRTUAL-DISKRed Hat QXL paravirtual graphic card 22GBQEMU Generic2 x Red Hat Virtio 1.0 device6.8.0-45-generic (x86_64)GCC 13.2.0 + CUDA 12.01024x768KVMOpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- ASPEED - 2 x Intel Xeon Gold 6226R: Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x5003605- 5x A5000 kw-dl580-3-4 NVIDIA: CPU Microcode: 0x715Graphics Details- ASPEED - 2 x Intel Xeon Gold 6226R: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.04.57.00.08- 5x A5000 kw-dl580-3-4 NVIDIA: BAR1 / Visible vRAM Size: 256 MiB - vBIOS Version: 94.02.6d.00.0dPython Details- ASPEED - 2 x Intel Xeon Gold 6226R: Python 3.8.13- 5x A5000 kw-dl580-3-4 NVIDIA: Python 3.12.3Security Details- ASPEED - 2 x Intel Xeon Gold 6226R: gather_data_sampling: Mitigation of Microcode + itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + reg_file_data_sampling: Not affected + retbleed: Mitigation of Enhanced IBRS + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: SW loop KVM: SW loop + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled - 5x A5000 kw-dl580-3-4 NVIDIA: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Mitigation of PTE Inversion; VMX: flush not necessary SMT vulnerable + mds: Mitigation of Clear buffers; SMT Host state unknown + meltdown: Mitigation of PTI + mmio_stale_data: Unknown: No mitigations + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines; IBPB: conditional; IBRS_FW; STIBP: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: Retpoline + srbds: Not affected + tsx_async_abort: Not affected

Nvidiacl-mem: Copycl-mem: Readcl-mem: Writeclpeak: Integer Compute INTclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Global Memory Bandwidthfahbench: financebench: Black-Scholes OpenCLhashcat: MD5hashcat: SHA1hashcat: 7-Ziphashcat: SHA-512hashcat: TrueCrypt RIPEMD160 + XTSluxcorerender: DLSC - GPUluxcorerender: Danish Mood - GPUluxcorerender: Orange Juice - GPUluxcorerender: LuxCore Benchmark - GPUluxcorerender: Rainbow Colors and Prism - GPUmixbench: OpenCL - Integermixbench: OpenCL - Double Precisionmixbench: OpenCL - Single Precisionncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3plaidml: No - Inference - IMDB LSTM - OpenCLplaidml: No - Inference - Mobilenet - OpenCLplaidml: Yes - Inference - Mobilenet - OpenCLplaidml: No - Inference - DenseNet 201 - OpenCLrodinia: OpenCL Particle Filtershoc: OpenCL - S3Dshoc: OpenCL - Triadshoc: OpenCL - FFT SPshoc: OpenCL - MD5 Hashshoc: OpenCL - Reductionshoc: OpenCL - GEMM SGEMM_Nshoc: OpenCL - Max SP Flopsshoc: OpenCL - Bus Speed Downloadshoc: OpenCL - Bus Speed Readbackshoc: OpenCL - Texture Read Bandwidthviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-TTviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TTviennacl: OpenCL BLAS - dGEMM-TNASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA283.4380.1376.49617.4918602.65365.83377.04240.138510.46015617811250091940033333422470013308000000345420057.4435.8250.6426.70122.0611601.09309.3918670.2818.938.378.529.727.3011.024.1318.1545.7310.927.9721.9033.4320.3232.7758.4610.39751.931898.702201.61179.217.105211.90412.11731094.6622.5655324.1823630.5521619.912.325013.15271998.58228382261117.918317510819659.759.462.258.3266345312359385383170317344348340328.2584.4547.413718.2126836.38483.57582.46185.27978.04913706795000075698633333351946710923433333285393348.6024.7234.4616.6976.3914663.50391.9127753.4077.0540.1640.3147.9839.5857.1221.0690.21140.8643.2828.1693.89106.3077.37227.14228.1650.1777.056.694299.3392.9199.846.167.856.651.6191.357.056.158.357.2309403291473534477164324440443442443OpenBenchmarking.org

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: CopyASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA70140210280350SE +/- 0.07, N = 3SE +/- 0.32, N = 3283.4328.21. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: ReadASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA130260390520650SE +/- 0.03, N = 3SE +/- 0.43, N = 3380.1584.41. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: WriteASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA120240360480600SE +/- 0.10, N = 3SE +/- 0.17, N = 3376.4547.41. (CC) gcc options: -O2 -flto -lOpenCL

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA3K6K9K12K15KSE +/- 87.03, N = 3SE +/- 26.25, N = 39617.4913718.211. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision FloatASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA6K12K18K24K30KSE +/- 76.05, N = 3SE +/- 0.40, N = 318602.6526836.381. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision DoubleASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA100200300400500SE +/- 0.36, N = 3SE +/- 1.64, N = 3365.83483.571. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory BandwidthASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA130260390520650SE +/- 0.01, N = 3SE +/- 0.02, N = 3377.04582.461. (CXX) g++ options: -O3

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA50100150200250SE +/- 0.31, N = 3SE +/- 2.24, N = 3240.14185.28

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA3691215SE +/- 0.016, N = 3SE +/- 0.107, N = 310.4608.0491. (CXX) g++ options: -O3 -march=native -fopenmp

Hashcat

Benchmark: MD5

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: MD5ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA30000M60000M90000M120000M150000MSE +/- 31239876948.73, N = 16SE +/- 23688235109.70, N = 16156178112500137067950000

Hashcat

Benchmark: SHA1

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA1ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA20000M40000M60000M80000M100000MSE +/- 113680610.09, N = 3SE +/- 133865807.60, N = 39194003333375698633333

Hashcat

Benchmark: 7-Zip

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: 7-ZipASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA900K1800K2700K3600K4500KSE +/- 9462.73, N = 3SE +/- 9837.57, N = 342247003519467

Hashcat

Benchmark: SHA-512

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: SHA-512ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA3000M6000M9000M12000M15000MSE +/- 26463244.95, N = 3SE +/- 27986087.81, N = 31330800000010923433333

Hashcat

Benchmark: TrueCrypt RIPEMD160 + XTS

OpenBenchmarking.orgH/s, More Is BetterHashcat 6.2.4Benchmark: TrueCrypt RIPEMD160 + XTSASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA700K1400K2100K2800K3500KSE +/- 1039.23, N = 3SE +/- 4272.52, N = 334542002853933

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1326395265SE +/- 5.22, N = 12SE +/- 0.60, N = 357.4448.60MAX: 65.49MIN: 29.6 / MAX: 52.28

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA816243240SE +/- 0.26, N = 3SE +/- 0.08, N = 335.8224.72MIN: 12.8 / MAX: 46.64MIN: 2.11 / MAX: 34.96

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1122334455SE +/- 0.09, N = 3SE +/- 0.29, N = 1550.6434.46MIN: 44.98 / MAX: 64.06MIN: 0.45 / MAX: 47.01

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA612182430SE +/- 2.44, N = 12SE +/- 1.54, N = 1226.7016.69MAX: 43.01MAX: 33.09

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA306090120150SE +/- 0.62, N = 3SE +/- 2.42, N = 12122.0676.39MIN: 106.98 / MAX: 141.43MIN: 61.59 / MAX: 129.05

Mixbench

Backend: OpenCL - Benchmark: Integer

OpenBenchmarking.orgGIOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: IntegerASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA3K6K9K12K15KSE +/- 4.62, N = 3SE +/- 174.42, N = 1511601.0914663.501. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: OpenCL - Benchmark: Double Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Double PrecisionASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA90180270360450SE +/- 0.92, N = 3SE +/- 4.99, N = 15309.39391.911. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

Mixbench

Backend: OpenCL - Benchmark: Single Precision

OpenBenchmarking.orgGFLOPS, More Is BetterMixbench 2020-06-23Backend: OpenCL - Benchmark: Single PrecisionASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA6K12K18K24K30KSE +/- 27.29, N = 3SE +/- 327.84, N = 1518670.2827753.401. (CXX) g++ options: -lm -lstdc++ -lOpenCL -lrt -O2

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA20406080100SE +/- 0.22, N = 12SE +/- 2.12, N = 918.9377.05MIN: 17.49 / MAX: 22.25MIN: 37.56 / MAX: 842.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA918273645SE +/- 0.10, N = 12SE +/- 1.67, N = 98.3740.16MIN: 7.43 / MAX: 30.63MIN: 19.36 / MAX: 731.711. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA918273645SE +/- 0.06, N = 12SE +/- 1.76, N = 98.5240.31MIN: 7.93 / MAX: 87.35MIN: 19.09 / MAX: 864.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1122334455SE +/- 0.08, N = 12SE +/- 2.22, N = 99.7247.98MIN: 8.97 / MAX: 16.96MIN: 22.19 / MAX: 949.821. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA918273645SE +/- 0.12, N = 12SE +/- 1.96, N = 97.3039.58MIN: 6.57 / MAX: 71.25MIN: 17.71 / MAX: 715.881. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1326395265SE +/- 0.13, N = 12SE +/- 1.20, N = 911.0257.12MIN: 9.86 / MAX: 77.38MIN: 26.93 / MAX: 1232.951. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA510152025SE +/- 0.07, N = 12SE +/- 1.40, N = 94.1321.06MIN: 3.7 / MAX: 4.65MIN: 10.19 / MAX: 621.351. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA20406080100SE +/- 0.33, N = 12SE +/- 3.17, N = 918.1590.21MIN: 15.73 / MAX: 36.32MIN: 41.1 / MAX: 1221.731. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA306090120150SE +/- 0.42, N = 12SE +/- 1.79, N = 945.73140.86MIN: 41.81 / MAX: 716.68MIN: 71.64 / MAX: 393.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1020304050SE +/- 0.11, N = 12SE +/- 1.32, N = 910.9243.28MIN: 10.17 / MAX: 12.21MIN: 20.79 / MAX: 504.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA714212835SE +/- 0.10, N = 12SE +/- 0.75, N = 97.9728.16MIN: 7.31 / MAX: 10.33MIN: 13.44 / MAX: 256.841. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50ASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA20406080100SE +/- 0.24, N = 12SE +/- 2.01, N = 921.9093.89MIN: 20.15 / MAX: 31.1MIN: 44.98 / MAX: 1038.181. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA20406080100SE +/- 0.49, N = 12SE +/- 1.69, N = 933.43106.30MIN: 29.11 / MAX: 256.67MIN: 52.73 / MAX: 505.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA20406080100SE +/- 0.34, N = 12SE +/- 3.32, N = 920.3277.37MIN: 18.17 / MAX: 30.87MIN: 35.36 / MAX: 1176.721. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA50100150200250SE +/- 0.23, N = 12SE +/- 27.31, N = 932.77227.14MIN: 93.69 / MAX: 5948.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA50100150200250SE +/- 0.72, N = 12SE +/- 3.18, N = 958.46228.16MIN: 52.56 / MAX: 125.76MIN: 122.48 / MAX: 1174.541. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1122334455SE +/- 0.41, N = 11SE +/- 1.46, N = 910.3950.17MIN: 8.52 / MAX: 30.47MIN: 22.35 / MAX: 948.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov35x A5000 kw-dl580-3-4 NVIDIA20406080100SE +/- 2.12, N = 977.05MIN: 37.56 / MAX: 842.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

PlaidML

FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCLASPEED - 2 x Intel Xeon Gold 6226R160320480640800SE +/- 1.13, N = 3751.93

PlaidML

FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCLASPEED - 2 x Intel Xeon Gold 6226R400800120016002000SE +/- 2.99, N = 31898.70

PlaidML

FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCLASPEED - 2 x Intel Xeon Gold 6226R5001000150020002500SE +/- 0.40, N = 32201.61

PlaidML

FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL

OpenBenchmarking.orgFPS, More Is BetterPlaidMLFP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCLASPEED - 2 x Intel Xeon Gold 6226R4080120160200SE +/- 0.34, N = 3179.21

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle FilterASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA246810SE +/- 0.096, N = 3SE +/- 0.079, N = 157.1056.694-m64 -lm -lcuda -lcudart -lcudadevrt -lcudart_static -lrt -lpthread -ldl-O2 -lOpenCL1. (CXX) g++ options:

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: S3D

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: S3DASPEED - 2 x Intel Xeon Gold 6226R50100150200250SE +/- 0.07, N = 3211.901. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Triad

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: TriadASPEED - 2 x Intel Xeon Gold 6226R3691215SE +/- 0.00, N = 312.121. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: FFT SP

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: FFT SPASPEED - 2 x Intel Xeon Gold 6226R2004006008001000SE +/- 0.17, N = 31094.661. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: MD5 Hash

OpenBenchmarking.orgGHash/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: MD5 HashASPEED - 2 x Intel Xeon Gold 6226R510152025SE +/- 0.00, N = 322.571. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Reduction

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: ReductionASPEED - 2 x Intel Xeon Gold 6226R70140210280350SE +/- 0.27, N = 3324.181. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: GEMM SGEMM_N

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: GEMM SGEMM_NASPEED - 2 x Intel Xeon Gold 6226R8001600240032004000SE +/- 44.25, N = 43630.551. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Max SP Flops

OpenBenchmarking.orgGFLOPS, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Max SP FlopsASPEED - 2 x Intel Xeon Gold 6226R5K10K15K20K25KSE +/- 305.65, N = 321619.91. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Download

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed DownloadASPEED - 2 x Intel Xeon Gold 6226R3691215SE +/- 0.00, N = 312.331. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Bus Speed Readback

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Bus Speed ReadbackASPEED - 2 x Intel Xeon Gold 6226R3691215SE +/- 0.00, N = 313.151. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

SHOC Scalable HeterOgeneous Computing

Target: OpenCL - Benchmark: Texture Read Bandwidth

OpenBenchmarking.orgGB/s, More Is BetterSHOC Scalable HeterOgeneous Computing 2020-04-17Target: OpenCL - Benchmark: Texture Read BandwidthASPEED - 2 x Intel Xeon Gold 6226R400800120016002000SE +/- 4.68, N = 31998.581. (CXX) g++ options: -O2 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -lmpi_cxx -lmpi

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA70140210280350SE +/- 3.89, N = 15SE +/- 34.09, N = 12228.0299.31. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA90180270360450SE +/- 5.93, N = 15SE +/- 53.50, N = 12382.0392.91. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA60120180240300SE +/- 1.23, N = 15SE +/- 23.95, N = 12261.0199.81. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA306090120150SE +/- 2.79, N = 15SE +/- 1.21, N = 12117.946.11. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA4080120160200SE +/- 1.77, N = 15SE +/- 2.44, N = 12183.067.81. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA4080120160200SE +/- 1.40, N = 15SE +/- 2.95, N = 12175.056.61. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-NASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA20406080100SE +/- 0.64, N = 15SE +/- 1.70, N = 12108.051.61. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-TASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA4080120160200SE +/- 1.22, N = 15SE +/- 20.15, N = 12196.0191.31. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1326395265SE +/- 1.20, N = 15SE +/- 0.22, N = 1259.757.01. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1326395265SE +/- 1.11, N = 14SE +/- 0.35, N = 1259.456.11. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1428425670SE +/- 1.35, N = 14SE +/- 0.13, N = 1262.258.31. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA1326395265SE +/- 1.46, N = 15SE +/- 0.14, N = 1258.357.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA70140210280350SE +/- 0.67, N = 3SE +/- 0.67, N = 32663091. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA90180270360450SE +/- 0.33, N = 3SE +/- 1.53, N = 33454031. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA70140210280350SE +/- 0.67, N = 3SE +/- 0.67, N = 33122911. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA100200300400500SE +/- 0.58, N = 3SE +/- 1.45, N = 33594731. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA120240360480600SE +/- 0.00, N = 3SE +/- 1.15, N = 33855341. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA100200300400500SE +/- 0.33, N = 3SE +/- 0.33, N = 33834771. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-NASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA4080120160200SE +/- 0.00, N = 3SE +/- 0.58, N = 31701641. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-TASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA70140210280350SE +/- 1.76, N = 3SE +/- 0.88, N = 33173241. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA100200300400500SE +/- 1.45, N = 3SE +/- 1.15, N = 33444401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA100200300400500SE +/- 2.03, N = 33484431. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTASPEED - 2 x Intel Xeon Gold 6226R5x A5000 kw-dl580-3-4 NVIDIA100200300400500SE +/- 2.03, N = 33404421. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TN5x A5000 kw-dl580-3-4 NVIDIA100200300400500SE +/- 1.76, N = 34431. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL


Phoronix Test Suite v10.8.5