dnn qemu testing on Ubuntu 22.04 via the Phoronix Test Suite. dnn: Processor: AMD Ryzen 9 7950X 16-Core (28 Cores), Motherboard: QEMU Standard PC (Q35 + ICH9 2009) (0.0.0 BIOS), Chipset: Intel 82G33/G31/P35/P31 + ICH9, Memory: 46GB, Disk: 2164GB, Graphics: NVIDIA GeForce RTX 4090 24GB, Audio: QEMU Generic, Monitor: DP1080P60, Network: 2 x Red Hat Virtio device OS: Ubuntu 22.04, Kernel: 5.19.0-43-generic (x86_64), Desktop: GNOME Shell 42.5, Display Server: X Server 1.21.1.4, Display Driver: NVIDIA 530.41.03, OpenGL: 4.6.0, OpenCL: OpenCL 3.0 CUDA 12.1.98, Vulkan: 1.3.236, Compiler: GCC 11.3.0 + CUDA 11.5, File-System: ext4, Screen Resolution: 1920x1080, System Layer: qemu Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Integer Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Integer Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Double Precision Mixbench 2020-06-23 Backend: OpenCL - Benchmark: Single Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Half Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Double Precision Mixbench 2020-06-23 Backend: NVIDIA CUDA - Benchmark: Single Precision SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth VkFFT 1.1.1 Benchmark Score > Higher Is Better dnn . 99244 |================================================================== PlaidML FP16: No - Mode: Training - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Inference - Network: IMDB LSTM - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Inference - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: Yes - Mode: Inference - Network: Mobilenet - Device: OpenCL Examples Per Second > Higher Is Better PlaidML FP16: No - Mode: Inference - Network: DenseNet 201 - Device: OpenCL Examples Per Second > Higher Is Better Libplacebo 5.229.1 FPS > Higher Is Better NeatBench 5 Acceleration: GPU FPS > Higher Is Better dnn . 4090 |=================================================================== cl-mem 2017-01-13 Benchmark: Copy GB/s > Higher Is Better dnn . 392.9 |================================================================== cl-mem 2017-01-13 Benchmark: Read GB/s > Higher Is Better dnn . 886.1 |================================================================== cl-mem 2017-01-13 Benchmark: Write GB/s > Higher Is Better dnn . 801.6 |================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sCOPY GB/s > Higher Is Better dnn . 205 |==================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sAXPY GB/s > Higher Is Better dnn . 309 |==================================================================== ViennaCL 1.7.1 Test: CPU BLAS - sDOT GB/s > Higher Is Better dnn . 294 |==================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dCOPY GB/s > Higher Is Better dnn . 62.4 |=================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dAXPY GB/s > Higher Is Better dnn . 92.9 |=================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dDOT GB/s > Higher Is Better dnn . 95.2 |=================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N GB/s > Higher Is Better dnn . 111 |==================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T GB/s > Higher Is Better dnn . 127 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY GB/s > Higher Is Better dnn . 436 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY GB/s > Higher Is Better dnn . 568 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY GB/s > Higher Is Better dnn . 651 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY GB/s > Higher Is Better dnn . 763 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT GB/s > Higher Is Better dnn . 642 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N GB/s > Higher Is Better dnn . 218 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T GB/s > Higher Is Better dnn . 435 |==================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT GB/s > Higher Is Better dnn . 439 |==================================================================== clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth GBPS > Higher Is Better dnn . 873.14 |================================================================= vkpeak 20210424 fp32-scalar GFLOPS > Higher Is Better dnn . 44603.45 |=============================================================== vkpeak 20210424 fp32-vec4 GFLOPS > Higher Is Better dnn . 58898.23 |=============================================================== vkpeak 20210424 fp16-scalar GFLOPS > Higher Is Better dnn . 44487.32 |=============================================================== vkpeak 20210424 fp16-vec4 GFLOPS > Higher Is Better dnn . 88257.98 |=============================================================== vkpeak 20210424 fp64-scalar GFLOPS > Higher Is Better dnn . 1406.30 |================================================================ vkpeak 20210424 fp64-vec4 GFLOPS > Higher Is Better dnn . 1407.53 |================================================================ clpeak 1.1.2 OpenCL Test: Single-Precision Float GFLOPS > Higher Is Better dnn . 79707.10 |=============================================================== clpeak 1.1.2 OpenCL Test: Double-Precision Double GFLOPS > Higher Is Better dnn . 1396.96 |================================================================ ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN GFLOPs/s > Higher Is Better dnn . 63.2 |=================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT GFLOPs/s > Higher Is Better dnn . 60.7 |=================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN GFLOPs/s > Higher Is Better dnn . 70.9 |=================================================================== ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT GFLOPs/s > Higher Is Better dnn . 66.4 |=================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN GFLOPs/s > Higher Is Better dnn . 1160 |=================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT GFLOPs/s > Higher Is Better dnn . 1277 |=================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN GFLOPs/s > Higher Is Better dnn . 1297 |=================================================================== ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT GFLOPs/s > Higher Is Better dnn . 1347 |=================================================================== vkpeak 20210424 int32-scalar GIOPS > Higher Is Better dnn . 44596.59 |=============================================================== vkpeak 20210424 int32-vec4 GIOPS > Higher Is Better dnn . 44374.76 |=============================================================== vkpeak 20210424 int16-scalar GIOPS > Higher Is Better dnn . 29662.99 |=============================================================== vkpeak 20210424 int16-vec4 GIOPS > Higher Is Better dnn . 39483.86 |=============================================================== clpeak 1.1.2 OpenCL Test: Integer Compute INT GIOPS > Higher Is Better dnn . 40347.55 |=============================================================== Hashcat 6.2.4 Benchmark: MD5 H/s > Higher Is Better Hashcat 6.2.4 Benchmark: SHA1 H/s > Higher Is Better Hashcat 6.2.4 Benchmark: 7-Zip H/s > Higher Is Better Hashcat 6.2.4 Benchmark: SHA-512 H/s > Higher Is Better Hashcat 6.2.4 Benchmark: TrueCrypt RIPEMD160 + XTS H/s > Higher Is Better IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Bedroom M samples/s > Higher Is Better dnn . 35.51 |================================================================== IndigoBench 4.4 Acceleration: OpenCL GPU - Scene: Supercar M samples/s > Higher Is Better dnn . 79.45 |================================================================== LuxCoreRender 2.6 Scene: DLSC - Acceleration: GPU M samples/sec > Higher Is Better dnn . 25.83 |================================================================== LuxCoreRender 2.6 Scene: Danish Mood - Acceleration: GPU M samples/sec > Higher Is Better dnn . 19.98 |================================================================== LuxCoreRender 2.6 Scene: Orange Juice - Acceleration: GPU M samples/sec > Higher Is Better dnn . 20.19 |================================================================== LuxCoreRender 2.6 Scene: LuxCore Benchmark - Acceleration: GPU M samples/sec > Higher Is Better dnn . 20.80 |================================================================== LuxCoreRender 2.6 Scene: Rainbow Colors and Prism - Acceleration: GPU M samples/sec > Higher Is Better dnn . 44.93 |================================================================== LeelaChessZero 0.28 Backend: OpenCL Nodes Per Second > Higher Is Better dnn . 31199 |================================================================== FAHBench 2.3.2 Ns Per Day > Higher Is Better dnn . 430.65 |================================================================= GROMACS 2023 Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare Ns Per Day > Higher Is Better MandelGPU 1.3pts1 OpenCL Device: GPU Samples/sec > Higher Is Better dnn . 830513186.5 |============================================================ OctaneBench 2020.1 Total Score Score > Higher Is Better dnn . 1281.89 |================================================================ NAMD CUDA 2.14 ATPase Simulation - 327,506 Atoms days/ns < Lower Is Better dnn . 0.07371 |================================================================ Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 100 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 200 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: NVIDIA CUDA - Iterations: 1000 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 100 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 200 Milli-Seconds < Lower Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: NVIDIA CUDA - Iterations: 1000 Milli-Seconds < Lower Is Better VkResample 1.0 Upscale: 2x - Precision: Double ms < Lower Is Better dnn . 55.28 |================================================================== VkResample 1.0 Upscale: 2x - Precision: Single ms < Lower Is Better dnn . 7.911 |================================================================== ArrayFire 3.7 Test: Conjugate Gradient OpenCL ms < Lower Is Better dnn . 0.9286 |================================================================= FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL ms < Lower Is Better dnn . 2.967 |================================================================== NCNN 20220729 Target: Vulkan GPU - Model: mobilenet ms < Lower Is Better dnn . 3.03 |=================================================================== NCNN 20220729 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 ms < Lower Is Better dnn . 1.02 |=================================================================== NCNN 20220729 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 ms < Lower Is Better dnn . 1.29 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: shufflenet-v2 ms < Lower Is Better dnn . 1.23 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: mnasnet ms < Lower Is Better dnn . 1.01 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: efficientnet-b0 ms < Lower Is Better dnn . 1.94 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: blazeface ms < Lower Is Better dnn . 0.76 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: googlenet ms < Lower Is Better dnn . 1.67 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: vgg16 ms < Lower Is Better dnn . 1.68 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: alexnet ms < Lower Is Better dnn . 1.02 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: yolov4-tiny ms < Lower Is Better dnn . 4.49 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: squeezenet_ssd ms < Lower Is Better dnn . 3.17 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: regnety_400m ms < Lower Is Better dnn . 1.50 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: vision_transformer ms < Lower Is Better dnn . 121.14 |================================================================= NCNN 20220729 Target: Vulkan GPU - Model: FastestDet ms < Lower Is Better dnn . 1.47 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: resnet18 ms < Lower Is Better dnn . 0.86 |=================================================================== NCNN 20220729 Target: Vulkan GPU - Model: resnet50 ms < Lower Is Better dnn . 1.52 |=================================================================== RealSR-NCNN 20200818 Scale: 4x - TAA: No Seconds < Lower Is Better dnn . 4.496 |================================================================== RealSR-NCNN 20200818 Scale: 4x - TAA: Yes Seconds < Lower Is Better dnn . 19.75 |================================================================== Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: No Seconds < Lower Is Better Waifu2x-NCNN Vulkan 20200818 Scale: 2x - Denoise: 3 - TAA: Yes Seconds < Lower Is Better dnn . 2.281 |================================================================== Betsy GPU Compressor 1.1 Beta Codec: ETC1 - Quality: Highest Seconds < Lower Is Better Betsy GPU Compressor 1.1 Beta Codec: ETC2 RGB - Quality: Highest Seconds < Lower Is Better RedShift Demo 3.0 Seconds < Lower Is Better Rodinia 3.1 Test: OpenCL Particle Filter Seconds < Lower Is Better dnn . 2.173 |================================================================== Blender 3.5 Blend File: BMW27 - Compute: NVIDIA OptiX Seconds < Lower Is Better dnn . 13.47 |================================================================== Blender 3.5 Blend File: Classroom - Compute: NVIDIA OptiX Seconds < Lower Is Better dnn . 7.34 |=================================================================== Blender 3.5 Blend File: Fishy Cat - Compute: NVIDIA OptiX Seconds < Lower Is Better dnn . 5.58 |=================================================================== Blender 3.5 Blend File: Barbershop - Compute: NVIDIA OptiX Seconds < Lower Is Better dnn . 30.60 |================================================================== Blender 3.5 Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX Seconds < Lower Is Better dnn . 8.37 |===================================================================