dnn

qemu testing on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2306107-NE-DNN37749368&grw.

dnnProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLOpenCLVulkanCompilerFile-SystemScreen ResolutionSystem LayerdnnAMD Ryzen 9 7950X 16-Core (28 Cores)QEMU Standard PC (Q35 + ICH9 2009) (0.0.0 BIOS)Intel 82G33/G31/P35/P31 + ICH946GB2164GBNVIDIA GeForce RTX 4090 24GBQEMU GenericDP1080P602 x Red Hat Virtio deviceUbuntu 22.045.19.0-43-generic (x86_64)GNOME Shell 42.5X Server 1.21.1.4NVIDIA 530.41.034.6.0OpenCL 3.0 CUDA 12.1.981.3.236GCC 11.3.0 + CUDA 11.5ext41920x1080qemuOpenBenchmarking.org- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - CPU Microcode: 0xa601203- BAR1 / Visible vRAM Size: 32768 MiB - vBIOS Version: 95.02.3c.40.1a - GPU Compute Cores: 16384- Python 3.10.6- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

dnnlczero: OpenCLncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - resnet50rodinia: OpenCL Particle Filterarrayfire: Conjugate Gradient OpenCLblender: BMW27 - NVIDIA OptiXblender: Classroom - NVIDIA OptiXblender: Fishy Cat - NVIDIA OptiXblender: Barbershop - NVIDIA OptiXblender: Pabellon Barcelona - NVIDIA OptiXneatbench: GPUindigobench: OpenCL GPU - Bedroomindigobench: OpenCL GPU - Supercarluxcorerender: DLSC - GPUluxcorerender: Danish Mood - GPUluxcorerender: Orange Juice - GPUluxcorerender: LuxCore Benchmark - GPUluxcorerender: Rainbow Colors and Prism - GPUfahbench: namd-cuda: ATPase Simulation - 327,506 Atomsoctanebench: Total Scorefinancebench: Black-Scholes OpenCLcl-mem: Copycl-mem: Readcl-mem: Writeclpeak: Integer Compute INTclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Global Memory Bandwidthmandelgpu: GPUviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-TTviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-TTviennacl: OpenCL BLAS - sDOTrealsr-ncnn: 4x - Norealsr-ncnn: 4x - Yesvkfft: vkpeak: fp32-scalarvkpeak: fp32-vec4vkpeak: fp16-scalarvkpeak: fp16-vec4vkpeak: fp64-scalarvkpeak: fp64-vec4vkpeak: int32-scalarvkpeak: int32-vec4vkpeak: int16-scalarvkpeak: int16-vec4vkresample: 2x - Doublevkresample: 2x - Singlewaifu2x-ncnn: 2x - 3 - Yesdnn311993.031.021.291.231.011.940.761.671.681.024.493.171.50121.141.470.861.522.1730.928613.477.345.5830.608.37409035.50979.45125.8319.9820.1920.8044.93430.64580.073711281.8920682.967392.9886.1801.640347.5579707.101396.96873.14830513186.520530929462.492.995.211112763.260.770.966.443656865176364221843511601277129713474394.49619.7479924444603.4558898.2344487.3288257.981406.301407.5344596.5944374.7629662.9939483.8655.2787.9112.281OpenBenchmarking.org

LeelaChessZero

Backend: OpenCL

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: OpenCLdnn7K14K21K28K35KSE +/- 315.85, N = 3311991. (CXX) g++ options: -flto -pthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: mobilenetdnn0.68181.36362.04542.72723.409SE +/- 0.02, N = 33.03MIN: 2.23 / MAX: 70.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2dnn0.22950.4590.68850.9181.1475SE +/- 0.10, N = 31.02MIN: 0.76 / MAX: 3.431. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3dnn0.29030.58060.87091.16121.4515SE +/- 0.03, N = 31.29MIN: 1.12 / MAX: 7.861. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: shufflenet-v2dnn0.27680.55360.83041.10721.384SE +/- 0.02, N = 31.23MIN: 1.11 / MAX: 2.431. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: mnasnetdnn0.22730.45460.68190.90921.1365SE +/- 0.03, N = 31.01MIN: 0.87 / MAX: 4.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: efficientnet-b0dnn0.43650.8731.30951.7462.1825SE +/- 0.11, N = 31.94MIN: 1.56 / MAX: 72.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: blazefacednn0.1710.3420.5130.6840.855SE +/- 0.06, N = 30.76MIN: 0.6 / MAX: 2.671. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: googlenetdnn0.37580.75161.12741.50321.879SE +/- 0.08, N = 31.67MIN: 1.27 / MAX: 62.381. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: vgg16dnn0.3780.7561.1341.5121.89SE +/- 0.12, N = 21.68MIN: 1.45 / MAX: 67.031. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: alexnetdnn0.22950.4590.68850.9181.1475SE +/- 0.05, N = 31.02MIN: 0.89 / MAX: 4.191. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: yolov4-tinydnn1.01032.02063.03094.04125.0515SE +/- 0.08, N = 34.49MIN: 3.65 / MAX: 66.581. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: squeezenet_ssddnn0.71331.42662.13992.85323.5665SE +/- 0.22, N = 33.17MIN: 2.32 / MAX: 68.21. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: regnety_400mdnn0.33750.6751.01251.351.6875SE +/- 0.06, N = 31.50MIN: 1.31 / MAX: 3.731. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: vision_transformerdnn306090120150SE +/- 7.90, N = 3121.14MIN: 89.4 / MAX: 315.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: FastestDetdnn0.33080.66160.99241.32321.654SE +/- 0.12, N = 31.47MIN: 1.28 / MAX: 64.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: resnet18dnn0.19350.3870.58050.7740.9675SE +/- 0.06, N = 20.86MIN: 0.76 / MAX: 3.951. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: Vulkan GPU - Model: resnet50dnn0.3420.6841.0261.3681.711.52MIN: 1.43 / MAX: 2.441. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Rodinia

Test: OpenCL Particle Filter

OpenBenchmarking.orgSeconds, Fewer Is BetterRodinia 3.1Test: OpenCL Particle Filterdnn0.48890.97781.46671.95562.44452.1731. (CXX) g++ options: -O2 -lOpenCL

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.7Test: Conjugate Gradient OpenCLdnn0.20890.41780.62670.83561.0445SE +/- 0.0086, N = 70.92861. (CXX) g++ options: -rdynamic

Blender

Blend File: BMW27 - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.5Blend File: BMW27 - Compute: NVIDIA OptiXdnn3691215SE +/- 9.91, N = 1213.47

Blender

Blend File: Classroom - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.5Blend File: Classroom - Compute: NVIDIA OptiXdnn246810SE +/- 0.03, N = 37.34

Blender

Blend File: Fishy Cat - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.5Blend File: Fishy Cat - Compute: NVIDIA OptiXdnn1.25552.5113.76655.0226.2775SE +/- 0.05, N = 145.58

Blender

Blend File: Barbershop - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.5Blend File: Barbershop - Compute: NVIDIA OptiXdnn714212835SE +/- 0.08, N = 330.60

Blender

Blend File: Pabellon Barcelona - Compute: NVIDIA OptiX

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 3.5Blend File: Pabellon Barcelona - Compute: NVIDIA OptiXdnn246810SE +/- 0.03, N = 38.37

NeatBench

Acceleration: GPU

OpenBenchmarking.orgFPS, More Is BetterNeatBench 5Acceleration: GPUdnn9001800270036004500SE +/- 0.00, N = 34090

IndigoBench

Acceleration: OpenCL GPU - Scene: Bedroom

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: Bedroomdnn816243240SE +/- 0.02, N = 335.51

IndigoBench

Acceleration: OpenCL GPU - Scene: Supercar

OpenBenchmarking.orgM samples/s, More Is BetterIndigoBench 4.4Acceleration: OpenCL GPU - Scene: Supercardnn20406080100SE +/- 0.01, N = 379.45

LuxCoreRender

Scene: DLSC - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: DLSC - Acceleration: GPUdnn612182430SE +/- 0.01, N = 325.83MIN: 24.42 / MAX: 26.13

LuxCoreRender

Scene: Danish Mood - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Danish Mood - Acceleration: GPUdnn510152025SE +/- 0.12, N = 319.98MIN: 7.42 / MAX: 23.23

LuxCoreRender

Scene: Orange Juice - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Orange Juice - Acceleration: GPUdnn510152025SE +/- 0.08, N = 320.19MIN: 18.16 / MAX: 27.9

LuxCoreRender

Scene: LuxCore Benchmark - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: LuxCore Benchmark - Acceleration: GPUdnn510152025SE +/- 0.04, N = 320.80MIN: 9.16 / MAX: 25.17

LuxCoreRender

Scene: Rainbow Colors and Prism - Acceleration: GPU

OpenBenchmarking.orgM samples/sec, More Is BetterLuxCoreRender 2.6Scene: Rainbow Colors and Prism - Acceleration: GPUdnn1020304050SE +/- 0.03, N = 344.93MIN: 38.13 / MAX: 47.42

FAHBench

OpenBenchmarking.orgNs Per Day, More Is BetterFAHBench 2.3.2dnn90180270360450SE +/- 0.68, N = 3430.65

NAMD CUDA

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD CUDA 2.14ATPase Simulation - 327,506 Atomsdnn0.01660.03320.04980.06640.083SE +/- 0.00099, N = 150.07371

OctaneBench

Total Score

OpenBenchmarking.orgScore, More Is BetterOctaneBench 2020.1Total Scorednn300600900120015001281.89

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLdnn0.66761.33522.00282.67043.338SE +/- 0.031, N = 32.9671. (CXX) g++ options: -O3 -march=native -fopenmp

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copydnn90180270360450SE +/- 4.80, N = 15392.91. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Readdnn2004006008001000SE +/- 1.47, N = 3886.11. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Writednn2004006008001000SE +/- 1.50, N = 3801.61. (CC) gcc options: -O2 -flto -lOpenCL

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTdnn9K18K27K36K45KSE +/- 461.36, N = 340347.551. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision Floatdnn20K40K60K80K100KSE +/- 2.35, N = 379707.101. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision Doublednn30060090012001500SE +/- 2.09, N = 31396.961. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory Bandwidthdnn2004006008001000SE +/- 0.04, N = 3873.141. (CXX) g++ options: -O3

MandelGPU

OpenCL Device: GPU

OpenBenchmarking.orgSamples/sec, More Is BetterMandelGPU 1.3pts1OpenCL Device: GPUdnn200M400M600M800M1000MSE +/- 1842939.53, N = 3830513186.51. (CC) gcc options: -O3 -lm -ftree-vectorize -funroll-loops -lglut -lOpenCL -lGL

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYdnn4080120160200SE +/- 1.86, N = 32051. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYdnn70140210280350SE +/- 3.51, N = 33091. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTdnn60120180240300SE +/- 4.26, N = 32941. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYdnn1428425670SE +/- 0.79, N = 362.41. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYdnn20406080100SE +/- 0.80, N = 392.91. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTdnn20406080100SE +/- 0.72, N = 395.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Ndnn20406080100SE +/- 0.33, N = 31111. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Tdnn306090120150SE +/- 1.33, N = 31271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNdnn1428425670SE +/- 0.51, N = 363.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTdnn1428425670SE +/- 0.59, N = 360.71. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNdnn1632486480SE +/- 1.29, N = 370.91. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTdnn1530456075SE +/- 0.44, N = 366.41. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYdnn90180270360450SE +/- 0.88, N = 34361. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYdnn120240360480600SE +/- 0.58, N = 35681. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYdnn140280420560700SE +/- 0.58, N = 36511. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYdnn160320480640800SE +/- 1.00, N = 37631. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTdnn140280420560700SE +/- 6.74, N = 36421. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Ndnn50100150200250SE +/- 0.33, N = 32181. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Tdnn90180270360450SE +/- 0.88, N = 34351. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNdnn2004006008001000SE +/- 0.00, N = 311601. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTdnn30060090012001500SE +/- 3.33, N = 312771. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNdnn30060090012001500SE +/- 3.33, N = 312971. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTdnn30060090012001500SE +/- 3.33, N = 313471. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTdnn100200300400500SE +/- 0.50, N = 24391. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

RealSR-NCNN

Scale: 4x - TAA: No

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: Nodnn1.01162.02323.03484.04645.058SE +/- 0.045, N = 34.496

RealSR-NCNN

Scale: 4x - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterRealSR-NCNN 20200818Scale: 4x - TAA: Yesdnn510152025SE +/- 0.01, N = 319.75

VkFFT

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.1.1dnn20K40K60K80K100KSE +/- 8508.53, N = 9992441. (CXX) g++ options: -O3

vkpeak

fp32-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp32-scalardnn10K20K30K40K50KSE +/- 56.62, N = 344603.45

vkpeak

fp32-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp32-vec4dnn13K26K39K52K65KSE +/- 53.24, N = 358898.23

vkpeak

fp16-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp16-scalardnn10K20K30K40K50KSE +/- 39.52, N = 344487.32

vkpeak

fp16-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp16-vec4dnn20K40K60K80K100KSE +/- 82.01, N = 388257.98

vkpeak

fp64-scalar

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp64-scalardnn30060090012001500SE +/- 1.05, N = 31406.30

vkpeak

fp64-vec4

OpenBenchmarking.orgGFLOPS, More Is Bettervkpeak 20210424fp64-vec4dnn30060090012001500SE +/- 0.36, N = 31407.53

vkpeak

int32-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20210424int32-scalardnn10K20K30K40K50KSE +/- 23.36, N = 344596.59

vkpeak

int32-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20210424int32-vec4dnn10K20K30K40K50KSE +/- 25.21, N = 344374.76

vkpeak

int16-scalar

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20210424int16-scalardnn6K12K18K24K30KSE +/- 22.90, N = 329662.99

vkpeak

int16-vec4

OpenBenchmarking.orgGIOPS, More Is Bettervkpeak 20210424int16-vec4dnn8K16K24K32K40KSE +/- 3.33, N = 339483.86

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Doublednn1224364860SE +/- 0.03, N = 355.281. (CXX) g++ options: -O3

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Singlednn246810SE +/- 0.008, N = 37.9111. (CXX) g++ options: -O3

Waifu2x-NCNN Vulkan

Scale: 2x - Denoise: 3 - TAA: Yes

OpenBenchmarking.orgSeconds, Fewer Is BetterWaifu2x-NCNN Vulkan 20200818Scale: 2x - Denoise: 3 - TAA: Yesdnn0.51321.02641.53962.05282.566SE +/- 0.024, N = 42.281


Phoronix Test Suite v10.8.5