ngc smoke run

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2403013-NE-NGCSMOKER54&grs&rdt&rro.

ngc smoke runProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH200 480GB2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 22.046.5.0-1007-NVIDIA-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.4.89GCC 11.4.0 + CUDA 11.5ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Graphics Details- BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02Python Details- Python 3.10.12Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ngc smoke runviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dGEMM-NNncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mviennacl: CPU BLAS - dGEMV-Nvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT R2C / C2Rviennacl: CPU BLAS - dGEMM-TTvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkfft: FFT + iFFT C2C 1D batched in single precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingncnn: Vulkan GPU - blazefaceviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - dGEMV-Tncnn: Vulkan GPU-v2-v2 - mobilenet-v2viennacl: CPU BLAS - dAXPYncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - googlenetviennacl: OpenCL BLAS - sAXPYncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - resnet50viennacl: OpenCL BLAS - dGEMM-TNncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - shufflenet-v2viennacl: CPU BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-NNfinancebench: Black-Scholes OpenCLviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - sAXPYvkfft: FFT + iFFT C2C 1D batched in half precisionncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - mobilenetviennacl: CPU BLAS - sDOTvkfft: FFT + iFFT C2C Bluestein in single precisionviennacl: OpenCL BLAS - dDOTarrayfire: Conjugate Gradient OpenCLncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - yolov4-tinyviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dGEMV-Tvkfft: FFT + iFFT C2C 1D batched in double precisionncnn: Vulkan GPU - vgg16viennacl: OpenCL BLAS - dGEMM-TTviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYcl-mem: Writeclpeak: Double-Precision Doubleclpeak: Integer Compute INTclpeak: Single-Precision Floatcl-mem: Copyvkresample: 2x - Doublecl-mem: Readvkresample: 2x - Singleclpeak: Global Memory Bandwidthviennacl: OpenCL BLAS - sCOPYabcd202713531.5214.784114448942397137208101857741944971.7529206862.1318031.632.162.263.494.164203.094.2770275.432.2912570574.347124714139431519124.894.89667178675502.9972.046.7981.2282308584055.26707075276037992354.932959.1733119.1064545.62308.624.2961045.95.2303483.99316194813732.3214.744084373141809138210001860821900371.7828926992.1618061.632.182.273.554.234273.104.2870675.472.2912570934.373123814039241519104.924.92664179675522.9832.036.8081.5282308582535.25707075376047982353.432961.2133144.7464547.74308.524.2941045.95.2313484.06316192013931.9214.774054507142581140210941899441909091.7429076912.1218371.652.172.303.524.234273.084.3270005.482.2712470374.351124314139171528664.924.92666178865522.9982.046.8181.2283308582565.25705775376047992354.932941.9933146.1264547.25308.624.2901046.15.2303483.95316191714131.1315.224184500743048136213201903101925071.7728576962.1218301.622.202.273.534.214263.124.3270705.432.2712470534.339124714039201519694.914.91663179425532.9972.046.8281.4283307582995.26707075406047992352.132933.6333129.3464520.97308.524.2971046.05.2303484.32316OpenBenchmarking.org

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYdcba400800120016002000SE +/- 14.53, N = 3SE +/- 41.63, N = 3SE +/- 17.15, N = 5SE +/- 44.85, N = 319171920194820271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNdcba306090120150SE +/- 2.65, N = 3SE +/- 2.67, N = 3SE +/- 1.29, N = 5SE +/- 0.88, N = 31411391371351. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerdcba816243240SE +/- 0.08, N = 3SE +/- 0.60, N = 3SE +/- 0.98, N = 3SE +/- 0.28, N = 331.1331.9232.3231.52MIN: 30.31 / MAX: 64.21MIN: 30.27 / MAX: 57.34MIN: 30.14 / MAX: 67.5MIN: 30.23 / MAX: 62.771. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mdcba48121620SE +/- 0.19, N = 3SE +/- 0.24, N = 3SE +/- 0.15, N = 3SE +/- 0.13, N = 315.2214.7714.7414.78MIN: 14.15 / MAX: 21.54MIN: 13.51 / MAX: 18.11MIN: 14 / MAX: 20.37MIN: 13.74 / MAX: 17.761. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Ndcba90180270360450SE +/- 8.51, N = 3SE +/- 1.86, N = 3SE +/- 2.90, N = 5SE +/- 0.33, N = 34184054084111. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C multidimensional in single precisiondcba10K20K30K40K50KSE +/- 571.37, N = 3SE +/- 475.72, N = 3SE +/- 441.36, N = 3SE +/- 479.16, N = 15450074507143731444891. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT R2C / C2Rdcba9K18K27K36K45KSE +/- 289.59, N = 15SE +/- 552.29, N = 3SE +/- 460.34, N = 3SE +/- 298.99, N = 3430484258141809423971. (CXX) g++ options: -O3

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTdcba306090120150SE +/- 1.20, N = 3SE +/- 4.73, N = 3SE +/- 2.18, N = 5SE +/- 1.53, N = 31361401381371. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein benchmark in double precisiondcba5K10K15K20K25KSE +/- 152.14, N = 15SE +/- 282.81, N = 3SE +/- 195.51, N = 3SE +/- 188.78, N = 3213202109421000208101. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precisiondcba40K80K120K160K200KSE +/- 479.86, N = 3SE +/- 1666.00, N = 3SE +/- 1095.80, N = 3SE +/- 1557.78, N = 31903101899441860821857741. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingdcba40K80K120K160K200KSE +/- 720.67, N = 3SE +/- 583.76, N = 3SE +/- 521.00, N = 3SE +/- 2261.14, N = 31925071909091900371944971. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefacedcba0.40050.8011.20151.6022.0025SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 31.771.741.781.75MIN: 1.64 / MAX: 3.01MIN: 1.6 / MAX: 2.93MIN: 1.67 / MAX: 7.1MIN: 1.68 / MAX: 3.091. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYdcba6001200180024003000SE +/- 23.33, N = 3SE +/- 3.33, N = 3SE +/- 29.56, N = 5SE +/- 20.00, N = 328572907289229201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Tdcba150300450600750SE +/- 10.68, N = 3SE +/- 1.20, N = 3SE +/- 3.65, N = 5SE +/- 17.19, N = 36966916996861. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2dcba0.4860.9721.4581.9442.43SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 32.122.122.162.13MIN: 1.96 / MAX: 5.5MIN: 1.96 / MAX: 3.59MIN: 2.04 / MAX: 8.11MIN: 1.99 / MAX: 3.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYdcba400800120016002000SE +/- 10.00, N = 3SE +/- 3.33, N = 3SE +/- 29.93, N = 5SE +/- 23.33, N = 318301837180618031. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetdcba0.37130.74261.11391.48521.8565SE +/- 0.00, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 31.621.651.631.63MIN: 1.5 / MAX: 4.84MIN: 1.44 / MAX: 4.74MIN: 1.5 / MAX: 2.94MIN: 1.49 / MAX: 2.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18dcba0.4950.991.4851.982.475SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 32.202.172.182.16MIN: 2.08 / MAX: 3.65MIN: 2.04 / MAX: 3.54MIN: 2.05 / MAX: 5.44MIN: 2.04 / MAX: 3.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3dcba0.51751.0351.55252.072.5875SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 32.272.302.272.26MIN: 2.11 / MAX: 4.59MIN: 2.1 / MAX: 3.74MIN: 2.16 / MAX: 5.3MIN: 2.11 / MAX: 3.891. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0dcba0.79881.59762.39643.19523.994SE +/- 0.04, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 33.533.523.553.49MIN: 3.25 / MAX: 6.52MIN: 3.22 / MAX: 6.65MIN: 3.33 / MAX: 8.63MIN: 3.27 / MAX: 5.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetdcba0.95181.90362.85543.80724.759SE +/- 0.01, N = 3SE +/- 0.06, N = 3SE +/- 0.05, N = 3SE +/- 0.02, N = 34.214.234.234.16MIN: 4.03 / MAX: 5.75MIN: 4 / MAX: 6.41MIN: 4.01 / MAX: 6.82MIN: 3.99 / MAX: 5.761. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYdcba90180270360450SE +/- 3.93, N = 3SE +/- 2.33, N = 3SE +/- 1.33, N = 3SE +/- 2.60, N = 34264274274201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetdcba0.7021.4042.1062.8083.51SE +/- 0.03, N = 3SE +/- 0.06, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 33.123.083.103.09MIN: 2.97 / MAX: 4.56MIN: 2.91 / MAX: 4.57MIN: 2.92 / MAX: 4.71MIN: 2.95 / MAX: 4.641. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50dcba0.9721.9442.9163.8884.86SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 34.324.324.284.27MIN: 4.1 / MAX: 7.58MIN: 4.05 / MAX: 8.1MIN: 4.05 / MAX: 7.65MIN: 4.05 / MAX: 6.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNdcba15003000450060007500SE +/- 45.09, N = 3SE +/- 15.28, N = 3SE +/- 46.31, N = 3SE +/- 17.64, N = 370707000706770271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssddcba1.2332.4663.6994.9326.165SE +/- 0.04, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.03, N = 35.435.485.475.43MIN: 5.17 / MAX: 7.39MIN: 5.16 / MAX: 11.04MIN: 5.22 / MAX: 11.66MIN: 5.18 / MAX: 8.531. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2dcba0.51531.03061.54592.06122.5765SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 32.272.272.292.29MIN: 2.1 / MAX: 5.57MIN: 2.13 / MAX: 5.55MIN: 2.12 / MAX: 3.59MIN: 2.13 / MAX: 3.941. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTdcba306090120150SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.77, N = 5SE +/- 0.88, N = 31241241251251. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNdcba15003000450060007500SE +/- 35.28, N = 3SE +/- 31.80, N = 3SE +/- 58.97, N = 3SE +/- 31.80, N = 370537037709370571. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLdcba0.98391.96782.95173.93564.9195SE +/- 0.016, N = 3SE +/- 0.010, N = 3SE +/- 0.004, N = 3SE +/- 0.010, N = 34.3394.3514.3734.3471. (CXX) g++ options: -O3 -march=native -fopenmp

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTdcba30060090012001500SE +/- 3.33, N = 3SE +/- 3.33, N = 3SE +/- 2.00, N = 5SE +/- 3.33, N = 312471243123812471. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNdcba306090120150SE +/- 0.58, N = 3SE +/- 0.58, N = 3SE +/- 0.93, N = 5SE +/- 1.20, N = 31401411401411. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYdcba8001600240032004000SE +/- 15.28, N = 3SE +/- 16.67, N = 3SE +/- 9.80, N = 5SE +/- 14.53, N = 339203917392439431. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in half precisiondcba30K60K90K120K150KSE +/- 377.55, N = 3SE +/- 136.79, N = 3SE +/- 506.81, N = 3SE +/- 190.55, N = 31519691528661519101519121. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3dcba1.1072.2143.3214.4285.535SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 34.914.924.924.89MIN: 4.79 / MAX: 6.53MIN: 4.74 / MAX: 6.99MIN: 4.77 / MAX: 7.57MIN: 4.76 / MAX: 7.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetdcba1.1072.2143.3214.4285.535SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 34.914.924.924.89MIN: 4.79 / MAX: 6.53MIN: 4.74 / MAX: 6.99MIN: 4.77 / MAX: 7.57MIN: 4.76 / MAX: 7.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTdcba140280420560700SE +/- 3.51, N = 3SE +/- 4.33, N = 3SE +/- 5.77, N = 5SE +/- 4.18, N = 36636666646671. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein in single precisiondcba4K8K12K16K20KSE +/- 168.45, N = 7SE +/- 147.24, N = 3SE +/- 196.01, N = 5SE +/- 131.79, N = 3179421788617967178671. (CXX) g++ options: -O3

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTdcba120240360480600SE +/- 1.00, N = 3SE +/- 1.15, N = 3SE +/- 0.33, N = 3SE +/- 0.88, N = 35535525525501. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.9Test: Conjugate Gradient OpenCLdcba0.67461.34922.02382.69843.373SE +/- 0.003, N = 3SE +/- 0.005, N = 3SE +/- 0.005, N = 3SE +/- 0.003, N = 32.9972.9982.9832.9971. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetdcba0.4590.9181.3771.8362.295SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 32.042.042.032.04MIN: 1.87 / MAX: 6.48MIN: 1.86 / MAX: 3.85MIN: 1.94 / MAX: 3.48MIN: 1.89 / MAX: 3.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinydcba246810SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 36.826.816.806.79MIN: 6.69 / MAX: 8.1MIN: 6.42 / MAX: 12.5MIN: 6.64 / MAX: 8.25MIN: 6.66 / MAX: 8.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Ndcba20406080100SE +/- 0.21, N = 3SE +/- 0.26, N = 3SE +/- 0.12, N = 3SE +/- 0.13, N = 381.481.281.581.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTdcba60120180240300SE +/- 0.67, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 32832832822821. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Tdcba70140210280350SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 33073083083081. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in double precisiondcba13K26K39K52K65KSE +/- 21.83, N = 3SE +/- 17.34, N = 3SE +/- 46.92, N = 3SE +/- 150.19, N = 3582995825658253584051. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16dcba1.18352.3673.55054.7345.9175SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 35.265.255.255.26MIN: 5.07 / MAX: 7.08MIN: 4.94 / MAX: 11.68MIN: 5.07 / MAX: 7.45MIN: 5.08 / MAX: 8.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTdcba15003000450060007500SE +/- 0.00, N = 3SE +/- 3.33, N = 3SE +/- 11.55, N = 3SE +/- 0.00, N = 370707057707070701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTdcba16003200480064008000SE +/- 0.00, N = 3SE +/- 3.33, N = 3SE +/- 8.82, N = 3SE +/- 3.33, N = 375407537753775271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYdcba130260390520650SE +/- 0.58, N = 3SE +/- 0.88, N = 3SE +/- 0.58, N = 3SE +/- 0.33, N = 36046046046031. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYdcba2004006008001000SE +/- 0.88, N = 3SE +/- 1.20, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 37997997987991. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Writedcba5001000150020002500SE +/- 3.80, N = 3SE +/- 0.99, N = 3SE +/- 0.88, N = 3SE +/- 1.31, N = 32352.12354.92353.42354.91. (CC) gcc options: -O2 -flto -lOpenCL

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision Doubledcba7K14K21K28K35KSE +/- 1.51, N = 3SE +/- 18.62, N = 3SE +/- 0.74, N = 3SE +/- 0.74, N = 332933.6332941.9932961.2132959.171. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTdcba7K14K21K28K35KSE +/- 8.15, N = 3SE +/- 0.26, N = 3SE +/- 0.09, N = 3SE +/- 2.54, N = 333129.3433146.1233144.7433119.101. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision Floatdcba14K28K42K56K70KSE +/- 0.91, N = 3SE +/- 0.56, N = 3SE +/- 0.85, N = 3SE +/- 0.43, N = 364520.9764547.2564547.7464545.621. (CXX) g++ options: -O3

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copydcba70140210280350SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.12, N = 3SE +/- 0.03, N = 3308.5308.6308.5308.61. (CC) gcc options: -O2 -flto -lOpenCL

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Doubledcba612182430SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 324.3024.2924.2924.301. (CXX) g++ options: -O3

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Readdcba2004006008001000SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.20, N = 3SE +/- 0.00, N = 31046.01046.11045.91045.91. (CC) gcc options: -O2 -flto -lOpenCL

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Singledcba1.1772.3543.5314.7085.885SE +/- 0.002, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.000, N = 35.2305.2305.2315.2301. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory Bandwidthdcba7001400210028003500SE +/- 0.04, N = 3SE +/- 0.27, N = 3SE +/- 0.20, N = 3SE +/- 0.33, N = 33484.323483.953484.063483.991. (CXX) g++ options: -O3

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYdcba70140210280350SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33163163163161. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL


Phoronix Test Suite v10.8.5