ngc smoke run

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2403013-NE-NGCSMOKER54&grs&sor&rro.

ngc smoke runProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH200 480GB2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 22.046.5.0-1007-NVIDIA-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.4.89GCC 11.4.0 + CUDA 11.5ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Graphics Details- BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02Python Details- Python 3.10.12Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ngc smoke runviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dGEMM-NNncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - regnety_400mviennacl: CPU BLAS - dGEMV-Nvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT R2C / C2Rviennacl: CPU BLAS - dGEMM-TTvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkfft: FFT + iFFT C2C 1D batched in single precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingncnn: Vulkan GPU - blazefaceviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - dGEMV-Tncnn: Vulkan GPU-v2-v2 - mobilenet-v2viennacl: CPU BLAS - dAXPYncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - googlenetviennacl: OpenCL BLAS - sAXPYncnn: Vulkan GPU - FastestDetncnn: Vulkan GPU - resnet50viennacl: OpenCL BLAS - dGEMM-TNncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - shufflenet-v2viennacl: CPU BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-NNfinancebench: Black-Scholes OpenCLviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - sAXPYvkfft: FFT + iFFT C2C 1D batched in half precisionncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - mobilenetviennacl: CPU BLAS - sDOTvkfft: FFT + iFFT C2C Bluestein in single precisionviennacl: OpenCL BLAS - dDOTarrayfire: Conjugate Gradient OpenCLncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - yolov4-tinyviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dGEMV-Tvkfft: FFT + iFFT C2C 1D batched in double precisionncnn: Vulkan GPU - vgg16viennacl: OpenCL BLAS - dGEMM-TTviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYcl-mem: Writeclpeak: Double-Precision Doubleclpeak: Integer Compute INTclpeak: Single-Precision Floatcl-mem: Copyvkresample: 2x - Doublecl-mem: Readvkresample: 2x - Singleclpeak: Global Memory Bandwidthviennacl: OpenCL BLAS - sCOPYabcd202713531.5214.784114448942397137208101857741944971.7529206862.1318031.632.162.263.494.164203.094.2770275.432.2912570574.347124714139431519124.894.89667178675502.9972.046.7981.2282308584055.26707075276037992354.932959.1733119.1064545.62308.624.2961045.95.2303483.99316194813732.3214.744084373141809138210001860821900371.7828926992.1618061.632.182.273.554.234273.104.2870675.472.2912570934.373123814039241519104.924.92664179675522.9832.036.8081.5282308582535.25707075376047982353.432961.2133144.7464547.74308.524.2941045.95.2313484.06316192013931.9214.774054507142581140210941899441909091.7429076912.1218371.652.172.303.524.234273.084.3270005.482.2712470374.351124314139171528664.924.92666178865522.9982.046.8181.2283308582565.25705775376047992354.932941.9933146.1264547.25308.624.2901046.15.2303483.95316191714131.1315.224184500743048136213201903101925071.7728576962.1218301.622.202.273.534.214263.124.3270705.432.2712470534.339124714039201519694.914.91663179425532.9972.046.8281.4283307582995.26707075406047992352.132933.6333129.3464520.97308.524.2971046.05.2303484.32316OpenBenchmarking.org

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYdcba400800120016002000SE +/- 14.53, N = 3SE +/- 41.63, N = 3SE +/- 17.15, N = 5SE +/- 44.85, N = 319171920194820271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNabcd306090120150SE +/- 0.88, N = 3SE +/- 1.29, N = 5SE +/- 2.67, N = 3SE +/- 2.65, N = 31351371391411. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerbcad816243240SE +/- 0.98, N = 3SE +/- 0.60, N = 3SE +/- 0.28, N = 3SE +/- 0.08, N = 332.3231.9231.5231.13MIN: 30.14 / MAX: 67.5MIN: 30.27 / MAX: 57.34MIN: 30.23 / MAX: 62.77MIN: 30.31 / MAX: 64.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mdacb48121620SE +/- 0.19, N = 3SE +/- 0.13, N = 3SE +/- 0.24, N = 3SE +/- 0.15, N = 315.2214.7814.7714.74MIN: 14.15 / MAX: 21.54MIN: 13.74 / MAX: 17.76MIN: 13.51 / MAX: 18.11MIN: 14 / MAX: 20.371. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Ncbad90180270360450SE +/- 1.86, N = 3SE +/- 2.90, N = 5SE +/- 0.33, N = 3SE +/- 8.51, N = 34054084114181. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C multidimensional in single precisionbadc10K20K30K40K50KSE +/- 441.36, N = 3SE +/- 479.16, N = 15SE +/- 571.37, N = 3SE +/- 475.72, N = 3437314448945007450711. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT R2C / C2Rbacd9K18K27K36K45KSE +/- 460.34, N = 3SE +/- 298.99, N = 3SE +/- 552.29, N = 3SE +/- 289.59, N = 15418094239742581430481. (CXX) g++ options: -O3

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTdabc306090120150SE +/- 1.20, N = 3SE +/- 1.53, N = 3SE +/- 2.18, N = 5SE +/- 4.73, N = 31361371381401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein benchmark in double precisionabcd5K10K15K20K25KSE +/- 188.78, N = 3SE +/- 195.51, N = 3SE +/- 282.81, N = 3SE +/- 152.14, N = 15208102100021094213201. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precisionabcd40K80K120K160K200KSE +/- 1557.78, N = 3SE +/- 1095.80, N = 3SE +/- 1666.00, N = 3SE +/- 479.86, N = 31857741860821899441903101. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingbcda40K80K120K160K200KSE +/- 521.00, N = 3SE +/- 583.76, N = 3SE +/- 720.67, N = 3SE +/- 2261.14, N = 31900371909091925071944971. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefacebdac0.40050.8011.20151.6022.0025SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 31.781.771.751.74MIN: 1.67 / MAX: 7.1MIN: 1.64 / MAX: 3.01MIN: 1.68 / MAX: 3.09MIN: 1.6 / MAX: 2.931. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYdbca6001200180024003000SE +/- 23.33, N = 3SE +/- 29.56, N = 5SE +/- 3.33, N = 3SE +/- 20.00, N = 328572892290729201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Tacdb150300450600750SE +/- 17.19, N = 3SE +/- 1.20, N = 3SE +/- 10.68, N = 3SE +/- 3.65, N = 56866916966991. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2badc0.4860.9721.4581.9442.43SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 32.162.132.122.12MIN: 2.04 / MAX: 8.11MIN: 1.99 / MAX: 3.7MIN: 1.96 / MAX: 5.5MIN: 1.96 / MAX: 3.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYabdc400800120016002000SE +/- 23.33, N = 3SE +/- 29.93, N = 5SE +/- 10.00, N = 3SE +/- 3.33, N = 318031806183018371. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetcbad0.37130.74261.11391.48521.8565SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 31.651.631.631.62MIN: 1.44 / MAX: 4.74MIN: 1.5 / MAX: 2.94MIN: 1.49 / MAX: 2.81MIN: 1.5 / MAX: 4.841. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18dbca0.4950.991.4851.982.475SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 32.202.182.172.16MIN: 2.08 / MAX: 3.65MIN: 2.05 / MAX: 5.44MIN: 2.04 / MAX: 3.54MIN: 2.04 / MAX: 3.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3cdba0.51751.0351.55252.072.5875SE +/- 0.05, N = 3SE +/- 0.04, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 32.302.272.272.26MIN: 2.1 / MAX: 3.74MIN: 2.11 / MAX: 4.59MIN: 2.16 / MAX: 5.3MIN: 2.11 / MAX: 3.891. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0bdca0.79881.59762.39643.19523.994SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 33.553.533.523.49MIN: 3.33 / MAX: 8.63MIN: 3.25 / MAX: 6.52MIN: 3.22 / MAX: 6.65MIN: 3.27 / MAX: 5.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetcbda0.95181.90362.85543.80724.759SE +/- 0.06, N = 3SE +/- 0.05, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 34.234.234.214.16MIN: 4 / MAX: 6.41MIN: 4.01 / MAX: 6.82MIN: 4.03 / MAX: 5.75MIN: 3.99 / MAX: 5.761. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYadbc90180270360450SE +/- 2.60, N = 3SE +/- 3.93, N = 3SE +/- 1.33, N = 3SE +/- 2.33, N = 34204264274271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetdbac0.7021.4042.1062.8083.51SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.06, N = 33.123.103.093.08MIN: 2.97 / MAX: 4.56MIN: 2.92 / MAX: 4.71MIN: 2.95 / MAX: 4.64MIN: 2.91 / MAX: 4.571. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50dcba0.9721.9442.9163.8884.86SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 34.324.324.284.27MIN: 4.1 / MAX: 7.58MIN: 4.05 / MAX: 8.1MIN: 4.05 / MAX: 7.65MIN: 4.05 / MAX: 6.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNcabd15003000450060007500SE +/- 15.28, N = 3SE +/- 17.64, N = 3SE +/- 46.31, N = 3SE +/- 45.09, N = 370007027706770701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdcbda1.2332.4663.6994.9326.165SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 35.485.475.435.43MIN: 5.16 / MAX: 11.04MIN: 5.22 / MAX: 11.66MIN: 5.17 / MAX: 7.39MIN: 5.18 / MAX: 8.531. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2badc0.51531.03061.54592.06122.5765SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 32.292.292.272.27MIN: 2.12 / MAX: 3.59MIN: 2.13 / MAX: 3.94MIN: 2.1 / MAX: 5.57MIN: 2.13 / MAX: 5.551. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTcdab306090120150SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.88, N = 3SE +/- 0.77, N = 51241241251251. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNcdab15003000450060007500SE +/- 31.80, N = 3SE +/- 35.28, N = 3SE +/- 31.80, N = 3SE +/- 58.97, N = 370377053705770931. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLbcad0.98391.96782.95173.93564.9195SE +/- 0.004, N = 3SE +/- 0.010, N = 3SE +/- 0.010, N = 3SE +/- 0.016, N = 34.3734.3514.3474.3391. (CXX) g++ options: -O3 -march=native -fopenmp

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTbcad30060090012001500SE +/- 2.00, N = 5SE +/- 3.33, N = 3SE +/- 3.33, N = 3SE +/- 3.33, N = 312381243124712471. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNbdac306090120150SE +/- 0.93, N = 5SE +/- 0.58, N = 3SE +/- 1.20, N = 3SE +/- 0.58, N = 31401401411411. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYcdba8001600240032004000SE +/- 16.67, N = 3SE +/- 15.28, N = 3SE +/- 9.80, N = 5SE +/- 14.53, N = 339173920392439431. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in half precisionbadc30K60K90K120K150KSE +/- 506.81, N = 3SE +/- 190.55, N = 3SE +/- 377.55, N = 3SE +/- 136.79, N = 31519101519121519691528661. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3cbda1.1072.2143.3214.4285.535SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 34.924.924.914.89MIN: 4.74 / MAX: 6.99MIN: 4.77 / MAX: 7.57MIN: 4.79 / MAX: 6.53MIN: 4.76 / MAX: 7.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetcbda1.1072.2143.3214.4285.535SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 34.924.924.914.89MIN: 4.74 / MAX: 6.99MIN: 4.77 / MAX: 7.57MIN: 4.79 / MAX: 6.53MIN: 4.76 / MAX: 7.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTdbca140280420560700SE +/- 3.51, N = 3SE +/- 5.77, N = 5SE +/- 4.33, N = 3SE +/- 4.18, N = 36636646666671. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein in single precisionacdb4K8K12K16K20KSE +/- 131.79, N = 3SE +/- 147.24, N = 3SE +/- 168.45, N = 7SE +/- 196.01, N = 5178671788617942179671. (CXX) g++ options: -O3

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTabcd120240360480600SE +/- 0.88, N = 3SE +/- 0.33, N = 3SE +/- 1.15, N = 3SE +/- 1.00, N = 35505525525531. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.9Test: Conjugate Gradient OpenCLcdab0.67461.34922.02382.69843.373SE +/- 0.005, N = 3SE +/- 0.003, N = 3SE +/- 0.003, N = 3SE +/- 0.005, N = 32.9982.9972.9972.9831. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetdcab0.4590.9181.3771.8362.295SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 32.042.042.042.03MIN: 1.87 / MAX: 6.48MIN: 1.86 / MAX: 3.85MIN: 1.89 / MAX: 3.56MIN: 1.94 / MAX: 3.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinydcba246810SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 36.826.816.806.79MIN: 6.69 / MAX: 8.1MIN: 6.42 / MAX: 12.5MIN: 6.64 / MAX: 8.25MIN: 6.66 / MAX: 8.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Nacdb20406080100SE +/- 0.13, N = 3SE +/- 0.26, N = 3SE +/- 0.21, N = 3SE +/- 0.12, N = 381.281.281.481.51. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTabcd60120180240300SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.67, N = 32822822832831. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Tdabc70140210280350SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 33073083083081. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in double precisionbcda13K26K39K52K65KSE +/- 46.92, N = 3SE +/- 17.34, N = 3SE +/- 21.83, N = 3SE +/- 150.19, N = 3582535825658299584051. (CXX) g++ options: -O3

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16dacb1.18352.3673.55054.7345.9175SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 35.265.265.255.25MIN: 5.07 / MAX: 7.08MIN: 5.08 / MAX: 8.11MIN: 4.94 / MAX: 11.68MIN: 5.07 / MAX: 7.451. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTcabd15003000450060007500SE +/- 3.33, N = 3SE +/- 0.00, N = 3SE +/- 11.55, N = 3SE +/- 0.00, N = 370577070707070701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTabcd16003200480064008000SE +/- 3.33, N = 3SE +/- 8.82, N = 3SE +/- 3.33, N = 3SE +/- 0.00, N = 375277537753775401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYabcd130260390520650SE +/- 0.33, N = 3SE +/- 0.58, N = 3SE +/- 0.88, N = 3SE +/- 0.58, N = 36036046046041. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYbacd2004006008001000SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 1.20, N = 3SE +/- 0.88, N = 37987997997991. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Writedbac5001000150020002500SE +/- 3.80, N = 3SE +/- 0.88, N = 3SE +/- 1.31, N = 3SE +/- 0.99, N = 32352.12353.42354.92354.91. (CC) gcc options: -O2 -flto -lOpenCL

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision Doubledcab7K14K21K28K35KSE +/- 1.51, N = 3SE +/- 18.62, N = 3SE +/- 0.74, N = 3SE +/- 0.74, N = 332933.6332941.9932959.1732961.211. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTadbc7K14K21K28K35KSE +/- 2.54, N = 3SE +/- 8.15, N = 3SE +/- 0.09, N = 3SE +/- 0.26, N = 333119.1033129.3433144.7433146.121. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision Floatdacb14K28K42K56K70KSE +/- 0.91, N = 3SE +/- 0.43, N = 3SE +/- 0.56, N = 3SE +/- 0.85, N = 364520.9764545.6264547.2564547.741. (CXX) g++ options: -O3

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copybdac70140210280350SE +/- 0.12, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.07, N = 3308.5308.5308.6308.61. (CC) gcc options: -O2 -flto -lOpenCL

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Doubledabc612182430SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 324.3024.3024.2924.291. (CXX) g++ options: -O3

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Readabdc2004006008001000SE +/- 0.00, N = 3SE +/- 0.20, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 31045.91045.91046.01046.11. (CC) gcc options: -O2 -flto -lOpenCL

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Singlebdca1.1772.3543.5314.7085.885SE +/- 0.001, N = 3SE +/- 0.002, N = 3SE +/- 0.001, N = 3SE +/- 0.000, N = 35.2315.2305.2305.2301. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory Bandwidthcabd7001400210028003500SE +/- 0.27, N = 3SE +/- 0.33, N = 3SE +/- 0.20, N = 3SE +/- 0.04, N = 33483.953483.993484.063484.321. (CXX) g++ options: -O3

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYabcd70140210280350SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33163163163161. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL


Phoronix Test Suite v10.8.5