ngc smoke run

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2403013-NE-NGCSMOKER54.

ngc smoke runProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH200 480GB2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 22.046.5.0-1007-NVIDIA-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.4.89GCC 11.4.0 + CUDA 11.5ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Graphics Details- BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02Python Details- Python 3.10.12Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ngc smoke runvkfft: FFT + iFFT R2C / C2Rvkfft: FFT + iFFT C2C 1D batched in half precisionvkfft: FFT + iFFT C2C Bluestein in single precisionvkfft: FFT + iFFT C2C 1D batched in double precisionvkfft: FFT + iFFT C2C 1D batched in single precisionvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingcl-mem: Copycl-mem: Readcl-mem: Writevkresample: 2x - Doublevkresample: 2x - Singleclpeak: Integer Compute INTclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Global Memory Bandwidtharrayfire: Conjugate Gradient OpenCLfinancebench: Black-Scholes OpenCLviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-TTviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-TTncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetabcd4239715191217867584051857744448920810194497308.61045.92354.924.2965.23033119.1064545.6232959.173483.992.9974.3472920394366720271803124741168613512514113731642028260379955081.230870577527702770704.892.132.262.292.043.491.754.165.262.161.634.274.896.795.4314.7831.523.094180915191017967582531860824373121000190037308.51045.92353.424.2945.23133144.7464547.7432961.213484.062.9834.3732892392466419481806123840869913712514013831642728260479855281.530870937537706770704.922.162.272.292.033.551.784.235.252.181.634.284.926.805.4714.7432.323.104258115286617886582561899444507121094190909308.61046.12354.924.2905.23033146.1264547.2532941.993483.952.9984.3512907391766619201837124340569113912414114031642728360479955281.230870377537700070574.922.122.302.272.043.521.744.235.252.171.654.324.926.815.4814.7731.923.084304815196917942582991903104500721320192507308.51046.02352.124.2975.23033129.3464520.9732933.633484.322.9974.3392857392066319171830124741869614112414013631642628360479955381.430770537540707070704.912.122.272.272.043.531.774.215.262.201.624.324.916.825.4315.2231.133.12OpenBenchmarking.org

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT R2C / C2Rabcd9K18K27K36K45KSE +/- 298.99, N = 3SE +/- 460.34, N = 3SE +/- 552.29, N = 3SE +/- 289.59, N = 15423974180942581430481. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in half precisionabcd30K60K90K120K150KSE +/- 190.55, N = 3SE +/- 506.81, N = 3SE +/- 136.79, N = 3SE +/- 377.55, N = 31519121519101528661519691. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein in single precisionabcd4K8K12K16K20KSE +/- 131.79, N = 3SE +/- 196.01, N = 5SE +/- 147.24, N = 3SE +/- 168.45, N = 7178671796717886179421. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in double precisionabcd13K26K39K52K65KSE +/- 150.19, N = 3SE +/- 46.92, N = 3SE +/- 17.34, N = 3SE +/- 21.83, N = 3584055825358256582991. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precisionabcd40K80K120K160K200KSE +/- 1557.78, N = 3SE +/- 1095.80, N = 3SE +/- 1666.00, N = 3SE +/- 479.86, N = 31857741860821899441903101. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C multidimensional in single precisionabcd10K20K30K40K50KSE +/- 479.16, N = 15SE +/- 441.36, N = 3SE +/- 475.72, N = 3SE +/- 571.37, N = 3444894373145071450071. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein benchmark in double precisionabcd5K10K15K20K25KSE +/- 188.78, N = 3SE +/- 195.51, N = 3SE +/- 282.81, N = 3SE +/- 152.14, N = 15208102100021094213201. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingabcd40K80K120K160K200KSE +/- 2261.14, N = 3SE +/- 521.00, N = 3SE +/- 583.76, N = 3SE +/- 720.67, N = 31944971900371909091925071. (CXX) g++ options: -O3

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copyabcd70140210280350SE +/- 0.03, N = 3SE +/- 0.12, N = 3SE +/- 0.07, N = 3SE +/- 0.03, N = 3308.6308.5308.6308.51. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Readabcd2004006008001000SE +/- 0.00, N = 3SE +/- 0.20, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 31045.91045.91046.11046.01. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Writeabcd5001000150020002500SE +/- 1.31, N = 3SE +/- 0.88, N = 3SE +/- 0.99, N = 3SE +/- 3.80, N = 32354.92353.42354.92352.11. (CC) gcc options: -O2 -flto -lOpenCL

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Doubleabcd612182430SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 324.3024.2924.2924.301. (CXX) g++ options: -O3

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Singleabcd1.1772.3543.5314.7085.885SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.001, N = 3SE +/- 0.002, N = 35.2305.2315.2305.2301. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTabcd7K14K21K28K35KSE +/- 2.54, N = 3SE +/- 0.09, N = 3SE +/- 0.26, N = 3SE +/- 8.15, N = 333119.1033144.7433146.1233129.341. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision Floatabcd14K28K42K56K70KSE +/- 0.43, N = 3SE +/- 0.85, N = 3SE +/- 0.56, N = 3SE +/- 0.91, N = 364545.6264547.7464547.2564520.971. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision Doubleabcd7K14K21K28K35KSE +/- 0.74, N = 3SE +/- 0.74, N = 3SE +/- 18.62, N = 3SE +/- 1.51, N = 332959.1732961.2132941.9932933.631. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory Bandwidthabcd7001400210028003500SE +/- 0.33, N = 3SE +/- 0.20, N = 3SE +/- 0.27, N = 3SE +/- 0.04, N = 33483.993484.063483.953484.321. (CXX) g++ options: -O3

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.9Test: Conjugate Gradient OpenCLabcd0.67461.34922.02382.69843.373SE +/- 0.003, N = 3SE +/- 0.005, N = 3SE +/- 0.005, N = 3SE +/- 0.003, N = 32.9972.9832.9982.9971. (CXX) g++ options: -O3

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLabcd0.98391.96782.95173.93564.9195SE +/- 0.010, N = 3SE +/- 0.004, N = 3SE +/- 0.010, N = 3SE +/- 0.016, N = 34.3474.3734.3514.3391. (CXX) g++ options: -O3 -march=native -fopenmp

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYabcd6001200180024003000SE +/- 20.00, N = 3SE +/- 29.56, N = 5SE +/- 3.33, N = 3SE +/- 23.33, N = 329202892290728571. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYabcd8001600240032004000SE +/- 14.53, N = 3SE +/- 9.80, N = 5SE +/- 16.67, N = 3SE +/- 15.28, N = 339433924391739201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTabcd140280420560700SE +/- 4.18, N = 3SE +/- 5.77, N = 5SE +/- 4.33, N = 3SE +/- 3.51, N = 36676646666631. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYabcd400800120016002000SE +/- 44.85, N = 3SE +/- 17.15, N = 5SE +/- 41.63, N = 3SE +/- 14.53, N = 320271948192019171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYabcd400800120016002000SE +/- 23.33, N = 3SE +/- 29.93, N = 5SE +/- 3.33, N = 3SE +/- 10.00, N = 318031806183718301. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTabcd30060090012001500SE +/- 3.33, N = 3SE +/- 2.00, N = 5SE +/- 3.33, N = 3SE +/- 3.33, N = 312471238124312471. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Nabcd90180270360450SE +/- 0.33, N = 3SE +/- 2.90, N = 5SE +/- 1.86, N = 3SE +/- 8.51, N = 34114084054181. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Tabcd150300450600750SE +/- 17.19, N = 3SE +/- 3.65, N = 5SE +/- 1.20, N = 3SE +/- 10.68, N = 36866996916961. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNabcd306090120150SE +/- 0.88, N = 3SE +/- 1.29, N = 5SE +/- 2.67, N = 3SE +/- 2.65, N = 31351371391411. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTabcd306090120150SE +/- 0.88, N = 3SE +/- 0.77, N = 5SE +/- 0.33, N = 3SE +/- 0.33, N = 31251251241241. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNabcd306090120150SE +/- 1.20, N = 3SE +/- 0.93, N = 5SE +/- 0.58, N = 3SE +/- 0.58, N = 31411401411401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTabcd306090120150SE +/- 1.53, N = 3SE +/- 2.18, N = 5SE +/- 4.73, N = 3SE +/- 1.20, N = 31371381401361. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYabcd70140210280350SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33163163163161. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYabcd90180270360450SE +/- 2.60, N = 3SE +/- 1.33, N = 3SE +/- 2.33, N = 3SE +/- 3.93, N = 34204274274261. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTabcd60120180240300SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.67, N = 32822822832831. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYabcd130260390520650SE +/- 0.33, N = 3SE +/- 0.58, N = 3SE +/- 0.88, N = 3SE +/- 0.58, N = 36036046046041. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYabcd2004006008001000SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 1.20, N = 3SE +/- 0.88, N = 37997987997991. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTabcd120240360480600SE +/- 0.88, N = 3SE +/- 0.33, N = 3SE +/- 1.15, N = 3SE +/- 1.00, N = 35505525525531. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Nabcd20406080100SE +/- 0.13, N = 3SE +/- 0.12, N = 3SE +/- 0.26, N = 3SE +/- 0.21, N = 381.281.581.281.41. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Tabcd70140210280350SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33083083083071. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNabcd15003000450060007500SE +/- 31.80, N = 3SE +/- 58.97, N = 3SE +/- 31.80, N = 3SE +/- 35.28, N = 370577093703770531. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTabcd16003200480064008000SE +/- 3.33, N = 3SE +/- 8.82, N = 3SE +/- 3.33, N = 3SE +/- 0.00, N = 375277537753775401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNabcd15003000450060007500SE +/- 17.64, N = 3SE +/- 46.31, N = 3SE +/- 15.28, N = 3SE +/- 45.09, N = 370277067700070701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTabcd15003000450060007500SE +/- 0.00, N = 3SE +/- 11.55, N = 3SE +/- 3.33, N = 3SE +/- 0.00, N = 370707070705770701. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetabcd1.1072.2143.3214.4285.535SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 34.894.924.924.91MIN: 4.76 / MAX: 7.99MIN: 4.77 / MAX: 7.57MIN: 4.74 / MAX: 6.99MIN: 4.79 / MAX: 6.531. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2abcd0.4860.9721.4581.9442.43SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 32.132.162.122.12MIN: 1.99 / MAX: 3.7MIN: 2.04 / MAX: 8.11MIN: 1.96 / MAX: 3.59MIN: 1.96 / MAX: 5.51. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3abcd0.51751.0351.55252.072.5875SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.05, N = 3SE +/- 0.04, N = 32.262.272.302.27MIN: 2.11 / MAX: 3.89MIN: 2.16 / MAX: 5.3MIN: 2.1 / MAX: 3.74MIN: 2.11 / MAX: 4.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2abcd0.51531.03061.54592.06122.5765SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 32.292.292.272.27MIN: 2.13 / MAX: 3.94MIN: 2.12 / MAX: 3.59MIN: 2.13 / MAX: 5.55MIN: 2.1 / MAX: 5.571. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetabcd0.4590.9181.3771.8362.295SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 32.042.032.042.04MIN: 1.89 / MAX: 3.56MIN: 1.94 / MAX: 3.48MIN: 1.86 / MAX: 3.85MIN: 1.87 / MAX: 6.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0abcd0.79881.59762.39643.19523.994SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.04, N = 33.493.553.523.53MIN: 3.27 / MAX: 5.07MIN: 3.33 / MAX: 8.63MIN: 3.22 / MAX: 6.65MIN: 3.25 / MAX: 6.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefaceabcd0.40050.8011.20151.6022.0025SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 31.751.781.741.77MIN: 1.68 / MAX: 3.09MIN: 1.67 / MAX: 7.1MIN: 1.6 / MAX: 2.93MIN: 1.64 / MAX: 3.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetabcd0.95181.90362.85543.80724.759SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.06, N = 3SE +/- 0.01, N = 34.164.234.234.21MIN: 3.99 / MAX: 5.76MIN: 4.01 / MAX: 6.82MIN: 4 / MAX: 6.41MIN: 4.03 / MAX: 5.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16abcd1.18352.3673.55054.7345.9175SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 35.265.255.255.26MIN: 5.08 / MAX: 8.11MIN: 5.07 / MAX: 7.45MIN: 4.94 / MAX: 11.68MIN: 5.07 / MAX: 7.081. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18abcd0.4950.991.4851.982.475SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 32.162.182.172.20MIN: 2.04 / MAX: 3.59MIN: 2.05 / MAX: 5.44MIN: 2.04 / MAX: 3.54MIN: 2.08 / MAX: 3.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetabcd0.37130.74261.11391.48521.8565SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.00, N = 31.631.631.651.62MIN: 1.49 / MAX: 2.81MIN: 1.5 / MAX: 2.94MIN: 1.44 / MAX: 4.74MIN: 1.5 / MAX: 4.841. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50abcd0.9721.9442.9163.8884.86SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 34.274.284.324.32MIN: 4.05 / MAX: 6.65MIN: 4.05 / MAX: 7.65MIN: 4.05 / MAX: 8.1MIN: 4.1 / MAX: 7.581. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3abcd1.1072.2143.3214.4285.535SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.01, N = 34.894.924.924.91MIN: 4.76 / MAX: 7.99MIN: 4.77 / MAX: 7.57MIN: 4.74 / MAX: 6.99MIN: 4.79 / MAX: 6.531. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyabcd246810SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 36.796.806.816.82MIN: 6.66 / MAX: 8.49MIN: 6.64 / MAX: 8.25MIN: 6.42 / MAX: 12.5MIN: 6.69 / MAX: 8.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdabcd1.2332.4663.6994.9326.165SE +/- 0.03, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 3SE +/- 0.04, N = 35.435.475.485.43MIN: 5.18 / MAX: 8.53MIN: 5.22 / MAX: 11.66MIN: 5.16 / MAX: 11.04MIN: 5.17 / MAX: 7.391. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mabcd48121620SE +/- 0.13, N = 3SE +/- 0.15, N = 3SE +/- 0.24, N = 3SE +/- 0.19, N = 314.7814.7414.7715.22MIN: 13.74 / MAX: 17.76MIN: 14 / MAX: 20.37MIN: 13.51 / MAX: 18.11MIN: 14.15 / MAX: 21.541. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerabcd816243240SE +/- 0.28, N = 3SE +/- 0.98, N = 3SE +/- 0.60, N = 3SE +/- 0.08, N = 331.5232.3231.9231.13MIN: 30.23 / MAX: 62.77MIN: 30.14 / MAX: 67.5MIN: 30.27 / MAX: 57.34MIN: 30.31 / MAX: 64.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetabcd0.7021.4042.1062.8083.51SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.06, N = 3SE +/- 0.03, N = 33.093.103.083.12MIN: 2.95 / MAX: 4.64MIN: 2.92 / MAX: 4.71MIN: 2.91 / MAX: 4.57MIN: 2.97 / MAX: 4.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread


Phoronix Test Suite v10.8.4