ngc smoke run

ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2403013-NE-NGCSMOKER54&sor.

ngc smoke runProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLCompilerFile-SystemScreen ResolutionabcdARMv8 Neoverse-V2 @ 3.39GHz (72 Cores)Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS)1 x 480GB DRAM-6400MT/s960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9NVIDIA GH200 480GB2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbEUbuntu 22.046.5.0-1007-NVIDIA-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.4.89GCC 11.4.0 + CUDA 11.5ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details- Scaling Governor: cppc_cpufreq performance (Boost: Disabled)Graphics Details- BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02Python Details- Python 3.10.12Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ngc smoke runvkfft: FFT + iFFT R2C / C2Rvkfft: FFT + iFFT C2C 1D batched in half precisionvkfft: FFT + iFFT C2C Bluestein in single precisionvkfft: FFT + iFFT C2C 1D batched in double precisionvkfft: FFT + iFFT C2C 1D batched in single precisionvkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT C2C Bluestein benchmark in double precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingcl-mem: Copycl-mem: Readcl-mem: Writevkresample: 2x - Doublevkresample: 2x - Singleclpeak: Integer Compute INTclpeak: Single-Precision Floatclpeak: Double-Precision Doubleclpeak: Global Memory Bandwidtharrayfire: Conjugate Gradient OpenCLfinancebench: Black-Scholes OpenCLviennacl: CPU BLAS - sCOPYviennacl: CPU BLAS - sAXPYviennacl: CPU BLAS - sDOTviennacl: CPU BLAS - dCOPYviennacl: CPU BLAS - dAXPYviennacl: CPU BLAS - dDOTviennacl: CPU BLAS - dGEMV-Nviennacl: CPU BLAS - dGEMV-Tviennacl: CPU BLAS - dGEMM-NNviennacl: CPU BLAS - dGEMM-NTviennacl: CPU BLAS - dGEMM-TNviennacl: CPU BLAS - dGEMM-TTviennacl: OpenCL BLAS - sCOPYviennacl: OpenCL BLAS - sAXPYviennacl: OpenCL BLAS - sDOTviennacl: OpenCL BLAS - dCOPYviennacl: OpenCL BLAS - dAXPYviennacl: OpenCL BLAS - dDOTviennacl: OpenCL BLAS - dGEMV-Nviennacl: OpenCL BLAS - dGEMV-Tviennacl: OpenCL BLAS - dGEMM-NNviennacl: OpenCL BLAS - dGEMM-NTviennacl: OpenCL BLAS - dGEMM-TNviennacl: OpenCL BLAS - dGEMM-TTncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetabcd4239715191217867584051857744448920810194497308.61045.92354.924.2965.23033119.1064545.6232959.173483.992.9974.3472920394366720271803124741168613512514113731642028260379955081.230870577527702770704.892.132.262.292.043.491.754.165.262.161.634.274.896.795.4314.7831.523.094180915191017967582531860824373121000190037308.51045.92353.424.2945.23133144.7464547.7432961.213484.062.9834.3732892392466419481806123840869913712514013831642728260479855281.530870937537706770704.922.162.272.292.033.551.784.235.252.181.634.284.926.805.4714.7432.323.104258115286617886582561899444507121094190909308.61046.12354.924.2905.23033146.1264547.2532941.993483.952.9984.3512907391766619201837124340569113912414114031642728360479955281.230870377537700070574.922.122.302.272.043.521.744.235.252.171.654.324.926.815.4814.7731.923.084304815196917942582991903104500721320192507308.51046.02352.124.2975.23033129.3464520.9732933.633484.322.9974.3392857392066319171830124741869614112414013631642628360479955381.430770537540707070704.912.122.272.272.043.531.774.215.262.201.624.324.916.825.4315.2231.133.12OpenBenchmarking.org

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT R2C / C2Rdcab9K18K27K36K45KSE +/- 289.59, N = 15SE +/- 552.29, N = 3SE +/- 298.99, N = 3SE +/- 460.34, N = 3430484258142397418091. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in half precisioncdab30K60K90K120K150KSE +/- 136.79, N = 3SE +/- 377.55, N = 3SE +/- 190.55, N = 3SE +/- 506.81, N = 31528661519691519121519101. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein in single precisionbdca4K8K12K16K20KSE +/- 196.01, N = 5SE +/- 168.45, N = 7SE +/- 147.24, N = 3SE +/- 131.79, N = 3179671794217886178671. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in double precisionadcb13K26K39K52K65KSE +/- 150.19, N = 3SE +/- 21.83, N = 3SE +/- 17.34, N = 3SE +/- 46.92, N = 3584055829958256582531. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precisiondcba40K80K120K160K200KSE +/- 479.86, N = 3SE +/- 1666.00, N = 3SE +/- 1095.80, N = 3SE +/- 1557.78, N = 31903101899441860821857741. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C multidimensional in single precisioncdab10K20K30K40K50KSE +/- 475.72, N = 3SE +/- 571.37, N = 3SE +/- 479.16, N = 15SE +/- 441.36, N = 3450714500744489437311. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein benchmark in double precisiondcba5K10K15K20K25KSE +/- 152.14, N = 15SE +/- 282.81, N = 3SE +/- 195.51, N = 3SE +/- 188.78, N = 3213202109421000208101. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingadcb40K80K120K160K200KSE +/- 2261.14, N = 3SE +/- 720.67, N = 3SE +/- 583.76, N = 3SE +/- 521.00, N = 31944971925071909091900371. (CXX) g++ options: -O3

cl-mem

Benchmark: Copy

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Copycadb70140210280350SE +/- 0.07, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.12, N = 3308.6308.6308.5308.51. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Read

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Readcdba2004006008001000SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.20, N = 3SE +/- 0.00, N = 31046.11046.01045.91045.91. (CC) gcc options: -O2 -flto -lOpenCL

cl-mem

Benchmark: Write

OpenBenchmarking.orgGB/s, More Is Bettercl-mem 2017-01-13Benchmark: Writecabd5001000150020002500SE +/- 0.99, N = 3SE +/- 1.31, N = 3SE +/- 0.88, N = 3SE +/- 3.80, N = 32354.92354.92353.42352.11. (CC) gcc options: -O2 -flto -lOpenCL

VkResample

Upscale: 2x - Precision: Double

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Doublecbad612182430SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 324.2924.2924.3024.301. (CXX) g++ options: -O3

VkResample

Upscale: 2x - Precision: Single

OpenBenchmarking.orgms, Fewer Is BetterVkResample 1.0Upscale: 2x - Precision: Singleacdb1.1772.3543.5314.7085.885SE +/- 0.000, N = 3SE +/- 0.001, N = 3SE +/- 0.002, N = 3SE +/- 0.001, N = 35.2305.2305.2305.2311. (CXX) g++ options: -O3

clpeak

OpenCL Test: Integer Compute INT

OpenBenchmarking.orgGIOPS, More Is Betterclpeak 1.1.2OpenCL Test: Integer Compute INTcbda7K14K21K28K35KSE +/- 0.26, N = 3SE +/- 0.09, N = 3SE +/- 8.15, N = 3SE +/- 2.54, N = 333146.1233144.7433129.3433119.101. (CXX) g++ options: -O3

clpeak

OpenCL Test: Single-Precision Float

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Single-Precision Floatbcad14K28K42K56K70KSE +/- 0.85, N = 3SE +/- 0.56, N = 3SE +/- 0.43, N = 3SE +/- 0.91, N = 364547.7464547.2564545.6264520.971. (CXX) g++ options: -O3

clpeak

OpenCL Test: Double-Precision Double

OpenBenchmarking.orgGFLOPS, More Is Betterclpeak 1.1.2OpenCL Test: Double-Precision Doublebacd7K14K21K28K35KSE +/- 0.74, N = 3SE +/- 0.74, N = 3SE +/- 18.62, N = 3SE +/- 1.51, N = 332961.2132959.1732941.9932933.631. (CXX) g++ options: -O3

clpeak

OpenCL Test: Global Memory Bandwidth

OpenBenchmarking.orgGBPS, More Is Betterclpeak 1.1.2OpenCL Test: Global Memory Bandwidthdbac7001400210028003500SE +/- 0.04, N = 3SE +/- 0.20, N = 3SE +/- 0.33, N = 3SE +/- 0.27, N = 33484.323484.063483.993483.951. (CXX) g++ options: -O3

ArrayFire

Test: Conjugate Gradient OpenCL

OpenBenchmarking.orgms, Fewer Is BetterArrayFire 3.9Test: Conjugate Gradient OpenCLbadc0.67461.34922.02382.69843.373SE +/- 0.005, N = 3SE +/- 0.003, N = 3SE +/- 0.003, N = 3SE +/- 0.005, N = 32.9832.9972.9972.9981. (CXX) g++ options: -O3

FinanceBench

Benchmark: Black-Scholes OpenCL

OpenBenchmarking.orgms, Fewer Is BetterFinanceBench 2016-07-25Benchmark: Black-Scholes OpenCLdacb0.98391.96782.95173.93564.9195SE +/- 0.016, N = 3SE +/- 0.010, N = 3SE +/- 0.010, N = 3SE +/- 0.004, N = 34.3394.3474.3514.3731. (CXX) g++ options: -O3 -march=native -fopenmp

ViennaCL

Test: CPU BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sCOPYacbd6001200180024003000SE +/- 20.00, N = 3SE +/- 3.33, N = 3SE +/- 29.56, N = 5SE +/- 23.33, N = 329202907289228571. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sAXPYabdc8001600240032004000SE +/- 14.53, N = 3SE +/- 9.80, N = 5SE +/- 15.28, N = 3SE +/- 16.67, N = 339433924392039171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - sDOTacbd140280420560700SE +/- 4.18, N = 3SE +/- 4.33, N = 3SE +/- 5.77, N = 5SE +/- 3.51, N = 36676666646631. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dCOPYabcd400800120016002000SE +/- 44.85, N = 3SE +/- 17.15, N = 5SE +/- 41.63, N = 3SE +/- 14.53, N = 320271948192019171. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dAXPYcdba400800120016002000SE +/- 3.33, N = 3SE +/- 10.00, N = 3SE +/- 29.93, N = 5SE +/- 23.33, N = 318371830180618031. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dDOTdacb30060090012001500SE +/- 3.33, N = 3SE +/- 3.33, N = 3SE +/- 3.33, N = 3SE +/- 2.00, N = 512471247124312381. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Ndabc90180270360450SE +/- 8.51, N = 3SE +/- 0.33, N = 3SE +/- 2.90, N = 5SE +/- 1.86, N = 34184114084051. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMV-Tbdca150300450600750SE +/- 3.65, N = 5SE +/- 10.68, N = 3SE +/- 1.20, N = 3SE +/- 17.19, N = 36996966916861. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NNdcba306090120150SE +/- 2.65, N = 3SE +/- 2.67, N = 3SE +/- 1.29, N = 5SE +/- 0.88, N = 31411391371351. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-NTbadc306090120150SE +/- 0.77, N = 5SE +/- 0.88, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 31251251241241. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TNcadb306090120150SE +/- 0.58, N = 3SE +/- 1.20, N = 3SE +/- 0.58, N = 3SE +/- 0.93, N = 51411411401401. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: CPU BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: CPU BLAS - dGEMM-TTcbad306090120150SE +/- 4.73, N = 3SE +/- 2.18, N = 5SE +/- 1.53, N = 3SE +/- 1.20, N = 31401381371361. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sCOPYdcba70140210280350SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 33163163163161. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sAXPYcbda90180270360450SE +/- 2.33, N = 3SE +/- 1.33, N = 3SE +/- 3.93, N = 3SE +/- 2.60, N = 34274274264201. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - sDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - sDOTdcba60120180240300SE +/- 0.67, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 3SE +/- 0.00, N = 32832832822821. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dCOPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dCOPYdcba130260390520650SE +/- 0.58, N = 3SE +/- 0.88, N = 3SE +/- 0.58, N = 3SE +/- 0.33, N = 36046046046031. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dAXPY

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dAXPYdcab2004006008001000SE +/- 0.88, N = 3SE +/- 1.20, N = 3SE +/- 0.33, N = 3SE +/- 0.33, N = 37997997997981. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dDOT

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dDOTdcba120240360480600SE +/- 1.00, N = 3SE +/- 1.15, N = 3SE +/- 0.33, N = 3SE +/- 0.88, N = 35535525525501. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-N

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Nbdca20406080100SE +/- 0.12, N = 3SE +/- 0.21, N = 3SE +/- 0.26, N = 3SE +/- 0.13, N = 381.581.481.281.21. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMV-T

OpenBenchmarking.orgGB/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMV-Tcbad70140210280350SE +/- 0.33, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.33, N = 33083083083071. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NNbadc15003000450060007500SE +/- 58.97, N = 3SE +/- 31.80, N = 3SE +/- 35.28, N = 3SE +/- 31.80, N = 370937057705370371. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-NT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-NTdcba16003200480064008000SE +/- 0.00, N = 3SE +/- 3.33, N = 3SE +/- 8.82, N = 3SE +/- 3.33, N = 375407537753775271. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TN

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TNdbac15003000450060007500SE +/- 45.09, N = 3SE +/- 46.31, N = 3SE +/- 17.64, N = 3SE +/- 15.28, N = 370707067702770001. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

ViennaCL

Test: OpenCL BLAS - dGEMM-TT

OpenBenchmarking.orgGFLOPs/s, More Is BetterViennaCL 1.7.1Test: OpenCL BLAS - dGEMM-TTdbac15003000450060007500SE +/- 0.00, N = 3SE +/- 11.55, N = 3SE +/- 0.00, N = 3SE +/- 3.33, N = 370707070707070571. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL

NCNN

Target: Vulkan GPU - Model: mobilenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mobilenetadbc1.1072.2143.3214.4285.535SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 34.894.914.924.92MIN: 4.76 / MAX: 7.99MIN: 4.79 / MAX: 6.53MIN: 4.77 / MAX: 7.57MIN: 4.74 / MAX: 6.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2cdab0.4860.9721.4581.9442.43SE +/- 0.04, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 32.122.122.132.16MIN: 1.96 / MAX: 3.59MIN: 1.96 / MAX: 5.5MIN: 1.99 / MAX: 3.7MIN: 2.04 / MAX: 8.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3abdc0.51751.0351.55252.072.5875SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.04, N = 3SE +/- 0.05, N = 32.262.272.272.30MIN: 2.11 / MAX: 3.89MIN: 2.16 / MAX: 5.3MIN: 2.11 / MAX: 4.59MIN: 2.1 / MAX: 3.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: shufflenet-v2

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: shufflenet-v2cdab0.51531.03061.54592.06122.5765SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 32.272.272.292.29MIN: 2.13 / MAX: 5.55MIN: 2.1 / MAX: 5.57MIN: 2.13 / MAX: 3.94MIN: 2.12 / MAX: 3.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: mnasnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: mnasnetbacd0.4590.9181.3771.8362.295SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 32.032.042.042.04MIN: 1.94 / MAX: 3.48MIN: 1.89 / MAX: 3.56MIN: 1.86 / MAX: 3.85MIN: 1.87 / MAX: 6.481. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: efficientnet-b0

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: efficientnet-b0acdb0.79881.59762.39643.19523.994SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 33.493.523.533.55MIN: 3.27 / MAX: 5.07MIN: 3.22 / MAX: 6.65MIN: 3.25 / MAX: 6.52MIN: 3.33 / MAX: 8.631. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: blazeface

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: blazefacecadb0.40050.8011.20151.6022.0025SE +/- 0.03, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 31.741.751.771.78MIN: 1.6 / MAX: 2.93MIN: 1.68 / MAX: 3.09MIN: 1.64 / MAX: 3.01MIN: 1.67 / MAX: 7.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: googlenet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: googlenetadbc0.95181.90362.85543.80724.759SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.05, N = 3SE +/- 0.06, N = 34.164.214.234.23MIN: 3.99 / MAX: 5.76MIN: 4.03 / MAX: 5.75MIN: 4.01 / MAX: 6.82MIN: 4 / MAX: 6.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vgg16

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vgg16bcad1.18352.3673.55054.7345.9175SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 35.255.255.265.26MIN: 5.07 / MAX: 7.45MIN: 4.94 / MAX: 11.68MIN: 5.08 / MAX: 8.11MIN: 5.07 / MAX: 7.081. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet18

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet18acbd0.4950.991.4851.982.475SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 32.162.172.182.20MIN: 2.04 / MAX: 3.59MIN: 2.04 / MAX: 3.54MIN: 2.05 / MAX: 5.44MIN: 2.08 / MAX: 3.651. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: alexnet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: alexnetdabc0.37130.74261.11391.48521.8565SE +/- 0.00, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 31.621.631.631.65MIN: 1.5 / MAX: 4.84MIN: 1.49 / MAX: 2.81MIN: 1.5 / MAX: 2.94MIN: 1.44 / MAX: 4.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: resnet50

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: resnet50abcd0.9721.9442.9163.8884.86SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 34.274.284.324.32MIN: 4.05 / MAX: 6.65MIN: 4.05 / MAX: 7.65MIN: 4.05 / MAX: 8.1MIN: 4.1 / MAX: 7.581. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3adbc1.1072.2143.3214.4285.535SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 34.894.914.924.92MIN: 4.76 / MAX: 7.99MIN: 4.79 / MAX: 6.53MIN: 4.77 / MAX: 7.57MIN: 4.74 / MAX: 6.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: yolov4-tiny

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: yolov4-tinyabcd246810SE +/- 0.01, N = 3SE +/- 0.03, N = 3SE +/- 0.03, N = 3SE +/- 0.02, N = 36.796.806.816.82MIN: 6.66 / MAX: 8.49MIN: 6.64 / MAX: 8.25MIN: 6.42 / MAX: 12.5MIN: 6.69 / MAX: 8.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: squeezenet_ssd

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: squeezenet_ssdadbc1.2332.4663.6994.9326.165SE +/- 0.03, N = 3SE +/- 0.04, N = 3SE +/- 0.07, N = 3SE +/- 0.07, N = 35.435.435.475.48MIN: 5.18 / MAX: 8.53MIN: 5.17 / MAX: 7.39MIN: 5.22 / MAX: 11.66MIN: 5.16 / MAX: 11.041. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: regnety_400m

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: regnety_400mbcad48121620SE +/- 0.15, N = 3SE +/- 0.24, N = 3SE +/- 0.13, N = 3SE +/- 0.19, N = 314.7414.7714.7815.22MIN: 14 / MAX: 20.37MIN: 13.51 / MAX: 18.11MIN: 13.74 / MAX: 17.76MIN: 14.15 / MAX: 21.541. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: vision_transformer

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: vision_transformerdacb816243240SE +/- 0.08, N = 3SE +/- 0.28, N = 3SE +/- 0.60, N = 3SE +/- 0.98, N = 331.1331.5231.9232.32MIN: 30.31 / MAX: 64.21MIN: 30.23 / MAX: 62.77MIN: 30.27 / MAX: 57.34MIN: 30.14 / MAX: 67.51. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

NCNN

Target: Vulkan GPU - Model: FastestDet

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20230517Target: Vulkan GPU - Model: FastestDetcabd0.7021.4042.1062.8083.51SE +/- 0.06, N = 3SE +/- 0.02, N = 3SE +/- 0.04, N = 3SE +/- 0.03, N = 33.083.093.103.12MIN: 2.91 / MAX: 4.57MIN: 2.95 / MAX: 4.64MIN: 2.92 / MAX: 4.71MIN: 2.97 / MAX: 4.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread


Phoronix Test Suite v10.8.4