ngc smoke run ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2403013-NE-NGCSMOKER54&rdt .
ngc smoke run Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Compiler File-System Screen Resolution a b c d ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Graphics Details - BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02 Python Details - Python 3.10.12 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ngc smoke run vkfft: FFT + iFFT R2C / C2R vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling cl-mem: Copy cl-mem: Read cl-mem: Write vkresample: 2x - Double vkresample: 2x - Single clpeak: Integer Compute INT clpeak: Single-Precision Float clpeak: Double-Precision Double clpeak: Global Memory Bandwidth arrayfire: Conjugate Gradient OpenCL financebench: Black-Scholes OpenCL viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT viennacl: OpenCL BLAS - sCOPY viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-TT ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet a b c d 42397 151912 17867 58405 185774 44489 20810 194497 308.6 1045.9 2354.9 24.296 5.230 33119.10 64545.62 32959.17 3483.99 2.997 4.347 2920 3943 667 2027 1803 1247 411 686 135 125 141 137 316 420 282 603 799 550 81.2 308 7057 7527 7027 7070 4.89 2.13 2.26 2.29 2.04 3.49 1.75 4.16 5.26 2.16 1.63 4.27 4.89 6.79 5.43 14.78 31.52 3.09 41809 151910 17967 58253 186082 43731 21000 190037 308.5 1045.9 2353.4 24.294 5.231 33144.74 64547.74 32961.21 3484.06 2.983 4.373 2892 3924 664 1948 1806 1238 408 699 137 125 140 138 316 427 282 604 798 552 81.5 308 7093 7537 7067 7070 4.92 2.16 2.27 2.29 2.03 3.55 1.78 4.23 5.25 2.18 1.63 4.28 4.92 6.80 5.47 14.74 32.32 3.10 42581 152866 17886 58256 189944 45071 21094 190909 308.6 1046.1 2354.9 24.290 5.230 33146.12 64547.25 32941.99 3483.95 2.998 4.351 2907 3917 666 1920 1837 1243 405 691 139 124 141 140 316 427 283 604 799 552 81.2 308 7037 7537 7000 7057 4.92 2.12 2.30 2.27 2.04 3.52 1.74 4.23 5.25 2.17 1.65 4.32 4.92 6.81 5.48 14.77 31.92 3.08 43048 151969 17942 58299 190310 45007 21320 192507 308.5 1046.0 2352.1 24.297 5.230 33129.34 64520.97 32933.63 3484.32 2.997 4.339 2857 3920 663 1917 1830 1247 418 696 141 124 140 136 316 426 283 604 799 553 81.4 307 7053 7540 7070 7070 4.91 2.12 2.27 2.27 2.04 3.53 1.77 4.21 5.26 2.20 1.62 4.32 4.91 6.82 5.43 15.22 31.13 3.12 OpenBenchmarking.org
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R a b c d 9K 18K 27K 36K 45K SE +/- 298.99, N = 3 SE +/- 460.34, N = 3 SE +/- 552.29, N = 3 SE +/- 289.59, N = 15 42397 41809 42581 43048 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a b c d 30K 60K 90K 120K 150K SE +/- 190.55, N = 3 SE +/- 506.81, N = 3 SE +/- 136.79, N = 3 SE +/- 377.55, N = 3 151912 151910 152866 151969 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision a b c d 4K 8K 12K 16K 20K SE +/- 131.79, N = 3 SE +/- 196.01, N = 5 SE +/- 147.24, N = 3 SE +/- 168.45, N = 7 17867 17967 17886 17942 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision a b c d 13K 26K 39K 52K 65K SE +/- 150.19, N = 3 SE +/- 46.92, N = 3 SE +/- 17.34, N = 3 SE +/- 21.83, N = 3 58405 58253 58256 58299 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision a b c d 40K 80K 120K 160K 200K SE +/- 1557.78, N = 3 SE +/- 1095.80, N = 3 SE +/- 1666.00, N = 3 SE +/- 479.86, N = 3 185774 186082 189944 190310 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision a b c d 10K 20K 30K 40K 50K SE +/- 479.16, N = 15 SE +/- 441.36, N = 3 SE +/- 475.72, N = 3 SE +/- 571.37, N = 3 44489 43731 45071 45007 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision a b c d 5K 10K 15K 20K 25K SE +/- 188.78, N = 3 SE +/- 195.51, N = 3 SE +/- 282.81, N = 3 SE +/- 152.14, N = 15 20810 21000 21094 21320 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling a b c d 40K 80K 120K 160K 200K SE +/- 2261.14, N = 3 SE +/- 521.00, N = 3 SE +/- 583.76, N = 3 SE +/- 720.67, N = 3 194497 190037 190909 192507 1. (CXX) g++ options: -O3
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy a b c d 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.12, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 308.6 308.5 308.6 308.5 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read a b c d 200 400 600 800 1000 SE +/- 0.00, N = 3 SE +/- 0.20, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 1045.9 1045.9 1046.1 1046.0 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write a b c d 500 1000 1500 2000 2500 SE +/- 1.31, N = 3 SE +/- 0.88, N = 3 SE +/- 0.99, N = 3 SE +/- 3.80, N = 3 2354.9 2353.4 2354.9 2352.1 1. (CC) gcc options: -O2 -flto -lOpenCL
VkResample Upscale: 2x - Precision: Double OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Double a b c d 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 24.30 24.29 24.29 24.30 1. (CXX) g++ options: -O3
VkResample Upscale: 2x - Precision: Single OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single a b c d 1.177 2.354 3.531 4.708 5.885 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 5.230 5.231 5.230 5.230 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT a b c d 7K 14K 21K 28K 35K SE +/- 2.54, N = 3 SE +/- 0.09, N = 3 SE +/- 0.26, N = 3 SE +/- 8.15, N = 3 33119.10 33144.74 33146.12 33129.34 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float a b c d 14K 28K 42K 56K 70K SE +/- 0.43, N = 3 SE +/- 0.85, N = 3 SE +/- 0.56, N = 3 SE +/- 0.91, N = 3 64545.62 64547.74 64547.25 64520.97 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double a b c d 7K 14K 21K 28K 35K SE +/- 0.74, N = 3 SE +/- 0.74, N = 3 SE +/- 18.62, N = 3 SE +/- 1.51, N = 3 32959.17 32961.21 32941.99 32933.63 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth a b c d 700 1400 2100 2800 3500 SE +/- 0.33, N = 3 SE +/- 0.20, N = 3 SE +/- 0.27, N = 3 SE +/- 0.04, N = 3 3483.99 3484.06 3483.95 3484.32 1. (CXX) g++ options: -O3
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL a b c d 0.6746 1.3492 2.0238 2.6984 3.373 SE +/- 0.003, N = 3 SE +/- 0.005, N = 3 SE +/- 0.005, N = 3 SE +/- 0.003, N = 3 2.997 2.983 2.998 2.997 1. (CXX) g++ options: -O3
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL a b c d 0.9839 1.9678 2.9517 3.9356 4.9195 SE +/- 0.010, N = 3 SE +/- 0.004, N = 3 SE +/- 0.010, N = 3 SE +/- 0.016, N = 3 4.347 4.373 4.351 4.339 1. (CXX) g++ options: -O3 -march=native -fopenmp
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY a b c d 600 1200 1800 2400 3000 SE +/- 20.00, N = 3 SE +/- 29.56, N = 5 SE +/- 3.33, N = 3 SE +/- 23.33, N = 3 2920 2892 2907 2857 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY a b c d 800 1600 2400 3200 4000 SE +/- 14.53, N = 3 SE +/- 9.80, N = 5 SE +/- 16.67, N = 3 SE +/- 15.28, N = 3 3943 3924 3917 3920 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT a b c d 140 280 420 560 700 SE +/- 4.18, N = 3 SE +/- 5.77, N = 5 SE +/- 4.33, N = 3 SE +/- 3.51, N = 3 667 664 666 663 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY a b c d 400 800 1200 1600 2000 SE +/- 44.85, N = 3 SE +/- 17.15, N = 5 SE +/- 41.63, N = 3 SE +/- 14.53, N = 3 2027 1948 1920 1917 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY a b c d 400 800 1200 1600 2000 SE +/- 23.33, N = 3 SE +/- 29.93, N = 5 SE +/- 3.33, N = 3 SE +/- 10.00, N = 3 1803 1806 1837 1830 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT a b c d 300 600 900 1200 1500 SE +/- 3.33, N = 3 SE +/- 2.00, N = 5 SE +/- 3.33, N = 3 SE +/- 3.33, N = 3 1247 1238 1243 1247 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N a b c d 90 180 270 360 450 SE +/- 0.33, N = 3 SE +/- 2.90, N = 5 SE +/- 1.86, N = 3 SE +/- 8.51, N = 3 411 408 405 418 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T a b c d 150 300 450 600 750 SE +/- 17.19, N = 3 SE +/- 3.65, N = 5 SE +/- 1.20, N = 3 SE +/- 10.68, N = 3 686 699 691 696 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN a b c d 30 60 90 120 150 SE +/- 0.88, N = 3 SE +/- 1.29, N = 5 SE +/- 2.67, N = 3 SE +/- 2.65, N = 3 135 137 139 141 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT a b c d 30 60 90 120 150 SE +/- 0.88, N = 3 SE +/- 0.77, N = 5 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 125 125 124 124 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN a b c d 30 60 90 120 150 SE +/- 1.20, N = 3 SE +/- 0.93, N = 5 SE +/- 0.58, N = 3 SE +/- 0.58, N = 3 141 140 141 140 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT a b c d 30 60 90 120 150 SE +/- 1.53, N = 3 SE +/- 2.18, N = 5 SE +/- 4.73, N = 3 SE +/- 1.20, N = 3 137 138 140 136 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY a b c d 70 140 210 280 350 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 316 316 316 316 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY a b c d 90 180 270 360 450 SE +/- 2.60, N = 3 SE +/- 1.33, N = 3 SE +/- 2.33, N = 3 SE +/- 3.93, N = 3 420 427 427 426 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT a b c d 60 120 180 240 300 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.67, N = 3 282 282 283 283 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY a b c d 130 260 390 520 650 SE +/- 0.33, N = 3 SE +/- 0.58, N = 3 SE +/- 0.88, N = 3 SE +/- 0.58, N = 3 603 604 604 604 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY a b c d 200 400 600 800 1000 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 1.20, N = 3 SE +/- 0.88, N = 3 799 798 799 799 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT a b c d 120 240 360 480 600 SE +/- 0.88, N = 3 SE +/- 0.33, N = 3 SE +/- 1.15, N = 3 SE +/- 1.00, N = 3 550 552 552 553 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N a b c d 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.12, N = 3 SE +/- 0.26, N = 3 SE +/- 0.21, N = 3 81.2 81.5 81.2 81.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T a b c d 70 140 210 280 350 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 308 308 308 307 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN a b c d 1500 3000 4500 6000 7500 SE +/- 31.80, N = 3 SE +/- 58.97, N = 3 SE +/- 31.80, N = 3 SE +/- 35.28, N = 3 7057 7093 7037 7053 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT a b c d 1600 3200 4800 6400 8000 SE +/- 3.33, N = 3 SE +/- 8.82, N = 3 SE +/- 3.33, N = 3 SE +/- 0.00, N = 3 7527 7537 7537 7540 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN a b c d 1500 3000 4500 6000 7500 SE +/- 17.64, N = 3 SE +/- 46.31, N = 3 SE +/- 15.28, N = 3 SE +/- 45.09, N = 3 7027 7067 7000 7070 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT a b c d 1500 3000 4500 6000 7500 SE +/- 0.00, N = 3 SE +/- 11.55, N = 3 SE +/- 3.33, N = 3 SE +/- 0.00, N = 3 7070 7070 7057 7070 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet a b c d 1.107 2.214 3.321 4.428 5.535 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 4.89 4.92 4.92 4.91 MIN: 4.76 / MAX: 7.99 MIN: 4.77 / MAX: 7.57 MIN: 4.74 / MAX: 6.99 MIN: 4.79 / MAX: 6.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 a b c d 0.486 0.972 1.458 1.944 2.43 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 2.13 2.16 2.12 2.12 MIN: 1.99 / MAX: 3.7 MIN: 2.04 / MAX: 8.11 MIN: 1.96 / MAX: 3.59 MIN: 1.96 / MAX: 5.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b c d 0.5175 1.035 1.5525 2.07 2.5875 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 2.26 2.27 2.30 2.27 MIN: 2.11 / MAX: 3.89 MIN: 2.16 / MAX: 5.3 MIN: 2.1 / MAX: 3.74 MIN: 2.11 / MAX: 4.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 a b c d 0.5153 1.0306 1.5459 2.0612 2.5765 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 2.29 2.29 2.27 2.27 MIN: 2.13 / MAX: 3.94 MIN: 2.12 / MAX: 3.59 MIN: 2.13 / MAX: 5.55 MIN: 2.1 / MAX: 5.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet a b c d 0.459 0.918 1.377 1.836 2.295 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 2.04 2.03 2.04 2.04 MIN: 1.89 / MAX: 3.56 MIN: 1.94 / MAX: 3.48 MIN: 1.86 / MAX: 3.85 MIN: 1.87 / MAX: 6.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 a b c d 0.7988 1.5976 2.3964 3.1952 3.994 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 3.49 3.55 3.52 3.53 MIN: 3.27 / MAX: 5.07 MIN: 3.33 / MAX: 8.63 MIN: 3.22 / MAX: 6.65 MIN: 3.25 / MAX: 6.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface a b c d 0.4005 0.801 1.2015 1.602 2.0025 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 1.75 1.78 1.74 1.77 MIN: 1.68 / MAX: 3.09 MIN: 1.67 / MAX: 7.1 MIN: 1.6 / MAX: 2.93 MIN: 1.64 / MAX: 3.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet a b c d 0.9518 1.9036 2.8554 3.8072 4.759 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 4.16 4.23 4.23 4.21 MIN: 3.99 / MAX: 5.76 MIN: 4.01 / MAX: 6.82 MIN: 4 / MAX: 6.41 MIN: 4.03 / MAX: 5.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 a b c d 1.1835 2.367 3.5505 4.734 5.9175 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 5.26 5.25 5.25 5.26 MIN: 5.08 / MAX: 8.11 MIN: 5.07 / MAX: 7.45 MIN: 4.94 / MAX: 11.68 MIN: 5.07 / MAX: 7.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 a b c d 0.495 0.99 1.485 1.98 2.475 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 2.16 2.18 2.17 2.20 MIN: 2.04 / MAX: 3.59 MIN: 2.05 / MAX: 5.44 MIN: 2.04 / MAX: 3.54 MIN: 2.08 / MAX: 3.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet a b c d 0.3713 0.7426 1.1139 1.4852 1.8565 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 1.63 1.63 1.65 1.62 MIN: 1.49 / MAX: 2.81 MIN: 1.5 / MAX: 2.94 MIN: 1.44 / MAX: 4.74 MIN: 1.5 / MAX: 4.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 a b c d 0.972 1.944 2.916 3.888 4.86 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 4.27 4.28 4.32 4.32 MIN: 4.05 / MAX: 6.65 MIN: 4.05 / MAX: 7.65 MIN: 4.05 / MAX: 8.1 MIN: 4.1 / MAX: 7.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 1.107 2.214 3.321 4.428 5.535 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 4.89 4.92 4.92 4.91 MIN: 4.76 / MAX: 7.99 MIN: 4.77 / MAX: 7.57 MIN: 4.74 / MAX: 6.99 MIN: 4.79 / MAX: 6.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny a b c d 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 6.79 6.80 6.81 6.82 MIN: 6.66 / MAX: 8.49 MIN: 6.64 / MAX: 8.25 MIN: 6.42 / MAX: 12.5 MIN: 6.69 / MAX: 8.1 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd a b c d 1.233 2.466 3.699 4.932 6.165 SE +/- 0.03, N = 3 SE +/- 0.07, N = 3 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 5.43 5.47 5.48 5.43 MIN: 5.18 / MAX: 8.53 MIN: 5.22 / MAX: 11.66 MIN: 5.16 / MAX: 11.04 MIN: 5.17 / MAX: 7.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m a b c d 4 8 12 16 20 SE +/- 0.13, N = 3 SE +/- 0.15, N = 3 SE +/- 0.24, N = 3 SE +/- 0.19, N = 3 14.78 14.74 14.77 15.22 MIN: 13.74 / MAX: 17.76 MIN: 14 / MAX: 20.37 MIN: 13.51 / MAX: 18.11 MIN: 14.15 / MAX: 21.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer a b c d 8 16 24 32 40 SE +/- 0.28, N = 3 SE +/- 0.98, N = 3 SE +/- 0.60, N = 3 SE +/- 0.08, N = 3 31.52 32.32 31.92 31.13 MIN: 30.23 / MAX: 62.77 MIN: 30.14 / MAX: 67.5 MIN: 30.27 / MAX: 57.34 MIN: 30.31 / MAX: 64.21 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet a b c d 0.702 1.404 2.106 2.808 3.51 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 3.09 3.10 3.08 3.12 MIN: 2.95 / MAX: 4.64 MIN: 2.92 / MAX: 4.71 MIN: 2.91 / MAX: 4.57 MIN: 2.97 / MAX: 4.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Phoronix Test Suite v10.8.5