ngc smoke run ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2403013-NE-NGCSMOKER54&sor&grs .
ngc smoke run Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Compiler File-System Screen Resolution a b c d ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Graphics Details - BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02 Python Details - Python 3.10.12 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ngc smoke run viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dGEMM-NN ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m viennacl: CPU BLAS - dGEMV-N vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT R2C / C2R viennacl: CPU BLAS - dGEMM-TT vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling ncnn: Vulkan GPU - blazeface viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - dGEMV-T ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 viennacl: CPU BLAS - dAXPY ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - googlenet viennacl: OpenCL BLAS - sAXPY ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - resnet50 viennacl: OpenCL BLAS - dGEMM-TN ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - shufflenet-v2 viennacl: CPU BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-NN financebench: Black-Scholes OpenCL viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - sAXPY vkfft: FFT + iFFT C2C 1D batched in half precision ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - mobilenet viennacl: CPU BLAS - sDOT vkfft: FFT + iFFT C2C Bluestein in single precision viennacl: OpenCL BLAS - dDOT arrayfire: Conjugate Gradient OpenCL ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - yolov4-tiny viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - dGEMV-T vkfft: FFT + iFFT C2C 1D batched in double precision ncnn: Vulkan GPU - vgg16 viennacl: OpenCL BLAS - dGEMM-TT viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - dAXPY cl-mem: Write clpeak: Double-Precision Double clpeak: Integer Compute INT clpeak: Single-Precision Float cl-mem: Copy vkresample: 2x - Double cl-mem: Read vkresample: 2x - Single clpeak: Global Memory Bandwidth viennacl: OpenCL BLAS - sCOPY a b c d 2027 135 31.52 14.78 411 44489 42397 137 20810 185774 194497 1.75 2920 686 2.13 1803 1.63 2.16 2.26 3.49 4.16 420 3.09 4.27 7027 5.43 2.29 125 7057 4.347 1247 141 3943 151912 4.89 4.89 667 17867 550 2.997 2.04 6.79 81.2 282 308 58405 5.26 7070 7527 603 799 2354.9 32959.17 33119.10 64545.62 308.6 24.296 1045.9 5.230 3483.99 316 1948 137 32.32 14.74 408 43731 41809 138 21000 186082 190037 1.78 2892 699 2.16 1806 1.63 2.18 2.27 3.55 4.23 427 3.10 4.28 7067 5.47 2.29 125 7093 4.373 1238 140 3924 151910 4.92 4.92 664 17967 552 2.983 2.03 6.80 81.5 282 308 58253 5.25 7070 7537 604 798 2353.4 32961.21 33144.74 64547.74 308.5 24.294 1045.9 5.231 3484.06 316 1920 139 31.92 14.77 405 45071 42581 140 21094 189944 190909 1.74 2907 691 2.12 1837 1.65 2.17 2.30 3.52 4.23 427 3.08 4.32 7000 5.48 2.27 124 7037 4.351 1243 141 3917 152866 4.92 4.92 666 17886 552 2.998 2.04 6.81 81.2 283 308 58256 5.25 7057 7537 604 799 2354.9 32941.99 33146.12 64547.25 308.6 24.290 1046.1 5.230 3483.95 316 1917 141 31.13 15.22 418 45007 43048 136 21320 190310 192507 1.77 2857 696 2.12 1830 1.62 2.20 2.27 3.53 4.21 426 3.12 4.32 7070 5.43 2.27 124 7053 4.339 1247 140 3920 151969 4.91 4.91 663 17942 553 2.997 2.04 6.82 81.4 283 307 58299 5.26 7070 7540 604 799 2352.1 32933.63 33129.34 64520.97 308.5 24.297 1046.0 5.230 3484.32 316 OpenBenchmarking.org
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY a b c d 400 800 1200 1600 2000 SE +/- 44.85, N = 3 SE +/- 17.15, N = 5 SE +/- 41.63, N = 3 SE +/- 14.53, N = 3 2027 1948 1920 1917 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN d c b a 30 60 90 120 150 SE +/- 2.65, N = 3 SE +/- 2.67, N = 3 SE +/- 1.29, N = 5 SE +/- 0.88, N = 3 141 139 137 135 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer d a c b 8 16 24 32 40 SE +/- 0.08, N = 3 SE +/- 0.28, N = 3 SE +/- 0.60, N = 3 SE +/- 0.98, N = 3 31.13 31.52 31.92 32.32 MIN: 30.31 / MAX: 64.21 MIN: 30.23 / MAX: 62.77 MIN: 30.27 / MAX: 57.34 MIN: 30.14 / MAX: 67.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m b c a d 4 8 12 16 20 SE +/- 0.15, N = 3 SE +/- 0.24, N = 3 SE +/- 0.13, N = 3 SE +/- 0.19, N = 3 14.74 14.77 14.78 15.22 MIN: 14 / MAX: 20.37 MIN: 13.51 / MAX: 18.11 MIN: 13.74 / MAX: 17.76 MIN: 14.15 / MAX: 21.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N d a b c 90 180 270 360 450 SE +/- 8.51, N = 3 SE +/- 0.33, N = 3 SE +/- 2.90, N = 5 SE +/- 1.86, N = 3 418 411 408 405 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision c d a b 10K 20K 30K 40K 50K SE +/- 475.72, N = 3 SE +/- 571.37, N = 3 SE +/- 479.16, N = 15 SE +/- 441.36, N = 3 45071 45007 44489 43731 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R d c a b 9K 18K 27K 36K 45K SE +/- 289.59, N = 15 SE +/- 552.29, N = 3 SE +/- 298.99, N = 3 SE +/- 460.34, N = 3 43048 42581 42397 41809 1. (CXX) g++ options: -O3
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT c b a d 30 60 90 120 150 SE +/- 4.73, N = 3 SE +/- 2.18, N = 5 SE +/- 1.53, N = 3 SE +/- 1.20, N = 3 140 138 137 136 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision d c b a 5K 10K 15K 20K 25K SE +/- 152.14, N = 15 SE +/- 282.81, N = 3 SE +/- 195.51, N = 3 SE +/- 188.78, N = 3 21320 21094 21000 20810 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision d c b a 40K 80K 120K 160K 200K SE +/- 479.86, N = 3 SE +/- 1666.00, N = 3 SE +/- 1095.80, N = 3 SE +/- 1557.78, N = 3 190310 189944 186082 185774 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling a d c b 40K 80K 120K 160K 200K SE +/- 2261.14, N = 3 SE +/- 720.67, N = 3 SE +/- 583.76, N = 3 SE +/- 521.00, N = 3 194497 192507 190909 190037 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface c a d b 0.4005 0.801 1.2015 1.602 2.0025 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 1.74 1.75 1.77 1.78 MIN: 1.6 / MAX: 2.93 MIN: 1.68 / MAX: 3.09 MIN: 1.64 / MAX: 3.01 MIN: 1.67 / MAX: 7.1 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY a c b d 600 1200 1800 2400 3000 SE +/- 20.00, N = 3 SE +/- 3.33, N = 3 SE +/- 29.56, N = 5 SE +/- 23.33, N = 3 2920 2907 2892 2857 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T b d c a 150 300 450 600 750 SE +/- 3.65, N = 5 SE +/- 10.68, N = 3 SE +/- 1.20, N = 3 SE +/- 17.19, N = 3 699 696 691 686 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 c d a b 0.486 0.972 1.458 1.944 2.43 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 2.12 2.12 2.13 2.16 MIN: 1.96 / MAX: 3.59 MIN: 1.96 / MAX: 5.5 MIN: 1.99 / MAX: 3.7 MIN: 2.04 / MAX: 8.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY c d b a 400 800 1200 1600 2000 SE +/- 3.33, N = 3 SE +/- 10.00, N = 3 SE +/- 29.93, N = 5 SE +/- 23.33, N = 3 1837 1830 1806 1803 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet d a b c 0.3713 0.7426 1.1139 1.4852 1.8565 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 1.62 1.63 1.63 1.65 MIN: 1.5 / MAX: 4.84 MIN: 1.49 / MAX: 2.81 MIN: 1.5 / MAX: 2.94 MIN: 1.44 / MAX: 4.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 a c b d 0.495 0.99 1.485 1.98 2.475 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 2.16 2.17 2.18 2.20 MIN: 2.04 / MAX: 3.59 MIN: 2.04 / MAX: 3.54 MIN: 2.05 / MAX: 5.44 MIN: 2.08 / MAX: 3.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b d c 0.5175 1.035 1.5525 2.07 2.5875 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 2.26 2.27 2.27 2.30 MIN: 2.11 / MAX: 3.89 MIN: 2.16 / MAX: 5.3 MIN: 2.11 / MAX: 4.59 MIN: 2.1 / MAX: 3.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 a c d b 0.7988 1.5976 2.3964 3.1952 3.994 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 3.49 3.52 3.53 3.55 MIN: 3.27 / MAX: 5.07 MIN: 3.22 / MAX: 6.65 MIN: 3.25 / MAX: 6.52 MIN: 3.33 / MAX: 8.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet a d b c 0.9518 1.9036 2.8554 3.8072 4.759 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 4.16 4.21 4.23 4.23 MIN: 3.99 / MAX: 5.76 MIN: 4.03 / MAX: 5.75 MIN: 4.01 / MAX: 6.82 MIN: 4 / MAX: 6.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY c b d a 90 180 270 360 450 SE +/- 2.33, N = 3 SE +/- 1.33, N = 3 SE +/- 3.93, N = 3 SE +/- 2.60, N = 3 427 427 426 420 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet c a b d 0.702 1.404 2.106 2.808 3.51 SE +/- 0.06, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 3.08 3.09 3.10 3.12 MIN: 2.91 / MAX: 4.57 MIN: 2.95 / MAX: 4.64 MIN: 2.92 / MAX: 4.71 MIN: 2.97 / MAX: 4.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 a b c d 0.972 1.944 2.916 3.888 4.86 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 4.27 4.28 4.32 4.32 MIN: 4.05 / MAX: 6.65 MIN: 4.05 / MAX: 7.65 MIN: 4.05 / MAX: 8.1 MIN: 4.1 / MAX: 7.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN d b a c 1500 3000 4500 6000 7500 SE +/- 45.09, N = 3 SE +/- 46.31, N = 3 SE +/- 17.64, N = 3 SE +/- 15.28, N = 3 7070 7067 7027 7000 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd a d b c 1.233 2.466 3.699 4.932 6.165 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.07, N = 3 SE +/- 0.07, N = 3 5.43 5.43 5.47 5.48 MIN: 5.18 / MAX: 8.53 MIN: 5.17 / MAX: 7.39 MIN: 5.22 / MAX: 11.66 MIN: 5.16 / MAX: 11.04 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 c d a b 0.5153 1.0306 1.5459 2.0612 2.5765 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 2.27 2.27 2.29 2.29 MIN: 2.13 / MAX: 5.55 MIN: 2.1 / MAX: 5.57 MIN: 2.13 / MAX: 3.94 MIN: 2.12 / MAX: 3.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT b a d c 30 60 90 120 150 SE +/- 0.77, N = 5 SE +/- 0.88, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 125 125 124 124 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN b a d c 1500 3000 4500 6000 7500 SE +/- 58.97, N = 3 SE +/- 31.80, N = 3 SE +/- 35.28, N = 3 SE +/- 31.80, N = 3 7093 7057 7053 7037 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL d a c b 0.9839 1.9678 2.9517 3.9356 4.9195 SE +/- 0.016, N = 3 SE +/- 0.010, N = 3 SE +/- 0.010, N = 3 SE +/- 0.004, N = 3 4.339 4.347 4.351 4.373 1. (CXX) g++ options: -O3 -march=native -fopenmp
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT d a c b 300 600 900 1200 1500 SE +/- 3.33, N = 3 SE +/- 3.33, N = 3 SE +/- 3.33, N = 3 SE +/- 2.00, N = 5 1247 1247 1243 1238 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN c a d b 30 60 90 120 150 SE +/- 0.58, N = 3 SE +/- 1.20, N = 3 SE +/- 0.58, N = 3 SE +/- 0.93, N = 5 141 141 140 140 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY a b d c 800 1600 2400 3200 4000 SE +/- 14.53, N = 3 SE +/- 9.80, N = 5 SE +/- 15.28, N = 3 SE +/- 16.67, N = 3 3943 3924 3920 3917 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision c d a b 30K 60K 90K 120K 150K SE +/- 136.79, N = 3 SE +/- 377.55, N = 3 SE +/- 190.55, N = 3 SE +/- 506.81, N = 3 152866 151969 151912 151910 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a d b c 1.107 2.214 3.321 4.428 5.535 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 4.89 4.91 4.92 4.92 MIN: 4.76 / MAX: 7.99 MIN: 4.79 / MAX: 6.53 MIN: 4.77 / MAX: 7.57 MIN: 4.74 / MAX: 6.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet a d b c 1.107 2.214 3.321 4.428 5.535 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 4.89 4.91 4.92 4.92 MIN: 4.76 / MAX: 7.99 MIN: 4.79 / MAX: 6.53 MIN: 4.77 / MAX: 7.57 MIN: 4.74 / MAX: 6.99 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT a c b d 140 280 420 560 700 SE +/- 4.18, N = 3 SE +/- 4.33, N = 3 SE +/- 5.77, N = 5 SE +/- 3.51, N = 3 667 666 664 663 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision b d c a 4K 8K 12K 16K 20K SE +/- 196.01, N = 5 SE +/- 168.45, N = 7 SE +/- 147.24, N = 3 SE +/- 131.79, N = 3 17967 17942 17886 17867 1. (CXX) g++ options: -O3
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT d c b a 120 240 360 480 600 SE +/- 1.00, N = 3 SE +/- 1.15, N = 3 SE +/- 0.33, N = 3 SE +/- 0.88, N = 3 553 552 552 550 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL b a d c 0.6746 1.3492 2.0238 2.6984 3.373 SE +/- 0.005, N = 3 SE +/- 0.003, N = 3 SE +/- 0.003, N = 3 SE +/- 0.005, N = 3 2.983 2.997 2.997 2.998 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet b a c d 0.459 0.918 1.377 1.836 2.295 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 2.03 2.04 2.04 2.04 MIN: 1.94 / MAX: 3.48 MIN: 1.89 / MAX: 3.56 MIN: 1.86 / MAX: 3.85 MIN: 1.87 / MAX: 6.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny a b c d 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 6.79 6.80 6.81 6.82 MIN: 6.66 / MAX: 8.49 MIN: 6.64 / MAX: 8.25 MIN: 6.42 / MAX: 12.5 MIN: 6.69 / MAX: 8.1 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N b d c a 20 40 60 80 100 SE +/- 0.12, N = 3 SE +/- 0.21, N = 3 SE +/- 0.26, N = 3 SE +/- 0.13, N = 3 81.5 81.4 81.2 81.2 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT d c b a 60 120 180 240 300 SE +/- 0.67, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 283 283 282 282 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T c b a d 70 140 210 280 350 SE +/- 0.33, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 308 308 308 307 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision a d c b 13K 26K 39K 52K 65K SE +/- 150.19, N = 3 SE +/- 21.83, N = 3 SE +/- 17.34, N = 3 SE +/- 46.92, N = 3 58405 58299 58256 58253 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 b c a d 1.1835 2.367 3.5505 4.734 5.9175 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 5.25 5.25 5.26 5.26 MIN: 5.07 / MAX: 7.45 MIN: 4.94 / MAX: 11.68 MIN: 5.08 / MAX: 8.11 MIN: 5.07 / MAX: 7.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT d b a c 1500 3000 4500 6000 7500 SE +/- 0.00, N = 3 SE +/- 11.55, N = 3 SE +/- 0.00, N = 3 SE +/- 3.33, N = 3 7070 7070 7070 7057 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT d c b a 1600 3200 4800 6400 8000 SE +/- 0.00, N = 3 SE +/- 3.33, N = 3 SE +/- 8.82, N = 3 SE +/- 3.33, N = 3 7540 7537 7537 7527 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY d c b a 130 260 390 520 650 SE +/- 0.58, N = 3 SE +/- 0.88, N = 3 SE +/- 0.58, N = 3 SE +/- 0.33, N = 3 604 604 604 603 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY d c a b 200 400 600 800 1000 SE +/- 0.88, N = 3 SE +/- 1.20, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 799 799 799 798 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write c a b d 500 1000 1500 2000 2500 SE +/- 0.99, N = 3 SE +/- 1.31, N = 3 SE +/- 0.88, N = 3 SE +/- 3.80, N = 3 2354.9 2354.9 2353.4 2352.1 1. (CC) gcc options: -O2 -flto -lOpenCL
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double b a c d 7K 14K 21K 28K 35K SE +/- 0.74, N = 3 SE +/- 0.74, N = 3 SE +/- 18.62, N = 3 SE +/- 1.51, N = 3 32961.21 32959.17 32941.99 32933.63 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT c b d a 7K 14K 21K 28K 35K SE +/- 0.26, N = 3 SE +/- 0.09, N = 3 SE +/- 8.15, N = 3 SE +/- 2.54, N = 3 33146.12 33144.74 33129.34 33119.10 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float b c a d 14K 28K 42K 56K 70K SE +/- 0.85, N = 3 SE +/- 0.56, N = 3 SE +/- 0.43, N = 3 SE +/- 0.91, N = 3 64547.74 64547.25 64545.62 64520.97 1. (CXX) g++ options: -O3
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy c a d b 70 140 210 280 350 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.12, N = 3 308.6 308.6 308.5 308.5 1. (CC) gcc options: -O2 -flto -lOpenCL
VkResample Upscale: 2x - Precision: Double OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Double c b a d 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 24.29 24.29 24.30 24.30 1. (CXX) g++ options: -O3
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read c d b a 200 400 600 800 1000 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.20, N = 3 SE +/- 0.00, N = 3 1046.1 1046.0 1045.9 1045.9 1. (CC) gcc options: -O2 -flto -lOpenCL
VkResample Upscale: 2x - Precision: Single OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single a c d b 1.177 2.354 3.531 4.708 5.885 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 SE +/- 0.001, N = 3 5.230 5.230 5.230 5.231 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth d b a c 700 1400 2100 2800 3500 SE +/- 0.04, N = 3 SE +/- 0.20, N = 3 SE +/- 0.33, N = 3 SE +/- 0.27, N = 3 3484.32 3484.06 3483.99 3483.95 1. (CXX) g++ options: -O3
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY d c b a 70 140 210 280 350 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 316 316 316 316 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
Phoronix Test Suite v10.8.5