vulkan tests AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 23.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2308016-PTS-VULKANTE10&grt&sro&rro .
vulkan tests Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 23.04 6.2.0-26-generic (x86_64) GNOME Shell 44.0 X Server + Wayland 4.6 Mesa 23.0.2 (LLVM 15.0.7 DRM 3.49) GCC 12.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-Pa930Z/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-Pa930Z/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107a Graphics Details - BAR1 / Visible vRAM Size: 256 MB - vBIOS Version: 113-D1820201-101 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
vulkan tests ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet vkfft: FFT + iFFT R2C / C2R vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling vkpeak: fp32-scalar vkpeak: fp32-vec4 vkpeak: fp16-scalar vkpeak: fp16-vec4 vkpeak: fp64-scalar vkpeak: fp64-vec4 vkpeak: int32-scalar vkpeak: int32-vec4 vkpeak: int16-scalar vkpeak: int16-vec4 vkresample: 2x - Single a b c 25.21 13.83 12.73 14.08 13.13 18.00 5.89 30.87 45.64 19.07 11.76 35.20 37.72 22.30 45.43 73.47 14.66 24.66 15.71 12.97 14.75 14.18 18.42 7.22 31.43 47.52 19.20 15.22 38.91 36.72 21.81 46.73 75.17 14.25 25525 100903 7268 18431 58944 16993 3066 64140 7836.61 7781.34 7853.66 12516.94 493.36 493.28 1214.41 1574.82 7856.21 12531.35 12.770 25.78 14.49 13.45 14.38 12.90 19.69 7.12 31.55 47.53 19.40 13.97 39.03 37.38 22.27 48.03 74.11 13.70 25.25 14.78 13.25 14.27 13.72 19.47 7.09 31.24 48.13 19.42 15.34 39.84 37.79 22.01 46.22 72.26 14.53 25447 101063 7258 18428 58947 16976 3064 64139 7856.19 7808.56 7860.29 12550.91 493.01 492.96 1213.48 1574.60 7852.03 12550.38 12.771 25.96 16.01 13.60 14.12 15.27 19.06 6.95 31.64 46.88 18.91 12.37 38.18 37.14 22.89 47.02 75.96 16.03 25.37 13.63 12.57 13.29 12.93 18.02 6.08 31.46 47.50 19.04 13.07 38.43 37.36 21.78 46.10 72.70 13.88 25451 100839 7259 18426 58938 16967 3064 64131 7866.83 7808.48 7866.33 12559.31 493.51 493.57 1214.40 1575.91 7857.01 12563.14 12.775 OpenBenchmarking.org
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mobilenet c b a 6 12 18 24 30 SE +/- 0.21, N = 3 SE +/- 0.28, N = 3 SE +/- 0.05, N = 3 25.96 25.78 25.21 MIN: 22.99 / MAX: 30.78 MIN: 22.57 / MAX: 80.73 MIN: 21.72 / MAX: 91.8 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v2-v2 - Model: mobilenet-v2 c b a 4 8 12 16 20 SE +/- 1.00, N = 3 SE +/- 1.52, N = 3 SE +/- 1.66, N = 3 16.01 14.49 13.83 MIN: 11.37 / MAX: 25.15 MIN: 10.12 / MAX: 23.01 MIN: 9.93 / MAX: 25.72 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU-v3-v3 - Model: mobilenet-v3 c b a 3 6 9 12 15 SE +/- 0.59, N = 3 SE +/- 0.98, N = 3 SE +/- 1.00, N = 3 13.60 13.45 12.73 MIN: 9.51 / MAX: 51.49 MIN: 9.44 / MAX: 19.16 MIN: 9.51 / MAX: 20.96 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: shufflenet-v2 c b a 4 8 12 16 20 SE +/- 1.04, N = 3 SE +/- 0.93, N = 3 SE +/- 1.17, N = 3 14.12 14.38 14.08 MIN: 11.12 / MAX: 22.91 MIN: 11.05 / MAX: 20.51 MIN: 10.99 / MAX: 18.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: mnasnet c b a 4 8 12 16 20 SE +/- 0.65, N = 3 SE +/- 1.05, N = 3 SE +/- 1.54, N = 3 15.27 12.90 13.13 MIN: 10.33 / MAX: 25.75 MIN: 9.19 / MAX: 19.69 MIN: 9.27 / MAX: 22.76 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: efficientnet-b0 c b a 5 10 15 20 25 SE +/- 1.07, N = 3 SE +/- 1.68, N = 3 SE +/- 1.70, N = 3 19.06 19.69 18.00 MIN: 13.85 / MAX: 27.95 MIN: 13.76 / MAX: 27.29 MIN: 13.54 / MAX: 69.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: blazeface c b a 2 4 6 8 10 SE +/- 0.44, N = 3 SE +/- 0.58, N = 3 SE +/- 0.59, N = 3 6.95 7.12 5.89 MIN: 5.22 / MAX: 9.49 MIN: 5.21 / MAX: 13.65 MIN: 4.78 / MAX: 9.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: googlenet c b a 7 14 21 28 35 SE +/- 0.41, N = 3 SE +/- 0.22, N = 3 SE +/- 0.21, N = 3 31.64 31.55 30.87 MIN: 26.58 / MAX: 37.83 MIN: 29.53 / MAX: 39 MIN: 27.29 / MAX: 90.04 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vgg16 c b a 11 22 33 44 55 SE +/- 0.66, N = 3 SE +/- 0.34, N = 3 SE +/- 0.74, N = 3 46.88 47.53 45.64 MIN: 43.93 / MAX: 59.11 MIN: 44.76 / MAX: 58.97 MIN: 42.07 / MAX: 55.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet18 c b a 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.21, N = 3 SE +/- 0.16, N = 3 18.91 19.40 19.07 MIN: 16.19 / MAX: 66.7 MIN: 16.25 / MAX: 94.41 MIN: 16.37 / MAX: 82.83 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: alexnet c b a 4 8 12 16 20 SE +/- 0.24, N = 3 SE +/- 0.45, N = 3 SE +/- 0.62, N = 3 12.37 13.97 11.76 MIN: 10.09 / MAX: 15.86 MIN: 11.94 / MAX: 26.76 MIN: 9.49 / MAX: 19.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: resnet50 c b a 9 18 27 36 45 SE +/- 1.02, N = 3 SE +/- 0.53, N = 3 SE +/- 0.28, N = 3 38.18 39.03 35.20 MIN: 32.56 / MAX: 45.61 MIN: 34.39 / MAX: 46.86 MIN: 30.71 / MAX: 97.13 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: yolov4-tiny c b a 9 18 27 36 45 SE +/- 0.31, N = 3 SE +/- 0.58, N = 3 SE +/- 0.32, N = 3 37.14 37.38 37.72 MIN: 33.68 / MAX: 43.48 MIN: 33.26 / MAX: 45.88 MIN: 34.14 / MAX: 48.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: squeezenet_ssd c b a 5 10 15 20 25 SE +/- 0.30, N = 3 SE +/- 0.39, N = 3 SE +/- 0.47, N = 3 22.89 22.27 22.30 MIN: 20.98 / MAX: 29.54 MIN: 20.98 / MAX: 27.69 MIN: 20.67 / MAX: 28.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: regnety_400m c b a 11 22 33 44 55 SE +/- 0.33, N = 3 SE +/- 0.70, N = 3 SE +/- 1.49, N = 3 47.02 48.03 45.43 MIN: 41.08 / MAX: 93.32 MIN: 41.67 / MAX: 101.38 MIN: 39.85 / MAX: 90.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: vision_transformer c b a 20 40 60 80 100 SE +/- 2.22, N = 3 SE +/- 2.67, N = 3 SE +/- 2.54, N = 3 75.96 74.11 73.47 MIN: 71.06 / MAX: 93.63 MIN: 70.44 / MAX: 113.62 MIN: 70.04 / MAX: 83.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: CPU - Model: FastestDet c b a 4 8 12 16 20 SE +/- 0.49, N = 3 SE +/- 0.53, N = 3 SE +/- 1.04, N = 3 16.03 13.70 14.66 MIN: 11.61 / MAX: 24.19 MIN: 11.46 / MAX: 17.73 MIN: 11.69 / MAX: 19.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet c b a 6 12 18 24 30 SE +/- 0.04, N = 3 SE +/- 0.39, N = 12 SE +/- 0.20, N = 3 25.37 25.25 24.66 MIN: 22.3 / MAX: 30.62 MIN: 21.74 / MAX: 83.58 MIN: 22.49 / MAX: 28.96 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 c b a 4 8 12 16 20 SE +/- 0.74, N = 3 SE +/- 0.85, N = 12 SE +/- 3.00, N = 3 13.63 14.78 15.71 MIN: 10.24 / MAX: 27.87 MIN: 10.2 / MAX: 70.67 MIN: 10.21 / MAX: 26.19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 c b a 3 6 9 12 15 SE +/- 0.32, N = 3 SE +/- 0.62, N = 12 SE +/- 1.36, N = 3 12.57 13.25 12.97 MIN: 9.6 / MAX: 17.01 MIN: 9.52 / MAX: 53.34 MIN: 9.71 / MAX: 20.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 c b a 4 8 12 16 20 SE +/- 0.21, N = 3 SE +/- 0.46, N = 12 SE +/- 1.24, N = 3 13.29 14.27 14.75 MIN: 11.24 / MAX: 22.87 MIN: 10.94 / MAX: 22.32 MIN: 11.39 / MAX: 23.44 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet c b a 4 8 12 16 20 SE +/- 1.40, N = 3 SE +/- 0.74, N = 12 SE +/- 2.23, N = 3 12.93 13.72 14.18 MIN: 9.34 / MAX: 22.14 MIN: 9.31 / MAX: 23.03 MIN: 9.68 / MAX: 23.38 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 c b a 5 10 15 20 25 SE +/- 0.52, N = 3 SE +/- 0.83, N = 12 SE +/- 0.44, N = 3 18.02 19.47 18.42 MIN: 13.61 / MAX: 27.87 MIN: 13.59 / MAX: 97.32 MIN: 14.24 / MAX: 27.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface c b a 2 4 6 8 10 SE +/- 0.20, N = 3 SE +/- 0.38, N = 12 SE +/- 0.32, N = 3 6.08 7.09 7.22 MIN: 5.01 / MAX: 7.58 MIN: 4.88 / MAX: 78.29 MIN: 5.7 / MAX: 10.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet c b a 7 14 21 28 35 SE +/- 0.10, N = 3 SE +/- 0.16, N = 12 SE +/- 0.18, N = 3 31.46 31.24 31.43 MIN: 26.64 / MAX: 38.17 MIN: 27.01 / MAX: 86.35 MIN: 26.65 / MAX: 84.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 c b a 11 22 33 44 55 SE +/- 0.08, N = 3 SE +/- 0.15, N = 12 SE +/- 0.06, N = 3 47.50 48.13 47.52 MIN: 45.06 / MAX: 62.77 MIN: 44.54 / MAX: 88.13 MIN: 45.43 / MAX: 59.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 c b a 5 10 15 20 25 SE +/- 0.17, N = 3 SE +/- 0.09, N = 12 SE +/- 0.06, N = 3 19.04 19.42 19.20 MIN: 16.24 / MAX: 22.76 MIN: 16.13 / MAX: 88.96 MIN: 16.68 / MAX: 25.67 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet c b a 4 8 12 16 20 SE +/- 0.28, N = 3 SE +/- 0.17, N = 12 SE +/- 0.36, N = 3 13.07 15.34 15.22 MIN: 10.71 / MAX: 16.27 MIN: 12.01 / MAX: 62 MIN: 11.93 / MAX: 21.72 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 c b a 9 18 27 36 45 SE +/- 0.59, N = 3 SE +/- 0.24, N = 12 SE +/- 0.49, N = 3 38.43 39.84 38.91 MIN: 33.37 / MAX: 44.58 MIN: 32.55 / MAX: 100.04 MIN: 33.52 / MAX: 45.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny c b a 9 18 27 36 45 SE +/- 0.31, N = 3 SE +/- 0.27, N = 12 SE +/- 0.55, N = 3 37.36 37.79 36.72 MIN: 33.45 / MAX: 45.21 MIN: 33.15 / MAX: 109.38 MIN: 32.78 / MAX: 42.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd c b a 5 10 15 20 25 SE +/- 0.20, N = 3 SE +/- 0.08, N = 12 SE +/- 0.21, N = 3 21.78 22.01 21.81 MIN: 20.62 / MAX: 26.09 MIN: 20.55 / MAX: 81.08 MIN: 20.68 / MAX: 25.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m c b a 11 22 33 44 55 SE +/- 0.73, N = 3 SE +/- 0.51, N = 12 SE +/- 0.37, N = 3 46.10 46.22 46.73 MIN: 39.52 / MAX: 107.36 MIN: 39.54 / MAX: 121.82 MIN: 39.79 / MAX: 151.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer c b a 20 40 60 80 100 SE +/- 1.92, N = 3 SE +/- 0.80, N = 12 SE +/- 2.14, N = 3 72.70 72.26 75.17 MIN: 69.83 / MAX: 164 MIN: 69.21 / MAX: 182.16 MIN: 69.97 / MAX: 147.06 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet c b a 4 8 12 16 20 SE +/- 0.66, N = 3 SE +/- 0.63, N = 12 SE +/- 1.49, N = 3 13.88 14.53 14.25 MIN: 11.52 / MAX: 18.83 MIN: 11.05 / MAX: 27.1 MIN: 11.35 / MAX: 22.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT R2C / C2R c b a 5K 10K 15K 20K 25K SE +/- 2.03, N = 3 SE +/- 4.33, N = 3 SE +/- 73.86, N = 3 25451 25447 25525 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in half precision c b a 20K 40K 60K 80K 100K SE +/- 219.62, N = 3 SE +/- 43.64, N = 3 SE +/- 98.36, N = 3 100839 101063 100903 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C Bluestein in single precision c b a 1600 3200 4800 6400 8000 SE +/- 4.70, N = 3 SE +/- 2.67, N = 3 SE +/- 15.24, N = 3 7259 7258 7268 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in double precision c b a 4K 8K 12K 16K 20K SE +/- 1.86, N = 3 SE +/- 3.61, N = 3 SE +/- 6.17, N = 3 18426 18428 18431 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in single precision c b a 13K 26K 39K 52K 65K SE +/- 1.20, N = 3 SE +/- 3.33, N = 3 SE +/- 1.33, N = 3 58938 58947 58944 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C multidimensional in single precision c b a 4K 8K 12K 16K 20K SE +/- 9.53, N = 3 SE +/- 12.01, N = 3 SE +/- 51.10, N = 3 16967 16976 16993 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C Bluestein benchmark in double precision c b a 700 1400 2100 2800 3500 SE +/- 1.53, N = 3 SE +/- 1.00, N = 3 SE +/- 4.33, N = 3 3064 3064 3066 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.2.31 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling c b a 14K 28K 42K 56K 70K SE +/- 7.69, N = 3 SE +/- 5.46, N = 3 SE +/- 6.43, N = 3 64131 64139 64140 1. (CXX) g++ options: -O3
vkpeak fp32-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20230730 fp32-scalar c b a 2K 4K 6K 8K 10K SE +/- 1.93, N = 3 SE +/- 1.09, N = 3 SE +/- 0.62, N = 3 7866.83 7856.19 7836.61
vkpeak fp32-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20230730 fp32-vec4 c b a 2K 4K 6K 8K 10K SE +/- 8.31, N = 3 SE +/- 7.17, N = 3 SE +/- 7.57, N = 3 7808.48 7808.56 7781.34
vkpeak fp16-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20230730 fp16-scalar c b a 2K 4K 6K 8K 10K SE +/- 1.85, N = 3 SE +/- 0.75, N = 3 SE +/- 1.36, N = 3 7866.33 7860.29 7853.66
vkpeak fp16-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20230730 fp16-vec4 c b a 3K 6K 9K 12K 15K SE +/- 1.25, N = 3 SE +/- 1.15, N = 3 SE +/- 1.96, N = 3 12559.31 12550.91 12516.94
vkpeak fp64-scalar OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20230730 fp64-scalar c b a 110 220 330 440 550 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 SE +/- 0.10, N = 3 493.51 493.01 493.36
vkpeak fp64-vec4 OpenBenchmarking.org GFLOPS, More Is Better vkpeak 20230730 fp64-vec4 c b a 110 220 330 440 550 SE +/- 0.10, N = 3 SE +/- 0.03, N = 3 SE +/- 0.11, N = 3 493.57 492.96 493.28
vkpeak int32-scalar OpenBenchmarking.org GIOPS, More Is Better vkpeak 20230730 int32-scalar c b a 300 600 900 1200 1500 SE +/- 0.32, N = 3 SE +/- 0.05, N = 3 SE +/- 0.27, N = 3 1214.40 1213.48 1214.41
vkpeak int32-vec4 OpenBenchmarking.org GIOPS, More Is Better vkpeak 20230730 int32-vec4 c b a 300 600 900 1200 1500 SE +/- 0.21, N = 3 SE +/- 0.05, N = 3 SE +/- 0.17, N = 3 1575.91 1574.60 1574.82
vkpeak int16-scalar OpenBenchmarking.org GIOPS, More Is Better vkpeak 20230730 int16-scalar c b a 2K 4K 6K 8K 10K SE +/- 0.96, N = 3 SE +/- 0.51, N = 3 SE +/- 0.44, N = 3 7857.01 7852.03 7856.21
vkpeak int16-vec4 OpenBenchmarking.org GIOPS, More Is Better vkpeak 20230730 int16-vec4 c b a 3K 6K 9K 12K 15K SE +/- 1.54, N = 3 SE +/- 0.77, N = 3 SE +/- 0.58, N = 3 12563.14 12550.38 12531.35
VkResample Upscale: 2x - Precision: Single OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single c b a 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 12.78 12.77 12.77 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5