llama ncnn 9950X AMD Ryzen 9 9950X 16-Core testing with a ASRock X870E Taichi (3.12.AS02 BIOS) and AMD Radeon RX 7800 XT 16GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412293-SYST-LLAMANC79 .
llama ncnn 9950X Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen 9 9950X 16-Core @ 5.75GHz (16 Cores / 32 Threads) ASRock X870E Taichi (3.12.AS02 BIOS) AMD Device 14d8 2 x 16GB DDR5-6000MT/s F5-6000J2836G16G Western Digital WD_BLACK SN850X 2000GB + 32GB Flash Drive AMD Radeon RX 7800 XT 16GB AMD Navi 31 HDMI/DP DELL U2723QE Realtek Device 8126 + MEDIATEK Device 0717 Ubuntu 24.04 6.12.3-061203-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.2.0-devel (LLVM 18.1.7 DRM 3.59) GCC 13.3.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - CPU Microcode: 0xb404023 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
llama ncnn 9950X ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 a b c d 6.84 2.45 2.37 2.19 2.20 3.09 0.92 5.78 23.65 4.05 3.30 9.01 6.84 10.46 5.48 6.34 26.57 2.46 6.92 2.56 2.40 2.27 2.42 3.19 0.97 6.01 24.01 3.99 3.20 8.97 6.92 10.88 5.53 6.51 27.89 2.74 9.25 90.47 91.86 88.94 9.74 89.96 91.63 88.19 65.97 412.57 396.01 372.75 6.77 2.47 2.34 2.20 2.32 3.13 0.94 5.84 23.52 4.00 3.26 9.01 6.77 10.54 5.49 6.34 26.82 2.66 6.79 2.53 2.40 2.25 2.40 3.21 0.95 5.91 23.62 4.04 3.29 9.01 6.79 10.59 5.47 6.44 26.59 2.48 9.25 91.73 90.59 89.40 9.75 89.92 92.01 90.55 65.92 418.20 395.63 372.04 6.94 2.56 2.4 2.23 2.43 3.21 0.96 5.85 24.77 3.99 3.2 9.1 6.94 10.41 5.76 6.62 26.56 1.8 6.41 2.57 2.42 2.26 2.43 3.2 0.98 5.94 23.98 3.98 3.19 8.91 6.41 10.25 5.13 6.46 26.85 2.07 9.24 87.44 94.63 89.6 9.8 90.09 91.73 90.44 66.05 414.72 391.87 371.89 6.79 2.59 2.45 2.24 2.53 3.23 0.99 5.95 23.64 4.05 3.3 9.02 6.79 10.73 5.67 6.64 26.49 2.64 6.81 2.61 2.41 2.27 2.45 3.3 0.97 5.97 24.15 4 3.33 8.94 6.81 9.73 5.58 6.52 26.31 1.81 9.26 92.82 92.6 90.5 9.75 92.08 93.49 89.71 65.86 410.23 397.2 371.23 OpenBenchmarking.org
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mobilenet a b c d 2 4 6 8 10 SE +/- 0.06, N = 3 SE +/- 0.05, N = 9 6.84 6.77 6.94 6.79 MIN: 6.66 / MAX: 13.62 MIN: 6.49 / MAX: 28.5 MIN: 6.9 / MAX: 7.15 MIN: 6.74 / MAX: 6.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v2-v2 - Model: mobilenet-v2 a b c d 0.5828 1.1656 1.7484 2.3312 2.914 SE +/- 0.11, N = 3 SE +/- 0.06, N = 9 2.45 2.47 2.56 2.59 MIN: 2.22 / MAX: 3.55 MIN: 2.12 / MAX: 4.06 MIN: 2.54 / MAX: 3.26 MIN: 2.57 / MAX: 3.32 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v3-v3 - Model: mobilenet-v3 a b c d 0.5513 1.1026 1.6539 2.2052 2.7565 SE +/- 0.04, N = 3 SE +/- 0.04, N = 9 2.37 2.34 2.40 2.45 MIN: 2.26 / MAX: 3.84 MIN: 2.07 / MAX: 3.58 MIN: 2.37 / MAX: 3.08 MIN: 2.43 / MAX: 3.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: shufflenet-v2 a b c d 0.504 1.008 1.512 2.016 2.52 SE +/- 0.04, N = 3 SE +/- 0.02, N = 9 2.19 2.20 2.23 2.24 MIN: 2.08 / MAX: 3.1 MIN: 2.06 / MAX: 6.27 MIN: 2.21 / MAX: 2.91 MIN: 2.21 / MAX: 2.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mnasnet a b c d 0.5693 1.1386 1.7079 2.2772 2.8465 SE +/- 0.11, N = 3 SE +/- 0.06, N = 9 2.20 2.32 2.43 2.53 MIN: 1.99 / MAX: 3.11 MIN: 1.95 / MAX: 3.84 MIN: 2.4 / MAX: 3.29 MIN: 2.47 / MAX: 9.55 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: efficientnet-b0 a b c d 0.7268 1.4536 2.1804 2.9072 3.634 SE +/- 0.10, N = 3 SE +/- 0.04, N = 9 3.09 3.13 3.21 3.23 MIN: 2.93 / MAX: 10.74 MIN: 2.84 / MAX: 4.74 MIN: 3.17 / MAX: 4.38 MIN: 3.2 / MAX: 4.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: blazeface a b c d 0.2228 0.4456 0.6684 0.8912 1.114 SE +/- 0.04, N = 3 SE +/- 0.02, N = 9 0.92 0.94 0.96 0.99 MIN: 0.85 / MAX: 1.88 MIN: 0.82 / MAX: 1.84 MIN: 0.95 / MAX: 1.07 MIN: 0.98 / MAX: 1.23 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: googlenet a b c d 1.3388 2.6776 4.0164 5.3552 6.694 SE +/- 0.09, N = 3 SE +/- 0.04, N = 9 5.78 5.84 5.85 5.95 MIN: 5.59 / MAX: 8.98 MIN: 5.54 / MAX: 9.4 MIN: 5.8 / MAX: 6.04 MIN: 5.83 / MAX: 6.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vgg16 a b c d 6 12 18 24 30 SE +/- 0.46, N = 3 SE +/- 0.18, N = 9 23.65 23.52 24.77 23.64 MIN: 22.18 / MAX: 73.49 MIN: 22.14 / MAX: 92.6 MIN: 22.23 / MAX: 104.18 MIN: 22.18 / MAX: 78.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet18 a b c d 0.9113 1.8226 2.7339 3.6452 4.5565 SE +/- 0.02, N = 3 SE +/- 0.01, N = 9 4.05 4.00 3.99 4.05 MIN: 3.94 / MAX: 7.83 MIN: 3.91 / MAX: 4.2 MIN: 3.91 / MAX: 4.15 MIN: 3.97 / MAX: 4.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: alexnet a b c d 0.7425 1.485 2.2275 2.97 3.7125 SE +/- 0.05, N = 3 SE +/- 0.03, N = 9 3.30 3.26 3.20 3.30 MIN: 3.18 / MAX: 3.81 MIN: 3.18 / MAX: 3.61 MIN: 3.17 / MAX: 3.32 MIN: 3.27 / MAX: 3.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet50 a b c d 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.02, N = 9 9.01 9.01 9.10 9.02 MIN: 8.86 / MAX: 16.7 MIN: 8.81 / MAX: 16.39 MIN: 9 / MAX: 9.35 MIN: 8.97 / MAX: 9.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 2 4 6 8 10 SE +/- 0.06, N = 3 SE +/- 0.05, N = 9 6.84 6.77 6.94 6.79 MIN: 6.66 / MAX: 13.62 MIN: 6.49 / MAX: 28.5 MIN: 6.9 / MAX: 7.15 MIN: 6.74 / MAX: 6.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: yolov4-tiny a b c d 3 6 9 12 15 SE +/- 0.27, N = 3 SE +/- 0.05, N = 9 10.46 10.54 10.41 10.73 MIN: 9.62 / MAX: 56.72 MIN: 10.21 / MAX: 26.19 MIN: 9.46 / MAX: 19.13 MIN: 10.49 / MAX: 15.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: squeezenet_ssd a b c d 1.296 2.592 3.888 5.184 6.48 SE +/- 0.08, N = 3 SE +/- 0.07, N = 9 5.48 5.49 5.76 5.67 MIN: 5.26 / MAX: 9.58 MIN: 5.09 / MAX: 5.87 MIN: 5.72 / MAX: 5.91 MIN: 5.6 / MAX: 5.83 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: regnety_400m a b c d 2 4 6 8 10 SE +/- 0.11, N = 3 SE +/- 0.06, N = 9 6.34 6.34 6.62 6.64 MIN: 6.05 / MAX: 12.6 MIN: 5.91 / MAX: 8.04 MIN: 6.29 / MAX: 34.38 MIN: 6.58 / MAX: 6.78 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vision_transformer a b c d 6 12 18 24 30 SE +/- 0.03, N = 3 SE +/- 0.12, N = 9 26.57 26.82 26.56 26.49 MIN: 25.83 / MAX: 50.83 MIN: 26.28 / MAX: 68.9 MIN: 26.46 / MAX: 26.89 MIN: 26.38 / MAX: 26.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: FastestDet a b c d 0.5985 1.197 1.7955 2.394 2.9925 SE +/- 0.33, N = 3 SE +/- 0.04, N = 9 2.46 2.66 1.80 2.64 MIN: 1.77 / MAX: 4.58 MIN: 2.45 / MAX: 2.99 MIN: 1.77 / MAX: 1.88 MIN: 2.61 / MAX: 2.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mobilenet a b c d 2 4 6 8 10 SE +/- 0.08, N = 3 SE +/- 0.06, N = 15 6.92 6.79 6.41 6.81 MIN: 6.72 / MAX: 58.33 MIN: 6.33 / MAX: 38.29 MIN: 6.38 / MAX: 6.51 MIN: 6.76 / MAX: 6.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 a b c d 0.5873 1.1746 1.7619 2.3492 2.9365 SE +/- 0.01, N = 3 SE +/- 0.03, N = 15 2.56 2.53 2.57 2.61 MIN: 2.52 / MAX: 3.79 MIN: 2.2 / MAX: 3.58 MIN: 2.55 / MAX: 3.69 MIN: 2.59 / MAX: 3.3 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b c d 0.5445 1.089 1.6335 2.178 2.7225 SE +/- 0.01, N = 3 SE +/- 0.02, N = 15 2.40 2.40 2.42 2.41 MIN: 2.36 / MAX: 3.96 MIN: 2.2 / MAX: 3.83 MIN: 2.39 / MAX: 3.74 MIN: 2.38 / MAX: 3.27 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: shufflenet-v2 a b c d 0.5108 1.0216 1.5324 2.0432 2.554 SE +/- 0.01, N = 3 SE +/- 0.01, N = 15 2.27 2.25 2.26 2.27 MIN: 2.22 / MAX: 7.29 MIN: 2.13 / MAX: 4.56 MIN: 2.24 / MAX: 2.74 MIN: 2.24 / MAX: 3.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mnasnet a b c d 0.5513 1.1026 1.6539 2.2052 2.7565 SE +/- 0.01, N = 3 SE +/- 0.03, N = 15 2.42 2.40 2.43 2.45 MIN: 2.39 / MAX: 4.45 MIN: 2.1 / MAX: 9.89 MIN: 2.4 / MAX: 3.58 MIN: 2.43 / MAX: 2.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: efficientnet-b0 a b c d 0.7425 1.485 2.2275 2.97 3.7125 SE +/- 0.01, N = 3 SE +/- 0.02, N = 15 3.19 3.21 3.20 3.30 MIN: 3.14 / MAX: 4.7 MIN: 2.98 / MAX: 12.56 MIN: 3.17 / MAX: 4.07 MIN: 3.25 / MAX: 5.07 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: blazeface a b c d 0.2205 0.441 0.6615 0.882 1.1025 SE +/- 0.01, N = 3 SE +/- 0.01, N = 15 0.97 0.95 0.98 0.97 MIN: 0.95 / MAX: 1.59 MIN: 0.84 / MAX: 1.96 MIN: 0.97 / MAX: 1 MIN: 0.96 / MAX: 1.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: googlenet a b c d 2 4 6 8 10 SE +/- 0.10, N = 3 SE +/- 0.04, N = 15 6.01 5.91 5.94 5.97 MIN: 5.82 / MAX: 52.97 MIN: 5.57 / MAX: 7.74 MIN: 5.88 / MAX: 6.31 MIN: 5.91 / MAX: 6.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vgg16 a b c d 6 12 18 24 30 SE +/- 0.41, N = 3 SE +/- 0.11, N = 15 24.01 23.62 23.98 24.15 MIN: 22.2 / MAX: 71.88 MIN: 22.14 / MAX: 85.86 MIN: 22.14 / MAX: 76.35 MIN: 22.23 / MAX: 116.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet18 a b c d 0.909 1.818 2.727 3.636 4.545 SE +/- 0.03, N = 3 SE +/- 0.02, N = 15 3.99 4.04 3.98 4.00 MIN: 3.91 / MAX: 9.9 MIN: 3.88 / MAX: 28.94 MIN: 3.92 / MAX: 4.16 MIN: 3.94 / MAX: 4.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: alexnet a b c d 0.7493 1.4986 2.2479 2.9972 3.7465 SE +/- 0.00, N = 3 SE +/- 0.02, N = 15 3.20 3.29 3.19 3.33 MIN: 3.17 / MAX: 3.59 MIN: 3.16 / MAX: 4.4 MIN: 3.17 / MAX: 3.38 MIN: 3.3 / MAX: 3.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet50 a b c d 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.03, N = 15 8.97 9.01 8.91 8.94 MIN: 8.83 / MAX: 34.28 MIN: 8.53 / MAX: 29.89 MIN: 8.83 / MAX: 9.33 MIN: 8.87 / MAX: 9.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 2 4 6 8 10 SE +/- 0.08, N = 3 SE +/- 0.06, N = 15 6.92 6.79 6.41 6.81 MIN: 6.72 / MAX: 58.33 MIN: 6.33 / MAX: 38.29 MIN: 6.38 / MAX: 6.51 MIN: 6.76 / MAX: 6.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: yolov4-tiny a b c d 3 6 9 12 15 SE +/- 0.21, N = 3 SE +/- 0.06, N = 15 10.88 10.59 10.25 9.73 MIN: 10.45 / MAX: 67.19 MIN: 9.55 / MAX: 64.48 MIN: 9.54 / MAX: 84.53 MIN: 9.47 / MAX: 16.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: squeezenet_ssd a b c d 1.2555 2.511 3.7665 5.022 6.2775 SE +/- 0.03, N = 3 SE +/- 0.06, N = 15 5.53 5.47 5.13 5.58 MIN: 5.43 / MAX: 5.83 MIN: 5.09 / MAX: 5.97 MIN: 5.08 / MAX: 5.62 MIN: 5.53 / MAX: 5.71 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: regnety_400m a b c d 2 4 6 8 10 SE +/- 0.04, N = 3 SE +/- 0.03, N = 15 6.51 6.44 6.46 6.52 MIN: 6.33 / MAX: 29.36 MIN: 6.07 / MAX: 23.61 MIN: 6.4 / MAX: 6.61 MIN: 6.49 / MAX: 6.69 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vision_transformer a b c d 7 14 21 28 35 SE +/- 1.18, N = 3 SE +/- 0.09, N = 15 27.89 26.59 26.85 26.31 MIN: 26.33 / MAX: 415.2 MIN: 25.99 / MAX: 55.19 MIN: 26.56 / MAX: 37.88 MIN: 25.93 / MAX: 42.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: FastestDet a b c d 0.6165 1.233 1.8495 2.466 3.0825 SE +/- 0.06, N = 3 SE +/- 0.09, N = 15 2.74 2.48 2.07 1.81 MIN: 2.59 / MAX: 2.9 MIN: 1.79 / MAX: 95.79 MIN: 2.05 / MAX: 2.32 MIN: 1.79 / MAX: 1.89 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b c d 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 9.25 9.25 9.24 9.26 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b c d 20 40 60 80 100 SE +/- 0.87, N = 6 SE +/- 0.93, N = 3 90.47 91.73 87.44 92.82 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b c d 20 40 60 80 100 SE +/- 0.48, N = 3 SE +/- 0.58, N = 3 91.86 90.59 94.63 92.60 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b c d 20 40 60 80 100 SE +/- 0.42, N = 3 SE +/- 0.55, N = 3 88.94 89.40 89.60 90.50 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b c d 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 9.74 9.75 9.80 9.75 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b c d 20 40 60 80 100 SE +/- 0.46, N = 3 SE +/- 0.79, N = 8 89.96 89.92 90.09 92.08 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b c d 20 40 60 80 100 SE +/- 1.24, N = 3 SE +/- 1.02, N = 3 91.63 92.01 91.73 93.49 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b c d 20 40 60 80 100 SE +/- 0.49, N = 3 SE +/- 0.47, N = 3 88.19 90.55 90.44 89.71 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b c d 15 30 45 60 75 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 65.97 65.92 66.05 65.86 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b c d 90 180 270 360 450 SE +/- 3.32, N = 3 SE +/- 3.52, N = 3 412.57 418.20 414.72 410.23 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b c d 90 180 270 360 450 SE +/- 2.61, N = 3 SE +/- 2.77, N = 3 396.01 395.63 391.87 397.20 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b c d 80 160 240 320 400 SE +/- 0.84, N = 3 SE +/- 0.66, N = 3 372.75 372.04 371.89 371.23 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5