ncnn llama AMD Ryzen Threadripper 7980X 64-Cores testing with a System76 Thelio Major (FA Z5 BIOS) and AMD Radeon RX 6700 XT 12GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412293-PTS-NCNNLLAM81 .
ncnn llama Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen Threadripper 7980X 64-Cores @ 5.37GHz (64 Cores / 128 Threads) System76 Thelio Major (FA Z5 BIOS) AMD Device 14a4 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2 1000GB CT1000T700SSD5 AMD Radeon RX 6700 XT 12GB AMD Device 14cc DELL P2415Q Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E Ubuntu 24.04 6.12.3-061203-generic (x86_64) GNOME Shell 46.0 X Server + Wayland 4.6 Mesa 24.0.9-0ubuntu0.2 (LLVM 17.0.6 DRM 3.59) GCC 13.3.0 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - CPU Microcode: 0xa108105 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ncnn llama ncnn: CPU - mobilenet ncnn: CPU-v2-v2 - mobilenet-v2 ncnn: CPU-v3-v3 - mobilenet-v3 ncnn: CPU - shufflenet-v2 ncnn: CPU - mnasnet ncnn: CPU - efficientnet-b0 ncnn: CPU - blazeface ncnn: CPU - googlenet ncnn: CPU - vgg16 ncnn: CPU - resnet18 ncnn: CPU - alexnet ncnn: CPU - resnet50 ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: CPU - yolov4-tiny ncnn: CPU - squeezenet_ssd ncnn: CPU - regnety_400m ncnn: CPU - vision_transformer ncnn: CPU - FastestDet ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 a b c d 16.27 7.94 8.57 10.62 7.85 11.36 4.32 15.39 31.59 8.51 4.71 15.26 16.27 25.05 18.31 33.07 43.49 12.72 15.53 7.65 8.47 10.64 7.84 11.12 4.25 15.13 30.77 8.41 4.35 14.67 15.53 24.20 17.62 32.46 42.37 12.74 15.78 68.70 70.15 69.77 16.60 70.76 69.79 69.39 75.57 268.35 275.79 250.90 16.04 7.8 8.32 10.57 7.8 11.08 4.31 15.01 30.24 8.4 4.34 14.86 16.04 24.59 17.54 33.24 40.76 12.92 16.01 7.84 8.08 10.31 7.21 10.71 4.16 14.79 30.24 8.49 4.3 15.05 16.01 23.73 16.84 32.64 41.61 12.51 15.65 70.27 70.02 68.58 16.66 68.65 70.3 69.99 75.4 272.33 278 242.95 15.88 7.84 8.55 10.51 7.77 11.27 4.28 15.27 31.39 8.44 4.37 14.78 15.88 24.92 18.04 32.82 43.02 12.78 15.9 7.86 8.55 10.69 7.83 11.27 4.29 15.31 31.58 8.4 4.38 14.7 15.9 24.89 18.05 33.3 42.21 11.19 15.81 69.09 70.15 69.22 16.52 68.65 69.62 69.54 75.88 283.43 268.94 250.67 17.05 7.83 8.6 10.71 7.87 11.32 4.62 15.32 32.29 8.59 4.44 16.45 17.05 25.31 19.18 33.2 42.14 12.23 16.27 7.91 8.53 10.75 7.81 11.24 4.28 15.32 30.64 8.42 4.34 15.18 16.27 24.5 17.69 32.83 41.14 12.63 15.79 70.54 69.11 69.96 16.67 69.66 68.76 69.74 75.13 283.5 277.51 260.63 OpenBenchmarking.org
NCNN Target: CPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mobilenet a b c d 4 8 12 16 20 SE +/- 0.20, N = 4 16.27 16.04 15.88 17.05 MIN: 15.82 / MAX: 17.16 MIN: 15.95 / MAX: 16.58 MIN: 15.76 / MAX: 24.56 MIN: 15.91 / MAX: 274.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v2-v2 - Model: mobilenet-v2 a b c d 2 4 6 8 10 SE +/- 0.04, N = 4 7.94 7.80 7.84 7.83 MIN: 7.74 / MAX: 13.85 MIN: 7.7 / MAX: 9.02 MIN: 7.72 / MAX: 8.87 MIN: 7.73 / MAX: 11.33 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU-v3-v3 - Model: mobilenet-v3 a b c d 2 4 6 8 10 SE +/- 0.03, N = 4 8.57 8.32 8.55 8.60 MIN: 8.42 / MAX: 11.02 MIN: 8.24 / MAX: 9.54 MIN: 8.45 / MAX: 10.15 MIN: 8.49 / MAX: 12.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: shufflenet-v2 a b c d 3 6 9 12 15 SE +/- 0.05, N = 4 10.62 10.57 10.51 10.71 MIN: 10.47 / MAX: 11.33 MIN: 10.52 / MAX: 11.08 MIN: 10.46 / MAX: 11.05 MIN: 10.66 / MAX: 11.25 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: mnasnet a b c d 2 4 6 8 10 SE +/- 0.01, N = 4 7.85 7.80 7.77 7.87 MIN: 7.75 / MAX: 8.45 MIN: 7.71 / MAX: 8.32 MIN: 7.7 / MAX: 8.35 MIN: 7.79 / MAX: 8.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: efficientnet-b0 a b c d 3 6 9 12 15 SE +/- 0.03, N = 4 11.36 11.08 11.27 11.32 MIN: 11.2 / MAX: 14.79 MIN: 10.99 / MAX: 11.6 MIN: 11.22 / MAX: 11.82 MIN: 11.24 / MAX: 11.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: blazeface a b c d 1.0395 2.079 3.1185 4.158 5.1975 SE +/- 0.02, N = 4 4.32 4.31 4.28 4.62 MIN: 4.25 / MAX: 11.58 MIN: 4.24 / MAX: 10.24 MIN: 4.25 / MAX: 4.63 MIN: 4.28 / MAX: 71.02 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: googlenet a b c d 4 8 12 16 20 SE +/- 0.04, N = 4 15.39 15.01 15.27 15.32 MIN: 15.16 / MAX: 16.08 MIN: 14.89 / MAX: 15.67 MIN: 15.13 / MAX: 16.55 MIN: 15.21 / MAX: 15.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vgg16 a b c d 7 14 21 28 35 SE +/- 0.25, N = 4 31.59 30.24 31.39 32.29 MIN: 30.46 / MAX: 154.3 MIN: 29.7 / MAX: 30.87 MIN: 30.66 / MAX: 39.08 MIN: 30.58 / MAX: 64.17 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet18 a b c d 2 4 6 8 10 SE +/- 0.09, N = 4 8.51 8.40 8.44 8.59 MIN: 8.33 / MAX: 9.52 MIN: 8.35 / MAX: 9.02 MIN: 8.37 / MAX: 13.03 MIN: 8.55 / MAX: 9.18 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: alexnet a b c d 1.0598 2.1196 3.1794 4.2392 5.299 SE +/- 0.35, N = 4 4.71 4.34 4.37 4.44 MIN: 4.18 / MAX: 227.25 MIN: 4.18 / MAX: 4.78 MIN: 4.21 / MAX: 4.92 MIN: 4.27 / MAX: 5.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: resnet50 a b c d 4 8 12 16 20 SE +/- 0.23, N = 4 15.26 14.86 14.78 16.45 MIN: 14.7 / MAX: 23.32 MIN: 14.69 / MAX: 23.77 MIN: 14.7 / MAX: 15.35 MIN: 15.23 / MAX: 37.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 4 8 12 16 20 SE +/- 0.20, N = 4 16.27 16.04 15.88 17.05 MIN: 15.82 / MAX: 17.16 MIN: 15.95 / MAX: 16.58 MIN: 15.76 / MAX: 24.56 MIN: 15.91 / MAX: 274.26 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: yolov4-tiny a b c d 6 12 18 24 30 SE +/- 0.08, N = 4 25.05 24.59 24.92 25.31 MIN: 24.63 / MAX: 44.44 MIN: 24.35 / MAX: 25.01 MIN: 24.55 / MAX: 25.65 MIN: 25.06 / MAX: 26.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: squeezenet_ssd a b c d 5 10 15 20 25 SE +/- 0.15, N = 4 18.31 17.54 18.04 19.18 MIN: 17.83 / MAX: 26.62 MIN: 17.31 / MAX: 18.18 MIN: 17.83 / MAX: 18.56 MIN: 18.04 / MAX: 165.28 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: regnety_400m a b c d 8 16 24 32 40 SE +/- 0.17, N = 4 33.07 33.24 32.82 33.20 MIN: 32.56 / MAX: 188.49 MIN: 32.54 / MAX: 166.34 MIN: 32.65 / MAX: 48.88 MIN: 33.01 / MAX: 35.05 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: vision_transformer a b c d 10 20 30 40 50 SE +/- 0.56, N = 4 43.49 40.76 43.02 42.14 MIN: 39.99 / MAX: 695.96 MIN: 40.33 / MAX: 48.85 MIN: 40.17 / MAX: 328 MIN: 39.94 / MAX: 176.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: CPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: CPU - Model: FastestDet a b c d 3 6 9 12 15 SE +/- 0.09, N = 4 12.72 12.92 12.78 12.23 MIN: 12.35 / MAX: 21.71 MIN: 12.82 / MAX: 13.42 MIN: 12.65 / MAX: 21.72 MIN: 12.15 / MAX: 12.93 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mobilenet a b c d 4 8 12 16 20 SE +/- 0.06, N = 3 15.53 16.01 15.90 16.27 MIN: 14.75 / MAX: 166.8 MIN: 14.83 / MAX: 287.48 MIN: 15.82 / MAX: 16.54 MIN: 15.66 / MAX: 147.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 a b c d 2 4 6 8 10 SE +/- 0.20, N = 3 7.65 7.84 7.86 7.91 MIN: 7.14 / MAX: 10.2 MIN: 7.73 / MAX: 9.01 MIN: 7.75 / MAX: 8.98 MIN: 7.81 / MAX: 9.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b c d 2 4 6 8 10 SE +/- 0.09, N = 3 8.47 8.08 8.55 8.53 MIN: 8.06 / MAX: 43.04 MIN: 7.98 / MAX: 9.07 MIN: 8.45 / MAX: 9.56 MIN: 8.44 / MAX: 9.69 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: shufflenet-v2 a b c d 3 6 9 12 15 SE +/- 0.12, N = 3 10.64 10.31 10.69 10.75 MIN: 10.32 / MAX: 18.32 MIN: 10.25 / MAX: 10.71 MIN: 10.63 / MAX: 11.3 MIN: 10.69 / MAX: 11.29 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: mnasnet a b c d 2 4 6 8 10 SE +/- 0.32, N = 3 7.84 7.21 7.83 7.81 MIN: 7.19 / MAX: 145.21 MIN: 7.15 / MAX: 7.74 MIN: 7.76 / MAX: 8.2 MIN: 7.73 / MAX: 8.34 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: efficientnet-b0 a b c d 3 6 9 12 15 SE +/- 0.17, N = 3 11.12 10.71 11.27 11.24 MIN: 10.7 / MAX: 11.97 MIN: 10.64 / MAX: 11.31 MIN: 11.21 / MAX: 11.91 MIN: 11.18 / MAX: 11.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: blazeface a b c d 0.9653 1.9306 2.8959 3.8612 4.8265 SE +/- 0.05, N = 3 4.25 4.16 4.29 4.28 MIN: 4.11 / MAX: 4.76 MIN: 4.13 / MAX: 4.7 MIN: 4.26 / MAX: 4.61 MIN: 4.26 / MAX: 4.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: googlenet a b c d 4 8 12 16 20 SE +/- 0.19, N = 3 15.13 14.79 15.31 15.32 MIN: 14.67 / MAX: 15.95 MIN: 14.64 / MAX: 23.47 MIN: 15.17 / MAX: 15.97 MIN: 15.15 / MAX: 29.15 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vgg16 a b c d 7 14 21 28 35 SE +/- 0.53, N = 3 30.77 30.24 31.58 30.64 MIN: 29.22 / MAX: 55.24 MIN: 29.83 / MAX: 34.6 MIN: 30.78 / MAX: 32.25 MIN: 29.89 / MAX: 31.45 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet18 a b c d 2 4 6 8 10 SE +/- 0.01, N = 3 8.41 8.49 8.40 8.42 MIN: 8.33 / MAX: 9.68 MIN: 8.44 / MAX: 8.86 MIN: 8.35 / MAX: 9.01 MIN: 8.37 / MAX: 8.64 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: alexnet a b c d 0.9855 1.971 2.9565 3.942 4.9275 SE +/- 0.00, N = 3 4.35 4.30 4.38 4.34 MIN: 4.16 / MAX: 4.98 MIN: 4.15 / MAX: 4.78 MIN: 4.19 / MAX: 12.97 MIN: 4.18 / MAX: 4.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: resnet50 a b c d 4 8 12 16 20 SE +/- 0.04, N = 3 14.67 15.05 14.70 15.18 MIN: 14.53 / MAX: 15.27 MIN: 14.98 / MAX: 15.58 MIN: 14.62 / MAX: 15.29 MIN: 15.09 / MAX: 15.94 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 4 8 12 16 20 SE +/- 0.06, N = 3 15.53 16.01 15.90 16.27 MIN: 14.75 / MAX: 166.8 MIN: 14.83 / MAX: 287.48 MIN: 15.82 / MAX: 16.54 MIN: 15.66 / MAX: 147.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: yolov4-tiny a b c d 6 12 18 24 30 SE +/- 0.42, N = 3 24.20 23.73 24.89 24.50 MIN: 23.13 / MAX: 25.24 MIN: 23.48 / MAX: 24.3 MIN: 24.57 / MAX: 25.81 MIN: 24.15 / MAX: 25.19 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: squeezenet_ssd a b c d 4 8 12 16 20 SE +/- 0.39, N = 3 17.62 16.84 18.05 17.69 MIN: 16.8 / MAX: 168.75 MIN: 16.71 / MAX: 25.09 MIN: 17.95 / MAX: 18.63 MIN: 17.57 / MAX: 18.43 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: regnety_400m a b c d 8 16 24 32 40 SE +/- 0.38, N = 3 32.46 32.64 33.30 32.83 MIN: 31.48 / MAX: 39.76 MIN: 31.4 / MAX: 315.52 MIN: 32.59 / MAX: 177.33 MIN: 32.53 / MAX: 42.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: vision_transformer a b c d 10 20 30 40 50 SE +/- 0.63, N = 3 42.37 41.61 42.21 41.14 MIN: 40.41 / MAX: 257.66 MIN: 40.26 / MAX: 214.45 MIN: 39.99 / MAX: 306 MIN: 40.24 / MAX: 165.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20241226 Target: Vulkan GPU - Model: FastestDet a b c d 3 6 9 12 15 SE +/- 0.04, N = 3 12.74 12.51 11.19 12.63 MIN: 12.58 / MAX: 14.39 MIN: 12.42 / MAX: 13.37 MIN: 11.09 / MAX: 11.81 MIN: 12.53 / MAX: 13.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b c d 4 8 12 16 20 SE +/- 0.00, N = 3 15.78 15.65 15.81 15.79 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b c d 16 32 48 64 80 SE +/- 0.36, N = 3 68.70 70.27 69.09 70.54 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b c d 16 32 48 64 80 SE +/- 0.11, N = 3 70.15 70.02 70.15 69.11 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b c d 16 32 48 64 80 SE +/- 0.24, N = 3 69.77 68.58 69.22 69.96 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b c d 4 8 12 16 20 SE +/- 0.07, N = 3 16.60 16.66 16.52 16.67 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b c d 16 32 48 64 80 SE +/- 0.22, N = 3 70.76 68.65 68.65 69.66 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b c d 16 32 48 64 80 SE +/- 0.33, N = 3 69.79 70.30 69.62 68.76 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b c d 16 32 48 64 80 SE +/- 0.26, N = 3 69.39 69.99 69.54 69.74 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b c d 20 40 60 80 100 SE +/- 0.21, N = 3 75.57 75.40 75.88 75.13 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b c d 60 120 180 240 300 SE +/- 3.83, N = 3 268.35 272.33 283.43 283.50 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b c d 60 120 180 240 300 SE +/- 1.45, N = 3 275.79 278.00 268.94 277.51 1. (CXX) g++ options: -O3
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4397 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b c d 60 120 180 240 300 SE +/- 1.47, N = 3 250.90 242.95 250.67 260.63 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5