ncnn llama arm

ARMv8 Neoverse-N1 testing with a System76 Thelio Astra (3.02 BIOS) and NVIDIA RTX A400/PCIe 4GB on Ubuntu 24.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2412300-PTS-NCNNLLAM46
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
a
December 29 2024
  14 Hours, 51 Minutes
b
December 30 2024
  14 Hours, 51 Minutes
c
December 30 2024
  14 Hours, 52 Minutes
d
December 30 2024
  14 Hours, 51 Minutes
Invert Behavior (Only Show Selected Data)
  14 Hours, 51 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


ncnn llama armOpenBenchmarking.orgPhoronix Test SuiteARMv8 Neoverse-N1 @ 3.00GHz (128 Cores)System76 Thelio Astra (3.02 BIOS)Ampere Computing LLC Altra PCI Root Complex A8 x 32GB DDR4-3200MT/s Micron 18ASF4G72PDZ-3G2F11024GB KINGSTON SKC3000S1024GNVIDIA RTX A400/PCIe 4GBNVIDIA Device 2291DELL P2415Q2 x Intel X550 + Intel I210Ubuntu 24.046.8.0-48-generic-64k (aarch64)GNOME Shell 46.0X ServerNVIDIA 550.1204.6.0GCC 13.2.0ext43840x2160ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerDisplay DriverOpenGLCompilerFile-SystemScreen ResolutionNcnn Llama Arm BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-dIwDw0/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v - Scaling Governor: cppc_cpufreq schedutil (Boost: Disabled)- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 BHB + srbds: Not affected + tsx_async_abort: Not affected

abcdResult OverviewPhoronix Test Suite100%101%102%103%Llama.cppLlama.cppLlama.cppLlama.cppNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNLlama.cppLlama.cppNCNNNCNNNCNNNCNNLlama.cppLlama.cppNCNNNCNNNCNNNCNNNCNNNCNNLlama.cppNCNNNCNNLlama.cppNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNLlama.cppLlama.cppCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.5CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.5CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - P.P.5CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - P.P.2Vulkan GPU - vgg16CPU - alexnetCPU - shufflenet-v2Vulkan GPU - mnasnetCPU - squeezenet_ssdVulkan GPU-v3-v3 - mobilenet-v3CPU - mnasnetVulkan GPU - FastestDetVulkan GPU-v2-v2 - mobilenet-v2Vulkan GPU - yolov4-tinyVulkan GPU - mobilenetV.G.y.y - mobilenetv2-yolov3CPU - googlenetCPU-v2-v2 - mobilenet-v2Vulkan GPU - googlenetCPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - P.P.1CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.1CPU - mobilenetCPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3Vulkan GPU - squeezenet_ssdCPU - yolov4-tinyCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.2CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.2Vulkan GPU - resnet18CPU - vgg16Vulkan GPU - shufflenet-v2CPU-v3-v3 - mobilenet-v3CPU - efficientnet-b0Vulkan GPU - vision_transformerCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - T.G.1Vulkan GPU - resnet50CPU - vision_transformerCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.1Vulkan GPU - alexnetCPU - FastestDetCPU - regnety_400mVulkan GPU - efficientnet-b0CPU - resnet18CPU - resnet50CPU - blazefaceVulkan GPU - blazefaceVulkan GPU - regnety_400mCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - T.G.1CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - T.G.1

ncnn llama armncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: CPU - vision_transformerncnn: CPU - FastestDetncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048abcd132.32109.39126.91124.14107.34180.1386.92167.0964.2674.6537.61152.92132.3280.31197.6881214.96155.86132.64109.33126.22122.92107.35180.5386.65166.6264.7475.0437.96153.16132.6481.4198.87880.29214.77155.282.0170.5365.863.452.0270.5565.8463.461.23146.54142.75132.64132.57110.23126.22122.9107.58180.4386.72166.6764.2274.7138.03153.31132.5780.15199.41879.25215.28155.88131.57110.25126.98123.67107.14180.8986.84166.8264.6674.5637.98152.86131.5780.73199.52881.65215.84156.442.0169.765.9863.072.0270.76663.621.23144.15142.81133.69133.3109.86126.5123.54107.83180.7786.93168.0164.6274.7737.77153.04133.380.63198.44880.92215.13156.04132.63109.94125.86123.34108.18180.5186.81167.8963.9374.6738.01153.4132.6380.76198.24882.11215.4156.252.0169.1166.0563.422.0267.7766.0363.211.23146.19142.21134.47132.77109.62126.24122.95106.92179.8286.83166.9664.3274.8637.96152.99132.7780.07198.01882.32215.79156.43131.96110.18126.02123.07107.43180.3186.84166.9764.5474.6138.1153.57131.9680.87198.08880.2214.85156.592.0170.0165.8863.492.0167.5766.3363.651.23144.66143.29133.99OpenBenchmarking.org

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mobilenetabcd306090120150132.32132.57133.30132.77MIN: 118.31 / MAX: 145.96MIN: 121.66 / MAX: 144.19MIN: 121.6 / MAX: 144.86MIN: 122.76 / MAX: 145.311. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v2-v2 - Model: mobilenet-v2abcd20406080100109.39110.23109.86109.62MIN: 93.75 / MAX: 119.38MIN: 101.03 / MAX: 122.74MIN: 98.09 / MAX: 125.52MIN: 98.76 / MAX: 121.971. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v3-v3 - Model: mobilenet-v3abcd306090120150126.91126.22126.50126.24MIN: 116.22 / MAX: 145.12MIN: 115.5 / MAX: 139.47MIN: 116.04 / MAX: 138.05MIN: 113.89 / MAX: 141.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: shufflenet-v2abcd306090120150124.14122.90123.54122.95MIN: 111.87 / MAX: 134.45MIN: 94.55 / MAX: 134.47MIN: 113.8 / MAX: 134.17MIN: 112.64 / MAX: 136.21. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mnasnetabcd20406080100107.34107.58107.83106.92MIN: 98.34 / MAX: 117.66MIN: 93.27 / MAX: 119.41MIN: 98.86 / MAX: 118.06MIN: 96.32 / MAX: 119.81. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: efficientnet-b0abcd4080120160200180.13180.43180.77179.82MIN: 166.39 / MAX: 193.3MIN: 169.36 / MAX: 197.54MIN: 167.36 / MAX: 196.56MIN: 166 / MAX: 195.571. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: blazefaceabcd2040608010086.9286.7286.9386.83MIN: 77.11 / MAX: 95.04MIN: 78.08 / MAX: 98.61MIN: 76.41 / MAX: 98.46MIN: 78.35 / MAX: 97.051. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: googlenetabcd4080120160200167.09166.67168.01166.96MIN: 147.27 / MAX: 183.49MIN: 153.38 / MAX: 185.72MIN: 155.2 / MAX: 183.45MIN: 154.63 / MAX: 183.251. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vgg16abcd142842567064.2664.2264.6264.32MIN: 57.02 / MAX: 71.75MIN: 56.62 / MAX: 72.87MIN: 52.84 / MAX: 73.43MIN: 58.78 / MAX: 72.71. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet18abcd2040608010074.6574.7174.7774.86MIN: 66.91 / MAX: 83.42MIN: 67.35 / MAX: 83.24MIN: 66.93 / MAX: 84.49MIN: 64.71 / MAX: 82.131. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: alexnetabcd91827364537.6138.0337.7737.96MIN: 28.86 / MAX: 42.07MIN: 28.66 / MAX: 43.52MIN: 32.68 / MAX: 43.76MIN: 33.05 / MAX: 42.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet50abcd306090120150152.92153.31153.04152.99MIN: 140.29 / MAX: 167.66MIN: 136.16 / MAX: 169.49MIN: 140.84 / MAX: 166.34MIN: 140.02 / MAX: 171.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3abcd306090120150132.32132.57133.30132.77MIN: 118.31 / MAX: 145.96MIN: 121.66 / MAX: 144.19MIN: 121.6 / MAX: 144.86MIN: 122.76 / MAX: 145.311. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: yolov4-tinyabcd2040608010080.3180.1580.6380.07MIN: 69.86 / MAX: 94.45MIN: 71.05 / MAX: 89.27MIN: 71.05 / MAX: 90.36MIN: 66.6 / MAX: 90.261. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: squeezenet_ssdabcd4080120160200197.60199.41198.44198.01MIN: 181.37 / MAX: 218.06MIN: 182.64 / MAX: 217.36MIN: 183.79 / MAX: 218.73MIN: 184.03 / MAX: 215.591. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: regnety_400mabcd2004006008001000881.00879.25880.92882.32MIN: 843.97 / MAX: 924.17MIN: 837.64 / MAX: 922.01MIN: 837 / MAX: 925.11MIN: 845.07 / MAX: 927.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vision_transformerabcd50100150200250214.96215.28215.13215.79MIN: 197.54 / MAX: 234.31MIN: 200.57 / MAX: 230.07MIN: 195.26 / MAX: 230.44MIN: 196.29 / MAX: 232.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: FastestDetabcd306090120150155.86155.88156.04156.43MIN: 143.01 / MAX: 170.02MIN: 144.26 / MAX: 169.94MIN: 143.75 / MAX: 168.99MIN: 141.28 / MAX: 169.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mobilenetabcd306090120150132.64131.57132.63131.96MIN: 121.96 / MAX: 143.93MIN: 119.66 / MAX: 143.37MIN: 114.6 / MAX: 142.36MIN: 120.24 / MAX: 145.061. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2abcd20406080100109.33110.25109.94110.18MIN: 99.92 / MAX: 123.78MIN: 100.5 / MAX: 122.31MIN: 99.96 / MAX: 119.9MIN: 97.9 / MAX: 119.921. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3abcd306090120150126.22126.98125.86126.02MIN: 108.33 / MAX: 138.25MIN: 116.7 / MAX: 140.75MIN: 110.17 / MAX: 137.94MIN: 116.59 / MAX: 142.641. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: shufflenet-v2abcd306090120150122.92123.67123.34123.07MIN: 108.49 / MAX: 133.34MIN: 113.16 / MAX: 137.42MIN: 109.54 / MAX: 134.48MIN: 112.44 / MAX: 135.861. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mnasnetabcd20406080100107.35107.14108.18107.43MIN: 95.52 / MAX: 120.21MIN: 96.13 / MAX: 116.9MIN: 99.67 / MAX: 118.41MIN: 87.55 / MAX: 118.241. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: efficientnet-b0abcd4080120160200180.53180.89180.51180.31MIN: 164.16 / MAX: 203.89MIN: 169.72 / MAX: 196.01MIN: 162.72 / MAX: 194.01MIN: 168.13 / MAX: 195.761. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: blazefaceabcd2040608010086.6586.8486.8186.84MIN: 77.8 / MAX: 95.84MIN: 78.56 / MAX: 102.01MIN: 76.65 / MAX: 95.94MIN: 77.91 / MAX: 96.41. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: googlenetabcd4080120160200166.62166.82167.89166.97MIN: 141.81 / MAX: 182.36MIN: 153.53 / MAX: 180.69MIN: 156.82 / MAX: 185.85MIN: 148.9 / MAX: 181.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vgg16abcd142842567064.7464.6663.9364.54MIN: 54.96 / MAX: 71.39MIN: 54.51 / MAX: 72.3MIN: 55.94 / MAX: 71.38MIN: 56.97 / MAX: 70.241. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet18abcd2040608010075.0474.5674.6774.61MIN: 65.11 / MAX: 83.57MIN: 63.29 / MAX: 82.89MIN: 65.47 / MAX: 83.22MIN: 65.97 / MAX: 82.841. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: alexnetabcd91827364537.9637.9838.0138.10MIN: 31.6 / MAX: 43.78MIN: 32.11 / MAX: 43.17MIN: 29.7 / MAX: 44.14MIN: 29.75 / MAX: 43.811. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet50abcd306090120150153.16152.86153.40153.57MIN: 138.71 / MAX: 165.9MIN: 141.06 / MAX: 165.1MIN: 137.69 / MAX: 167.65MIN: 140.65 / MAX: 166.211. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3abcd306090120150132.64131.57132.63131.96MIN: 121.96 / MAX: 143.93MIN: 119.66 / MAX: 143.37MIN: 114.6 / MAX: 142.36MIN: 120.24 / MAX: 145.061. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: yolov4-tinyabcd2040608010081.4080.7380.7680.87MIN: 72.84 / MAX: 90.16MIN: 71.73 / MAX: 89.82MIN: 69.87 / MAX: 89.54MIN: 72.19 / MAX: 88.941. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: squeezenet_ssdabcd4080120160200198.87199.52198.24198.08MIN: 184.68 / MAX: 215.27MIN: 185.61 / MAX: 213.96MIN: 175.09 / MAX: 220.18MIN: 181.64 / MAX: 213.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: regnety_400mabcd2004006008001000880.29881.65882.11880.20MIN: 847.49 / MAX: 930.34MIN: 848.4 / MAX: 926.66MIN: 831.28 / MAX: 930.35MIN: 849.34 / MAX: 921.741. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vision_transformerabcd50100150200250214.77215.84215.40214.85MIN: 200 / MAX: 230.6MIN: 196.95 / MAX: 231.27MIN: 200.85 / MAX: 230.97MIN: 200.63 / MAX: 227.821. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: FastestDetabcd306090120150155.28156.44156.25156.59MIN: 130.03 / MAX: 167.54MIN: 145.85 / MAX: 170.32MIN: 143.75 / MAX: 168.72MIN: 143.76 / MAX: 169.91. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128abcd0.45230.90461.35691.80922.26152.012.012.012.011. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512abcd163248648070.5369.7069.1170.011. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024abcd153045607565.8065.9866.0565.881. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048abcd142842567063.4563.0763.4263.491. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128abcd0.45450.9091.36351.8182.27252.022.022.022.011. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512abcd163248648070.5570.7067.7767.571. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024abcd153045607565.8466.0066.0366.331. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048abcd142842567063.4663.6263.2163.651. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128abcd0.27680.55360.83041.10721.3841.231.231.231.231. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512abcd306090120150146.54144.15146.19144.661. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024abcd306090120150142.75142.81142.21143.291. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048abcd306090120150132.64133.69134.47133.991. (CXX) g++ options: -O3

48 Results Shown

NCNN:
  CPU - mobilenet
  CPU-v2-v2 - mobilenet-v2
  CPU-v3-v3 - mobilenet-v3
  CPU - shufflenet-v2
  CPU - mnasnet
  CPU - efficientnet-b0
  CPU - blazeface
  CPU - googlenet
  CPU - vgg16
  CPU - resnet18
  CPU - alexnet
  CPU - resnet50
  CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  CPU - yolov4-tiny
  CPU - squeezenet_ssd
  CPU - regnety_400m
  CPU - vision_transformer
  CPU - FastestDet
  Vulkan GPU - mobilenet
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU-v3-v3 - mobilenet-v3
  Vulkan GPU - shufflenet-v2
  Vulkan GPU - mnasnet
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - blazeface
  Vulkan GPU - googlenet
  Vulkan GPU - vgg16
  Vulkan GPU - resnet18
  Vulkan GPU - alexnet
  Vulkan GPU - resnet50
  Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  Vulkan GPU - yolov4-tiny
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - regnety_400m
  Vulkan GPU - vision_transformer
  Vulkan GPU - FastestDet
Llama.cpp:
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048