llama ncnn 9950X

AMD Ryzen 9 9950X 16-Core testing with a ASRock X870E Taichi (3.12.AS02 BIOS) and AMD Radeon RX 7800 XT 16GB on Ubuntu 24.04 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2412293-SYST-LLAMANC79
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
a
December 29 2024
  1 Hour, 50 Minutes
b
December 29 2024
  5 Hours, 30 Minutes
c
December 29 2024
  36 Minutes
d
December 29 2024
  36 Minutes
Invert Behavior (Only Show Selected Data)
  2 Hours, 8 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


llama ncnn 9950XOpenBenchmarking.orgPhoronix Test SuiteAMD Ryzen 9 9950X 16-Core @ 5.75GHz (16 Cores / 32 Threads)ASRock X870E Taichi (3.12.AS02 BIOS)AMD Device 14d82 x 16GB DDR5-6000MT/s F5-6000J2836G16GWestern Digital WD_BLACK SN850X 2000GB + 32GB Flash DriveAMD Radeon RX 7800 XT 16GBAMD Navi 31 HDMI/DPDELL U2723QERealtek Device 8126 + MEDIATEK Device 0717Ubuntu 24.046.12.3-061203-generic (x86_64)GNOME Shell 46.0X Server 1.21.1.11 + Wayland4.6 Mesa 24.2.0-devel (LLVM 18.1.7 DRM 3.59)GCC 13.3.0ext43840x2160ProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLCompilerFile-SystemScreen ResolutionLlama Ncnn 9950X BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - CPU Microcode: 0xb404023 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

abcdResult OverviewPhoronix Test Suite100%113%126%139%NCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNLlama.cppNCNNNCNNNCNNNCNNNCNNNCNNNCNNLlama.cppNCNNNCNNNCNNNCNNNCNNNCNNNCNNLlama.cppNCNNNCNNLlama.cppNCNNNCNNNCNNLlama.cppLlama.cppLlama.cppNCNNNCNNNCNNLlama.cppNCNNNCNNNCNNNCNNNCNNNCNNLlama.cppLlama.cppLlama.cppLlama.cppVulkan GPU - FastestDetCPU - FastestDetCPU - mnasnetVulkan GPU - yolov4-tinyVulkan GPU - squeezenet_ssdVulkan GPU - mobilenetV.G.y.y - mobilenetv2-yolov3CPU - blazefaceCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.5Vulkan GPU - vision_transformerCPU-v2-v2 - mobilenet-v2CPU - vgg16CPU - squeezenet_ssdCPU - regnety_400mCPU-v3-v3 - mobilenet-v3CPU - efficientnet-b0CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.1Vulkan GPU - alexnetVulkan GPU - efficientnet-b0Vulkan GPU-v2-v2 - mobilenet-v2Vulkan GPU - blazefaceCPU - alexnetCPU - yolov4-tinyCPU - googlenetCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.2CPU - mobilenetCPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.5CPU - shufflenet-v2Vulkan GPU - vgg16Vulkan GPU - mnasnetCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.1CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - P.P.5CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.2Vulkan GPU - googlenetVulkan GPU - resnet18CPU - resnet18CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - P.P.1CPU - vision_transformerVulkan GPU - regnety_400mVulkan GPU - resnet50CPU - resnet50Vulkan GPU - shufflenet-v2Vulkan GPU-v3-v3 - mobilenet-v3CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - T.G.1CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - P.P.2CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - T.G.1CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - T.G.1

llama ncnn 9950Xncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: CPU - vision_transformerncnn: CPU - FastestDetncnn: Vulkan GPU - mobilenetncnn: Vulkan GPU-v2-v2 - mobilenet-v2ncnn: Vulkan GPU-v3-v3 - mobilenet-v3ncnn: Vulkan GPU - shufflenet-v2ncnn: Vulkan GPU - mnasnetncnn: Vulkan GPU - efficientnet-b0ncnn: Vulkan GPU - blazefacencnn: Vulkan GPU - googlenetncnn: Vulkan GPU - vgg16ncnn: Vulkan GPU - resnet18ncnn: Vulkan GPU - alexnetncnn: Vulkan GPU - resnet50ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: Vulkan GPU - yolov4-tinyncnn: Vulkan GPU - squeezenet_ssdncnn: Vulkan GPU - regnety_400mncnn: Vulkan GPU - vision_transformerncnn: Vulkan GPU - FastestDetllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048abcd6.842.452.372.192.203.090.925.7823.654.053.309.016.8410.465.486.3426.572.466.922.562.402.272.423.190.976.0124.013.993.208.976.9210.885.536.5127.892.749.2590.4791.8688.949.7489.9691.6388.1965.97412.57396.01372.756.772.472.342.202.323.130.945.8423.524.003.269.016.7710.545.496.3426.822.666.792.532.402.252.403.210.955.9123.624.043.299.016.7910.595.476.4426.592.489.2591.7390.5989.409.7589.9292.0190.5565.92418.20395.63372.046.942.562.42.232.433.210.965.8524.773.993.29.16.9410.415.766.6226.561.86.412.572.422.262.433.20.985.9423.983.983.198.916.4110.255.136.4626.852.079.2487.4494.6389.69.890.0991.7390.4466.05414.72391.87371.896.792.592.452.242.533.230.995.9523.644.053.39.026.7910.735.676.6426.492.646.812.612.412.272.453.30.975.9724.1543.338.946.819.735.586.5226.311.819.2692.8292.690.59.7592.0893.4989.7165.86410.23397.2371.23OpenBenchmarking.org

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mobilenetabcd246810SE +/- 0.06, N = 3SE +/- 0.05, N = 96.846.776.946.79MIN: 6.66 / MAX: 13.62MIN: 6.49 / MAX: 28.5MIN: 6.9 / MAX: 7.15MIN: 6.74 / MAX: 6.921. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v2-v2 - Model: mobilenet-v2abcd0.58281.16561.74842.33122.914SE +/- 0.11, N = 3SE +/- 0.06, N = 92.452.472.562.59MIN: 2.22 / MAX: 3.55MIN: 2.12 / MAX: 4.06MIN: 2.54 / MAX: 3.26MIN: 2.57 / MAX: 3.321. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v3-v3 - Model: mobilenet-v3abcd0.55131.10261.65392.20522.7565SE +/- 0.04, N = 3SE +/- 0.04, N = 92.372.342.402.45MIN: 2.26 / MAX: 3.84MIN: 2.07 / MAX: 3.58MIN: 2.37 / MAX: 3.08MIN: 2.43 / MAX: 3.251. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: shufflenet-v2abcd0.5041.0081.5122.0162.52SE +/- 0.04, N = 3SE +/- 0.02, N = 92.192.202.232.24MIN: 2.08 / MAX: 3.1MIN: 2.06 / MAX: 6.27MIN: 2.21 / MAX: 2.91MIN: 2.21 / MAX: 2.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mnasnetabcd0.56931.13861.70792.27722.8465SE +/- 0.11, N = 3SE +/- 0.06, N = 92.202.322.432.53MIN: 1.99 / MAX: 3.11MIN: 1.95 / MAX: 3.84MIN: 2.4 / MAX: 3.29MIN: 2.47 / MAX: 9.551. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: efficientnet-b0abcd0.72681.45362.18042.90723.634SE +/- 0.10, N = 3SE +/- 0.04, N = 93.093.133.213.23MIN: 2.93 / MAX: 10.74MIN: 2.84 / MAX: 4.74MIN: 3.17 / MAX: 4.38MIN: 3.2 / MAX: 4.391. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: blazefaceabcd0.22280.44560.66840.89121.114SE +/- 0.04, N = 3SE +/- 0.02, N = 90.920.940.960.99MIN: 0.85 / MAX: 1.88MIN: 0.82 / MAX: 1.84MIN: 0.95 / MAX: 1.07MIN: 0.98 / MAX: 1.231. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: googlenetabcd1.33882.67764.01645.35526.694SE +/- 0.09, N = 3SE +/- 0.04, N = 95.785.845.855.95MIN: 5.59 / MAX: 8.98MIN: 5.54 / MAX: 9.4MIN: 5.8 / MAX: 6.04MIN: 5.83 / MAX: 6.111. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vgg16abcd612182430SE +/- 0.46, N = 3SE +/- 0.18, N = 923.6523.5224.7723.64MIN: 22.18 / MAX: 73.49MIN: 22.14 / MAX: 92.6MIN: 22.23 / MAX: 104.18MIN: 22.18 / MAX: 78.611. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet18abcd0.91131.82262.73393.64524.5565SE +/- 0.02, N = 3SE +/- 0.01, N = 94.054.003.994.05MIN: 3.94 / MAX: 7.83MIN: 3.91 / MAX: 4.2MIN: 3.91 / MAX: 4.15MIN: 3.97 / MAX: 4.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: alexnetabcd0.74251.4852.22752.973.7125SE +/- 0.05, N = 3SE +/- 0.03, N = 93.303.263.203.30MIN: 3.18 / MAX: 3.81MIN: 3.18 / MAX: 3.61MIN: 3.17 / MAX: 3.32MIN: 3.27 / MAX: 3.411. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet50abcd3691215SE +/- 0.06, N = 3SE +/- 0.02, N = 99.019.019.109.02MIN: 8.86 / MAX: 16.7MIN: 8.81 / MAX: 16.39MIN: 9 / MAX: 9.35MIN: 8.97 / MAX: 9.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3abcd246810SE +/- 0.06, N = 3SE +/- 0.05, N = 96.846.776.946.79MIN: 6.66 / MAX: 13.62MIN: 6.49 / MAX: 28.5MIN: 6.9 / MAX: 7.15MIN: 6.74 / MAX: 6.921. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: yolov4-tinyabcd3691215SE +/- 0.27, N = 3SE +/- 0.05, N = 910.4610.5410.4110.73MIN: 9.62 / MAX: 56.72MIN: 10.21 / MAX: 26.19MIN: 9.46 / MAX: 19.13MIN: 10.49 / MAX: 15.561. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: squeezenet_ssdabcd1.2962.5923.8885.1846.48SE +/- 0.08, N = 3SE +/- 0.07, N = 95.485.495.765.67MIN: 5.26 / MAX: 9.58MIN: 5.09 / MAX: 5.87MIN: 5.72 / MAX: 5.91MIN: 5.6 / MAX: 5.831. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: regnety_400mabcd246810SE +/- 0.11, N = 3SE +/- 0.06, N = 96.346.346.626.64MIN: 6.05 / MAX: 12.6MIN: 5.91 / MAX: 8.04MIN: 6.29 / MAX: 34.38MIN: 6.58 / MAX: 6.781. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vision_transformerabcd612182430SE +/- 0.03, N = 3SE +/- 0.12, N = 926.5726.8226.5626.49MIN: 25.83 / MAX: 50.83MIN: 26.28 / MAX: 68.9MIN: 26.46 / MAX: 26.89MIN: 26.38 / MAX: 26.841. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: FastestDetabcd0.59851.1971.79552.3942.9925SE +/- 0.33, N = 3SE +/- 0.04, N = 92.462.661.802.64MIN: 1.77 / MAX: 4.58MIN: 2.45 / MAX: 2.99MIN: 1.77 / MAX: 1.88MIN: 2.61 / MAX: 2.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mobilenetabcd246810SE +/- 0.08, N = 3SE +/- 0.06, N = 156.926.796.416.81MIN: 6.72 / MAX: 58.33MIN: 6.33 / MAX: 38.29MIN: 6.38 / MAX: 6.51MIN: 6.76 / MAX: 6.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2abcd0.58731.17461.76192.34922.9365SE +/- 0.01, N = 3SE +/- 0.03, N = 152.562.532.572.61MIN: 2.52 / MAX: 3.79MIN: 2.2 / MAX: 3.58MIN: 2.55 / MAX: 3.69MIN: 2.59 / MAX: 3.31. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3abcd0.54451.0891.63352.1782.7225SE +/- 0.01, N = 3SE +/- 0.02, N = 152.402.402.422.41MIN: 2.36 / MAX: 3.96MIN: 2.2 / MAX: 3.83MIN: 2.39 / MAX: 3.74MIN: 2.38 / MAX: 3.271. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: shufflenet-v2abcd0.51081.02161.53242.04322.554SE +/- 0.01, N = 3SE +/- 0.01, N = 152.272.252.262.27MIN: 2.22 / MAX: 7.29MIN: 2.13 / MAX: 4.56MIN: 2.24 / MAX: 2.74MIN: 2.24 / MAX: 3.531. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: mnasnetabcd0.55131.10261.65392.20522.7565SE +/- 0.01, N = 3SE +/- 0.03, N = 152.422.402.432.45MIN: 2.39 / MAX: 4.45MIN: 2.1 / MAX: 9.89MIN: 2.4 / MAX: 3.58MIN: 2.43 / MAX: 2.91. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: efficientnet-b0abcd0.74251.4852.22752.973.7125SE +/- 0.01, N = 3SE +/- 0.02, N = 153.193.213.203.30MIN: 3.14 / MAX: 4.7MIN: 2.98 / MAX: 12.56MIN: 3.17 / MAX: 4.07MIN: 3.25 / MAX: 5.071. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: blazefaceabcd0.22050.4410.66150.8821.1025SE +/- 0.01, N = 3SE +/- 0.01, N = 150.970.950.980.97MIN: 0.95 / MAX: 1.59MIN: 0.84 / MAX: 1.96MIN: 0.97 / MAX: 1MIN: 0.96 / MAX: 1.081. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: googlenetabcd246810SE +/- 0.10, N = 3SE +/- 0.04, N = 156.015.915.945.97MIN: 5.82 / MAX: 52.97MIN: 5.57 / MAX: 7.74MIN: 5.88 / MAX: 6.31MIN: 5.91 / MAX: 6.161. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vgg16abcd612182430SE +/- 0.41, N = 3SE +/- 0.11, N = 1524.0123.6223.9824.15MIN: 22.2 / MAX: 71.88MIN: 22.14 / MAX: 85.86MIN: 22.14 / MAX: 76.35MIN: 22.23 / MAX: 116.391. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet18abcd0.9091.8182.7273.6364.545SE +/- 0.03, N = 3SE +/- 0.02, N = 153.994.043.984.00MIN: 3.91 / MAX: 9.9MIN: 3.88 / MAX: 28.94MIN: 3.92 / MAX: 4.16MIN: 3.94 / MAX: 4.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: alexnetabcd0.74931.49862.24792.99723.7465SE +/- 0.00, N = 3SE +/- 0.02, N = 153.203.293.193.33MIN: 3.17 / MAX: 3.59MIN: 3.16 / MAX: 4.4MIN: 3.17 / MAX: 3.38MIN: 3.3 / MAX: 3.581. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: resnet50abcd3691215SE +/- 0.03, N = 3SE +/- 0.03, N = 158.979.018.918.94MIN: 8.83 / MAX: 34.28MIN: 8.53 / MAX: 29.89MIN: 8.83 / MAX: 9.33MIN: 8.87 / MAX: 9.181. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3abcd246810SE +/- 0.08, N = 3SE +/- 0.06, N = 156.926.796.416.81MIN: 6.72 / MAX: 58.33MIN: 6.33 / MAX: 38.29MIN: 6.38 / MAX: 6.51MIN: 6.76 / MAX: 6.981. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: yolov4-tinyabcd3691215SE +/- 0.21, N = 3SE +/- 0.06, N = 1510.8810.5910.259.73MIN: 10.45 / MAX: 67.19MIN: 9.55 / MAX: 64.48MIN: 9.54 / MAX: 84.53MIN: 9.47 / MAX: 16.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: squeezenet_ssdabcd1.25552.5113.76655.0226.2775SE +/- 0.03, N = 3SE +/- 0.06, N = 155.535.475.135.58MIN: 5.43 / MAX: 5.83MIN: 5.09 / MAX: 5.97MIN: 5.08 / MAX: 5.62MIN: 5.53 / MAX: 5.711. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: regnety_400mabcd246810SE +/- 0.04, N = 3SE +/- 0.03, N = 156.516.446.466.52MIN: 6.33 / MAX: 29.36MIN: 6.07 / MAX: 23.61MIN: 6.4 / MAX: 6.61MIN: 6.49 / MAX: 6.691. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: vision_transformerabcd714212835SE +/- 1.18, N = 3SE +/- 0.09, N = 1527.8926.5926.8526.31MIN: 26.33 / MAX: 415.2MIN: 25.99 / MAX: 55.19MIN: 26.56 / MAX: 37.88MIN: 25.93 / MAX: 42.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: Vulkan GPU - Model: FastestDetabcd0.61651.2331.84952.4663.0825SE +/- 0.06, N = 3SE +/- 0.09, N = 152.742.482.071.81MIN: 2.59 / MAX: 2.9MIN: 1.79 / MAX: 95.79MIN: 2.05 / MAX: 2.32MIN: 1.79 / MAX: 1.891. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128abcd3691215SE +/- 0.00, N = 3SE +/- 0.00, N = 39.259.259.249.261. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512abcd20406080100SE +/- 0.87, N = 6SE +/- 0.93, N = 390.4791.7387.4492.821. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024abcd20406080100SE +/- 0.48, N = 3SE +/- 0.58, N = 391.8690.5994.6392.601. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048abcd20406080100SE +/- 0.42, N = 3SE +/- 0.55, N = 388.9489.4089.6090.501. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128abcd3691215SE +/- 0.02, N = 3SE +/- 0.01, N = 39.749.759.809.751. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512abcd20406080100SE +/- 0.46, N = 3SE +/- 0.79, N = 889.9689.9290.0992.081. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024abcd20406080100SE +/- 1.24, N = 3SE +/- 1.02, N = 391.6392.0191.7393.491. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048abcd20406080100SE +/- 0.49, N = 3SE +/- 0.47, N = 388.1990.5590.4489.711. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128abcd1530456075SE +/- 0.06, N = 3SE +/- 0.03, N = 365.9765.9266.0565.861. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512abcd90180270360450SE +/- 3.32, N = 3SE +/- 3.52, N = 3412.57418.20414.72410.231. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024abcd90180270360450SE +/- 2.61, N = 3SE +/- 2.77, N = 3396.01395.63391.87397.201. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048abcd80160240320400SE +/- 0.84, N = 3SE +/- 0.66, N = 3372.75372.04371.89371.231. (CXX) g++ options: -O3

48 Results Shown

NCNN:
  CPU - mobilenet
  CPU-v2-v2 - mobilenet-v2
  CPU-v3-v3 - mobilenet-v3
  CPU - shufflenet-v2
  CPU - mnasnet
  CPU - efficientnet-b0
  CPU - blazeface
  CPU - googlenet
  CPU - vgg16
  CPU - resnet18
  CPU - alexnet
  CPU - resnet50
  CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  CPU - yolov4-tiny
  CPU - squeezenet_ssd
  CPU - regnety_400m
  CPU - vision_transformer
  CPU - FastestDet
  Vulkan GPU - mobilenet
  Vulkan GPU-v2-v2 - mobilenet-v2
  Vulkan GPU-v3-v3 - mobilenet-v3
  Vulkan GPU - shufflenet-v2
  Vulkan GPU - mnasnet
  Vulkan GPU - efficientnet-b0
  Vulkan GPU - blazeface
  Vulkan GPU - googlenet
  Vulkan GPU - vgg16
  Vulkan GPU - resnet18
  Vulkan GPU - alexnet
  Vulkan GPU - resnet50
  Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3
  Vulkan GPU - yolov4-tiny
  Vulkan GPU - squeezenet_ssd
  Vulkan GPU - regnety_400m
  Vulkan GPU - vision_transformer
  Vulkan GPU - FastestDet
Llama.cpp:
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048