9654 llama ncnn

Tests for a future article. 2 x AMD EPYC 9654 96-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 24.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2412296-NE-9654LLAMA11
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
a
December 29 2024
  1 Hour, 49 Minutes
b
December 29 2024
  1 Hour, 53 Minutes
Invert Behavior (Only Show Selected Data)
  1 Hour, 51 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


9654 llama ncnnOpenBenchmarking.orgPhoronix Test Suite2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads)AMD Titanite_4G (RTI1007B BIOS)AMD Device 14a41520GB3201GB Micron_7450_MTFDKCB3T2TFS + 257GB Flash DriveASPEEDBroadcom NetXtreme BCM5720 PCIeUbuntu 24.106.11.0-13-generic (x86_64)GNOME Shell 47.0X ServerGCC 14.2.0ext41920x1200ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen Resolution9654 Llama Ncnn BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa101148 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

a vs. b ComparisonPhoronix Test SuiteBaseline+4%+4%+8%+8%+12%+12%16%5.6%4.1%2.5%CPU-v3-v3 - mobilenet-v3CPU - vgg1612%CPU - googlenet11.3%CPU - mnasnet10.3%CPU - resnet188.9%CPU - yolov4-tiny8.5%CPU - resnet507.9%CPU - mobilenet7.8%CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov37.8%CPU - alexnet7.2%CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.2CPU - squeezenet_ssd4.4%CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.5CPU-v2-v2 - mobilenet-v23.3%CPU - efficientnet-b02.9%CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.2NCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNNCNNLlama.cppNCNNLlama.cppNCNNNCNNLlama.cppab

9654 llama ncnnncnn: CPU - mobilenetncnn: CPU-v2-v2 - mobilenet-v2ncnn: CPU-v3-v3 - mobilenet-v3ncnn: CPU - shufflenet-v2ncnn: CPU - mnasnetncnn: CPU - efficientnet-b0ncnn: CPU - blazefacencnn: CPU - googlenetncnn: CPU - vgg16ncnn: CPU - resnet18ncnn: CPU - alexnetncnn: CPU - resnet50ncnn: CPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3ncnn: CPU - yolov4-tinyncnn: CPU - squeezenet_ssdncnn: CPU - regnety_400mncnn: CPU - vision_transformerncnn: CPU - FastestDetllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048ab34.8424.5931.6830.6922.3635.7916.3640.0660.4925.4710.5643.434.844451.3142.5171.835.4720.7480.2675.6174.7521.5475.1579.8677.2335.6168.94166.49162.3437.5625.3927.331.2224.6736.8216.3244.667.7627.7311.3246.8537.5647.7453.55141.3671.4935.6820.5579.3175.7278.9621.4178.2578.8579.1736.04169.42164.59161.38OpenBenchmarking.org

NCNN

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mobilenetba91827364537.5634.84MIN: 37.35 / MAX: 39.06MIN: 34.6 / MAX: 36.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v2-v2 - Model: mobilenet-v2ba61218243025.3924.59MIN: 25.06 / MAX: 27.15MIN: 24.4 / MAX: 26.241. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU-v3-v3 - Model: mobilenet-v3ba71421283527.3031.68MIN: 27.04 / MAX: 38.58MIN: 27.33 / MAX: 469.661. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: shufflenet-v2ba71421283531.2230.69MIN: 30.69 / MAX: 67.53MIN: 29.52 / MAX: 40.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: mnasnetba61218243024.6722.36MIN: 24.37 / MAX: 28.09MIN: 22.21 / MAX: 24.511. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: efficientnet-b0ba81624324036.8235.79MIN: 36.52 / MAX: 51.88MIN: 35.61 / MAX: 37.491. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: blazefaceba4812162016.3216.36MIN: 16.09 / MAX: 34.43MIN: 16.24 / MAX: 17.791. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: googlenetba102030405044.6040.06MIN: 44.28 / MAX: 46.21MIN: 39.88 / MAX: 41.521. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vgg16ba153045607567.7660.49MIN: 66.94 / MAX: 70.22MIN: 59.38 / MAX: 67.141. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet18ba71421283527.7325.47MIN: 27.44 / MAX: 29.28MIN: 25.27 / MAX: 26.991. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: alexnetba369121511.3210.56MIN: 10.59 / MAX: 13.13MIN: 10.2 / MAX: 11.971. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: resnet50ba112233445546.8543.40MIN: 46.41 / MAX: 50.7MIN: 42.52 / MAX: 195.11. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3ba91827364537.5634.84MIN: 37.35 / MAX: 39.06MIN: 34.6 / MAX: 36.421. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: yolov4-tinyba112233445547.7444.00MIN: 46.8 / MAX: 49.69MIN: 42.75 / MAX: 45.751. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: squeezenet_ssdba122436486053.5551.30MIN: 52.9 / MAX: 69.09MIN: 50.89 / MAX: 53.011. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: regnety_400mba306090120150141.36142.51MIN: 139.15 / MAX: 147.1MIN: 141.15 / MAX: 211.341. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: vision_transformerba163248648071.4971.80MIN: 68.86 / MAX: 74.1MIN: 68.53 / MAX: 76.021. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20241226Target: CPU - Model: FastestDetba81624324035.6835.47MIN: 35.42 / MAX: 37.42MIN: 35.2 / MAX: 37.171. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128ba51015202520.5520.741. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512ba2040608010079.3180.261. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024ba2040608010075.7275.611. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048ba2040608010078.9674.751. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128ba51015202521.4121.541. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512ba2040608010078.2575.151. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024ba2040608010078.8579.861. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048ba2040608010079.1777.231. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128ba81624324036.0435.601. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512ba4080120160200169.42168.941. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024ba4080120160200164.59166.491. (CXX) g++ options: -O3

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4397Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048ba4080120160200161.38162.341. (CXX) g++ options: -O3