dddda AMD Ryzen AI 9 365 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and AMD Radeon 512MB on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2412110-NE-DDDDA678639&grr .
dddda Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c AMD Ryzen AI 9 365 @ 4.31GHz (10 Cores / 20 Threads) ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) AMD Device 1507 4 x 6GB LPDDR5-7500MT/s Micron MT62F1536M32D4DS-026 1024GB MTFDKBA1T0QFM-1BD1AABGB AMD Radeon 512MB AMD Rembrandt Radeon HD Audio MEDIATEK Device 7925 Ubuntu 24.10 6.12.0-rc7-phx-eraps (x86_64) GNOME Shell 47.0 X Server + Wayland 4.6 Mesa 24.2.3-1ubuntu1 (LLVM 19.1.0 DRM 3.59) GCC 14.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - amdgpu.dcdebugmask=0x600 - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced Python Details - Python 3.12.7 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; PBRSB-eIBRS: Not affected; BHI: Not affected; ERAPS hardware RSB flush + srbds: Not affected + tsx_async_abort: Not affected
dddda llamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 16 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 2048 openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Noise Suppression Poconet-Like FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Person Vehicle Bike Detection FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Road Segmentation ADAS FP16 - CPU openvino: Vehicle Detection FP16 - CPU openvino: Vehicle Detection FP16 - CPU llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024 openvino: Machine Translation EN To DE FP16 - CPU openvino: Machine Translation EN To DE FP16 - CPU llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 128 openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Handwritten English Recognition FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU openvino: Road Segmentation ADAS FP16-INT8 - CPU llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 1024 openvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Token openvino-genai: Gemma-7b-int4-ov - CPU - Time To First Token openvino-genai: Gemma-7b-int4-ov - CPU llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 2048 llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 128 llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128 llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128 openvino: Face Detection FP16 - CPU openvino: Face Detection FP16 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Face Detection FP16-INT8 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP16 - CPU openvino: Person Detection FP32 - CPU openvino: Person Detection FP32 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Person Re-Identification Retail FP16 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Vehicle Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Face Detection Retail FP16-INT8 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Handwritten English Recognition FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Weld Porosity Detection FP16-INT8 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Face Detection Retail FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino: Age Gender Recognition Retail 0013 FP16 - CPU openvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Token openvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Token openvino-genai: Falcon-7b-instruct-int4-ov - CPU llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128 llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 1024 llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 16 llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 256 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16 openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Token openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Token openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU openvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU - Time Per Output Token openvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU - Time To First Token openvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 512 llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128 llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 256 a b c 0.08 33.24 33.64 126.06 32768 16.30 607.91 9.13 437.20 31.72 126.02 14.09 283.78 36.83 36.83 35.89 81.62 48.99 35.93 13.69 39.27 254.39 23.49 170.05 16384 102.99 204.63 9.71 32768 26.24 10.23 10.81 865.46 4.61 467.21 8.54 90.96 43.95 90.68 44.07 7.37 540.95 8.54 467.17 6.27 1587.70 40.66 245.49 0.53 17776.30 22.37 446.41 11.25 886.77 4.19 950.04 0.77 12337.97 70.17 146 14.25 33.98 13.95 8192 147.47 16384 26.51 4096 34.27 50.42 93.7 19.83 31.81 34.26 31.43 158.11 8192 61.76 4096 0.07 34.19 34.47 153.71 32768 14.88 664.33 8.28 481.89 29.22 136.7 12.7 314.25 40.74 40.1 36.1 73.35 54.49 36.41 13.85 35.29 282.93 22.72 175.74 16384 112.38 215.32 8.9 32768 26.44 10.28 10.83 867.81 4.59 449 8.89 91.66 43.6 91.14 43.86 7.15 557.7 7.87 507.16 5.82 1712.41 37.4 266.88 0.49 18962.53 20.71 481.79 10.46 953.24 3.85 1032.88 0.75 12742.09 83.28 181.63 12.01 33.5 11.99 8192 142.57 16384 22.91 4096 29.99 53.82 100.02 18.58 32.93 35.97 30.36 157.49 8192 60.43 4096 0.07 33.65 33.81 137.25 32768 14.96 661.16 8.4 474.85 29.99 133.18 12.78 312.32 38.9 38.45 36.19 74.77 53.46 35.66 13.89 35.9 278.1 21.89 182.39 16384 105.93 204.71 9.44 32768 26.31 10.31 10.83 882.69 4.52 497.6 8.03 91.4 43.73 92.28 43.31 7.11 561.16 8.04 496.31 5.94 1676.72 39.51 252.75 0.5 18885.08 21.76 458.73 10.83 920.13 4.23 939.9 0.74 12864.63 71.15 148.94 14.06 33.65 12.17 8192 147.12 16384 22.72 4096 30.14 54.9 103.28 18.22 31.87 34.11 31.38 158.18 8192 61.36 4096 OpenBenchmarking.org
Llamafile Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16 a b c 0.018 0.036 0.054 0.072 0.09 SE +/- 0.00, N = 3 0.08 0.07 0.07
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048 a b c 8 16 24 32 40 SE +/- 0.06, N = 3 33.24 34.19 33.65 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048 a b c 8 16 24 32 40 SE +/- 0.15, N = 3 33.64 34.47 33.81 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048 a b c 30 60 90 120 150 SE +/- 1.74, N = 15 126.06 153.71 137.25 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048 a b c 7K 14K 21K 28K 35K SE +/- 0.00, N = 3 32768 32768 32768
OpenVINO Model: Noise Suppression Poconet-Like FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Noise Suppression Poconet-Like FP16 - Device: CPU a b c 4 8 12 16 20 SE +/- 0.12, N = 15 16.30 14.88 14.96 MIN: 8.05 / MAX: 48.27 MIN: 9.08 / MAX: 23.51 MIN: 9.32 / MAX: 24.49 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Noise Suppression Poconet-Like FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Noise Suppression Poconet-Like FP16 - Device: CPU a b c 140 280 420 560 700 SE +/- 4.56, N = 15 607.91 664.33 661.16 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Person Vehicle Bike Detection FP16 - Device: CPU a b c 3 6 9 12 15 SE +/- 0.06, N = 15 9.13 8.28 8.40 MIN: 4.67 / MAX: 38.31 MIN: 5.61 / MAX: 29.51 MIN: 5.46 / MAX: 31.1 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Person Vehicle Bike Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Person Vehicle Bike Detection FP16 - Device: CPU a b c 100 200 300 400 500 SE +/- 2.99, N = 15 437.20 481.89 474.85 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Road Segmentation ADAS FP16 - Device: CPU a b c 7 14 21 28 35 SE +/- 0.21, N = 15 31.72 29.22 29.99 MIN: 14.27 / MAX: 66.49 MIN: 22.58 / MAX: 35.18 MIN: 14.48 / MAX: 36.5 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Road Segmentation ADAS FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Road Segmentation ADAS FP16 - Device: CPU a b c 30 60 90 120 150 SE +/- 0.86, N = 15 126.02 136.70 133.18 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Vehicle Detection FP16 - Device: CPU a b c 4 8 12 16 20 SE +/- 0.15, N = 15 14.09 12.70 12.78 MIN: 6.67 / MAX: 44.25 MIN: 6.35 / MAX: 22.55 MIN: 9.47 / MAX: 20.55 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Vehicle Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Vehicle Detection FP16 - Device: CPU a b c 70 140 210 280 350 SE +/- 3.12, N = 15 283.78 314.25 312.32 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512 a b c 9 18 27 36 45 SE +/- 0.29, N = 10 36.83 40.74 38.90 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512 a b c 9 18 27 36 45 SE +/- 0.29, N = 9 36.83 40.10 38.45 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024 a b c 8 16 24 32 40 SE +/- 0.50, N = 3 35.89 36.10 36.19 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Machine Translation EN To DE FP16 - Device: CPU a b c 20 40 60 80 100 SE +/- 0.57, N = 12 81.62 73.35 74.77 MIN: 51.47 / MAX: 126.22 MIN: 56.95 / MAX: 89.78 MIN: 50.72 / MAX: 91.82 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Machine Translation EN To DE FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Machine Translation EN To DE FP16 - Device: CPU a b c 12 24 36 48 60 SE +/- 0.34, N = 12 48.99 54.49 53.46 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024 a b c 8 16 24 32 40 SE +/- 0.38, N = 3 35.93 36.41 35.66 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128 a b c 4 8 12 16 20 SE +/- 0.14, N = 3 13.69 13.85 13.89
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Handwritten English Recognition FP16-INT8 - Device: CPU a b c 9 18 27 36 45 SE +/- 0.33, N = 9 39.27 35.29 35.90 MIN: 19.02 / MAX: 80 MIN: 23.48 / MAX: 69.74 MIN: 22.3 / MAX: 71.26 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Handwritten English Recognition FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Handwritten English Recognition FP16-INT8 - Device: CPU a b c 60 120 180 240 300 SE +/- 2.11, N = 9 254.39 282.93 278.10 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU a b c 6 12 18 24 30 SE +/- 0.22, N = 7 23.49 22.72 21.89 MIN: 12 / MAX: 52.69 MIN: 17.07 / MAX: 30.25 MIN: 16.13 / MAX: 43.35 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Road Segmentation ADAS FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU a b c 40 80 120 160 200 SE +/- 1.58, N = 7 170.05 175.74 182.39 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024 a b c 4K 8K 12K 16K 20K SE +/- 0.00, N = 3 16384 16384 16384
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Token a b c 30 60 90 120 150 102.99 112.38 105.93
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU - Time To First Token a b c 50 100 150 200 250 204.63 215.32 204.71
OpenVINO GenAI Model: Gemma-7b-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Gemma-7b-int4-ov - Device: CPU a b c 3 6 9 12 15 9.71 8.90 9.44
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048 a b c 7K 14K 21K 28K 35K SE +/- 0.00, N = 3 32768 32768 32768
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128 a b c 6 12 18 24 30 SE +/- 0.22, N = 3 26.24 26.44 26.31
Llama.cpp Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128 a b c 3 6 9 12 15 SE +/- 0.06, N = 3 10.23 10.28 10.31 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llama.cpp Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128 a b c 3 6 9 12 15 SE +/- 0.04, N = 3 10.81 10.83 10.83 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Face Detection FP16 - Device: CPU a b c 200 400 600 800 1000 SE +/- 4.05, N = 3 865.46 867.81 882.69 MIN: 681.88 / MAX: 935.79 MIN: 432.97 / MAX: 929.79 MIN: 811.64 / MAX: 914.61 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Face Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Face Detection FP16 - Device: CPU a b c 1.0373 2.0746 3.1119 4.1492 5.1865 SE +/- 0.02, N = 3 4.61 4.59 4.52 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Face Detection FP16-INT8 - Device: CPU a b c 110 220 330 440 550 SE +/- 4.94, N = 3 467.21 449.00 497.60 MIN: 389.62 / MAX: 499.44 MIN: 400.92 / MAX: 475.03 MIN: 415.79 / MAX: 524.4 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Face Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Face Detection FP16-INT8 - Device: CPU a b c 2 4 6 8 10 SE +/- 0.09, N = 3 8.54 8.89 8.03 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Person Detection FP16 - Device: CPU a b c 20 40 60 80 100 SE +/- 0.10, N = 3 90.96 91.66 91.40 MIN: 45.37 / MAX: 129.41 MIN: 70.22 / MAX: 107.21 MIN: 73 / MAX: 111.22 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Person Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Person Detection FP16 - Device: CPU a b c 10 20 30 40 50 SE +/- 0.04, N = 3 43.95 43.60 43.73 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Person Detection FP32 - Device: CPU a b c 20 40 60 80 100 SE +/- 0.24, N = 3 90.68 91.14 92.28 MIN: 42.08 / MAX: 127.52 MIN: 74.95 / MAX: 112.27 MIN: 71.08 / MAX: 120.33 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Person Detection FP32 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Person Detection FP32 - Device: CPU a b c 10 20 30 40 50 SE +/- 0.12, N = 3 44.07 43.86 43.31 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Person Re-Identification Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Person Re-Identification Retail FP16 - Device: CPU a b c 2 4 6 8 10 SE +/- 0.06, N = 3 7.37 7.15 7.11 MIN: 3.58 / MAX: 36.29 MIN: 4.45 / MAX: 34.36 MIN: 4.3 / MAX: 30.31 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Person Re-Identification Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Person Re-Identification Retail FP16 - Device: CPU a b c 120 240 360 480 600 SE +/- 4.09, N = 3 540.95 557.70 561.16 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Vehicle Detection FP16-INT8 - Device: CPU a b c 2 4 6 8 10 SE +/- 0.06, N = 3 8.54 7.87 8.04 MIN: 4.72 / MAX: 37.1 MIN: 4.58 / MAX: 16.38 MIN: 4.77 / MAX: 36.82 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Vehicle Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Vehicle Detection FP16-INT8 - Device: CPU a b c 110 220 330 440 550 SE +/- 3.54, N = 3 467.17 507.16 496.31 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Face Detection Retail FP16-INT8 - Device: CPU a b c 2 4 6 8 10 SE +/- 0.02, N = 3 6.27 5.82 5.94 MIN: 2.63 / MAX: 36.96 MIN: 2.57 / MAX: 33.51 MIN: 2.83 / MAX: 34.92 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Face Detection Retail FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Face Detection Retail FP16-INT8 - Device: CPU a b c 400 800 1200 1600 2000 SE +/- 5.90, N = 3 1587.70 1712.41 1676.72 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Handwritten English Recognition FP16 - Device: CPU a b c 9 18 27 36 45 SE +/- 0.04, N = 3 40.66 37.40 39.51 MIN: 19.98 / MAX: 76.17 MIN: 23.82 / MAX: 72.21 MIN: 24.79 / MAX: 75.37 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Handwritten English Recognition FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Handwritten English Recognition FP16 - Device: CPU a b c 60 120 180 240 300 SE +/- 0.26, N = 3 245.49 266.88 252.75 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU a b c 0.1193 0.2386 0.3579 0.4772 0.5965 SE +/- 0.00, N = 3 0.53 0.49 0.50 MIN: 0.2 / MAX: 27.59 MIN: 0.21 / MAX: 7.16 MIN: 0.2 / MAX: 24.4 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU a b c 4K 8K 12K 16K 20K SE +/- 74.62, N = 3 17776.30 18962.53 18885.08 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Weld Porosity Detection FP16 - Device: CPU a b c 5 10 15 20 25 SE +/- 0.32, N = 3 22.37 20.71 21.76 MIN: 9.45 / MAX: 55.55 MIN: 9.11 / MAX: 43.48 MIN: 9.51 / MAX: 51.32 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Weld Porosity Detection FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Weld Porosity Detection FP16 - Device: CPU a b c 100 200 300 400 500 SE +/- 6.37, N = 3 446.41 481.79 458.73 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Weld Porosity Detection FP16-INT8 - Device: CPU a b c 3 6 9 12 15 SE +/- 0.13, N = 3 11.25 10.46 10.83 MIN: 4.28 / MAX: 41.13 MIN: 4.65 / MAX: 39.08 MIN: 5.05 / MAX: 41.09 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Weld Porosity Detection FP16-INT8 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Weld Porosity Detection FP16-INT8 - Device: CPU a b c 200 400 600 800 1000 SE +/- 10.32, N = 3 886.77 953.24 920.13 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Face Detection Retail FP16 - Device: CPU a b c 0.9518 1.9036 2.8554 3.8072 4.759 SE +/- 0.02, N = 3 4.19 3.85 4.23 MIN: 2.09 / MAX: 31 MIN: 2.03 / MAX: 29.88 MIN: 2.29 / MAX: 29.37 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Face Detection Retail FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Face Detection Retail FP16 - Device: CPU a b c 200 400 600 800 1000 SE +/- 5.37, N = 3 950.04 1032.88 939.90 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2024.5 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU a b c 0.1733 0.3466 0.5199 0.6932 0.8665 SE +/- 0.01, N = 3 0.77 0.75 0.74 MIN: 0.28 / MAX: 28.44 MIN: 0.28 / MAX: 25.7 MIN: 0.29 / MAX: 21.6 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU OpenBenchmarking.org FPS, More Is Better OpenVINO 2024.5 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU a b c 3K 6K 9K 12K 15K SE +/- 70.01, N = 3 12337.97 12742.09 12864.63 1. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Token a b c 20 40 60 80 100 70.17 83.28 71.15
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Token a b c 40 80 120 160 200 146.00 181.63 148.94
OpenVINO GenAI Model: Falcon-7b-instruct-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Falcon-7b-instruct-int4-ov - Device: CPU a b c 4 8 12 16 20 14.25 12.01 14.06
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128 a b c 8 16 24 32 40 SE +/- 0.12, N = 3 33.98 33.50 33.65
Llamafile Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16 a b c 4 8 12 16 20 SE +/- 0.15, N = 13 13.95 11.99 12.17
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512 a b c 2K 4K 6K 8K 10K SE +/- 0.00, N = 3 8192 8192 8192
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024 a b c 30 60 90 120 150 SE +/- 0.34, N = 3 147.47 142.57 147.12 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024 a b c 4K 8K 12K 16K 20K SE +/- 0.00, N = 3 16384 16384 16384
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16 a b c 6 12 18 24 30 SE +/- 0.28, N = 12 26.51 22.91 22.72
Llamafile Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256 a b c 900 1800 2700 3600 4500 SE +/- 0.00, N = 3 4096 4096 4096
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16 a b c 8 16 24 32 40 SE +/- 0.44, N = 15 34.27 29.99 30.14
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Token a b c 12 24 36 48 60 50.42 53.82 54.90
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Token a b c 20 40 60 80 100 93.70 100.02 103.28
OpenVINO GenAI Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU a b c 5 10 15 20 25 19.83 18.58 18.22
OpenVINO GenAI Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPU - Time Per Output Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPU - Time Per Output Token a b c 8 16 24 32 40 31.81 32.93 31.87
OpenVINO GenAI Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPU - Time To First Token OpenBenchmarking.org ms, Fewer Is Better OpenVINO GenAI 2024.5 Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPU - Time To First Token a b c 8 16 24 32 40 34.26 35.97 34.11
OpenVINO GenAI Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPU OpenBenchmarking.org tokens/s, More Is Better OpenVINO GenAI 2024.5 Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPU a b c 7 14 21 28 35 31.43 30.36 31.38
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512 a b c 30 60 90 120 150 SE +/- 1.19, N = 3 158.11 157.49 158.18 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512 a b c 2K 4K 6K 8K 10K SE +/- 0.00, N = 3 8192 8192 8192
Llama.cpp Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b4154 Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128 a b c 14 28 42 56 70 SE +/- 0.13, N = 3 61.76 60.43 61.36 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas
Llamafile Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.16 Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256 a b c 900 1800 2700 3600 4500 SE +/- 0.00, N = 3 4096 4096 4096
Phoronix Test Suite v10.8.5