dddda

AMD Ryzen AI 9 365 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and AMD Radeon 512MB on Ubuntu 24.10 via the Phoronix Test Suite.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2412110-NE-DDDDA678639
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
a
December 11
  8 Hours, 8 Minutes
b
December 11
  1 Hour, 55 Minutes
c
December 11
  1 Hour, 54 Minutes
Invert Behavior (Only Show Selected Data)
  3 Hours, 59 Minutes

Only show results where is faster than
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


ddddaOpenBenchmarking.orgPhoronix Test SuiteAMD Ryzen AI 9 365 @ 4.31GHz (10 Cores / 20 Threads)ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS)AMD Device 15074 x 6GB LPDDR5-7500MT/s Micron MT62F1536M32D4DS-0261024GB MTFDKBA1T0QFM-1BD1AABGBAMD Radeon 512MBAMD Rembrandt Radeon HD AudioMEDIATEK Device 7925Ubuntu 24.106.12.0-rc7-phx-eraps (x86_64)GNOME Shell 47.0X Server + Wayland4.6 Mesa 24.2.3-1ubuntu1 (LLVM 19.1.0 DRM 3.59)GCC 14.2.0ext42880x1800ProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLCompilerFile-SystemScreen ResolutionDddda BenchmarksSystem Logs- amdgpu.dcdebugmask=0x600 - Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced - Python 3.12.7- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; PBRSB-eIBRS: Not affected; BHI: Not affected; ERAPS hardware RSB flush + srbds: Not affected + tsx_async_abort: Not affected

abcResult OverviewPhoronix Test Suite100%102%105%107%110%OpenVINO GenAIOpenVINOLlamafileLlama.cpp

ddddallamafile: wizardcoder-python-34b-v1.0.Q6_K - Text Generation 16llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 2048openvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Road Segmentation ADAS FP16 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024openvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 128openvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUllamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 1024openvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Gemma-7b-int4-ov - CPU - Time To First Tokenopenvino-genai: Gemma-7b-int4-ov - CPUllamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 2048llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128openvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Handwritten English Recognition FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Face Detection Retail FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPUllamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128llamafile: mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 1024llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 1024llamafile: Llama-3.2-3B-Instruct.Q6_K - Text Generation 16llamafile: Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 256llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPUopenvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU - Time Per Output Tokenopenvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPU - Time To First Tokenopenvino-genai: TinyLlama-1.1B-Chat-v1.0 - CPUllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 512llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 256abc0.0833.2433.64126.063276816.30607.919.13437.2031.72126.0214.09283.7836.8336.8335.8981.6248.9935.9313.6939.27254.3923.49170.0516384102.99204.639.713276826.2410.2310.81865.464.61467.218.5490.9643.9590.6844.077.37540.958.54467.176.271587.7040.66245.490.5317776.3022.37446.4111.25886.774.19950.040.7712337.9770.1714614.2533.9813.958192147.471638426.51409634.2750.4293.719.8331.8134.2631.43158.11819261.7640960.0734.1934.47153.713276814.88664.338.28481.8929.22136.712.7314.2540.7440.136.173.3554.4936.4113.8535.29282.9322.72175.7416384112.38215.328.93276826.4410.2810.83867.814.594498.8991.6643.691.1443.867.15557.77.87507.165.821712.4137.4266.880.4918962.5320.71481.7910.46953.243.851032.880.7512742.0983.28181.6312.0133.511.998192142.571638422.91409629.9953.82100.0218.5832.9335.9730.36157.49819260.4340960.0733.6533.81137.253276814.96661.168.4474.8529.99133.1812.78312.3238.938.4536.1974.7753.4635.6613.8935.9278.121.89182.3916384105.93204.719.443276826.3110.3110.83882.694.52497.68.0391.443.7392.2843.317.11561.168.04496.315.941676.7239.51252.750.518885.0821.76458.7310.83920.134.23939.90.7412864.6371.15148.9414.0633.6512.178192147.121638422.72409630.1454.9103.2818.2231.8734.1131.38158.18819261.364096OpenBenchmarking.org

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: wizardcoder-python-34b-v1.0.Q6_K - Test: Text Generation 16cba0.0180.0360.0540.0720.09SE +/- 0.00, N = 30.070.070.08

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048cba816243240SE +/- 0.06, N = 333.6534.1933.241. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048cba816243240SE +/- 0.15, N = 333.8134.4733.641. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048cba306090120150SE +/- 1.74, N = 15137.25153.71126.061. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 2048cba7K14K21K28K35KSE +/- 0.00, N = 3327683276832768

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUcba48121620SE +/- 0.12, N = 1514.9614.8816.30MIN: 9.32 / MAX: 24.49MIN: 9.08 / MAX: 23.51MIN: 8.05 / MAX: 48.271. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUcba140280420560700SE +/- 4.56, N = 15661.16664.33607.911. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUcba3691215SE +/- 0.06, N = 158.408.289.13MIN: 5.46 / MAX: 31.1MIN: 5.61 / MAX: 29.51MIN: 4.67 / MAX: 38.311. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUcba100200300400500SE +/- 2.99, N = 15474.85481.89437.201. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16 - Device: CPUcba714212835SE +/- 0.21, N = 1529.9929.2231.72MIN: 14.48 / MAX: 36.5MIN: 22.58 / MAX: 35.18MIN: 14.27 / MAX: 66.491. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16 - Device: CPUcba306090120150SE +/- 0.86, N = 15133.18136.70126.021. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16 - Device: CPUcba48121620SE +/- 0.15, N = 1512.7812.7014.09MIN: 9.47 / MAX: 20.55MIN: 6.35 / MAX: 22.55MIN: 6.67 / MAX: 44.251. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16 - Device: CPUcba70140210280350SE +/- 3.12, N = 15312.32314.25283.781. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512cba918273645SE +/- 0.29, N = 1038.9040.7436.831. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512cba918273645SE +/- 0.29, N = 938.4540.1036.831. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024cba816243240SE +/- 0.50, N = 336.1936.1035.891. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUcba20406080100SE +/- 0.57, N = 1274.7773.3581.62MIN: 50.72 / MAX: 91.82MIN: 56.95 / MAX: 89.78MIN: 51.47 / MAX: 126.221. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUcba1224364860SE +/- 0.34, N = 1253.4654.4948.991. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024cba816243240SE +/- 0.38, N = 335.6636.4135.931. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 128cba48121620SE +/- 0.14, N = 313.8913.8513.69

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16-INT8 - Device: CPUcba918273645SE +/- 0.33, N = 935.9035.2939.27MIN: 22.3 / MAX: 71.26MIN: 23.48 / MAX: 69.74MIN: 19.02 / MAX: 801. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16-INT8 - Device: CPUcba60120180240300SE +/- 2.11, N = 9278.10282.93254.391. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16-INT8 - Device: CPUcba612182430SE +/- 0.22, N = 721.8922.7223.49MIN: 16.13 / MAX: 43.35MIN: 17.07 / MAX: 30.25MIN: 12 / MAX: 52.691. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16-INT8 - Device: CPUcba4080120160200SE +/- 1.58, N = 7182.39175.74170.051. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 1024cba4K8K12K16K20KSE +/- 0.00, N = 3163841638416384

OpenVINO GenAI

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output Tokencba306090120150105.93112.38102.99

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPU - Time To First Tokencba50100150200250204.71215.32204.63

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPUcba36912159.448.909.71

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 2048cba7K14K21K28K35KSE +/- 0.00, N = 3327683276832768

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 128cba612182430SE +/- 0.22, N = 326.3126.4426.24

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128cba3691215SE +/- 0.06, N = 310.3110.2810.231. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128cba3691215SE +/- 0.04, N = 310.8310.8310.811. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection FP16 - Device: CPUcba2004006008001000SE +/- 4.05, N = 3882.69867.81865.46MIN: 811.64 / MAX: 914.61MIN: 432.97 / MAX: 929.79MIN: 681.88 / MAX: 935.791. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection FP16 - Device: CPUcba1.03732.07463.11194.14925.1865SE +/- 0.02, N = 34.524.594.611. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection FP16-INT8 - Device: CPUcba110220330440550SE +/- 4.94, N = 3497.60449.00467.21MIN: 415.79 / MAX: 524.4MIN: 400.92 / MAX: 475.03MIN: 389.62 / MAX: 499.441. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection FP16-INT8 - Device: CPUcba246810SE +/- 0.09, N = 38.038.898.541. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUcba20406080100SE +/- 0.10, N = 391.4091.6690.96MIN: 73 / MAX: 111.22MIN: 70.22 / MAX: 107.21MIN: 45.37 / MAX: 129.411. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUcba1020304050SE +/- 0.04, N = 343.7343.6043.951. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Detection FP32 - Device: CPUcba20406080100SE +/- 0.24, N = 392.2891.1490.68MIN: 71.08 / MAX: 120.33MIN: 74.95 / MAX: 112.27MIN: 42.08 / MAX: 127.521. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Detection FP32 - Device: CPUcba1020304050SE +/- 0.12, N = 343.3143.8644.071. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUcba246810SE +/- 0.06, N = 37.117.157.37MIN: 4.3 / MAX: 30.31MIN: 4.45 / MAX: 34.36MIN: 3.58 / MAX: 36.291. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUcba120240360480600SE +/- 4.09, N = 3561.16557.70540.951. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16-INT8 - Device: CPUcba246810SE +/- 0.06, N = 38.047.878.54MIN: 4.77 / MAX: 36.82MIN: 4.58 / MAX: 16.38MIN: 4.72 / MAX: 37.11. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16-INT8 - Device: CPUcba110220330440550SE +/- 3.54, N = 3496.31507.16467.171. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16-INT8 - Device: CPUcba246810SE +/- 0.02, N = 35.945.826.27MIN: 2.83 / MAX: 34.92MIN: 2.57 / MAX: 33.51MIN: 2.63 / MAX: 36.961. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16-INT8 - Device: CPUcba400800120016002000SE +/- 5.90, N = 31676.721712.411587.701. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16 - Device: CPUcba918273645SE +/- 0.04, N = 339.5137.4040.66MIN: 24.79 / MAX: 75.37MIN: 23.82 / MAX: 72.21MIN: 19.98 / MAX: 76.171. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16 - Device: CPUcba60120180240300SE +/- 0.26, N = 3252.75266.88245.491. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUcba0.11930.23860.35790.47720.5965SE +/- 0.00, N = 30.500.490.53MIN: 0.2 / MAX: 24.4MIN: 0.21 / MAX: 7.16MIN: 0.2 / MAX: 27.591. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUcba4K8K12K16K20KSE +/- 74.62, N = 318885.0818962.5317776.301. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16 - Device: CPUcba510152025SE +/- 0.32, N = 321.7620.7122.37MIN: 9.51 / MAX: 51.32MIN: 9.11 / MAX: 43.48MIN: 9.45 / MAX: 55.551. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16 - Device: CPUcba100200300400500SE +/- 6.37, N = 3458.73481.79446.411. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16-INT8 - Device: CPUcba3691215SE +/- 0.13, N = 310.8310.4611.25MIN: 5.05 / MAX: 41.09MIN: 4.65 / MAX: 39.08MIN: 4.28 / MAX: 41.131. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16-INT8 - Device: CPUcba2004006008001000SE +/- 10.32, N = 3920.13953.24886.771. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16 - Device: CPUcba0.95181.90362.85543.80724.759SE +/- 0.02, N = 34.233.854.19MIN: 2.29 / MAX: 29.37MIN: 2.03 / MAX: 29.88MIN: 2.09 / MAX: 311. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16 - Device: CPUcba2004006008001000SE +/- 5.37, N = 3939.901032.88950.041. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUcba0.17330.34660.51990.69320.8665SE +/- 0.01, N = 30.740.750.77MIN: 0.29 / MAX: 21.6MIN: 0.28 / MAX: 25.7MIN: 0.28 / MAX: 28.441. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUcba3K6K9K12K15KSE +/- 70.01, N = 312864.6312742.0912337.971. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenVINO GenAI

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output Tokencba2040608010071.1583.2870.17

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First Tokencba4080120160200148.94181.63146.00

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPUcba4812162014.0612.0114.25

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 128cba816243240SE +/- 0.12, N = 333.6533.5033.98

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: mistral-7b-instruct-v0.2.Q5_K_M - Test: Text Generation 16cba48121620SE +/- 0.15, N = 1312.1711.9913.95

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 512cba2K4K6K8K10KSE +/- 0.00, N = 3819281928192

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 1024cba306090120150SE +/- 0.34, N = 3147.12142.57147.471. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 1024cba4K8K12K16K20KSE +/- 0.00, N = 3163841638416384

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Text Generation 16cba612182430SE +/- 0.28, N = 1222.7222.9126.51

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: Llama-3.2-3B-Instruct.Q6_K - Test: Prompt Processing 256cba9001800270036004500SE +/- 0.00, N = 3409640964096

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Text Generation 16cba816243240SE +/- 0.44, N = 1530.1429.9934.27

OpenVINO GenAI

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output Tokencba122436486054.9053.8250.42

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First Tokencba20406080100103.28100.0293.70

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPUcba51015202518.2218.5819.83

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPU - Time Per Output Tokencba81624324031.8732.9331.81

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPU - Time To First Tokencba81624324034.1135.9734.26

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: TinyLlama-1.1B-Chat-v1.0 - Device: CPUcba71421283531.3830.3631.43

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512cba306090120150SE +/- 1.19, N = 3158.18157.49158.111. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 512cba2K4K6K8K10KSE +/- 0.00, N = 3819281928192

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128cba1428425670SE +/- 0.13, N = 361.3660.4361.761. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Llamafile

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.8.16Model: TinyLlama-1.1B-Chat-v1.0.BF16 - Test: Prompt Processing 256cba9001800270036004500SE +/- 0.00, N = 3409640964096

79 Results Shown

Llamafile
Llama.cpp:
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048
Llamafile
OpenVINO:
  Noise Suppression Poconet-Like FP16 - CPU:
    ms
    FPS
  Person Vehicle Bike Detection FP16 - CPU:
    ms
    FPS
  Road Segmentation ADAS FP16 - CPU:
    ms
    FPS
  Vehicle Detection FP16 - CPU:
    ms
    FPS
Llama.cpp:
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024
OpenVINO:
  Machine Translation EN To DE FP16 - CPU:
    ms
    FPS
Llama.cpp
Llamafile
OpenVINO:
  Handwritten English Recognition FP16-INT8 - CPU:
    ms
    FPS
  Road Segmentation ADAS FP16-INT8 - CPU:
    ms
    FPS
Llamafile
OpenVINO GenAI:
  Gemma-7b-int4-ov - CPU - Time Per Output Token
  Gemma-7b-int4-ov - CPU - Time To First Token
  Gemma-7b-int4-ov - CPU
Llamafile:
  TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 2048
  Llama-3.2-3B-Instruct.Q6_K - Text Generation 128
Llama.cpp:
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128
OpenVINO:
  Face Detection FP16 - CPU:
    ms
    FPS
  Face Detection FP16-INT8 - CPU:
    ms
    FPS
  Person Detection FP16 - CPU:
    ms
    FPS
  Person Detection FP32 - CPU:
    ms
    FPS
  Person Re-Identification Retail FP16 - CPU:
    ms
    FPS
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
  Face Detection Retail FP16-INT8 - CPU:
    ms
    FPS
  Handwritten English Recognition FP16 - CPU:
    ms
    FPS
  Age Gender Recognition Retail 0013 FP16-INT8 - CPU:
    ms
    FPS
  Weld Porosity Detection FP16 - CPU:
    ms
    FPS
  Weld Porosity Detection FP16-INT8 - CPU:
    ms
    FPS
  Face Detection Retail FP16 - CPU:
    ms
    FPS
  Age Gender Recognition Retail 0013 FP16 - CPU:
    ms
    FPS
OpenVINO GenAI:
  Falcon-7b-instruct-int4-ov - CPU - Time Per Output Token
  Falcon-7b-instruct-int4-ov - CPU - Time To First Token
  Falcon-7b-instruct-int4-ov - CPU
Llamafile:
  TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 128
  mistral-7b-instruct-v0.2.Q5_K_M - Text Generation 16
  Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 512
Llama.cpp
Llamafile:
  TinyLlama-1.1B-Chat-v1.0.BF16 - Prompt Processing 1024
  Llama-3.2-3B-Instruct.Q6_K - Text Generation 16
  Llama-3.2-3B-Instruct.Q6_K - Prompt Processing 256
  TinyLlama-1.1B-Chat-v1.0.BF16 - Text Generation 16
OpenVINO GenAI:
  Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Token
  Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Token
  Phi-3-mini-128k-instruct-int4-ov - CPU
  TinyLlama-1.1B-Chat-v1.0 - CPU - Time Per Output Token
  TinyLlama-1.1B-Chat-v1.0 - CPU - Time To First Token
  TinyLlama-1.1B-Chat-v1.0 - CPU
Llama.cpp
Llamafile
Llama.cpp
Llamafile