AMD EPYC Turin AI/ML Tuning Guide

AMD EPYC 9655P following AMD tuning guide for AI/ML workloads - https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/58467_amd-epyc-9005-tg-bios-and-workload.pdf Benchmarks by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2411286-NE-AMDEPYCTU24
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
Stock
November 28
  5 Hours, 44 Minutes
AI/ML Tuning Recommendations
November 28
  6 Hours, 7 Minutes
Invert Behavior (Only Show Selected Data)
  5 Hours, 55 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC Turin AI/ML Tuning GuideOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads)Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS)AMD 1Ah12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Broadcom NetXtreme BCM5720 PCIeUbuntu 24.106.12.0-rc7-linux-pm-next-phx (x86_64)GNOME Shell 47.0X ServerGCC 14.2.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionAMD EPYC Turin AI/ML Tuning Guide BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 - Python 3.12.7- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Stock vs. AI/ML Tuning Recommendations ComparisonPhoronix Test SuiteBaseline+2.6%+2.6%+5.2%+5.2%+7.8%+7.8%10.3%10%6.5%6.4%6.4%6.3%6.2%6.2%6.2%6.1%6.1%5.8%5.6%5.4%5.3%5.1%5.1%4.9%4.9%4.7%4.7%4.7%4.7%4.6%4.5%4.5%4.2%4.2%3.6%3.3%3.3%3.3%3.3%3%2.9%2.8%2.7%2.7%2.6%2.5%2.5%2.4%2.3%2.2%2.1%2%2%4.1%3.3%2%W.P.D.F.I - CPUW.P.D.F.I - CPUP.V.B.D.F - CPUNASNet MobileCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.5P.V.B.D.F - CPUFP16MobileNetV3SmallF.D.R.F.I - CPUF.D.R.F.I - CPUD.B.s - CPUC.B.S.A - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.2IP Shapes 1D - CPUCPU - 512 - ResNet-152R.N.N.I - CPUV.D.F.I - CPUV.D.F.I - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.1CPU - 256 - ResNet-152R.N.N.T - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.5A.G.R.R.0.F - CPUCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.1R.S.A.F.I - CPUIP Shapes 3D - CPUR.S.A.F.I - CPUResNet101_DUC_HDC-12 - CPU - StandardA.G.R.R.0.F - CPUCPU - 512 - ResNet-50M.T.E.T.D.F - CPUM.T.E.T.D.F - CPUR.v.1.i - CPU - Standardggml-small.en - 2.S.o.t.USmallCPU - 256 - ResNet-50CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - T.G.1P.R.I.R.F - CPUFP16MobileNetV1P.R.I.R.F - CPUH.E.R.F.I - CPUH.E.R.F.I - CPUMobilenet FloatQS8MobileNetV2N.S.P.L.F - CPUN.S.P.L.F - CPUP.D.F - CPUP.D.F - CPUResNet101_DUC_HDC-12 - CPU - StandardR.v.1.i - CPU - StandardGemma-7b-int4-ov - CPU - T.T.F.TFalcon-7b-instruct-int4-ov - CPU - T.T.F.T5.6%Phi-3-mini-128k-instruct-int4-ov - CPU - T.T.F.T6.2%OpenVINOOpenVINOOpenVINOLiteRTLlama.cppOpenVINOXNNPACKOpenVINOOpenVINOoneDNNoneDNNLlama.cpponeDNNPyTorchoneDNNOpenVINOOpenVINOLlama.cppPyTorchoneDNNLlama.cppOpenVINOLlama.cppOpenVINOoneDNNOpenVINOONNX RuntimeOpenVINOPyTorchOpenVINOOpenVINOONNX RuntimeWhisper.cppWhisperfilePyTorchLlama.cppOpenVINOXNNPACKOpenVINOOpenVINOOpenVINOLiteRTXNNPACKOpenVINOOpenVINOOpenVINOOpenVINOONNX RuntimeONNX RuntimeOpenVINO GenAIOpenVINO GenAIOpenVINO GenAIStockAI/ML Tuning Recommendations

AMD EPYC Turin AI/ML Tuning Guideopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512openvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUonednn: Deconvolution Batch shapes_3d - CPUonednn: Convolution Batch Shapes Auto - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048onednn: IP Shapes 1D - CPUpytorch: CPU - 512 - ResNet-152onednn: Recurrent Neural Network Inference - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024pytorch: CPU - 256 - ResNet-152onednn: Recurrent Neural Network Training - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512openvino: Age Gender Recognition Retail 0013 FP16 - CPUllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024openvino: Road Segmentation ADAS FP16-INT8 - CPUonednn: IP Shapes 3D - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUpytorch: CPU - 512 - ResNet-50openvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUonnx: ResNet50 v1-12-int8 - CPU - Standardwhisper-cpp: ggml-small.en - 2016 State of the Unionwhisperfile: Smallpytorch: CPU - 256 - ResNet-50llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128openvino: Person Re-Identification Retail FP16 - CPUxnnpack: FP16MobileNetV1openvino: Person Re-Identification Retail FP16 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUlitert: Mobilenet Floatxnnpack: QS8MobileNetV2openvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUtensorflow: CPU - 512 - ResNet-50whisperfile: Mediumlitert: SqueezeNetxnnpack: FP32MobileNetV2xnnpack: FP32MobileNetV1tensorflow: CPU - 256 - ResNet-50openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPUwhisperfile: Tinyllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128whisper-cpp: ggml-medium.en - 2016 State of the Unionxnnpack: FP32MobileNetV3Smallllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048onednn: Deconvolution Batch shapes_1d - CPUxnnpack: FP16MobileNetV2llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048openvino-genai: Gemma-7b-int4-ov - CPUnumpy: openvino-genai: Falcon-7b-instruct-int4-ov - CPUlitert: Inception V4xnnpack: FP16MobileNetV3Smallonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardlitert: NASNet Mobilellama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512openvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Gemma-7b-int4-ov - CPU - Time To First Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First TokenStockAI/ML Tuning Recommendations6.7214035.767.0972.46720.9823587.023.940.7184840.341475144.530.53587420.60276.3795.778270.9996.9920.78425.71872.990.4597.1918.990.2655642517.89140497.3451.48852.8356.24276.076221.6291890.6717551.6192.824.53463410525.863559.8726.944438.521004213.686747.48716.2066.89231.18200.482927035.7492034539204.3355.6331.8856048.0745.84454.2879410488149.006.708979092306.5137.84885.5051.0143898.910966165.0946.097883.62341733737154.6026.4336.1319.6029.0717.9824.176.0915437.796.6677.057144.8425050.473.710.6770500.321833152.920.50735521.71262.5175.498691.00101.7721.79406.45376.410.43101.7118.160.2541232630.42146329.8053.34881.2254.43285.104214.6279288.0364053.1395.424.41451310800.123649.6326.284335.67981313.386891.64730.7365.56235.35197.240956926.8890624471207.3856.4631.4357348.7346.45449.7026210387150.296.654829028308.3238.00887.7551.1243824.910323158.5286.356263.50697689396155.7326.3135.4319.5630.7017.7125.66OpenBenchmarking.org

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock246810SE +/- 0.00, N = 3SE +/- 0.00, N = 36.096.72MIN: 2.21 / MAX: 23.86MIN: 2.22 / MAX: 22.181. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock3K6K9K12K15KSE +/- 14.35, N = 3SE +/- 9.10, N = 315437.7914035.761. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUAI/ML Tuning RecommendationsStock246810SE +/- 0.01, N = 3SE +/- 0.01, N = 36.667.09MIN: 3.65 / MAX: 22.09MIN: 4.15 / MAX: 20.161. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512AI/ML Tuning RecommendationsStock20406080100SE +/- 0.77, N = 5SE +/- 0.83, N = 377.0572.401. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUAI/ML Tuning RecommendationsStock15003000450060007500SE +/- 8.11, N = 3SE +/- 13.33, N = 37144.846720.981. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock5K10K15K20K25KSE +/- 17.84, N = 3SE +/- 17.66, N = 325050.4723587.021. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock0.88651.7732.65953.5464.4325SE +/- 0.00, N = 3SE +/- 0.01, N = 33.713.94MIN: 1.71 / MAX: 19.33MIN: 1.76 / MAX: 17.651. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUAI/ML Tuning RecommendationsStock0.16170.32340.48510.64680.8085SE +/- 0.000482, N = 9SE +/- 0.001205, N = 90.6770500.718484MIN: 0.58MIN: 0.621. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUAI/ML Tuning RecommendationsStock0.07680.15360.23040.30720.384SE +/- 0.001024, N = 7SE +/- 0.000295, N = 70.3218330.341475MIN: 0.31MIN: 0.321. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048AI/ML Tuning RecommendationsStock306090120150SE +/- 1.38, N = 3SE +/- 0.97, N = 3152.92144.531. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUAI/ML Tuning RecommendationsStock0.12060.24120.36180.48240.603SE +/- 0.001097, N = 4SE +/- 0.001151, N = 40.5073550.535874MIN: 0.46MIN: 0.491. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-152AI/ML Tuning RecommendationsStock510152025SE +/- 0.18, N = 3SE +/- 0.10, N = 321.7120.60MIN: 20.13 / MAX: 22.23MIN: 19.3 / MAX: 21.02

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUAI/ML Tuning RecommendationsStock60120180240300SE +/- 0.27, N = 3SE +/- 0.68, N = 3262.52276.38MIN: 257.81MIN: 269.71. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock1.29832.59663.89495.19326.4915SE +/- 0.01, N = 3SE +/- 0.00, N = 35.495.77MIN: 2.47 / MAX: 19.56MIN: 2.28 / MAX: 21.361. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 7.22, N = 3SE +/- 3.96, N = 38691.008270.991. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024AI/ML Tuning RecommendationsStock20406080100SE +/- 1.16, N = 3SE +/- 0.34, N = 3101.7796.991. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-152AI/ML Tuning RecommendationsStock510152025SE +/- 0.27, N = 3SE +/- 0.04, N = 321.7920.78MIN: 20.38 / MAX: 22.54MIN: 19.72 / MAX: 21.04

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUAI/ML Tuning RecommendationsStock90180270360450SE +/- 0.37, N = 3SE +/- 0.52, N = 3406.45425.72MIN: 400.13MIN: 419.471. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512AI/ML Tuning RecommendationsStock20406080100SE +/- 0.99, N = 3SE +/- 0.98, N = 376.4172.991. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUAI/ML Tuning RecommendationsStock0.10130.20260.30390.40520.5065SE +/- 0.00, N = 3SE +/- 0.01, N = 30.430.45MIN: 0.15 / MAX: 23.94MIN: 0.16 / MAX: 25.141. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024AI/ML Tuning RecommendationsStock20406080100SE +/- 0.67, N = 15SE +/- 1.13, N = 4101.7197.191. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock510152025SE +/- 0.02, N = 3SE +/- 0.06, N = 318.1618.99MIN: 7.68 / MAX: 40.38MIN: 9.19 / MAX: 391. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUAI/ML Tuning RecommendationsStock0.05980.11960.17940.23920.299SE +/- 0.000491, N = 5SE +/- 0.000944, N = 50.2541230.265564MIN: 0.24MIN: 0.241. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock6001200180024003000SE +/- 2.42, N = 3SE +/- 7.45, N = 32630.422517.891. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUAI/ML Tuning RecommendationsStock30K60K90K120K150KSE +/- 528.15, N = 3SE +/- 313.81, N = 3146329.80140497.341. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-50AI/ML Tuning RecommendationsStock1224364860SE +/- 0.26, N = 3SE +/- 0.15, N = 353.3451.48MIN: 49 / MAX: 54.59MIN: 46.04 / MAX: 52.41

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUAI/ML Tuning RecommendationsStock2004006008001000SE +/- 0.40, N = 3SE +/- 0.22, N = 3881.22852.831. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUAI/ML Tuning RecommendationsStock1326395265SE +/- 0.03, N = 3SE +/- 0.02, N = 354.4356.24MIN: 28.33 / MAX: 92.29MIN: 29.3 / MAX: 94.691. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: StandardAI/ML Tuning RecommendationsStock60120180240300SE +/- 1.46, N = 3SE +/- 2.53, N = 7285.10276.081. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-small.en - Input: 2016 State of the UnionAI/ML Tuning RecommendationsStock50100150200250SE +/- 0.68, N = 3SE +/- 2.33, N = 3214.63221.631. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: SmallAI/ML Tuning RecommendationsStock20406080100SE +/- 0.57, N = 3SE +/- 0.71, N = 388.0490.67

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-50AI/ML Tuning RecommendationsStock1224364860SE +/- 0.28, N = 3SE +/- 0.15, N = 353.1351.61MIN: 46.57 / MAX: 54.35MIN: 45.56 / MAX: 52.57

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128AI/ML Tuning RecommendationsStock20406080100SE +/- 0.43, N = 6SE +/- 0.49, N = 695.4292.821. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUAI/ML Tuning RecommendationsStock1.01932.03863.05794.07725.0965SE +/- 0.00, N = 3SE +/- 0.01, N = 34.414.53MIN: 2.45 / MAX: 17.46MIN: 1.95 / MAX: 23.941. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1AI/ML Tuning RecommendationsStock10002000300040005000SE +/- 10.73, N = 3SE +/- 15.14, N = 3451346341. (CXX) g++ options: -O3 -lrt -lm

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUAI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 6.08, N = 3SE +/- 12.77, N = 310800.1210525.861. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock8001600240032004000SE +/- 3.35, N = 3SE +/- 2.96, N = 33649.633559.871. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock612182430SE +/- 0.02, N = 3SE +/- 0.02, N = 326.2826.94MIN: 15.86 / MAX: 40.89MIN: 15.65 / MAX: 45.151. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

LiteRT

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet FloatAI/ML Tuning RecommendationsStock10002000300040005000SE +/- 7.66, N = 3SE +/- 11.59, N = 34335.674438.52

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2AI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 87.21, N = 3SE +/- 154.48, N = 39813100421. (CXX) g++ options: -O3 -lrt -lm

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUAI/ML Tuning RecommendationsStock48121620SE +/- 0.02, N = 3SE +/- 0.02, N = 313.3813.68MIN: 6.98 / MAX: 34.62MIN: 7.09 / MAX: 36.011. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUAI/ML Tuning RecommendationsStock15003000450060007500SE +/- 9.90, N = 3SE +/- 11.12, N = 36891.646747.481. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUAI/ML Tuning RecommendationsStock160320480640800SE +/- 0.74, N = 3SE +/- 0.66, N = 3730.73716.201. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUAI/ML Tuning RecommendationsStock1530456075SE +/- 0.07, N = 3SE +/- 0.06, N = 365.5666.89MIN: 32.6 / MAX: 131.97MIN: 34.58 / MAX: 1301. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50AI/ML Tuning RecommendationsStock50100150200250SE +/- 0.20, N = 3SE +/- 0.28, N = 3235.35231.18

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: MediumAI/ML Tuning RecommendationsStock4080120160200SE +/- 0.71, N = 3SE +/- 0.88, N = 3197.24200.48

LiteRT

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: SqueezeNetAI/ML Tuning RecommendationsStock15003000450060007500SE +/- 31.55, N = 3SE +/- 31.31, N = 36926.887035.74

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2AI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 25.31, N = 3SE +/- 32.13, N = 3906292031. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1AI/ML Tuning RecommendationsStock10002000300040005000SE +/- 54.67, N = 3SE +/- 42.62, N = 3447145391. (CXX) g++ options: -O3 -lrt -lm

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: ResNet-50AI/ML Tuning RecommendationsStock50100150200250SE +/- 0.72, N = 3SE +/- 0.57, N = 3207.38204.33

OpenVINO GenAI

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPUAI/ML Tuning RecommendationsStock1326395265SE +/- 0.14, N = 4SE +/- 0.16, N = 456.4655.63

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: TinyAI/ML Tuning RecommendationsStock714212835SE +/- 0.25, N = 3SE +/- 0.26, N = 331.4431.89

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128AI/ML Tuning RecommendationsStock1122334455SE +/- 0.05, N = 4SE +/- 0.07, N = 448.7348.071. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128AI/ML Tuning RecommendationsStock1122334455SE +/- 0.09, N = 4SE +/- 0.05, N = 446.4545.841. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-medium.en - Input: 2016 State of the UnionAI/ML Tuning RecommendationsStock100200300400500SE +/- 1.24, N = 3SE +/- 1.45, N = 3449.70454.291. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3SmallAI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 28.22, N = 3SE +/- 11.35, N = 310387104881. (CXX) g++ options: -O3 -lrt -lm

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048AI/ML Tuning RecommendationsStock306090120150SE +/- 1.36, N = 15SE +/- 2.06, N = 3150.29149.001. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUAI/ML Tuning RecommendationsStock246810SE +/- 0.01789, N = 3SE +/- 0.03430, N = 36.654826.70897MIN: 3.91MIN: 6.071. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2AI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 94.57, N = 3SE +/- 89.20, N = 3902890921. (CXX) g++ options: -O3 -lrt -lm

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048AI/ML Tuning RecommendationsStock70140210280350SE +/- 2.23, N = 15SE +/- 2.62, N = 3308.32306.511. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO GenAI

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPUAI/ML Tuning RecommendationsStock918273645SE +/- 0.09, N = 3SE +/- 0.13, N = 338.0037.84

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkAI/ML Tuning RecommendationsStock2004006008001000SE +/- 0.72, N = 3SE +/- 1.94, N = 3887.75885.50

OpenVINO GenAI

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPUAI/ML Tuning RecommendationsStock1224364860SE +/- 0.19, N = 3SE +/- 0.09, N = 351.1251.01

LiteRT

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception V4AI/ML Tuning RecommendationsStock9K18K27K36K45KSE +/- 159.42, N = 3SE +/- 47.06, N = 343824.943898.9

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3SmallAI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 21.33, N = 3SE +/- 410.82, N = 310323109661. (CXX) g++ options: -O3 -lrt -lm

ONNX Runtime

MinAvgMaxAI/ML Tuning Recommendations98.5460.1509.9Stock98.9429.9477.0OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.19System Power Consumption Monitor130260390520650

MinAvgMaxAI/ML Tuning Recommendations77.0270.8300.9Stock3.0248.9279.9OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.19CPU Power Consumption Monitor80160240320400

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAI/ML Tuning RecommendationsStock4080120160200SE +/- 3.81, N = 15SE +/- 3.66, N = 15158.53165.091. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAI/ML Tuning RecommendationsStock246810SE +/- 0.14383, N = 15SE +/- 0.13191, N = 156.356266.097881. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

LiteRT

MinAvgMaxAI/ML Tuning Recommendations97.7390.9414.6Stock97.1339.2364.8OpenBenchmarking.orgWatts, Fewer Is BetterLiteRT 2024-10-15System Power Consumption Monitor110220330440550

MinAvgMaxAI/ML Tuning Recommendations99.2261.1279.6Stock3.7221.3238.5OpenBenchmarking.orgWatts, Fewer Is BetterLiteRT 2024-10-15CPU Power Consumption Monitor70140210280350

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: NASNet MobileAI/ML Tuning RecommendationsStock160K320K480K640K800KSE +/- 22050.37, N = 12SE +/- 17324.48, N = 15689396733737

Llama.cpp

MinAvgMaxAI/ML Tuning Recommendations97.3433.6515.3Stock97.9371.5450.0OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154System Power Consumption Monitor130260390520650

MinAvgMaxAI/ML Tuning Recommendations97.3292.2349.7Stock14.5244.4300.1OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor100200300400500

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512AI/ML Tuning RecommendationsStock306090120150SE +/- 3.23, N = 12SE +/- 2.60, N = 12155.73154.601. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

75 Results Shown

OpenVINO:
  Weld Porosity Detection FP16-INT8 - CPU:
    ms
    FPS
  Person Vehicle Bike Detection FP16 - CPU:
    ms
Llama.cpp
OpenVINO:
  Person Vehicle Bike Detection FP16 - CPU
  Face Detection Retail FP16-INT8 - CPU
  Face Detection Retail FP16-INT8 - CPU
oneDNN:
  Deconvolution Batch shapes_3d - CPU
  Convolution Batch Shapes Auto - CPU
Llama.cpp
oneDNN
PyTorch
oneDNN
OpenVINO:
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
Llama.cpp
PyTorch
oneDNN
Llama.cpp
OpenVINO
Llama.cpp
OpenVINO
oneDNN
OpenVINO:
  Road Segmentation ADAS FP16-INT8 - CPU
  Age Gender Recognition Retail 0013 FP16 - CPU
PyTorch
OpenVINO:
  Machine Translation EN To DE FP16 - CPU:
    FPS
    ms
ONNX Runtime
Whisper.cpp
Whisperfile
PyTorch
Llama.cpp
OpenVINO
XNNPACK
OpenVINO:
  Person Re-Identification Retail FP16 - CPU
  Handwritten English Recognition FP16-INT8 - CPU
  Handwritten English Recognition FP16-INT8 - CPU
LiteRT
XNNPACK
OpenVINO:
  Noise Suppression Poconet-Like FP16 - CPU:
    ms
    FPS
  Person Detection FP16 - CPU:
    FPS
    ms
TensorFlow
Whisperfile
LiteRT
XNNPACK:
  FP32MobileNetV2
  FP32MobileNetV1
TensorFlow
OpenVINO GenAI
Whisperfile
Llama.cpp:
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128
Whisper.cpp
XNNPACK
Llama.cpp
oneDNN
XNNPACK
Llama.cpp
OpenVINO GenAI
Numpy Benchmark
OpenVINO GenAI
LiteRT
XNNPACK
ONNX Runtime:
  System Power Consumption Monitor
  CPU Power Consumption Monitor
ONNX Runtime:
  ResNet101_DUC_HDC-12 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
LiteRT:
  System Power Consumption Monitor
  CPU Power Consumption Monitor
LiteRT
Llama.cpp:
  System Power Consumption Monitor
  CPU Power Consumption Monitor
Llama.cpp