AMD EPYC Turin AI/ML Tuning Guide

AMD EPYC 9655P following AMD tuning guide for AI/ML workloads - https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/58467_amd-epyc-9005-tg-bios-and-workload.pdf Benchmarks by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2411286-NE-AMDEPYCTU24
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
Stock
November 28 2024
  5 Hours, 44 Minutes
AI/ML Tuning Recommendations
November 28 2024
  6 Hours, 7 Minutes
Invert Behavior (Only Show Selected Data)
  5 Hours, 55 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC Turin AI/ML Tuning GuideOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads)Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS)AMD 1Ah12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Broadcom NetXtreme BCM5720 PCIeUbuntu 24.106.12.0-rc7-linux-pm-next-phx (x86_64)GNOME Shell 47.0X ServerGCC 14.2.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionAMD EPYC Turin AI/ML Tuning Guide BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 - Python 3.12.7- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Stock vs. AI/ML Tuning Recommendations ComparisonPhoronix Test SuiteBaseline+2.6%+2.6%+5.2%+5.2%+7.8%+7.8%10.3%10%6.5%6.4%6.4%6.3%6.2%6.2%6.2%6.1%6.1%5.8%5.6%5.4%5.3%5.1%5.1%4.9%4.9%4.7%4.7%4.7%4.7%4.6%4.5%4.5%4.2%4.2%3.6%3.3%3.3%3.3%3.3%3%2.9%2.8%2.7%2.7%2.6%2.5%2.5%2.4%2.3%2.2%2.1%2%2%4.1%3.3%2%W.P.D.F.I - CPUW.P.D.F.I - CPUP.V.B.D.F - CPUNASNet MobileCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.5P.V.B.D.F - CPUFP16MobileNetV3SmallF.D.R.F.I - CPUF.D.R.F.I - CPUD.B.s - CPUC.B.S.A - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.2IP Shapes 1D - CPUCPU - 512 - ResNet-152R.N.N.I - CPUV.D.F.I - CPUV.D.F.I - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.1CPU - 256 - ResNet-152R.N.N.T - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.5A.G.R.R.0.F - CPUCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.1R.S.A.F.I - CPUIP Shapes 3D - CPUR.S.A.F.I - CPUResNet101_DUC_HDC-12 - CPU - StandardA.G.R.R.0.F - CPUCPU - 512 - ResNet-50M.T.E.T.D.F - CPUM.T.E.T.D.F - CPUR.v.1.i - CPU - Standardggml-small.en - 2.S.o.t.USmallCPU - 256 - ResNet-50CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - T.G.1P.R.I.R.F - CPUFP16MobileNetV1P.R.I.R.F - CPUH.E.R.F.I - CPUH.E.R.F.I - CPUMobilenet FloatQS8MobileNetV2N.S.P.L.F - CPUN.S.P.L.F - CPUP.D.F - CPUP.D.F - CPUResNet101_DUC_HDC-12 - CPU - StandardR.v.1.i - CPU - StandardGemma-7b-int4-ov - CPU - T.T.F.TFalcon-7b-instruct-int4-ov - CPU - T.T.F.T5.6%Phi-3-mini-128k-instruct-int4-ov - CPU - T.T.F.T6.2%OpenVINOOpenVINOOpenVINOLiteRTLlama.cppOpenVINOXNNPACKOpenVINOOpenVINOoneDNNoneDNNLlama.cpponeDNNPyTorchoneDNNOpenVINOOpenVINOLlama.cppPyTorchoneDNNLlama.cppOpenVINOLlama.cppOpenVINOoneDNNOpenVINOONNX RuntimeOpenVINOPyTorchOpenVINOOpenVINOONNX RuntimeWhisper.cppWhisperfilePyTorchLlama.cppOpenVINOXNNPACKOpenVINOOpenVINOOpenVINOLiteRTXNNPACKOpenVINOOpenVINOOpenVINOOpenVINOONNX RuntimeONNX RuntimeOpenVINO GenAIOpenVINO GenAIOpenVINO GenAIStockAI/ML Tuning Recommendations

AMD EPYC Turin AI/ML Tuning Guideopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512openvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUonednn: Deconvolution Batch shapes_3d - CPUonednn: Convolution Batch Shapes Auto - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048onednn: IP Shapes 1D - CPUpytorch: CPU - 512 - ResNet-152onednn: Recurrent Neural Network Inference - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024pytorch: CPU - 256 - ResNet-152onednn: Recurrent Neural Network Training - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512openvino: Age Gender Recognition Retail 0013 FP16 - CPUllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024openvino: Road Segmentation ADAS FP16-INT8 - CPUonednn: IP Shapes 3D - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUpytorch: CPU - 512 - ResNet-50openvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUonnx: ResNet50 v1-12-int8 - CPU - Standardwhisper-cpp: ggml-small.en - 2016 State of the Unionwhisperfile: Smallpytorch: CPU - 256 - ResNet-50llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128openvino: Person Re-Identification Retail FP16 - CPUxnnpack: FP16MobileNetV1openvino: Person Re-Identification Retail FP16 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUlitert: Mobilenet Floatxnnpack: QS8MobileNetV2openvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUtensorflow: CPU - 512 - ResNet-50whisperfile: Mediumlitert: SqueezeNetxnnpack: FP32MobileNetV2xnnpack: FP32MobileNetV1tensorflow: CPU - 256 - ResNet-50openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPUwhisperfile: Tinyllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128whisper-cpp: ggml-medium.en - 2016 State of the Unionxnnpack: FP32MobileNetV3Smallllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048onednn: Deconvolution Batch shapes_1d - CPUxnnpack: FP16MobileNetV2llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048openvino-genai: Gemma-7b-int4-ov - CPUnumpy: openvino-genai: Falcon-7b-instruct-int4-ov - CPUlitert: Inception V4xnnpack: FP16MobileNetV3Smallonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardlitert: NASNet Mobilellama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512openvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Gemma-7b-int4-ov - CPU - Time To First Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First TokenStockAI/ML Tuning Recommendations6.7214035.767.0972.46720.9823587.023.940.7184840.341475144.530.53587420.60276.3795.778270.9996.9920.78425.71872.990.4597.1918.990.2655642517.89140497.3451.48852.8356.24276.076221.6291890.6717551.6192.824.53463410525.863559.8726.944438.521004213.686747.48716.2066.89231.18200.482927035.7492034539204.3355.6331.8856048.0745.84454.2879410488149.006.708979092306.5137.84885.5051.0143898.910966165.0946.097883.62341733737154.6026.4336.1319.6029.0717.9824.176.0915437.796.6677.057144.8425050.473.710.6770500.321833152.920.50735521.71262.5175.498691.00101.7721.79406.45376.410.43101.7118.160.2541232630.42146329.8053.34881.2254.43285.104214.6279288.0364053.1395.424.41451310800.123649.6326.284335.67981313.386891.64730.7365.56235.35197.240956926.8890624471207.3856.4631.4357348.7346.45449.7026210387150.296.654829028308.3238.00887.7551.1243824.910323158.5286.356263.50697689396155.7326.3135.4319.5630.7017.7125.66OpenBenchmarking.org

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations246810SE +/- 0.00, N = 3SE +/- 0.00, N = 36.726.09MIN: 2.22 / MAX: 22.18MIN: 2.21 / MAX: 23.861. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations3K6K9K12K15KSE +/- 9.10, N = 3SE +/- 14.35, N = 314035.7615437.791. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUStockAI/ML Tuning Recommendations246810SE +/- 0.01, N = 3SE +/- 0.01, N = 37.096.66MIN: 4.15 / MAX: 20.16MIN: 3.65 / MAX: 22.091. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512StockAI/ML Tuning Recommendations20406080100SE +/- 0.83, N = 3SE +/- 0.77, N = 572.4077.051. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUStockAI/ML Tuning Recommendations15003000450060007500SE +/- 13.33, N = 3SE +/- 8.11, N = 36720.987144.841. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations5K10K15K20K25KSE +/- 17.66, N = 3SE +/- 17.84, N = 323587.0225050.471. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations0.88651.7732.65953.5464.4325SE +/- 0.01, N = 3SE +/- 0.00, N = 33.943.71MIN: 1.76 / MAX: 17.65MIN: 1.71 / MAX: 19.331. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUStockAI/ML Tuning Recommendations0.16170.32340.48510.64680.8085SE +/- 0.001205, N = 9SE +/- 0.000482, N = 90.7184840.677050MIN: 0.62MIN: 0.581. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUStockAI/ML Tuning Recommendations0.07680.15360.23040.30720.384SE +/- 0.000295, N = 7SE +/- 0.001024, N = 70.3414750.321833MIN: 0.32MIN: 0.311. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048StockAI/ML Tuning Recommendations306090120150SE +/- 0.97, N = 3SE +/- 1.38, N = 3144.53152.921. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUStockAI/ML Tuning Recommendations0.12060.24120.36180.48240.603SE +/- 0.001151, N = 4SE +/- 0.001097, N = 40.5358740.507355MIN: 0.49MIN: 0.461. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-152StockAI/ML Tuning Recommendations510152025SE +/- 0.10, N = 3SE +/- 0.18, N = 320.6021.71MIN: 19.3 / MAX: 21.02MIN: 20.13 / MAX: 22.23

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUStockAI/ML Tuning Recommendations60120180240300SE +/- 0.68, N = 3SE +/- 0.27, N = 3276.38262.52MIN: 269.7MIN: 257.811. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations1.29832.59663.89495.19326.4915SE +/- 0.00, N = 3SE +/- 0.01, N = 35.775.49MIN: 2.28 / MAX: 21.36MIN: 2.47 / MAX: 19.561. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations2K4K6K8K10KSE +/- 3.96, N = 3SE +/- 7.22, N = 38270.998691.001. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024StockAI/ML Tuning Recommendations20406080100SE +/- 0.34, N = 3SE +/- 1.16, N = 396.99101.771. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-152StockAI/ML Tuning Recommendations510152025SE +/- 0.04, N = 3SE +/- 0.27, N = 320.7821.79MIN: 19.72 / MAX: 21.04MIN: 20.38 / MAX: 22.54

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUStockAI/ML Tuning Recommendations90180270360450SE +/- 0.52, N = 3SE +/- 0.37, N = 3425.72406.45MIN: 419.47MIN: 400.131. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512StockAI/ML Tuning Recommendations20406080100SE +/- 0.98, N = 3SE +/- 0.99, N = 372.9976.411. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUStockAI/ML Tuning Recommendations0.10130.20260.30390.40520.5065SE +/- 0.01, N = 3SE +/- 0.00, N = 30.450.43MIN: 0.16 / MAX: 25.14MIN: 0.15 / MAX: 23.941. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024StockAI/ML Tuning Recommendations20406080100SE +/- 1.13, N = 4SE +/- 0.67, N = 1597.19101.711. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations510152025SE +/- 0.06, N = 3SE +/- 0.02, N = 318.9918.16MIN: 9.19 / MAX: 39MIN: 7.68 / MAX: 40.381. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUStockAI/ML Tuning Recommendations0.05980.11960.17940.23920.299SE +/- 0.000944, N = 5SE +/- 0.000491, N = 50.2655640.254123MIN: 0.24MIN: 0.241. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations6001200180024003000SE +/- 7.45, N = 3SE +/- 2.42, N = 32517.892630.421. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUStockAI/ML Tuning Recommendations30K60K90K120K150KSE +/- 313.81, N = 3SE +/- 528.15, N = 3140497.34146329.801. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-50StockAI/ML Tuning Recommendations1224364860SE +/- 0.15, N = 3SE +/- 0.26, N = 351.4853.34MIN: 46.04 / MAX: 52.41MIN: 49 / MAX: 54.59

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUStockAI/ML Tuning Recommendations2004006008001000SE +/- 0.22, N = 3SE +/- 0.40, N = 3852.83881.221. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUStockAI/ML Tuning Recommendations1326395265SE +/- 0.02, N = 3SE +/- 0.03, N = 356.2454.43MIN: 29.3 / MAX: 94.69MIN: 28.33 / MAX: 92.291. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

ONNX Runtime

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: StandardStockAI/ML Tuning Recommendations60120180240300SE +/- 2.53, N = 7SE +/- 1.46, N = 3276.08285.101. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-small.en - Input: 2016 State of the UnionStockAI/ML Tuning Recommendations50100150200250SE +/- 2.33, N = 3SE +/- 0.68, N = 3221.63214.631. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: SmallStockAI/ML Tuning Recommendations20406080100SE +/- 0.71, N = 3SE +/- 0.57, N = 390.6788.04

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-50StockAI/ML Tuning Recommendations1224364860SE +/- 0.15, N = 3SE +/- 0.28, N = 351.6153.13MIN: 45.56 / MAX: 52.57MIN: 46.57 / MAX: 54.35

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128StockAI/ML Tuning Recommendations20406080100SE +/- 0.49, N = 6SE +/- 0.43, N = 692.8295.421. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUStockAI/ML Tuning Recommendations1.01932.03863.05794.07725.0965SE +/- 0.01, N = 3SE +/- 0.00, N = 34.534.41MIN: 1.95 / MAX: 23.94MIN: 2.45 / MAX: 17.461. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1StockAI/ML Tuning Recommendations10002000300040005000SE +/- 15.14, N = 3SE +/- 10.73, N = 3463445131. (CXX) g++ options: -O3 -lrt -lm

OpenVINO

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUStockAI/ML Tuning Recommendations2K4K6K8K10KSE +/- 12.77, N = 3SE +/- 6.08, N = 310525.8610800.121. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations8001600240032004000SE +/- 2.96, N = 3SE +/- 3.35, N = 33559.873649.631. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16-INT8 - Device: CPUStockAI/ML Tuning Recommendations612182430SE +/- 0.02, N = 3SE +/- 0.02, N = 326.9426.28MIN: 15.65 / MAX: 45.15MIN: 15.86 / MAX: 40.891. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

LiteRT

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet FloatStockAI/ML Tuning Recommendations10002000300040005000SE +/- 11.59, N = 3SE +/- 7.66, N = 34438.524335.67

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2StockAI/ML Tuning Recommendations2K4K6K8K10KSE +/- 154.48, N = 3SE +/- 87.21, N = 31004298131. (CXX) g++ options: -O3 -lrt -lm

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUStockAI/ML Tuning Recommendations48121620SE +/- 0.02, N = 3SE +/- 0.02, N = 313.6813.38MIN: 7.09 / MAX: 36.01MIN: 6.98 / MAX: 34.621. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUStockAI/ML Tuning Recommendations15003000450060007500SE +/- 11.12, N = 3SE +/- 9.90, N = 36747.486891.641. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUStockAI/ML Tuning Recommendations160320480640800SE +/- 0.66, N = 3SE +/- 0.74, N = 3716.20730.731. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUStockAI/ML Tuning Recommendations1530456075SE +/- 0.06, N = 3SE +/- 0.07, N = 366.8965.56MIN: 34.58 / MAX: 130MIN: 32.6 / MAX: 131.971. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50StockAI/ML Tuning Recommendations50100150200250SE +/- 0.28, N = 3SE +/- 0.20, N = 3231.18235.35

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: MediumStockAI/ML Tuning Recommendations4080120160200SE +/- 0.88, N = 3SE +/- 0.71, N = 3200.48197.24

LiteRT

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: SqueezeNetStockAI/ML Tuning Recommendations15003000450060007500SE +/- 31.31, N = 3SE +/- 31.55, N = 37035.746926.88

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2StockAI/ML Tuning Recommendations2K4K6K8K10KSE +/- 32.13, N = 3SE +/- 25.31, N = 3920390621. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1StockAI/ML Tuning Recommendations10002000300040005000SE +/- 42.62, N = 3SE +/- 54.67, N = 3453944711. (CXX) g++ options: -O3 -lrt -lm

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: ResNet-50StockAI/ML Tuning Recommendations50100150200250SE +/- 0.57, N = 3SE +/- 0.72, N = 3204.33207.38

OpenVINO GenAI

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPUStockAI/ML Tuning Recommendations1326395265SE +/- 0.16, N = 4SE +/- 0.14, N = 455.6356.46

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: TinyStockAI/ML Tuning Recommendations714212835SE +/- 0.26, N = 3SE +/- 0.25, N = 331.8931.44

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128StockAI/ML Tuning Recommendations1122334455SE +/- 0.07, N = 4SE +/- 0.05, N = 448.0748.731. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128StockAI/ML Tuning Recommendations1122334455SE +/- 0.05, N = 4SE +/- 0.09, N = 445.8446.451. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-medium.en - Input: 2016 State of the UnionStockAI/ML Tuning Recommendations100200300400500SE +/- 1.45, N = 3SE +/- 1.24, N = 3454.29449.701. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3SmallStockAI/ML Tuning Recommendations2K4K6K8K10KSE +/- 11.35, N = 3SE +/- 28.22, N = 310488103871. (CXX) g++ options: -O3 -lrt -lm

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048StockAI/ML Tuning Recommendations306090120150SE +/- 2.06, N = 3SE +/- 1.36, N = 15149.00150.291. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUStockAI/ML Tuning Recommendations246810SE +/- 0.03430, N = 3SE +/- 0.01789, N = 36.708976.65482MIN: 6.07MIN: 3.911. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2StockAI/ML Tuning Recommendations2K4K6K8K10KSE +/- 89.20, N = 3SE +/- 94.57, N = 3909290281. (CXX) g++ options: -O3 -lrt -lm

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048StockAI/ML Tuning Recommendations70140210280350SE +/- 2.62, N = 3SE +/- 2.23, N = 15306.51308.321. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO GenAI

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPUStockAI/ML Tuning Recommendations918273645SE +/- 0.13, N = 3SE +/- 0.09, N = 337.8438.00

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkStockAI/ML Tuning Recommendations2004006008001000SE +/- 1.94, N = 3SE +/- 0.72, N = 3885.50887.75

OpenVINO GenAI

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPUStockAI/ML Tuning Recommendations1224364860SE +/- 0.09, N = 3SE +/- 0.19, N = 351.0151.12

LiteRT

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception V4StockAI/ML Tuning Recommendations9K18K27K36K45KSE +/- 47.06, N = 3SE +/- 159.42, N = 343898.943824.9

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3SmallStockAI/ML Tuning Recommendations2K4K6K8K10KSE +/- 410.82, N = 3SE +/- 21.33, N = 310966103231. (CXX) g++ options: -O3 -lrt -lm

ONNX Runtime

MinAvgMaxStock98.9429.9477.0AI/ML Tuning Recommendations98.5460.1509.9OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.19System Power Consumption Monitor130260390520650

MinAvgMaxStock3.0248.9279.9AI/ML Tuning Recommendations77.0270.8300.9OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.19CPU Power Consumption Monitor80160240320400

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardStockAI/ML Tuning Recommendations4080120160200SE +/- 3.66, N = 15SE +/- 3.81, N = 15165.09158.531. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardStockAI/ML Tuning Recommendations246810SE +/- 0.13191, N = 15SE +/- 0.14383, N = 156.097886.356261. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

LiteRT

MinAvgMaxStock97.1339.2364.8AI/ML Tuning Recommendations97.7390.9414.6OpenBenchmarking.orgWatts, Fewer Is BetterLiteRT 2024-10-15System Power Consumption Monitor110220330440550

MinAvgMaxStock3.7221.3238.5AI/ML Tuning Recommendations99.2261.1279.6OpenBenchmarking.orgWatts, Fewer Is BetterLiteRT 2024-10-15CPU Power Consumption Monitor70140210280350

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: NASNet MobileStockAI/ML Tuning Recommendations160K320K480K640K800KSE +/- 17324.48, N = 15SE +/- 22050.37, N = 12733737689396

Llama.cpp

MinAvgMaxStock97.9371.5450.0AI/ML Tuning Recommendations97.3433.6515.3OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154System Power Consumption Monitor130260390520650

MinAvgMaxStock14.5244.4300.1AI/ML Tuning Recommendations97.3292.2349.7OpenBenchmarking.orgWatts, Fewer Is BetterLlama.cpp b4154CPU Power Consumption Monitor100200300400500

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512StockAI/ML Tuning Recommendations306090120150SE +/- 2.60, N = 12SE +/- 3.23, N = 12154.60155.731. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

75 Results Shown

OpenVINO:
  Weld Porosity Detection FP16-INT8 - CPU:
    ms
    FPS
  Person Vehicle Bike Detection FP16 - CPU:
    ms
Llama.cpp
OpenVINO:
  Person Vehicle Bike Detection FP16 - CPU
  Face Detection Retail FP16-INT8 - CPU
  Face Detection Retail FP16-INT8 - CPU
oneDNN:
  Deconvolution Batch shapes_3d - CPU
  Convolution Batch Shapes Auto - CPU
Llama.cpp
oneDNN
PyTorch
oneDNN
OpenVINO:
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
Llama.cpp
PyTorch
oneDNN
Llama.cpp
OpenVINO
Llama.cpp
OpenVINO
oneDNN
OpenVINO:
  Road Segmentation ADAS FP16-INT8 - CPU
  Age Gender Recognition Retail 0013 FP16 - CPU
PyTorch
OpenVINO:
  Machine Translation EN To DE FP16 - CPU:
    FPS
    ms
ONNX Runtime
Whisper.cpp
Whisperfile
PyTorch
Llama.cpp
OpenVINO
XNNPACK
OpenVINO:
  Person Re-Identification Retail FP16 - CPU
  Handwritten English Recognition FP16-INT8 - CPU
  Handwritten English Recognition FP16-INT8 - CPU
LiteRT
XNNPACK
OpenVINO:
  Noise Suppression Poconet-Like FP16 - CPU:
    ms
    FPS
  Person Detection FP16 - CPU:
    FPS
    ms
TensorFlow
Whisperfile
LiteRT
XNNPACK:
  FP32MobileNetV2
  FP32MobileNetV1
TensorFlow
OpenVINO GenAI
Whisperfile
Llama.cpp:
  CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128
Whisper.cpp
XNNPACK
Llama.cpp
oneDNN
XNNPACK
Llama.cpp
OpenVINO GenAI
Numpy Benchmark
OpenVINO GenAI
LiteRT
XNNPACK
ONNX Runtime:
  System Power Consumption Monitor
  CPU Power Consumption Monitor
ONNX Runtime:
  ResNet101_DUC_HDC-12 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
LiteRT:
  System Power Consumption Monitor
  CPU Power Consumption Monitor
LiteRT
Llama.cpp:
  System Power Consumption Monitor
  CPU Power Consumption Monitor
Llama.cpp