AMD EPYC Turin AI/ML Tuning Guide

AMD EPYC 9655P following AMD tuning guide for AI/ML workloads - https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/58467_amd-epyc-9005-tg-bios-and-workload.pdf Benchmarks by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2411286-NE-AMDEPYCTU24
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Sensor Monitoring

Show Accumulated Sensor Monitoring Data For Displayed Results
Generate Power Efficiency / Performance Per Watt Results

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
View Logs
Performance Per
Dollar
Date
Run
  Test
  Duration
Stock
November 28
  5 Hours, 44 Minutes
AI/ML Tuning Recommendations
November 28
  6 Hours, 7 Minutes
Invert Behavior (Only Show Selected Data)
  5 Hours, 55 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC Turin AI/ML Tuning GuideOpenBenchmarking.orgPhoronix Test SuiteAMD EPYC 9655P 96-Core @ 2.60GHz (96 Cores / 192 Threads)Supermicro Super Server H13SSL-N v1.01 (3.0 BIOS)AMD 1Ah12 x 64GB DDR5-6000MT/s Micron MTC40F2046S1RC64BDY QSFF3201GB Micron_7450_MTFDKCB3T2TFSASPEED2 x Broadcom NetXtreme BCM5720 PCIeUbuntu 24.106.12.0-rc7-linux-pm-next-phx (x86_64)GNOME Shell 47.0X ServerGCC 14.2.0ext41024x768ProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelDesktopDisplay ServerCompilerFile-SystemScreen ResolutionAMD EPYC Turin AI/ML Tuning Guide BenchmarksSystem Logs- Transparent Huge Pages: madvise- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xb002116 - Python 3.12.7- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected

Stock vs. AI/ML Tuning Recommendations ComparisonPhoronix Test SuiteBaseline+2.6%+2.6%+5.2%+5.2%+7.8%+7.8%10.3%10%6.5%6.4%6.4%6.3%6.2%6.2%6.2%6.1%6.1%5.8%5.6%5.4%5.3%5.1%5.1%4.9%4.9%4.7%4.7%4.7%4.7%4.6%4.5%4.5%4.2%4.2%3.6%3.3%3.3%3.3%3.3%3%2.9%2.8%2.7%2.7%2.6%2.5%2.5%2.4%2.3%2.2%2.1%2%2%4.1%3.3%2%W.P.D.F.I - CPUW.P.D.F.I - CPUP.V.B.D.F - CPUNASNet MobileCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.5P.V.B.D.F - CPUFP16MobileNetV3SmallF.D.R.F.I - CPUF.D.R.F.I - CPUD.B.s - CPUC.B.S.A - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.2IP Shapes 1D - CPUCPU - 512 - ResNet-152R.N.N.I - CPUV.D.F.I - CPUV.D.F.I - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.1CPU - 256 - ResNet-152R.N.N.T - CPUCPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - P.P.5A.G.R.R.0.F - CPUCPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - P.P.1R.S.A.F.I - CPUIP Shapes 3D - CPUR.S.A.F.I - CPUResNet101_DUC_HDC-12 - CPU - StandardA.G.R.R.0.F - CPUCPU - 512 - ResNet-50M.T.E.T.D.F - CPUM.T.E.T.D.F - CPUR.v.1.i - CPU - Standardggml-small.en - 2.S.o.t.USmallCPU - 256 - ResNet-50CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - T.G.1P.R.I.R.F - CPUFP16MobileNetV1P.R.I.R.F - CPUH.E.R.F.I - CPUH.E.R.F.I - CPUMobilenet FloatQS8MobileNetV2N.S.P.L.F - CPUN.S.P.L.F - CPUP.D.F - CPUP.D.F - CPUResNet101_DUC_HDC-12 - CPU - StandardR.v.1.i - CPU - StandardGemma-7b-int4-ov - CPU - T.T.F.TFalcon-7b-instruct-int4-ov - CPU - T.T.F.T5.6%Phi-3-mini-128k-instruct-int4-ov - CPU - T.T.F.T6.2%OpenVINOOpenVINOOpenVINOLiteRTLlama.cppOpenVINOXNNPACKOpenVINOOpenVINOoneDNNoneDNNLlama.cpponeDNNPyTorchoneDNNOpenVINOOpenVINOLlama.cppPyTorchoneDNNLlama.cppOpenVINOLlama.cppOpenVINOoneDNNOpenVINOONNX RuntimeOpenVINOPyTorchOpenVINOOpenVINOONNX RuntimeWhisper.cppWhisperfilePyTorchLlama.cppOpenVINOXNNPACKOpenVINOOpenVINOOpenVINOLiteRTXNNPACKOpenVINOOpenVINOOpenVINOOpenVINOONNX RuntimeONNX RuntimeOpenVINO GenAIOpenVINO GenAIOpenVINO GenAIStockAI/ML Tuning Recommendations

AMD EPYC Turin AI/ML Tuning Guidewhisper-cpp: ggml-medium.en - 2016 State of the Uniononnx: ResNet101_DUC_HDC-12 - CPU - Standardonnx: ResNet101_DUC_HDC-12 - CPU - Standardtensorflow: CPU - 512 - ResNet-50llama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 2048litert: NASNet Mobilewhisper-cpp: ggml-small.en - 2016 State of the Unionxnnpack: QS8MobileNetV2xnnpack: FP16MobileNetV3Smallxnnpack: FP16MobileNetV2xnnpack: FP16MobileNetV1xnnpack: FP32MobileNetV3Smallxnnpack: FP32MobileNetV2xnnpack: FP32MobileNetV1whisperfile: Mediumllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 1024tensorflow: CPU - 256 - ResNet-50llama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 2048onnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardpytorch: CPU - 512 - ResNet-152pytorch: CPU - 256 - ResNet-152numpy: whisperfile: Smallllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512llama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048onednn: Recurrent Neural Network Training - CPUonednn: Recurrent Neural Network Inference - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Noise Suppression Poconet-Like FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 1024openvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Person Re-Identification Retail FP16 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Road Segmentation ADAS FP16-INT8 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Face Detection Retail FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Handwritten English Recognition FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Prompt Processing 512litert: Inception V4litert: Mobilenet Floatlitert: SqueezeNetllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 512pytorch: CPU - 256 - ResNet-50pytorch: CPU - 512 - ResNet-50openvino-genai: Gemma-7b-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Gemma-7b-int4-ov - CPU - Time To First Tokenopenvino-genai: Gemma-7b-int4-ov - CPUwhisperfile: Tinyopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPU - Time To First Tokenopenvino-genai: Falcon-7b-instruct-int4-ov - CPUonednn: Deconvolution Batch shapes_1d - CPUllama-cpp: CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Text Generation 128onednn: IP Shapes 1D - CPUllama-cpp: CPU BLAS - Mistral-7B-Instruct-v0.3-Q8_0 - Text Generation 128openvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Tokenopenvino-genai: Phi-3-mini-128k-instruct-int4-ov - CPUonednn: IP Shapes 3D - CPUllama-cpp: CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Text Generation 128onednn: Convolution Batch Shapes Auto - CPUonednn: Deconvolution Batch shapes_3d - CPUStockAI/ML Tuning Recommendations454.28794165.0946.09788231.18149.00733737221.629181004210966909246341048892034539200.4829297.19204.33306.513.62341276.07620.6020.78885.5090.67175154.60144.53425.718276.37913.686747.4856.24852.8366.89716.2096.997.096720.984.5310525.8618.992517.890.45140497.343.9423587.025.778270.9926.943559.876.7214035.7672.443898.94438.527035.7472.9951.6151.4826.4336.1337.8431.8856019.6029.0751.016.7089745.840.53587448.0717.9824.1755.630.26556492.820.3414750.718484449.70262158.5286.35626235.35150.29689396214.62792981310323902845131038790624471197.24095101.71207.38308.323.50697285.10421.7121.79887.7588.03640155.73152.92406.453262.51713.386891.6454.43881.2265.56730.73101.776.667144.844.4110800.1218.162630.420.43146329.803.7125050.475.498691.0026.283649.636.0915437.7977.0543824.94335.676926.8876.4153.1353.3426.3135.4338.0031.4357319.5630.7051.126.6548246.450.50735548.7317.7125.6656.460.25412395.420.3218330.677050OpenBenchmarking.org

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-medium.en - Input: 2016 State of the UnionAI/ML Tuning RecommendationsStock100200300400500SE +/- 1.24, N = 3SE +/- 1.45, N = 3449.70454.291. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAI/ML Tuning RecommendationsStock4080120160200SE +/- 3.81, N = 15SE +/- 3.66, N = 15158.53165.091. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: StandardAI/ML Tuning RecommendationsStock246810SE +/- 0.14383, N = 15SE +/- 0.13191, N = 156.356266.097881. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50AI/ML Tuning RecommendationsStock50100150200250SE +/- 0.20, N = 3SE +/- 0.28, N = 3235.35231.18

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048AI/ML Tuning RecommendationsStock306090120150SE +/- 1.36, N = 15SE +/- 2.06, N = 3150.29149.001. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

LiteRT

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: NASNet MobileAI/ML Tuning RecommendationsStock160K320K480K640K800KSE +/- 22050.37, N = 12SE +/- 17324.48, N = 15689396733737

Whisper.cpp

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisper.cpp 1.6.2Model: ggml-small.en - Input: 2016 State of the UnionAI/ML Tuning RecommendationsStock50100150200250SE +/- 0.68, N = 3SE +/- 2.33, N = 3214.63221.631. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vnni

XNNPACK

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2AI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 87.21, N = 3SE +/- 154.48, N = 39813100421. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3SmallAI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 21.33, N = 3SE +/- 410.82, N = 310323109661. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2AI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 94.57, N = 3SE +/- 89.20, N = 3902890921. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1AI/ML Tuning RecommendationsStock10002000300040005000SE +/- 10.73, N = 3SE +/- 15.14, N = 3451346341. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3SmallAI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 28.22, N = 3SE +/- 11.35, N = 310387104881. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2AI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 25.31, N = 3SE +/- 32.13, N = 3906292031. (CXX) g++ options: -O3 -lrt -lm

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1AI/ML Tuning RecommendationsStock10002000300040005000SE +/- 54.67, N = 3SE +/- 42.62, N = 3447145391. (CXX) g++ options: -O3 -lrt -lm

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: MediumAI/ML Tuning RecommendationsStock4080120160200SE +/- 0.71, N = 3SE +/- 0.88, N = 3197.24200.48

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 1024AI/ML Tuning RecommendationsStock20406080100SE +/- 0.67, N = 15SE +/- 1.13, N = 4101.7197.191. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries if desired for complementary metrics. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 256 - Model: ResNet-50AI/ML Tuning RecommendationsStock50100150200250SE +/- 0.72, N = 3SE +/- 0.57, N = 3207.38204.33

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 2048AI/ML Tuning RecommendationsStock70140210280350SE +/- 2.23, N = 15SE +/- 2.62, N = 3308.32306.511. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

ONNX Runtime

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: StandardAI/ML Tuning RecommendationsStock0.81531.63062.44593.26124.0765SE +/- 0.01794, N = 3SE +/- 0.03395, N = 73.506973.623411. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.19Model: ResNet50 v1-12-int8 - Device: CPU - Executor: StandardAI/ML Tuning RecommendationsStock60120180240300SE +/- 1.46, N = 3SE +/- 2.53, N = 7285.10276.081. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-152AI/ML Tuning RecommendationsStock510152025SE +/- 0.18, N = 3SE +/- 0.10, N = 321.7120.60MIN: 20.13 / MAX: 22.23MIN: 19.3 / MAX: 21.02

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-152AI/ML Tuning RecommendationsStock510152025SE +/- 0.27, N = 3SE +/- 0.04, N = 321.7920.78MIN: 20.38 / MAX: 22.54MIN: 19.72 / MAX: 21.04

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkAI/ML Tuning RecommendationsStock2004006008001000SE +/- 0.72, N = 3SE +/- 1.94, N = 3887.75885.50

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: SmallAI/ML Tuning RecommendationsStock20406080100SE +/- 0.57, N = 3SE +/- 0.71, N = 388.0490.67

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Prompt Processing 512AI/ML Tuning RecommendationsStock306090120150SE +/- 3.23, N = 12SE +/- 2.60, N = 12155.73154.601. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 2048AI/ML Tuning RecommendationsStock306090120150SE +/- 1.38, N = 3SE +/- 0.97, N = 3152.92144.531. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUAI/ML Tuning RecommendationsStock90180270360450SE +/- 0.37, N = 3SE +/- 0.52, N = 3406.45425.72MIN: 400.13MIN: 419.471. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUAI/ML Tuning RecommendationsStock60120180240300SE +/- 0.27, N = 3SE +/- 0.68, N = 3262.52276.38MIN: 257.81MIN: 269.71. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUAI/ML Tuning RecommendationsStock48121620SE +/- 0.02, N = 3SE +/- 0.02, N = 313.3813.68MIN: 6.98 / MAX: 34.62MIN: 7.09 / MAX: 36.011. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Noise Suppression Poconet-Like FP16 - Device: CPUAI/ML Tuning RecommendationsStock15003000450060007500SE +/- 9.90, N = 3SE +/- 11.12, N = 36891.646747.481. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUAI/ML Tuning RecommendationsStock1326395265SE +/- 0.03, N = 3SE +/- 0.02, N = 354.4356.24MIN: 28.33 / MAX: 92.29MIN: 29.3 / MAX: 94.691. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Machine Translation EN To DE FP16 - Device: CPUAI/ML Tuning RecommendationsStock2004006008001000SE +/- 0.40, N = 3SE +/- 0.22, N = 3881.22852.831. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUAI/ML Tuning RecommendationsStock1530456075SE +/- 0.07, N = 3SE +/- 0.06, N = 365.5666.89MIN: 32.6 / MAX: 131.97MIN: 34.58 / MAX: 1301. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Detection FP16 - Device: CPUAI/ML Tuning RecommendationsStock160320480640800SE +/- 0.74, N = 3SE +/- 0.66, N = 3730.73716.201. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 1024AI/ML Tuning RecommendationsStock20406080100SE +/- 1.16, N = 3SE +/- 0.34, N = 3101.7796.991. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUAI/ML Tuning RecommendationsStock246810SE +/- 0.01, N = 3SE +/- 0.01, N = 36.667.09MIN: 3.65 / MAX: 22.09MIN: 4.15 / MAX: 20.161. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Vehicle Bike Detection FP16 - Device: CPUAI/ML Tuning RecommendationsStock15003000450060007500SE +/- 8.11, N = 3SE +/- 13.33, N = 37144.846720.981. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUAI/ML Tuning RecommendationsStock1.01932.03863.05794.07725.0965SE +/- 0.00, N = 3SE +/- 0.01, N = 34.414.53MIN: 2.45 / MAX: 17.46MIN: 1.95 / MAX: 23.941. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Person Re-Identification Retail FP16 - Device: CPUAI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 6.08, N = 3SE +/- 12.77, N = 310800.1210525.861. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock510152025SE +/- 0.02, N = 3SE +/- 0.06, N = 318.1618.99MIN: 7.68 / MAX: 40.38MIN: 9.19 / MAX: 391. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Road Segmentation ADAS FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock6001200180024003000SE +/- 2.42, N = 3SE +/- 7.45, N = 32630.422517.891. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUAI/ML Tuning RecommendationsStock0.10130.20260.30390.40520.5065SE +/- 0.00, N = 3SE +/- 0.01, N = 30.430.45MIN: 0.15 / MAX: 23.94MIN: 0.16 / MAX: 25.141. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Age Gender Recognition Retail 0013 FP16 - Device: CPUAI/ML Tuning RecommendationsStock30K60K90K120K150KSE +/- 528.15, N = 3SE +/- 313.81, N = 3146329.80140497.341. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock0.88651.7732.65953.5464.4325SE +/- 0.00, N = 3SE +/- 0.01, N = 33.713.94MIN: 1.71 / MAX: 19.33MIN: 1.76 / MAX: 17.651. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Face Detection Retail FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock5K10K15K20K25KSE +/- 17.84, N = 3SE +/- 17.66, N = 325050.4723587.021. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock1.29832.59663.89495.19326.4915SE +/- 0.01, N = 3SE +/- 0.00, N = 35.495.77MIN: 2.47 / MAX: 19.56MIN: 2.28 / MAX: 21.361. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Vehicle Detection FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock2K4K6K8K10KSE +/- 7.22, N = 3SE +/- 3.96, N = 38691.008270.991. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock612182430SE +/- 0.02, N = 3SE +/- 0.02, N = 326.2826.94MIN: 15.86 / MAX: 40.89MIN: 15.65 / MAX: 45.151. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Handwritten English Recognition FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock8001600240032004000SE +/- 3.35, N = 3SE +/- 2.96, N = 33649.633559.871. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock246810SE +/- 0.00, N = 3SE +/- 0.00, N = 36.096.72MIN: 2.21 / MAX: 23.86MIN: 2.22 / MAX: 22.181. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2024.5Model: Weld Porosity Detection FP16-INT8 - Device: CPUAI/ML Tuning RecommendationsStock3K6K9K12K15KSE +/- 14.35, N = 3SE +/- 9.10, N = 315437.7914035.761. (CXX) g++ options: -fPIC -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -shared -ldl -lstdc++fs

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 512AI/ML Tuning RecommendationsStock20406080100SE +/- 0.77, N = 5SE +/- 0.83, N = 377.0572.401. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

LiteRT

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception V4AI/ML Tuning RecommendationsStock9K18K27K36K45KSE +/- 159.42, N = 3SE +/- 47.06, N = 343824.943898.9

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet FloatAI/ML Tuning RecommendationsStock10002000300040005000SE +/- 7.66, N = 3SE +/- 11.59, N = 34335.674438.52

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: SqueezeNetAI/ML Tuning RecommendationsStock15003000450060007500SE +/- 31.55, N = 3SE +/- 31.31, N = 36926.887035.74

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Prompt Processing 512AI/ML Tuning RecommendationsStock20406080100SE +/- 0.99, N = 3SE +/- 0.98, N = 376.4172.991. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

PyTorch

This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 256 - Model: ResNet-50AI/ML Tuning RecommendationsStock1224364860SE +/- 0.28, N = 3SE +/- 0.15, N = 353.1351.61MIN: 46.57 / MAX: 54.35MIN: 45.56 / MAX: 52.57

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.2.1Device: CPU - Batch Size: 512 - Model: ResNet-50AI/ML Tuning RecommendationsStock1224364860SE +/- 0.26, N = 3SE +/- 0.15, N = 353.3451.48MIN: 49 / MAX: 54.59MIN: 46.04 / MAX: 52.41

OpenVINO GenAI

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPU - Time Per Output TokenAI/ML Tuning RecommendationsStock612182430SE +/- 0.06, N = 3SE +/- 0.09, N = 326.3126.43

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPU - Time To First TokenAI/ML Tuning RecommendationsStock816243240SE +/- 0.08, N = 3SE +/- 0.20, N = 335.4336.13

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Gemma-7b-int4-ov - Device: CPUAI/ML Tuning RecommendationsStock918273645SE +/- 0.09, N = 3SE +/- 0.13, N = 338.0037.84

Whisperfile

OpenBenchmarking.orgSeconds, Fewer Is BetterWhisperfile 20Aug24Model Size: TinyAI/ML Tuning RecommendationsStock714212835SE +/- 0.25, N = 3SE +/- 0.26, N = 331.4431.89

OpenVINO GenAI

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time Per Output TokenAI/ML Tuning RecommendationsStock510152025SE +/- 0.07, N = 3SE +/- 0.04, N = 319.5619.60

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPU - Time To First TokenStockAI/ML Tuning Recommendations714212835SE +/- 0.03, N = 3SE +/- 0.18, N = 329.0730.70

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Falcon-7b-instruct-int4-ov - Device: CPUAI/ML Tuning RecommendationsStock1224364860SE +/- 0.19, N = 3SE +/- 0.09, N = 351.1251.01

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUAI/ML Tuning RecommendationsStock246810SE +/- 0.01789, N = 3SE +/- 0.03430, N = 36.654826.70897MIN: 3.91MIN: 6.071. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Llama-3.1-Tulu-3-8B-Q8_0 - Test: Text Generation 128AI/ML Tuning RecommendationsStock1122334455SE +/- 0.09, N = 4SE +/- 0.05, N = 446.4545.841. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUAI/ML Tuning RecommendationsStock0.12060.24120.36180.48240.603SE +/- 0.001097, N = 4SE +/- 0.001151, N = 40.5073550.535874MIN: 0.46MIN: 0.491. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Text Generation 128AI/ML Tuning RecommendationsStock1122334455SE +/- 0.05, N = 4SE +/- 0.07, N = 448.7348.071. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

OpenVINO GenAI

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time Per Output TokenAI/ML Tuning RecommendationsStock48121620SE +/- 0.04, N = 4SE +/- 0.05, N = 417.7117.98

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPU - Time To First TokenStockAI/ML Tuning Recommendations612182430SE +/- 0.14, N = 4SE +/- 0.07, N = 424.1725.66

OpenBenchmarking.orgtokens/s, More Is BetterOpenVINO GenAI 2024.5Model: Phi-3-mini-128k-instruct-int4-ov - Device: CPUAI/ML Tuning RecommendationsStock1326395265SE +/- 0.14, N = 4SE +/- 0.16, N = 456.4655.63

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUAI/ML Tuning RecommendationsStock0.05980.11960.17940.23920.299SE +/- 0.000491, N = 5SE +/- 0.000944, N = 50.2541230.265564MIN: 0.24MIN: 0.241. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

Llama.cpp

OpenBenchmarking.orgTokens Per Second, More Is BetterLlama.cpp b4154Backend: CPU BLAS - Model: granite-3.0-3b-a800m-instruct-Q8_0 - Test: Text Generation 128AI/ML Tuning RecommendationsStock20406080100SE +/- 0.43, N = 6SE +/- 0.49, N = 695.4292.821. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -fopenmp -march=native -mtune=native -lopenblas

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUAI/ML Tuning RecommendationsStock0.07680.15360.23040.30720.384SE +/- 0.001024, N = 7SE +/- 0.000295, N = 70.3218330.341475MIN: 0.31MIN: 0.321. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUAI/ML Tuning RecommendationsStock0.16170.32340.48510.64680.8085SE +/- 0.000482, N = 9SE +/- 0.001205, N = 90.6770500.718484MIN: 0.58MIN: 0.621. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl

76 Results Shown

Whisper.cpp
ONNX Runtime:
  ResNet101_DUC_HDC-12 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
TensorFlow
Llama.cpp
LiteRT
Whisper.cpp
XNNPACK:
  QS8MobileNetV2
  FP16MobileNetV3Small
  FP16MobileNetV2
  FP16MobileNetV1
  FP32MobileNetV3Small
  FP32MobileNetV2
  FP32MobileNetV1
Whisperfile
Llama.cpp
TensorFlow
Llama.cpp
ONNX Runtime:
  ResNet50 v1-12-int8 - CPU - Standard:
    Inference Time Cost (ms)
    Inferences Per Second
PyTorch:
  CPU - 512 - ResNet-152
  CPU - 256 - ResNet-152
Numpy Benchmark
Whisperfile
Llama.cpp:
  CPU BLAS - granite-3.0-3b-a800m-instruct-Q8_0 - Prompt Processing 512
  CPU BLAS - Llama-3.1-Tulu-3-8B-Q8_0 - Prompt Processing 2048
oneDNN:
  Recurrent Neural Network Training - CPU
  Recurrent Neural Network Inference - CPU
OpenVINO:
  Noise Suppression Poconet-Like FP16 - CPU:
    ms
    FPS
  Machine Translation EN To DE FP16 - CPU:
    ms
    FPS
  Person Detection FP16 - CPU:
    ms
    FPS
Llama.cpp
OpenVINO:
  Person Vehicle Bike Detection FP16 - CPU:
    ms
    FPS
  Person Re-Identification Retail FP16 - CPU:
    ms
    FPS
  Road Segmentation ADAS FP16-INT8 - CPU:
    ms
    FPS
  Age Gender Recognition Retail 0013 FP16 - CPU:
    ms
    FPS
  Face Detection Retail FP16-INT8 - CPU:
    ms
    FPS
  Vehicle Detection FP16-INT8 - CPU:
    ms
    FPS
  Handwritten English Recognition FP16-INT8 - CPU:
    ms
    FPS
  Weld Porosity Detection FP16-INT8 - CPU:
    ms
    FPS
Llama.cpp
LiteRT:
  Inception V4
  Mobilenet Float
  SqueezeNet
Llama.cpp
PyTorch:
  CPU - 256 - ResNet-50
  CPU - 512 - ResNet-50
OpenVINO GenAI:
  Gemma-7b-int4-ov - CPU - Time Per Output Token
  Gemma-7b-int4-ov - CPU - Time To First Token
  Gemma-7b-int4-ov - CPU
Whisperfile
OpenVINO GenAI:
  Falcon-7b-instruct-int4-ov - CPU - Time Per Output Token
  Falcon-7b-instruct-int4-ov - CPU - Time To First Token
  Falcon-7b-instruct-int4-ov - CPU
oneDNN
Llama.cpp
oneDNN
Llama.cpp
OpenVINO GenAI:
  Phi-3-mini-128k-instruct-int4-ov - CPU - Time Per Output Token
  Phi-3-mini-128k-instruct-int4-ov - CPU - Time To First Token
  Phi-3-mini-128k-instruct-int4-ov - CPU
oneDNN
Llama.cpp
oneDNN:
  Convolution Batch Shapes Auto - CPU
  Deconvolution Batch shapes_3d - CPU