m6i.8xlarge amazon testing on Ubuntu 22.04 via the Phoronix Test Suite. m6i.8xlarge: Processor: Intel Xeon Platinum 8375C (16 Cores / 32 Threads), Motherboard: Amazon EC2 m6i.8xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 128 GB DDR4-3200MT/s, Disk: 537GB Amazon Elastic Block Store, Graphics: EFI VGA, Network: Amazon Elastic OS: Ubuntu 22.04, Kernel: 6.5.0-1017-aws (x86_64), Vulkan: 1.3.255, Compiler: GCC 11.4.0, File-System: ext4, Screen Resolution: 800x600, System Layer: amazon ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 132.96 |========================================================= ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 160.21 |========================================================= ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 11.34 |========================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 15.44 |========================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 190.74 |========================================================= ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 255.15 |========================================================= ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 14.78 |========================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 23.33 |========================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 554.93 |========================================================= ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 808.41 |========================================================= ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 1.74709 |======================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 3.92245 |======================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 34.70 |========================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 44.47 |========================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 287.50 |========================================================= ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 332.25 |========================================================= ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 98.62 |========================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 131.79 |========================================================= ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 4.55004 |======================================================== ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 4.88161 |======================================================== OpenCV 4.7 Test: Core ms < Lower Is Better m6i.8xlarge . 100806 |========================================================= OpenCV 4.7 Test: Video ms < Lower Is Better m6i.8xlarge . 26336 |========================================================== OpenCV 4.7 Test: Graph API ms < Lower Is Better m6i.8xlarge . 248216 |========================================================= OpenCV 4.7 Test: Stitching ms < Lower Is Better m6i.8xlarge . 351122 |========================================================= OpenCV 4.7 Test: Features 2D ms < Lower Is Better m6i.8xlarge . 72733 |========================================================== OpenCV 4.7 Test: Image Processing ms < Lower Is Better m6i.8xlarge . 141105 |========================================================= OpenCV 4.7 Test: Object Detection ms < Lower Is Better m6i.8xlarge . 28606 |========================================================== OpenCV 4.7 Test: DNN - Deep Neural Network ms < Lower Is Better m6i.8xlarge . 32175 |========================================================== Whisper.cpp 1.6.2 Model: ggml-base.en - Input: 2016 State of the Union Seconds < Lower Is Better m6i.8xlarge . 131.72 |========================================================= Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union Seconds < Lower Is Better m6i.8xlarge . 347.07 |========================================================= Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union Seconds < Lower Is Better m6i.8xlarge . 977.35 |========================================================= Llama.cpp b3067 Model: Meta-Llama-3-8B-Instruct-Q8_0.gguf Tokens Per Second > Higher Is Better m6i.8xlarge . 12.49 |========================================================== oneDNN 3.4 Harness: IP Shapes 1D - Engine: CPU ms < Lower Is Better m6i.8xlarge . 1.10033 |======================================================== oneDNN 3.4 Harness: IP Shapes 3D - Engine: CPU ms < Lower Is Better m6i.8xlarge . 2.05162 |======================================================== oneDNN 3.4 Harness: Convolution Batch Shapes Auto - Engine: CPU ms < Lower Is Better m6i.8xlarge . 3.13406 |======================================================== oneDNN 3.4 Harness: Deconvolution Batch shapes_1d - Engine: CPU ms < Lower Is Better m6i.8xlarge . 5.67627 |======================================================== oneDNN 3.4 Harness: Deconvolution Batch shapes_3d - Engine: CPU ms < Lower Is Better m6i.8xlarge . 2.71994 |======================================================== oneDNN 3.4 Harness: Recurrent Neural Network Training - Engine: CPU ms < Lower Is Better m6i.8xlarge . 1559.67 |======================================================== oneDNN 3.4 Harness: Recurrent Neural Network Inference - Engine: CPU ms < Lower Is Better m6i.8xlarge . 840.25 |========================================================= OpenVINO 2024.0 Model: Face Detection FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 6.86 |=========================================================== OpenVINO 2024.0 Model: Face Detection FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 1158.75 |======================================================== OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 52.40 |========================================================== OpenVINO 2024.0 Model: Person Detection FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 152.58 |========================================================= OpenVINO 2024.0 Model: Person Detection FP32 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 52.39 |========================================================== OpenVINO 2024.0 Model: Person Detection FP32 - Device: CPU ms < Lower Is Better m6i.8xlarge . 152.55 |========================================================= OpenVINO 2024.0 Model: Vehicle Detection FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 348.67 |========================================================= OpenVINO 2024.0 Model: Vehicle Detection FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 22.91 |========================================================== OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 26.28 |========================================================== OpenVINO 2024.0 Model: Face Detection FP16-INT8 - Device: CPU ms < Lower Is Better m6i.8xlarge . 303.90 |========================================================= OpenVINO 2024.0 Model: Face Detection Retail FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 1268.18 |======================================================== OpenVINO 2024.0 Model: Face Detection Retail FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 6.28 |=========================================================== OpenVINO 2024.0 Model: Road Segmentation ADAS FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 163.45 |========================================================= OpenVINO 2024.0 Model: Road Segmentation ADAS FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 48.91 |========================================================== OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 1158.41 |======================================================== OpenVINO 2024.0 Model: Vehicle Detection FP16-INT8 - Device: CPU ms < Lower Is Better m6i.8xlarge . 6.88 |=========================================================== OpenVINO 2024.0 Model: Weld Porosity Detection FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 679.27 |========================================================= OpenVINO 2024.0 Model: Weld Porosity Detection FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 23.49 |========================================================== OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 3313.29 |======================================================== OpenVINO 2024.0 Model: Face Detection Retail FP16-INT8 - Device: CPU ms < Lower Is Better m6i.8xlarge . 4.82 |=========================================================== OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 330.42 |========================================================= OpenVINO 2024.0 Model: Road Segmentation ADAS FP16-INT8 - Device: CPU ms < Lower Is Better m6i.8xlarge . 24.18 |========================================================== OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 84.26 |========================================================== OpenVINO 2024.0 Model: Machine Translation EN To DE FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 94.91 |========================================================== OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 2590.00 |======================================================== OpenVINO 2024.0 Model: Weld Porosity Detection FP16-INT8 - Device: CPU ms < Lower Is Better m6i.8xlarge . 6.16 |=========================================================== OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 570.28 |========================================================= OpenVINO 2024.0 Model: Person Vehicle Bike Detection FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 14.00 |========================================================== OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 765.42 |========================================================= OpenVINO 2024.0 Model: Noise Suppression Poconet-Like FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 20.77 |========================================================== OpenVINO 2024.0 Model: Handwritten English Recognition FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 276.19 |========================================================= OpenVINO 2024.0 Model: Handwritten English Recognition FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 57.90 |========================================================== OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 936.38 |========================================================= OpenVINO 2024.0 Model: Person Re-Identification Retail FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 17.05 |========================================================== OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 17231.32 |======================================================= OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16 - Device: CPU ms < Lower Is Better m6i.8xlarge . 0.92 |=========================================================== OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 316.14 |========================================================= OpenVINO 2024.0 Model: Handwritten English Recognition FP16-INT8 - Device: CPU ms < Lower Is Better m6i.8xlarge . 50.58 |========================================================== OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU FPS > Higher Is Better m6i.8xlarge . 36965.98 |======================================================= OpenVINO 2024.0 Model: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPU ms < Lower Is Better m6i.8xlarge . 0.42 |=========================================================== ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 7.51583 |======================================================== ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 6.28573 |======================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 88.19 |========================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 64.75 |========================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 5.24137 |======================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 3.91611 |======================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 67.65 |========================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 42.86 |========================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 1.80072 |======================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 1.23548 |======================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 572.45 |========================================================= ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 257.87 |========================================================= ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 28.82 |========================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 22.87 |========================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 3.47731 |======================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 3.00865 |======================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 10.14 |========================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 8.00676 |======================================================== ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 219.80 |========================================================= ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 204.97 |=========================================================