m6i.8xlarge amazon testing on Ubuntu 22.04 via the Phoronix Test Suite. m6i.8xlarge: Processor: Intel Xeon Platinum 8375C (16 Cores / 32 Threads), Motherboard: Amazon EC2 m6i.8xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 128 GB DDR4-3200MT/s, Disk: 322GB Amazon Elastic Block Store, Graphics: EFI VGA, Network: Amazon Elastic OS: Ubuntu 22.04, Kernel: 6.5.0-1017-aws (x86_64), Vulkan: 1.3.255, Compiler: GCC 11.4.0, File-System: ext4, Screen Resolution: 800x600, System Layer: amazon ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 128.29 |========================================================= ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 152.48 |========================================================= ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 11.43 |========================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 13.52 |========================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 193.01 |========================================================= ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 219.93 |========================================================= ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 15.07 |========================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 18.45 |========================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 557.47 |========================================================= ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 662.60 |========================================================= ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 1.83424 |======================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 2.99347 |======================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 35.19 |========================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 35.39 |========================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 254.41 |========================================================= ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 279.86 |========================================================= ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 97.76 |========================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 139.17 |========================================================= ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better m6i.8xlarge . 4.47456 |======================================================== ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better m6i.8xlarge . 4.86425 |======================================================== Llama.cpp b3067 Model: Meta-Llama-3-8B-Instruct-Q8_0.gguf Tokens Per Second > Higher Is Better m6i.8xlarge . 11.73 |========================================================== ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 7.78959 |======================================================== ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 6.60049 |======================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 87.52 |========================================================== ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 75.99 |========================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 5.18002 |======================================================== ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 4.61890 |======================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 66.37 |========================================================== ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 55.88 |========================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 1.79240 |======================================================== ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 1.54130 |======================================================== ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 546.06 |========================================================= ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 357.87 |========================================================= ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 28.42 |========================================================== ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 28.25 |========================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 3.92952 |======================================================== ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 3.58213 |======================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 10.23 |========================================================== ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 7.56041 |======================================================== ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 223.50 |========================================================= ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better m6i.8xlarge . 205.58 |========================================================= Whisper.cpp 1.6.2 Model: ggml-base.en - Input: 2016 State of the Union Seconds < Lower Is Better m6i.8xlarge . 134.98 |========================================================= Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union Seconds < Lower Is Better m6i.8xlarge . 350.88 |========================================================= Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union Seconds < Lower Is Better m6i.8xlarge . 994.88 |=========================================================