nlp-benchmarks AWS EC2 Amazon Linux 2023 Benchmarking c6i.2xlarge: Processor: Intel Xeon Platinum 8375C (4 Cores / 8 Threads), Motherboard: Amazon EC2 c6i.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 16GB DDR4-3200MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.61-85.141.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon m7i-flex.2xlarge: Processor: Intel Xeon Platinum 8488C (4 Cores / 8 Threads), Motherboard: Amazon EC2 m7i-flex.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 32GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon c7a.2xlarge: Processor: AMD EPYC 9R14 (8 Cores), Motherboard: Amazon EC2 c7a.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 16GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon r7a.xlarge: Processor: AMD EPYC 9R14 (4 Cores), Motherboard: Amazon EC2 r7a.xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 32GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 5.93669 |===================================== m7i-flex.2xlarge . 5.84261 |==================================== c7a.2xlarge ...... 7.35101 |============================================== r7a.xlarge ....... 8.22691 |=================================================== oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 12.66180 |================================================== m7i-flex.2xlarge . 8.10007 |================================ c7a.2xlarge ...... 5.01437 |==================== r7a.xlarge ....... 9.85441 |======================================= oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 8.09803 |======================================== m7i-flex.2xlarge . 7.94878 |======================================= c7a.2xlarge ...... 5.10330 |========================= r7a.xlarge ....... 10.16790 |================================================== oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 6.24391 |========================================= m7i-flex.2xlarge . 7.19462 |=============================================== c7a.2xlarge ...... 7.74619 |=================================================== r7a.xlarge ....... 7.38732 |================================================= oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2.038360 |=============================================== m7i-flex.2xlarge . 1.016327 |======================= c7a.2xlarge ...... 1.094290 |========================= r7a.xlarge ....... 2.180290 |================================================== oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 1.772860 |=================================== m7i-flex.2xlarge . 1.003269 |==================== c7a.2xlarge ...... 1.264050 |========================= r7a.xlarge ....... 2.529820 |================================================== oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2496.84 |============================================= m7i-flex.2xlarge . 2382.46 |=========================================== c7a.2xlarge ...... 1482.35 |========================== r7a.xlarge ....... 2856.84 |=================================================== oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 33.19200 |================================================== m7i-flex.2xlarge . 3.68993 |====== c7a.2xlarge ...... 3.28245 |===== r7a.xlarge ....... 5.40784 |======== oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 46.69860 |================================================== m7i-flex.2xlarge . 1.76350 |== c7a.2xlarge ...... 6.90200 |======= r7a.xlarge ....... 13.73180 |=============== oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 34.71660 |================================================== m7i-flex.2xlarge . 3.11075 |==== c7a.2xlarge ...... 3.12339 |==== r7a.xlarge ....... 6.25907 |========= oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2492.29 |============================================ m7i-flex.2xlarge . 2389.64 |=========================================== c7a.2xlarge ...... 1480.93 |========================== r7a.xlarge ....... 2857.77 |=================================================== oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2501.50 |============================================= m7i-flex.2xlarge . 2318.31 |========================================= c7a.2xlarge ...... 1478.51 |========================== r7a.xlarge ....... 2862.86 |=================================================== Numpy Benchmark Score > Higher Is Better c6i.2xlarge ...... 374.99 |================================= m7i-flex.2xlarge . 438.25 |====================================== c7a.2xlarge ...... 590.10 |==================================================== r7a.xlarge ....... 595.01 |==================================================== PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 26.78 |============================ m7i-flex.2xlarge . 29.89 |================================ c7a.2xlarge ...... 50.16 |===================================================== r7a.xlarge ....... 33.39 |=================================== PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 10.57 |============================ m7i-flex.2xlarge . 12.10 |================================ c7a.2xlarge ...... 20.00 |===================================================== r7a.xlarge ....... 13.19 |=================================== PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 15.96 |=========================== m7i-flex.2xlarge . 18.84 |=============================== c7a.2xlarge ...... 31.89 |===================================================== r7a.xlarge ....... 19.35 |================================ PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 15.81 |=========================== m7i-flex.2xlarge . 18.34 |=============================== c7a.2xlarge ...... 31.29 |===================================================== r7a.xlarge ....... 19.32 |================================= PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 6.36 |========================== m7i-flex.2xlarge . 7.36 |============================== c7a.2xlarge ...... 12.95 |===================================================== r7a.xlarge ....... 7.68 |=============================== PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 6.38 |========================== m7i-flex.2xlarge . 7.55 |=============================== c7a.2xlarge ...... 13.03 |===================================================== r7a.xlarge ....... 7.68 |=============================== PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 7.99 |==================================== m7i-flex.2xlarge . 8.99 |========================================= c7a.2xlarge ...... 11.74 |===================================================== r7a.xlarge ....... 8.44 |====================================== PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 4.06 |========================= m7i-flex.2xlarge . 5.38 |================================= c7a.2xlarge ...... 8.75 |====================================================== r7a.xlarge ....... 5.66 |=================================== PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 4.04 |========================= m7i-flex.2xlarge . 5.47 |================================== c7a.2xlarge ...... 8.71 |====================================================== r7a.xlarge ....... 5.67 |=================================== OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 1.77 |=========== m7i-flex.2xlarge . 8.43 |====================================================== c7a.2xlarge ...... 5.16 |================================= r7a.xlarge ....... 2.62 |================= OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 2252.04 |=================================================== m7i-flex.2xlarge . 474.46 |=========== c7a.2xlarge ...... 774.23 |================== r7a.xlarge ....... 764.30 |================= OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 6.53 |===================== m7i-flex.2xlarge . 16.57 |===================================================== c7a.2xlarge ...... 9.76 |=============================== r7a.xlarge ....... 4.90 |================ OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 610.59 |==================================================== m7i-flex.2xlarge . 241.21 |===================== c7a.2xlarge ...... 409.78 |=================================== r7a.xlarge ....... 408.02 |=================================== OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 22.26 |====================== m7i-flex.2xlarge . 53.87 |==================================================== c7a.2xlarge ...... 54.51 |===================================================== r7a.xlarge ....... 30.77 |============================== OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 179.53 |==================================================== m7i-flex.2xlarge . 74.20 |===================== c7a.2xlarge ...... 73.34 |===================== r7a.xlarge ....... 64.98 |=================== PyBench 2018-02-16 Total For Average Test Times Milliseconds < Lower Is Better c6i.2xlarge ...... 1000 |====================================================== m7i-flex.2xlarge . 736 |======================================== c7a.2xlarge ...... 887 |================================================ r7a.xlarge ....... 887 |================================================