nlp-benchmarks AWS EC2 Amazon Linux 2023 Benchmarking c6i.2xlarge: Processor: Intel Xeon Platinum 8375C (4 Cores / 8 Threads), Motherboard: Amazon EC2 c6i.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 16GB DDR4-3200MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.61-85.141.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon m7i-flex.2xlarge: Processor: Intel Xeon Platinum 8488C (4 Cores / 8 Threads), Motherboard: Amazon EC2 m7i-flex.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 32GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon c7a.2xlarge: Processor: AMD EPYC 9R14 (8 Cores), Motherboard: Amazon EC2 c7a.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 16GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 1.77 |=========== m7i-flex.2xlarge . 8.43 |====================================================== c7a.2xlarge ...... 5.16 |================================= OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 2252.04 |=================================================== m7i-flex.2xlarge . 474.46 |=========== c7a.2xlarge ...... 774.23 |================== oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 46.69860 |================================================== m7i-flex.2xlarge . 1.76350 |== c7a.2xlarge ...... 6.90200 |======= OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 6.53 |===================== m7i-flex.2xlarge . 16.57 |===================================================== c7a.2xlarge ...... 9.76 |=============================== OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 610.59 |==================================================== m7i-flex.2xlarge . 241.21 |===================== c7a.2xlarge ...... 409.78 |=================================== oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 12.66180 |================================================== m7i-flex.2xlarge . 8.10007 |================================ c7a.2xlarge ...... 5.01437 |==================== OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 22.26 |====================== m7i-flex.2xlarge . 53.87 |==================================================== c7a.2xlarge ...... 54.51 |===================================================== OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 179.53 |==================================================== m7i-flex.2xlarge . 74.20 |===================== c7a.2xlarge ...... 73.34 |===================== PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 4.04 |========================= m7i-flex.2xlarge . 5.47 |================================== c7a.2xlarge ...... 8.71 |====================================================== PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 4.06 |========================= m7i-flex.2xlarge . 5.38 |================================= c7a.2xlarge ...... 8.75 |====================================================== PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 6.38 |========================== m7i-flex.2xlarge . 7.55 |=============================== c7a.2xlarge ...... 13.03 |===================================================== PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 6.36 |========================== m7i-flex.2xlarge . 7.36 |============================== c7a.2xlarge ...... 12.95 |===================================================== oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2.038360 |================================================== m7i-flex.2xlarge . 1.016327 |========================= c7a.2xlarge ...... 1.094290 |=========================== oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 34.71660 |================================================== m7i-flex.2xlarge . 3.11075 |==== c7a.2xlarge ...... 3.12339 |==== oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 33.19200 |================================================== m7i-flex.2xlarge . 3.68993 |====== c7a.2xlarge ...... 3.28245 |===== PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 15.96 |=========================== m7i-flex.2xlarge . 18.84 |=============================== c7a.2xlarge ...... 31.89 |===================================================== PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 10.57 |============================ m7i-flex.2xlarge . 12.10 |================================ c7a.2xlarge ...... 20.00 |===================================================== PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 26.78 |============================ m7i-flex.2xlarge . 29.89 |================================ c7a.2xlarge ...... 50.16 |===================================================== oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 1.772860 |================================================== m7i-flex.2xlarge . 1.003269 |============================ c7a.2xlarge ...... 1.264050 |==================================== oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2501.50 |=================================================== m7i-flex.2xlarge . 2318.31 |=============================================== c7a.2xlarge ...... 1478.51 |============================== oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2496.84 |=================================================== m7i-flex.2xlarge . 2382.46 |================================================= c7a.2xlarge ...... 1482.35 |============================== oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2492.29 |=================================================== m7i-flex.2xlarge . 2389.64 |================================================= c7a.2xlarge ...... 1480.93 |============================== oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 8.09803 |=================================================== m7i-flex.2xlarge . 7.94878 |================================================== c7a.2xlarge ...... 5.10330 |================================ Numpy Benchmark Score > Higher Is Better c6i.2xlarge ...... 374.99 |================================= m7i-flex.2xlarge . 438.25 |======================================= c7a.2xlarge ...... 590.10 |==================================================== PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 7.99 |==================================== m7i-flex.2xlarge . 8.99 |========================================= c7a.2xlarge ...... 11.74 |===================================================== PyBench 2018-02-16 Total For Average Test Times Milliseconds < Lower Is Better c6i.2xlarge ...... 1000 |====================================================== m7i-flex.2xlarge . 736 |======================================== c7a.2xlarge ...... 887 |================================================ oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 5.93669 |========================================= m7i-flex.2xlarge . 5.84261 |========================================= c7a.2xlarge ...... 7.35101 |=================================================== oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 6.24391 |========================================= m7i-flex.2xlarge . 7.19462 |=============================================== c7a.2xlarge ...... 7.74619 |=================================================== PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 15.81 |=========================== m7i-flex.2xlarge . 18.34 |=============================== c7a.2xlarge ...... 31.29 |=====================================================