nlp-benchmarks AWS EC2 Amazon Linux 2023 Benchmarking c6i.2xlarge: Processor: Intel Xeon Platinum 8375C (4 Cores / 8 Threads), Motherboard: Amazon EC2 c6i.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 16GB DDR4-3200MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.61-85.141.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon m7i-flex.2xlarge: Processor: Intel Xeon Platinum 8488C (4 Cores / 8 Threads), Motherboard: Amazon EC2 m7i-flex.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 32GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon c7a.2xlarge: Processor: AMD EPYC 9R14 (8 Cores), Motherboard: Amazon EC2 c7a.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 16GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon r7a.xlarge: Processor: AMD EPYC 9R14 (4 Cores), Motherboard: Amazon EC2 r7a.xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 32GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon m7i.2xlarge: Processor: Intel Xeon Platinum 8488C (4 Cores / 8 Threads), Motherboard: Amazon EC2 m7i.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 32GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon r7iz.xlarge: Processor: Intel Xeon Gold 6455B (2 Cores / 4 Threads), Motherboard: Amazon EC2 r7iz.xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 32GB 4800MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Amazon Linux 2023, Kernel: 6.1.72-96.166.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 4.06 |========================= m7i-flex.2xlarge . 5.38 |================================= c7a.2xlarge ...... 8.75 |====================================================== r7a.xlarge ....... 5.66 |=================================== m7i.2xlarge ...... 5.26 |================================ r7iz.xlarge ...... 4.63 |============================= PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 4.04 |========================= m7i-flex.2xlarge . 5.47 |================================== c7a.2xlarge ...... 8.71 |====================================================== r7a.xlarge ....... 5.67 |=================================== m7i.2xlarge ...... 5.28 |================================= r7iz.xlarge ...... 4.63 |============================= PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 15.81 |=========================== m7i-flex.2xlarge . 18.34 |=============================== c7a.2xlarge ...... 31.29 |===================================================== r7a.xlarge ....... 19.32 |================================= m7i.2xlarge ...... 19.43 |================================= r7iz.xlarge ...... 15.30 |========================== PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 6.36 |========================== m7i-flex.2xlarge . 7.36 |============================== c7a.2xlarge ...... 12.95 |===================================================== r7a.xlarge ....... 7.68 |=============================== m7i.2xlarge ...... 7.65 |=============================== r7iz.xlarge ...... 6.37 |========================== PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 6.38 |========================== m7i-flex.2xlarge . 7.55 |=============================== c7a.2xlarge ...... 13.03 |===================================================== r7a.xlarge ....... 7.68 |=============================== m7i.2xlarge ...... 7.66 |=============================== r7iz.xlarge ...... 6.39 |========================== PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l batches/sec > Higher Is Better c6i.2xlarge ...... 7.99 |==================================== m7i-flex.2xlarge . 8.99 |========================================= c7a.2xlarge ...... 11.74 |===================================================== r7a.xlarge ....... 8.44 |====================================== m7i.2xlarge ...... 8.84 |======================================== r7iz.xlarge ...... 8.17 |===================================== Numpy Benchmark Score > Higher Is Better c6i.2xlarge ...... 374.99 |================================= m7i-flex.2xlarge . 438.25 |====================================== c7a.2xlarge ...... 590.10 |==================================================== r7a.xlarge ....... 595.01 |==================================================== m7i.2xlarge ...... 452.50 |======================================== r7iz.xlarge ...... 554.28 |================================================ PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 15.96 |=========================== m7i-flex.2xlarge . 18.84 |=============================== c7a.2xlarge ...... 31.89 |===================================================== r7a.xlarge ....... 19.35 |================================ m7i.2xlarge ...... 19.62 |================================= r7iz.xlarge ...... 15.92 |========================== PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 batches/sec > Higher Is Better c6i.2xlarge ...... 10.57 |============================ m7i-flex.2xlarge . 12.10 |================================ c7a.2xlarge ...... 20.00 |===================================================== r7a.xlarge ....... 13.19 |=================================== m7i.2xlarge ...... 12.48 |================================= r7iz.xlarge ...... 11.07 |============================= oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2496.84 |========================================== m7i-flex.2xlarge . 2382.46 |======================================== c7a.2xlarge ...... 1482.35 |========================= r7a.xlarge ....... 2856.84 |================================================ m7i.2xlarge ...... 2320.94 |======================================= r7iz.xlarge ...... 3062.69 |=================================================== oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2492.29 |========================================= m7i-flex.2xlarge . 2389.64 |======================================== c7a.2xlarge ...... 1480.93 |========================= r7a.xlarge ....... 2857.77 |================================================ m7i.2xlarge ...... 2310.90 |====================================== r7iz.xlarge ...... 3064.81 |=================================================== oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2501.50 |========================================== m7i-flex.2xlarge . 2318.31 |======================================= c7a.2xlarge ...... 1478.51 |========================= r7a.xlarge ....... 2862.86 |================================================ m7i.2xlarge ...... 2303.44 |====================================== r7iz.xlarge ...... 3062.84 |=================================================== OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 2252.04 |=================================================== m7i-flex.2xlarge . 474.46 |=========== c7a.2xlarge ...... 774.23 |================== r7a.xlarge ....... 764.30 |================= m7i.2xlarge ...... 511.54 |============ r7iz.xlarge ...... 341.08 |======== OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 1.77 |=========== m7i-flex.2xlarge . 8.43 |====================================================== c7a.2xlarge ...... 5.16 |================================= r7a.xlarge ....... 2.62 |================= m7i.2xlarge ...... 7.80 |================================================== r7iz.xlarge ...... 5.86 |====================================== OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 610.59 |==================================================== m7i-flex.2xlarge . 241.21 |===================== c7a.2xlarge ...... 409.78 |=================================== r7a.xlarge ....... 408.02 |=================================== m7i.2xlarge ...... 275.65 |======================= r7iz.xlarge ...... 186.64 |================ OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 6.53 |===================== m7i-flex.2xlarge . 16.57 |===================================================== c7a.2xlarge ...... 9.76 |=============================== r7a.xlarge ....... 4.90 |================ m7i.2xlarge ...... 14.51 |============================================== r7iz.xlarge ...... 10.72 |================================== OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU ms < Lower Is Better c6i.2xlarge ...... 179.53 |==================================================== m7i-flex.2xlarge . 74.20 |===================== c7a.2xlarge ...... 73.34 |===================== r7a.xlarge ....... 64.98 |=================== m7i.2xlarge ...... 80.94 |======================= r7iz.xlarge ...... 58.03 |================= OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU FPS > Higher Is Better c6i.2xlarge ...... 22.26 |====================== m7i-flex.2xlarge . 53.87 |==================================================== c7a.2xlarge ...... 54.51 |===================================================== r7a.xlarge ....... 30.77 |============================== m7i.2xlarge ...... 49.39 |================================================ r7iz.xlarge ...... 34.45 |================================= PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 batches/sec > Higher Is Better c6i.2xlarge ...... 26.78 |============================ m7i-flex.2xlarge . 29.89 |================================ c7a.2xlarge ...... 50.16 |===================================================== r7a.xlarge ....... 33.39 |=================================== m7i.2xlarge ...... 31.36 |================================= r7iz.xlarge ...... 27.43 |============================= oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 2.038360 |=============================================== m7i-flex.2xlarge . 1.016327 |======================= c7a.2xlarge ...... 1.094290 |========================= r7a.xlarge ....... 2.180290 |================================================== m7i.2xlarge ...... 1.030110 |======================== r7iz.xlarge ...... 1.427360 |================================= oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 12.66180 |================================================== m7i-flex.2xlarge . 8.10007 |================================ c7a.2xlarge ...... 5.01437 |==================== r7a.xlarge ....... 9.85441 |======================================= m7i.2xlarge ...... 8.61718 |================================== r7iz.xlarge ...... 11.19730 |============================================ oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 46.69860 |================================================== m7i-flex.2xlarge . 1.76350 |== c7a.2xlarge ...... 6.90200 |======= r7a.xlarge ....... 13.73180 |=============== m7i.2xlarge ...... 1.79360 |== r7iz.xlarge ...... 2.37705 |=== PyBench 2018-02-16 Total For Average Test Times Milliseconds < Lower Is Better c6i.2xlarge ...... 1000 |====================================================== m7i-flex.2xlarge . 736 |======================================== c7a.2xlarge ...... 887 |================================================ r7a.xlarge ....... 887 |================================================ m7i.2xlarge ...... 815 |============================================ r7iz.xlarge ...... 691 |===================================== oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 5.93669 |===================================== m7i-flex.2xlarge . 5.84261 |==================================== c7a.2xlarge ...... 7.35101 |============================================== r7a.xlarge ....... 8.22691 |=================================================== m7i.2xlarge ...... 5.88019 |==================================== r7iz.xlarge ...... 7.66745 |================================================ oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 6.24391 |=============================== m7i-flex.2xlarge . 7.19462 |=================================== c7a.2xlarge ...... 7.74619 |====================================== r7a.xlarge ....... 7.38732 |==================================== m7i.2xlarge ...... 6.61974 |================================ r7iz.xlarge ...... 10.19330 |================================================== oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 33.19200 |================================================== m7i-flex.2xlarge . 3.68993 |====== c7a.2xlarge ...... 3.28245 |===== r7a.xlarge ....... 5.40784 |======== m7i.2xlarge ...... 3.37610 |===== r7iz.xlarge ...... 4.99403 |======== oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 1.772860 |=================================== m7i-flex.2xlarge . 1.003269 |==================== c7a.2xlarge ...... 1.264050 |========================= r7a.xlarge ....... 2.529820 |================================================== m7i.2xlarge ...... 1.066020 |===================== r7iz.xlarge ...... 1.457260 |============================= oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 34.71660 |================================================== m7i-flex.2xlarge . 3.11075 |==== c7a.2xlarge ...... 3.12339 |==== r7a.xlarge ....... 6.25907 |========= m7i.2xlarge ...... 3.20792 |===== r7iz.xlarge ...... 4.09311 |====== oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU ms < Lower Is Better c6i.2xlarge ...... 8.09803 |==================================== m7i-flex.2xlarge . 7.94878 |=================================== c7a.2xlarge ...... 5.10330 |======================= r7a.xlarge ....... 10.16790 |============================================= m7i.2xlarge ...... 8.34462 |===================================== r7iz.xlarge ...... 11.24930 |==================================================