nlp-benchmarks

AWS EC2 Amazon Linux 2023 Benchmarking

HTML result view exported from: https://openbenchmarking.org/result/2402103-NE-2402012NE79&sor&grs.

nlp-benchmarksProcessorMotherboardChipsetMemoryDiskNetworkOSKernelCompilerFile-SystemSystem Layerc6i.2xlargem7i-flex.2xlargec7a.2xlarger7a.xlargem7i.2xlarger7iz.xlargeIntel Xeon Platinum 8375C (4 Cores / 8 Threads)Amazon EC2 c6i.2xlarge (1.0 BIOS)Intel 440FX 82441FX PMC1 x 16GB DDR4-3200MT/s215GB Amazon Elastic Block StoreAmazon ElasticAmazon Linux 20236.1.61-85.141.amzn2023.x86_64 (x86_64)GCC 11.4.1 20230605xfsamazonIntel Xeon Platinum 8488C (4 Cores / 8 Threads)Amazon EC2 m7i-flex.2xlarge (1.0 BIOS)1 x 32GB 4800MT/s6.1.72-96.166.amzn2023.x86_64 (x86_64)AMD EPYC 9R14 (8 Cores)Amazon EC2 c7a.2xlarge (1.0 BIOS)1 x 16GB 4800MT/sAMD EPYC 9R14 (4 Cores)Amazon EC2 r7a.xlarge (1.0 BIOS)1 x 32GB 4800MT/sIntel Xeon Platinum 8488C (4 Cores / 8 Threads)Amazon EC2 m7i.2xlarge (1.0 BIOS)Intel Xeon Gold 6455B (2 Cores / 4 Threads)Amazon EC2 r7iz.xlarge (1.0 BIOS)OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-amazon-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-arch_64=x86-64-v2 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details- c6i.2xlarge: CPU Microcode: 0xd0003a5- m7i-flex.2xlarge: CPU Microcode: 0x2b000571- c7a.2xlarge: CPU Microcode: 0xa10113e- r7a.xlarge: CPU Microcode: 0xa10113e- m7i.2xlarge: CPU Microcode: 0x2b000571- r7iz.xlarge: CPU Microcode: 0x2b000571Python Details- Python 3.11.6Security Details- c6i.2xlarge: gather_data_sampling: Unknown: Dependent on hypervisor status + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT Host state unknown + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - m7i-flex.2xlarge: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - c7a.2xlarge: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - r7a.xlarge: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: disabled RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected - m7i.2xlarge: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - r7iz.xlarge: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

nlp-benchmarksopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Machine Translation EN To DE FP16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUopenvino: Machine Translation EN To DE FP16 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUpytorch: CPU - 32 - Efficientnet_v2_lpytorch: CPU - 16 - Efficientnet_v2_lonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUpytorch: CPU - 32 - ResNet-50pytorch: CPU - 32 - ResNet-152pytorch: CPU - 16 - ResNet-152pytorch: CPU - 16 - ResNet-50onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUpytorch: CPU - 1 - ResNet-152pytorch: CPU - 1 - ResNet-50onednn: Convolution Batch Shapes Auto - u8s8f32 - CPUnumpy: pytorch: CPU - 1 - Efficientnet_v2_lpybench: Total For Average Test Timesonednn: Convolution Batch Shapes Auto - f32 - CPUc6i.2xlargem7i-flex.2xlargec7a.2xlarger7a.xlargem7i.2xlarger7iz.xlarge2252.041.776.53610.59179.5346.698612.66181.7728622.268.098034.044.062.038362501.502492.292496.8415.816.386.3615.9634.716633.192010.5726.786.24391374.997.9910005.93669474.468.4316.57241.2174.201.763508.100071.00326953.877.948785.475.381.0163272318.312389.642382.4618.347.557.3618.843.110753.6899312.1029.897.19462438.258.997365.84261774.235.169.76409.7873.346.902005.014371.2640554.515.103308.718.751.094291478.511480.931482.3531.2913.0312.9531.893.123393.2824520.0050.167.74619590.1011.748877.35101764.302.624.9408.0264.9813.73189.854412.5298230.7710.16795.675.662.180292862.862857.772856.8419.327.687.6819.356.259075.4078413.1933.397.38732595.018.448878.22691511.547.8014.51275.6580.941.793608.617181.0660249.398.344625.285.261.030112303.442310.902320.9419.437.667.6519.623.207923.3761012.4831.366.61974452.508.848155.88019341.085.8610.72186.6458.032.3770511.19731.4572634.4511.24934.634.631.427363062.843064.813062.6915.306.396.3715.924.093114.9940311.0727.4310.1933554.288.176917.66745OpenBenchmarking.org

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Face Detection FP16 - Device: CPUr7iz.xlargem7i-flex.2xlargem7i.2xlarger7a.xlargec7a.2xlargec6i.2xlarge5001000150020002500SE +/- 0.94, N = 3SE +/- 5.97, N = 3SE +/- 2.36, N = 3SE +/- 0.15, N = 3SE +/- 0.28, N = 3SE +/- 3.74, N = 3341.08474.46511.54764.30774.232252.04MIN: 269.26 / MAX: 532.11MIN: 427.39 / MAX: 558.54MIN: 337.11 / MAX: 534.54MIN: 761.82 / MAX: 783.23MIN: 767.78 / MAX: 795.49MIN: 2202.18 / MAX: 2317.241. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Face Detection FP16 - Device: CPUm7i-flex.2xlargem7i.2xlarger7iz.xlargec7a.2xlarger7a.xlargec6i.2xlarge246810SE +/- 0.10, N = 3SE +/- 0.04, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 3SE +/- 0.00, N = 3SE +/- 0.01, N = 38.437.805.865.162.621.771. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Face Detection FP16-INT8 - Device: CPUm7i-flex.2xlargem7i.2xlarger7iz.xlargec7a.2xlargec6i.2xlarger7a.xlarge48121620SE +/- 0.16, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.00, N = 3SE +/- 0.04, N = 3SE +/- 0.00, N = 316.5714.5110.729.766.534.901. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Face Detection FP16-INT8 - Device: CPUr7iz.xlargem7i-flex.2xlargem7i.2xlarger7a.xlargec7a.2xlargec6i.2xlarge130260390520650SE +/- 0.12, N = 3SE +/- 2.35, N = 3SE +/- 0.26, N = 3SE +/- 0.14, N = 3SE +/- 0.07, N = 3SE +/- 3.28, N = 3186.64241.21275.65408.02409.78610.59MIN: 178.68 / MAX: 210.7MIN: 91.81 / MAX: 374.94MIN: 267.85 / MAX: 299.94MIN: 406.22 / MAX: 426.04MIN: 407.35 / MAX: 424.19MIN: 497.95 / MAX: 643.841. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Machine Translation EN To DE FP16 - Device: CPUr7iz.xlarger7a.xlargec7a.2xlargem7i-flex.2xlargem7i.2xlargec6i.2xlarge4080120160200SE +/- 0.05, N = 3SE +/- 0.40, N = 3SE +/- 0.02, N = 3SE +/- 0.25, N = 3SE +/- 0.04, N = 3SE +/- 0.44, N = 358.0364.9873.3474.2080.94179.53MIN: 52.43 / MAX: 74.85MIN: 61.57 / MAX: 90.41MIN: 65.17 / MAX: 80.62MIN: 34.84 / MAX: 96.18MIN: 69.04 / MAX: 153.39MIN: 100.26 / MAX: 3441. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUm7i-flex.2xlargem7i.2xlarger7iz.xlargec7a.2xlarger7a.xlargec6i.2xlarge1122334455SE +/- 0.00827, N = 3SE +/- 0.00405, N = 3SE +/- 0.00607, N = 3SE +/- 0.00810, N = 3SE +/- 0.02961, N = 3SE +/- 0.01032, N = 31.763501.793602.377056.9020013.7318046.69860MIN: 13.51MIN: 46.421. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUc7a.2xlargem7i-flex.2xlargem7i.2xlarger7a.xlarger7iz.xlargec6i.2xlarge3691215SE +/- 0.00229, N = 3SE +/- 0.01302, N = 3SE +/- 0.08548, N = 12SE +/- 0.00467, N = 3SE +/- 0.00848, N = 3SE +/- 0.07954, N = 35.014378.100078.617189.8544111.1973012.66180MIN: 4.93MIN: 6.7MIN: 7.87MIN: 9.75MIN: 10.84MIN: 10.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUm7i-flex.2xlargem7i.2xlargec7a.2xlarger7iz.xlargec6i.2xlarger7a.xlarge0.56921.13841.70762.27682.846SE +/- 0.011545, N = 4SE +/- 0.003802, N = 3SE +/- 0.004187, N = 3SE +/- 0.001934, N = 3SE +/- 0.013170, N = 3SE +/- 0.006412, N = 31.0032691.0660201.2640501.4572601.7728602.529820MIN: 0.84MIN: 1MIN: 1.25MIN: 1.41MIN: 1.73MIN: 2.51. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Machine Translation EN To DE FP16 - Device: CPUc7a.2xlargem7i-flex.2xlargem7i.2xlarger7iz.xlarger7a.xlargec6i.2xlarge1224364860SE +/- 0.02, N = 3SE +/- 0.19, N = 3SE +/- 0.02, N = 3SE +/- 0.03, N = 3SE +/- 0.18, N = 3SE +/- 0.06, N = 354.5153.8749.3934.4530.7722.261. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUc7a.2xlargem7i-flex.2xlargec6i.2xlargem7i.2xlarger7a.xlarger7iz.xlarge3691215SE +/- 0.00261, N = 3SE +/- 0.08711, N = 3SE +/- 0.01712, N = 3SE +/- 0.02786, N = 3SE +/- 0.01406, N = 3SE +/- 0.03556, N = 35.103307.948788.098038.3446210.1679011.24930MIN: 5.05MIN: 6.79MIN: 8.02MIN: 8.05MIN: 10.11MIN: 10.981. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyTorch

Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_lc7a.2xlarger7a.xlargem7i-flex.2xlargem7i.2xlarger7iz.xlargec6i.2xlarge246810SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 38.715.675.475.284.634.04MIN: 5.53 / MAX: 8.82MIN: 4.37 / MAX: 5.73MIN: 2.31 / MAX: 6.19MIN: 4.01 / MAX: 5.41MIN: 4.28 / MAX: 4.69MIN: 3.5 / MAX: 4.36

PyTorch

Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_lc7a.2xlarger7a.xlargem7i-flex.2xlargem7i.2xlarger7iz.xlargec6i.2xlarge246810SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.03, N = 38.755.665.385.264.634.06MIN: 5.72 / MAX: 8.84MIN: 4.27 / MAX: 5.72MIN: 2.12 / MAX: 6.08MIN: 3.36 / MAX: 5.38MIN: 3.17 / MAX: 4.69MIN: 3.29 / MAX: 4.32

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUm7i-flex.2xlargem7i.2xlargec7a.2xlarger7iz.xlargec6i.2xlarger7a.xlarge0.49060.98121.47181.96242.453SE +/- 0.013619, N = 15SE +/- 0.000901, N = 3SE +/- 0.002971, N = 3SE +/- 0.003845, N = 3SE +/- 0.002826, N = 3SE +/- 0.000469, N = 31.0163271.0301101.0942901.4273602.0383602.180290MIN: 0.79MIN: 0.94MIN: 1.08MIN: 1.35MIN: 1.97MIN: 2.151. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUc7a.2xlargem7i.2xlargem7i-flex.2xlargec6i.2xlarger7a.xlarger7iz.xlarge7001400210028003500SE +/- 1.00, N = 3SE +/- 1.63, N = 3SE +/- 15.72, N = 3SE +/- 2.60, N = 3SE +/- 3.75, N = 3SE +/- 2.69, N = 31478.512303.442318.312501.502862.863062.84MIN: 1472.49MIN: 2276.46MIN: 2205.71MIN: 2476.86MIN: 2849.87MIN: 3033.81. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUc7a.2xlargem7i.2xlargem7i-flex.2xlargec6i.2xlarger7a.xlarger7iz.xlarge7001400210028003500SE +/- 2.09, N = 3SE +/- 2.59, N = 3SE +/- 24.80, N = 3SE +/- 2.42, N = 3SE +/- 1.65, N = 3SE +/- 9.60, N = 31480.932310.902389.642492.292857.773064.81MIN: 1474.24MIN: 2291.18MIN: 2260.99MIN: 2460.06MIN: 2848.47MIN: 3031.61. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUc7a.2xlargem7i.2xlargem7i-flex.2xlargec6i.2xlarger7a.xlarger7iz.xlarge7001400210028003500SE +/- 1.52, N = 3SE +/- 7.66, N = 3SE +/- 26.45, N = 4SE +/- 7.01, N = 3SE +/- 2.56, N = 3SE +/- 2.90, N = 31482.352320.942382.462496.842856.843062.69MIN: 1476.16MIN: 2290.27MIN: 2219.69MIN: 2465.66MIN: 2845.85MIN: 3036.531. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyTorch

Device: CPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: ResNet-50c7a.2xlargem7i.2xlarger7a.xlargem7i-flex.2xlargec6i.2xlarger7iz.xlarge714212835SE +/- 0.42, N = 15SE +/- 0.21, N = 5SE +/- 0.15, N = 10SE +/- 0.29, N = 15SE +/- 0.17, N = 4SE +/- 0.21, N = 1531.2919.4319.3218.3415.8115.30MIN: 20.73 / MAX: 33.27MIN: 13.34 / MAX: 20.17MIN: 13.74 / MAX: 19.98MIN: 4.11 / MAX: 22.15MIN: 9.11 / MAX: 17.08MIN: 10.53 / MAX: 16.18

PyTorch

Device: CPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: ResNet-152c7a.2xlarger7a.xlargem7i.2xlargem7i-flex.2xlarger7iz.xlargec6i.2xlarge3691215SE +/- 0.05, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 3SE +/- 0.01, N = 313.037.687.667.556.396.38MIN: 10.5 / MAX: 13.2MIN: 6.31 / MAX: 7.76MIN: 4.36 / MAX: 7.82MIN: 2.93 / MAX: 8.62MIN: 5.8 / MAX: 6.46MIN: 3.7 / MAX: 6.57

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-152c7a.2xlarger7a.xlargem7i.2xlargem7i-flex.2xlarger7iz.xlargec6i.2xlarge3691215SE +/- 0.09, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 3SE +/- 0.08, N = 5SE +/- 0.01, N = 3SE +/- 0.05, N = 312.957.687.657.366.376.36MIN: 4.2 / MAX: 13.21MIN: 6.45 / MAX: 7.76MIN: 6.27 / MAX: 7.82MIN: 2.24 / MAX: 8.66MIN: 5.79 / MAX: 6.45MIN: 5.43 / MAX: 6.61

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-50c7a.2xlargem7i.2xlarger7a.xlargem7i-flex.2xlargec6i.2xlarger7iz.xlarge714212835SE +/- 0.24, N = 3SE +/- 0.08, N = 3SE +/- 0.18, N = 3SE +/- 0.17, N = 3SE +/- 0.12, N = 3SE +/- 0.04, N = 331.8919.6219.3518.8415.9615.92MIN: 23.54 / MAX: 32.58MIN: 15.92 / MAX: 20.23MIN: 15.18 / MAX: 19.82MIN: 4.37 / MAX: 21.77MIN: 11.65 / MAX: 17.13MIN: 14.8 / MAX: 16.24

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUm7i-flex.2xlargec7a.2xlargem7i.2xlarger7iz.xlarger7a.xlargec6i.2xlarge816243240SE +/- 0.01261, N = 3SE +/- 0.00414, N = 3SE +/- 0.00738, N = 3SE +/- 0.02647, N = 3SE +/- 0.01228, N = 3SE +/- 0.00790, N = 33.110753.123393.207924.093116.2590734.71660MIN: 6.18MIN: 34.61. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUc7a.2xlargem7i.2xlargem7i-flex.2xlarger7iz.xlarger7a.xlargec6i.2xlarge816243240SE +/- 0.00275, N = 3SE +/- 0.00955, N = 3SE +/- 0.03984, N = 3SE +/- 0.00443, N = 3SE +/- 0.02244, N = 3SE +/- 0.00337, N = 33.282453.376103.689934.994035.4078433.19200MIN: 5.32MIN: 33.131. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152c7a.2xlarger7a.xlargem7i.2xlargem7i-flex.2xlarger7iz.xlargec6i.2xlarge510152025SE +/- 0.05, N = 3SE +/- 0.02, N = 3SE +/- 0.09, N = 3SE +/- 0.10, N = 3SE +/- 0.02, N = 3SE +/- 0.01, N = 320.0013.1912.4812.1011.0710.57MIN: 15.6 / MAX: 20.36MIN: 10.96 / MAX: 13.33MIN: 8.7 / MAX: 12.92MIN: 2.89 / MAX: 13.92MIN: 9.42 / MAX: 11.22MIN: 9.04 / MAX: 10.77

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50c7a.2xlarger7a.xlargem7i.2xlargem7i-flex.2xlarger7iz.xlargec6i.2xlarge1122334455SE +/- 0.32, N = 3SE +/- 0.04, N = 3SE +/- 0.15, N = 3SE +/- 0.05, N = 3SE +/- 0.09, N = 3SE +/- 0.13, N = 350.1633.3931.3629.8927.4326.78MIN: 33.27 / MAX: 51.43MIN: 24.56 / MAX: 33.78MIN: 26.54 / MAX: 32.62MIN: 7.96 / MAX: 34.56MIN: 21.64 / MAX: 28.06MIN: 13.67 / MAX: 27.8

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUc6i.2xlargem7i.2xlargem7i-flex.2xlarger7a.xlargec7a.2xlarger7iz.xlarge3691215SE +/- 0.08834, N = 3SE +/- 0.02083, N = 3SE +/- 0.03815, N = 3SE +/- 0.01020, N = 3SE +/- 0.01540, N = 3SE +/- 0.00372, N = 36.243916.619747.194627.387327.7461910.19330MIN: 5.78MIN: 6.24MIN: 6.66MIN: 7.27MIN: 7.55MIN: 9.931. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy Benchmarkr7a.xlargec7a.2xlarger7iz.xlargem7i.2xlargem7i-flex.2xlargec6i.2xlarge130260390520650SE +/- 0.90, N = 3SE +/- 1.02, N = 3SE +/- 0.21, N = 3SE +/- 0.98, N = 3SE +/- 3.53, N = 3SE +/- 1.37, N = 3595.01590.10554.28452.50438.25374.99

PyTorch

Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_lc7a.2xlargem7i-flex.2xlargem7i.2xlarger7a.xlarger7iz.xlargec6i.2xlarge3691215SE +/- 0.02, N = 3SE +/- 0.09, N = 12SE +/- 0.04, N = 3SE +/- 0.01, N = 3SE +/- 0.02, N = 3SE +/- 0.02, N = 311.748.998.848.448.177.99MIN: 8.95 / MAX: 11.9MIN: 3.08 / MAX: 10.39MIN: 7.72 / MAX: 9.09MIN: 2.94 / MAX: 8.53MIN: 1.5 / MAX: 8.27MIN: 6.59 / MAX: 8.31

PyBench

Total For Average Test Times

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyBench 2018-02-16Total For Average Test Timesr7iz.xlargem7i-flex.2xlargem7i.2xlargec7a.2xlarger7a.xlargec6i.2xlarge2004006008001000SE +/- 1.00, N = 3SE +/- 3.18, N = 3SE +/- 2.33, N = 3SE +/- 0.67, N = 3SE +/- 1.33, N = 3SE +/- 0.33, N = 36917368158878871000

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUm7i-flex.2xlargem7i.2xlargec6i.2xlargec7a.2xlarger7iz.xlarger7a.xlarge246810SE +/- 0.02435, N = 3SE +/- 0.00970, N = 3SE +/- 0.03862, N = 3SE +/- 0.01338, N = 3SE +/- 0.00852, N = 3SE +/- 0.00015, N = 35.842615.880195.936697.351017.667458.22691MIN: 5MIN: 5.62MIN: 5.66MIN: 7.22MIN: 7.48MIN: 8.11. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl


Phoronix Test Suite v10.8.5