nlp-benchmarks

m7i.2xlarge NLP benchmarking

HTML result view exported from: https://openbenchmarking.org/result/2402084-NE-2402012NE30&grs.

nlp-benchmarksProcessorMotherboardChipsetMemoryDiskNetworkOSKernelCompilerFile-SystemSystem Layerc6i.2xlargem7i-flex.2xlargeIntel Xeon Platinum 8375C (4 Cores / 8 Threads)Amazon EC2 c6i.2xlarge (1.0 BIOS)Intel 440FX 82441FX PMC1 x 16GB DDR4-3200MT/s215GB Amazon Elastic Block StoreAmazon ElasticAmazon Linux 20236.1.61-85.141.amzn2023.x86_64 (x86_64)GCC 11.4.1 20230605xfsamazonIntel Xeon Platinum 8488C (4 Cores / 8 Threads)Amazon EC2 m7i-flex.2xlarge (1.0 BIOS)1 x 32GB 4800MT/s6.1.72-96.166.amzn2023.x86_64 (x86_64)OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-amazon-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-arch_64=x86-64-v2 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details- c6i.2xlarge: CPU Microcode: 0xd0003a5- m7i-flex.2xlarge: CPU Microcode: 0x2b000571Python Details- Python 3.11.6Security Details- c6i.2xlarge: gather_data_sampling: Unknown: Dependent on hypervisor status + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT Host state unknown + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected - m7i-flex.2xlarge: gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected

nlp-benchmarksonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUonednn: Deconvolution Batch shapes_1d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - u8s8f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUpybench: Total For Average Test Timespytorch: CPU - 32 - Efficientnet_v2_lpytorch: CPU - 16 - Efficientnet_v2_lpytorch: CPU - 32 - ResNet-152pytorch: CPU - 16 - ResNet-50numpy: pytorch: CPU - 16 - ResNet-152onednn: Convolution Batch Shapes Auto - u8s8f32 - CPUpytorch: CPU - 1 - ResNet-152pytorch: CPU - 1 - Efficientnet_v2_lpytorch: CPU - 1 - ResNet-50onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Recurrent Neural Network Inference - u8s8f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUpytorch: CPU - 32 - ResNet-50c6i.2xlargem7i-flex.2xlarge33.19201.772252.0446.69866.53610.5922.26179.532.0383634.71661.7728612.661810004.044.066.3815.96374.996.366.2439110.577.9926.782501.502496.842492.298.098035.9366915.813.689938.43474.461.7635016.57241.2153.8774.201.0163273.110751.0032698.100077365.475.387.5518.84438.257.367.1946212.108.9929.892318.312382.462389.647.948785.8426118.34OpenBenchmarking.org

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUc6i.2xlargem7i-flex.2xlarge816243240SE +/- 0.00337, N = 3SE +/- 0.03984, N = 333.192003.68993MIN: 33.131. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Face Detection FP16 - Device: CPUc6i.2xlargem7i-flex.2xlarge246810SE +/- 0.01, N = 3SE +/- 0.10, N = 31.778.431. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Face Detection FP16 - Device: CPUc6i.2xlargem7i-flex.2xlarge5001000150020002500SE +/- 3.74, N = 3SE +/- 5.97, N = 32252.04474.46MIN: 2202.18 / MAX: 2317.24MIN: 427.39 / MAX: 558.541. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUc6i.2xlargem7i-flex.2xlarge1122334455SE +/- 0.01032, N = 3SE +/- 0.00827, N = 346.698601.76350MIN: 46.421. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Face Detection FP16-INT8 - Device: CPUc6i.2xlargem7i-flex.2xlarge48121620SE +/- 0.04, N = 3SE +/- 0.16, N = 36.5316.571. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Face Detection FP16-INT8 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Face Detection FP16-INT8 - Device: CPUc6i.2xlargem7i-flex.2xlarge130260390520650SE +/- 3.28, N = 3SE +/- 2.35, N = 3610.59241.21MIN: 497.95 / MAX: 643.84MIN: 91.81 / MAX: 374.941. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2023.2.devModel: Machine Translation EN To DE FP16 - Device: CPUc6i.2xlargem7i-flex.2xlarge1224364860SE +/- 0.06, N = 3SE +/- 0.19, N = 322.2653.871. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

OpenVINO

Model: Machine Translation EN To DE FP16 - Device: CPU

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2023.2.devModel: Machine Translation EN To DE FP16 - Device: CPUc6i.2xlargem7i-flex.2xlarge4080120160200SE +/- 0.44, N = 3SE +/- 0.25, N = 3179.5374.20MIN: 100.26 / MAX: 344MIN: 34.84 / MAX: 96.181. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPUc6i.2xlargem7i-flex.2xlarge0.45860.91721.37581.83442.293SE +/- 0.002826, N = 3SE +/- 0.013619, N = 152.0383601.016327MIN: 1.97MIN: 0.791. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUc6i.2xlargem7i-flex.2xlarge816243240SE +/- 0.00790, N = 3SE +/- 0.01261, N = 334.716603.11075MIN: 34.61. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPUc6i.2xlargem7i-flex.2xlarge0.39890.79781.19671.59561.9945SE +/- 0.013170, N = 3SE +/- 0.011545, N = 41.7728601.003269MIN: 1.73MIN: 0.841. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUc6i.2xlargem7i-flex.2xlarge3691215SE +/- 0.07954, N = 3SE +/- 0.01302, N = 312.661808.10007MIN: 10.8MIN: 6.71. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyBench

Total For Average Test Times

OpenBenchmarking.orgMilliseconds, Fewer Is BetterPyBench 2018-02-16Total For Average Test Timesc6i.2xlargem7i-flex.2xlarge2004006008001000SE +/- 0.33, N = 3SE +/- 3.18, N = 31000736

PyTorch

Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_lc6i.2xlargem7i-flex.2xlarge1.23082.46163.69244.92326.154SE +/- 0.02, N = 3SE +/- 0.01, N = 34.045.47MIN: 3.5 / MAX: 4.36MIN: 2.31 / MAX: 6.19

PyTorch

Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_lc6i.2xlargem7i-flex.2xlarge1.21052.4213.63154.8426.0525SE +/- 0.03, N = 3SE +/- 0.01, N = 34.065.38MIN: 3.29 / MAX: 4.32MIN: 2.12 / MAX: 6.08

PyTorch

Device: CPU - Batch Size: 32 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: ResNet-152c6i.2xlargem7i-flex.2xlarge246810SE +/- 0.01, N = 3SE +/- 0.02, N = 36.387.55MIN: 3.7 / MAX: 6.57MIN: 2.93 / MAX: 8.62

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-50c6i.2xlargem7i-flex.2xlarge510152025SE +/- 0.12, N = 3SE +/- 0.17, N = 315.9618.84MIN: 11.65 / MAX: 17.13MIN: 4.37 / MAX: 21.77

Numpy Benchmark

OpenBenchmarking.orgScore, More Is BetterNumpy Benchmarkc6i.2xlargem7i-flex.2xlarge90180270360450SE +/- 1.37, N = 3SE +/- 3.53, N = 3374.99438.25

PyTorch

Device: CPU - Batch Size: 16 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 16 - Model: ResNet-152c6i.2xlargem7i-flex.2xlarge246810SE +/- 0.05, N = 3SE +/- 0.08, N = 56.367.36MIN: 5.43 / MAX: 6.61MIN: 2.24 / MAX: 8.66

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPUc6i.2xlargem7i-flex.2xlarge246810SE +/- 0.08834, N = 3SE +/- 0.03815, N = 36.243917.19462MIN: 5.78MIN: 6.661. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-152

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-152c6i.2xlargem7i-flex.2xlarge3691215SE +/- 0.01, N = 3SE +/- 0.10, N = 310.5712.10MIN: 9.04 / MAX: 10.77MIN: 2.89 / MAX: 13.92

PyTorch

Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_lc6i.2xlargem7i-flex.2xlarge3691215SE +/- 0.02, N = 3SE +/- 0.09, N = 127.998.99MIN: 6.59 / MAX: 8.31MIN: 3.08 / MAX: 10.39

PyTorch

Device: CPU - Batch Size: 1 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 1 - Model: ResNet-50c6i.2xlargem7i-flex.2xlarge714212835SE +/- 0.13, N = 3SE +/- 0.05, N = 326.7829.89MIN: 13.67 / MAX: 27.8MIN: 7.96 / MAX: 34.56

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUc6i.2xlargem7i-flex.2xlarge5001000150020002500SE +/- 2.60, N = 3SE +/- 15.72, N = 32501.502318.31MIN: 2476.86MIN: 2205.711. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUc6i.2xlargem7i-flex.2xlarge5001000150020002500SE +/- 7.01, N = 3SE +/- 26.45, N = 42496.842382.46MIN: 2465.66MIN: 2219.691. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPUc6i.2xlargem7i-flex.2xlarge5001000150020002500SE +/- 2.42, N = 3SE +/- 24.80, N = 32492.292389.64MIN: 2460.06MIN: 2260.991. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUc6i.2xlargem7i-flex.2xlarge246810SE +/- 0.01712, N = 3SE +/- 0.08711, N = 38.098037.94878MIN: 8.02MIN: 6.791. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.3Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUc6i.2xlargem7i-flex.2xlarge1.33582.67164.00745.34326.679SE +/- 0.03862, N = 3SE +/- 0.02435, N = 35.936695.84261MIN: 5.66MIN: 51. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl

PyTorch

Device: CPU - Batch Size: 32 - Model: ResNet-50

OpenBenchmarking.orgbatches/sec, More Is BetterPyTorch 2.1Device: CPU - Batch Size: 32 - Model: ResNet-50c6i.2xlargem7i-flex.2xlarge510152025SE +/- 0.17, N = 4SE +/- 0.29, N = 1515.8118.34MIN: 9.11 / MAX: 17.08MIN: 4.11 / MAX: 22.15


Phoronix Test Suite v10.8.5