nlp-benchmarks c6i.2xlarge NLP benchmarking on AL2023
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2402012-NE-NLPBENCHM92 c6i.2xlarge Processor: Intel Xeon Platinum 8375C (4 Cores / 8 Threads), Motherboard: Amazon EC2 c6i.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 16GB DDR4-3200MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic
OS: Amazon Linux 2023, Kernel: 6.1.61-85.141.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-amazon-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-arch_64=x86-64-v2 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driverProcessor Notes: CPU Microcode: 0xd0003a5Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Unknown: Dependent on hypervisor status + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT Host state unknown + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
oneDNN This is a test of the Intel oneDNN as an Intel-optimized library for Deep Neural Networks and making use of its built-in benchdnn functionality. The result is the total perf time reported. Intel oneDNN was formerly known as DNNL (Deep Neural Network Library) and MKL-DNN before being rebranded as part of the Intel oneAPI toolkit. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU c6i.2xlarge 1.3358 2.6716 4.0074 5.3432 6.679 SE +/- 0.03862, N = 3 5.93669 MIN: 5.66 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU c6i.2xlarge 3 6 9 12 15 SE +/- 0.08, N = 3 12.66 MIN: 10.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU c6i.2xlarge 2 4 6 8 10 SE +/- 0.01712, N = 3 8.09803 MIN: 8.02 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU c6i.2xlarge 2 4 6 8 10 SE +/- 0.08834, N = 3 6.24391 MIN: 5.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU c6i.2xlarge 0.4586 0.9172 1.3758 1.8344 2.293 SE +/- 0.00283, N = 3 2.03836 MIN: 1.97 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU c6i.2xlarge 0.3989 0.7978 1.1967 1.5956 1.9945 SE +/- 0.01317, N = 3 1.77286 MIN: 1.73 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU c6i.2xlarge 500 1000 1500 2000 2500 SE +/- 7.01, N = 3 2496.84 MIN: 2465.66 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU c6i.2xlarge 8 16 24 32 40 SE +/- 0.00, N = 3 33.19 MIN: 33.13 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU c6i.2xlarge 11 22 33 44 55 SE +/- 0.01, N = 3 46.70 MIN: 46.42 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU c6i.2xlarge 8 16 24 32 40 SE +/- 0.01, N = 3 34.72 MIN: 34.6 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU c6i.2xlarge 500 1000 1500 2000 2500 SE +/- 2.42, N = 3 2492.29 MIN: 2460.06 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.3 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU c6i.2xlarge 500 1000 1500 2000 2500 SE +/- 2.60, N = 3 2501.50 MIN: 2476.86 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
OpenVINO This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU c6i.2xlarge 0.3983 0.7966 1.1949 1.5932 1.9915 SE +/- 0.01, N = 3 1.77 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16 - Device: CPU c6i.2xlarge 500 1000 1500 2000 2500 SE +/- 3.74, N = 3 2252.04 MIN: 2202.18 / MAX: 2317.24 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU c6i.2xlarge 2 4 6 8 10 SE +/- 0.04, N = 3 6.53 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Face Detection FP16-INT8 - Device: CPU c6i.2xlarge 130 260 390 520 650 SE +/- 3.28, N = 3 610.59 MIN: 497.95 / MAX: 643.84 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org FPS, More Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU c6i.2xlarge 5 10 15 20 25 SE +/- 0.06, N = 3 22.26 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
OpenBenchmarking.org ms, Fewer Is Better OpenVINO 2023.2.dev Model: Machine Translation EN To DE FP16 - Device: CPU c6i.2xlarge 40 80 120 160 200 SE +/- 0.44, N = 3 179.53 MIN: 100.26 / MAX: 344 1. (CXX) g++ options: -fsigned-char -ffunction-sections -fdata-sections -O3 -fno-strict-overflow -fwrapv -pie
PyBench This test profile reports the total time of the different average timed test results from PyBench. PyBench reports average test times for different functions such as BuiltinFunctionCalls and NestedForLoops, with this total result providing a rough estimate as to Python's average performance on a given system. This test profile runs PyBench each time for 20 rounds. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org Milliseconds, Fewer Is Better PyBench 2018-02-16 Total For Average Test Times c6i.2xlarge 200 400 600 800 1000 SE +/- 0.33, N = 3 1000
PyTorch This is a benchmark of PyTorch making use of pytorch-benchmark [https://github.com/LukasHedegaard/pytorch-benchmark]. Currently this test profile is catered to CPU-based testing. Learn more via the OpenBenchmarking.org test page.
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 c6i.2xlarge 6 12 18 24 30 SE +/- 0.13, N = 3 26.78 MIN: 13.67 / MAX: 27.8
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 c6i.2xlarge 3 6 9 12 15 SE +/- 0.01, N = 3 10.57 MIN: 9.04 / MAX: 10.77
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 c6i.2xlarge 4 8 12 16 20 SE +/- 0.12, N = 3 15.96 MIN: 11.65 / MAX: 17.13
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-50 c6i.2xlarge 4 8 12 16 20 SE +/- 0.17, N = 4 15.81 MIN: 9.11 / MAX: 17.08
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-152 c6i.2xlarge 2 4 6 8 10 SE +/- 0.05, N = 3 6.36 MIN: 5.43 / MAX: 6.61
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: ResNet-152 c6i.2xlarge 2 4 6 8 10 SE +/- 0.01, N = 3 6.38 MIN: 3.7 / MAX: 6.57
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l c6i.2xlarge 2 4 6 8 10 SE +/- 0.02, N = 3 7.99 MIN: 6.59 / MAX: 8.31
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: Efficientnet_v2_l c6i.2xlarge 0.9135 1.827 2.7405 3.654 4.5675 SE +/- 0.03, N = 3 4.06 MIN: 3.29 / MAX: 4.32
OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 32 - Model: Efficientnet_v2_l c6i.2xlarge 0.909 1.818 2.727 3.636 4.545 SE +/- 0.02, N = 3 4.04 MIN: 3.5 / MAX: 4.36
c6i.2xlarge Processor: Intel Xeon Platinum 8375C (4 Cores / 8 Threads), Motherboard: Amazon EC2 c6i.2xlarge (1.0 BIOS), Chipset: Intel 440FX 82441FX PMC, Memory: 1 x 16GB DDR4-3200MT/s, Disk: 215GB Amazon Elastic Block Store, Network: Amazon Elastic
OS: Amazon Linux 2023, Kernel: 6.1.61-85.141.amzn2023.x86_64 (x86_64), Compiler: GCC 11.4.1 20230605, File-System: xfs, System Layer: amazon
Kernel Notes: Transparent Huge Pages: madviseCompiler Notes: --build=x86_64-amazon-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,lto --enable-multilib --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=x86-64 --with-arch_64=x86-64-v2 --with-gcc-major-version-only --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driverProcessor Notes: CPU Microcode: 0xd0003a5Python Notes: Python 3.11.6Security Notes: gather_data_sampling: Unknown: Dependent on hypervisor status + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT Host state unknown + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
Testing initiated at 1 February 2024 17:47 by user ec2-user.