Amazon EC2 m6i.8xlarge

KVM testing on Ubuntu 20.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2108184-TJ-2108173TJ20&gru.

Amazon EC2 m6i.8xlargeProcessorMotherboardChipsetMemoryDiskNetworkOSKernelVulkanCompilerFile-SystemSystem Layerm6i.8xlargem5.8xlargeIntel Xeon Platinum 8375C (16 Cores / 32 Threads)Amazon EC2 m6i.8xlarge (1.0 BIOS)Intel 440FX 82441FX PMC124GB86GB Amazon Elastic Block StoreAmazon ElasticUbuntu 20.045.4.0-1045-aws (x86_64)1.0.2GCC 9.3.0ext4KVMIntel Xeon Platinum 8259CL (16 Cores / 32 Threads)Amazon EC2 m5.8xlarge (1.0 BIOS)126GBOpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- m6i.8xlarge: CPU Microcode: 0xd0002b1- m5.8xlarge: CPU Microcode: 0x5003005Python Details- Python 3.8.10Security Details- m6i.8xlarge: itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected - m5.8xlarge: itlb_multihit: KVM: Vulnerable + l1tf: Mitigation of PTE Inversion + mds: Vulnerable: Clear buffers attempted no microcode; SMT Host state unknown + meltdown: Mitigation of PTI + spec_store_bypass: Vulnerable + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full generic retpoline STIBP: disabled RSB filling + srbds: Not affected + tsx_async_abort: Not affected

Amazon EC2 m6i.8xlargeospray: San Miguel - Path Tracerospray: XFrog Forest - Path Tracerospray: NASA Streamlines - Path Tracerospray: Magnetic Reconnection - Path Tracersvt-av1: Preset 8 - Bosphorus 4Ksvt-hevc: 1 - Bosphorus 1080psvt-hevc: 7 - Bosphorus 1080psvt-hevc: 10 - Bosphorus 1080phpcg: oidn: RT.hdr_alb_nrm.3840x2160oidn: RT.ldr_alb_nrm.3840x2160oidn: RTLightmap.hdr.4096x4096openvkl: vklBenchmark ISPCopenvkl: vklBenchmark Scalarquantlib: gromacs: MPI CPU - water_GMX50_barenginx: 200nginx: 500nginx: 1000apache: 200apache: 500apache: 1000lulesh: namd: ATPase Simulation - 327,506 Atomspennant: sedovbigpennant: leblancbigonednn: IP Shapes 1D - f32 - CPUonednn: IP Shapes 3D - f32 - CPUonednn: IP Shapes 1D - bf16bf16bf16 - CPUonednn: IP Shapes 3D - bf16bf16bf16 - CPUonednn: Convolution Batch Shapes Auto - f32 - CPUonednn: Deconvolution Batch shapes_1d - f32 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUonednn: Recurrent Neural Network Training - f32 - CPUonednn: Recurrent Neural Network Inference - f32 - CPUonednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPUonednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - f32 - CPUonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUonednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPUtnn: CPU - DenseNettnn: CPU - MobileNet v2tnn: CPU - SqueezeNet v2tnn: CPU - SqueezeNet v1.1incompact3d: input.i3d 129 Cells Per Directionincompact3d: input.i3d 193 Cells Per Directionbuild-ffmpeg: Time To Compilebuild-linux-kernel: Time To Compilebuild-llvm: Unix Makefilesbuild-nodejs: Time To Compileyafaray: Total Time For Sample Sceneappleseed: Emilyappleseed: Disney Materialappleseed: Material Testerm6i.8xlargem5.8xlarge2.582.697.4150019.63312.95187.95387.2715.92341.061.060.5081312478.82.676440121.66414507.60401818.94162518.62162463.56169650.548427.76830.9130943.2954820.343731.725471.831334.340381.753142.891776.602452.360061262.00683.6128.3753111.192211.54190.5982671259.81681.6871.839163786.262349.67770.554357.65810.383473241.352092739.30753.906428.628253.24690.463239.740087133.278786132.0912561.961.955.35333.3313.3719.15136.96278.6013.09300.780.780.3760241716.61.817334665.99322639.42312237.31101689.84117933.04109815.036183.67471.3145861.9823328.347912.313613.056795.825542.711864.021317.114503.327511646.37891.90310.606613.851714.36920.7852541643.55892.7172.366594457.139462.357103.470436.69013.450386454.984939653.76172.458577.688332.863133.741298.612547164.39384166.450262OpenBenchmarking.org

OSPray

Demo: San Miguel - Renderer: Path Tracer

OpenBenchmarking.orgFPS, More Is BetterOSPray 1.8.5Demo: San Miguel - Renderer: Path Tracerm6i.8xlargem5.8xlarge0.58051.1611.74152.3222.9025SE +/- 0.00, N = 3SE +/- 0.00, N = 32.581.96MIN: 2.53 / MAX: 2.6MIN: 1.94

OSPray

Demo: XFrog Forest - Renderer: Path Tracer

OpenBenchmarking.orgFPS, More Is BetterOSPray 1.8.5Demo: XFrog Forest - Renderer: Path Tracerm6i.8xlargem5.8xlarge0.60531.21061.81592.42123.0265SE +/- 0.00, N = 3SE +/- 0.00, N = 32.691.95MIN: 2.6 / MAX: 2.7MIN: 1.94 / MAX: 1.96

OSPray

Demo: NASA Streamlines - Renderer: Path Tracer

OpenBenchmarking.orgFPS, More Is BetterOSPray 1.8.5Demo: NASA Streamlines - Renderer: Path Tracerm6i.8xlargem5.8xlarge246810SE +/- 0.00, N = 3SE +/- 0.00, N = 37.415.35MIN: 7.25 / MAX: 7.58MIN: 5.24 / MAX: 5.43

OSPray

Demo: Magnetic Reconnection - Renderer: Path Tracer

OpenBenchmarking.orgFPS, More Is BetterOSPray 1.8.5Demo: Magnetic Reconnection - Renderer: Path Tracerm6i.8xlargem5.8xlarge110220330440550SE +/- 0.00, N = 3500.00333.33MAX: 1000MIN: 250

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 0.8.7Encoder Mode: Preset 8 - Input: Bosphorus 4Km6i.8xlargem5.8xlarge510152025SE +/- 0.01, N = 3SE +/- 0.01, N = 319.6313.371. (CXX) g++ options: -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq -pie

SVT-HEVC

Tuning: 1 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 1 - Input: Bosphorus 1080pm6i.8xlargem5.8xlarge3691215SE +/- 0.02, N = 3SE +/- 0.02, N = 312.959.151. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

SVT-HEVC

Tuning: 7 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 7 - Input: Bosphorus 1080pm6i.8xlargem5.8xlarge4080120160200SE +/- 0.11, N = 3SE +/- 0.21, N = 3187.95136.961. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

SVT-HEVC

Tuning: 10 - Input: Bosphorus 1080p

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-HEVC 1.5.0Tuning: 10 - Input: Bosphorus 1080pm6i.8xlargem5.8xlarge80160240320400SE +/- 1.33, N = 3SE +/- 0.62, N = 3387.27278.601. (CC) gcc options: -fPIE -fPIC -O3 -O2 -pie -rdynamic -lpthread -lrt

High Performance Conjugate Gradient

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1m6i.8xlargem5.8xlarge48121620SE +/- 0.01, N = 3SE +/- 0.02, N = 315.9213.091. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -pthread -lmpi_cxx -lmpi

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RT.hdr_alb_nrm.3840x2160m6i.8xlargem5.8xlarge0.23850.4770.71550.9541.1925SE +/- 0.00, N = 3SE +/- 0.00, N = 31.060.78

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RT.ldr_alb_nrm.3840x2160m6i.8xlargem5.8xlarge0.23850.4770.71550.9541.1925SE +/- 0.00, N = 3SE +/- 0.00, N = 31.060.78

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RTLightmap.hdr.4096x4096m6i.8xlargem5.8xlarge0.11250.2250.33750.450.5625SE +/- 0.00, N = 3SE +/- 0.00, N = 30.500.37

OpenVKL

Benchmark: vklBenchmark ISPC

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.0Benchmark: vklBenchmark ISPCm6i.8xlargem5.8xlarge204060801008160MIN: 6 / MAX: 1934MIN: 4 / MAX: 1457

OpenVKL

Benchmark: vklBenchmark Scalar

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.0Benchmark: vklBenchmark Scalarm6i.8xlargem5.8xlarge7142128353124MIN: 2 / MAX: 931MIN: 2 / MAX: 667

QuantLib

OpenBenchmarking.orgMFLOPS, More Is BetterQuantLib 1.21m6i.8xlargem5.8xlarge5001000150020002500SE +/- 4.97, N = 3SE +/- 0.78, N = 32478.81716.61. (CXX) g++ options: -O3 -march=native -rdynamic

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2021.2Implementation: MPI CPU - Input: water_GMX50_barem6i.8xlargem5.8xlarge0.60211.20421.80632.40843.0105SE +/- 0.004, N = 3SE +/- 0.002, N = 32.6761.8171. (CXX) g++ options: -O3 -pthread

nginx

Concurrent Requests: 200

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.21.1Concurrent Requests: 200m6i.8xlargem5.8xlarge90K180K270K360K450KSE +/- 600.47, N = 3SE +/- 59.66, N = 3440121.66334665.991. (CC) gcc options: -ldl -lpthread -lcrypt -lz -O3 -march=native

nginx

Concurrent Requests: 500

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.21.1Concurrent Requests: 500m6i.8xlargem5.8xlarge90K180K270K360K450KSE +/- 2040.23, N = 3SE +/- 143.26, N = 3414507.60322639.421. (CC) gcc options: -ldl -lpthread -lcrypt -lz -O3 -march=native

nginx

Concurrent Requests: 1000

OpenBenchmarking.orgRequests Per Second, More Is Betternginx 1.21.1Concurrent Requests: 1000m6i.8xlargem5.8xlarge90K180K270K360K450KSE +/- 2242.18, N = 3SE +/- 1189.81, N = 3401818.94312237.311. (CC) gcc options: -ldl -lpthread -lcrypt -lz -O3 -march=native

Apache HTTP Server

Concurrent Requests: 200

OpenBenchmarking.orgRequests Per Second, More Is BetterApache HTTP Server 2.4.48Concurrent Requests: 200m6i.8xlargem5.8xlarge30K60K90K120K150KSE +/- 812.36, N = 3SE +/- 267.71, N = 3162518.62101689.841. (CC) gcc options: -shared -fPIC -O2 -pthread

Apache HTTP Server

Concurrent Requests: 500

OpenBenchmarking.orgRequests Per Second, More Is BetterApache HTTP Server 2.4.48Concurrent Requests: 500m6i.8xlargem5.8xlarge30K60K90K120K150KSE +/- 1515.80, N = 7SE +/- 155.99, N = 3162463.56117933.041. (CC) gcc options: -shared -fPIC -O2 -pthread

Apache HTTP Server

Concurrent Requests: 1000

OpenBenchmarking.orgRequests Per Second, More Is BetterApache HTTP Server 2.4.48Concurrent Requests: 1000m6i.8xlargem5.8xlarge40K80K120K160K200KSE +/- 1307.55, N = 10SE +/- 1046.25, N = 3169650.54109815.031. (CC) gcc options: -shared -fPIC -O2 -pthread

LULESH

OpenBenchmarking.orgz/s, More Is BetterLULESH 2.0.3m6i.8xlargem5.8xlarge2K4K6K8K10KSE +/- 23.16, N = 3SE +/- 10.80, N = 38427.776183.671. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi

NAMD

ATPase Simulation - 327,506 Atoms

OpenBenchmarking.orgdays/ns, Fewer Is BetterNAMD 2.14ATPase Simulation - 327,506 Atomsm6i.8xlargem5.8xlarge0.29580.59160.88741.18321.479SE +/- 0.00097, N = 3SE +/- 0.00352, N = 30.913091.31458

Pennant

Test: sedovbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: sedovbigm6i.8xlargem5.8xlarge1428425670SE +/- 0.06, N = 3SE +/- 0.00, N = 343.3061.981. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

Pennant

Test: leblancbig

OpenBenchmarking.orgHydro Cycle Time - Seconds, Fewer Is BetterPennant 1.0.1Test: leblancbigm6i.8xlargem5.8xlarge714212835SE +/- 0.01, N = 3SE +/- 0.02, N = 320.3428.351. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi

oneDNN

Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: f32 - Engine: CPUm6i.8xlargem5.8xlarge0.52061.04121.56182.08242.603SE +/- 0.00183, N = 3SE +/- 0.00021, N = 31.725472.31361MIN: 1.52MIN: 2.261. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: f32 - Engine: CPUm6i.8xlargem5.8xlarge0.68781.37562.06342.75123.439SE +/- 0.00113, N = 3SE +/- 0.00364, N = 31.831333.05679MIN: 1.78MIN: 31. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPUm6i.8xlargem5.8xlarge1.31072.62143.93215.24286.5535SE +/- 0.00222, N = 3SE +/- 0.00191, N = 34.340385.82554MIN: 4.13MIN: 5.741. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPUm6i.8xlargem5.8xlarge0.61021.22041.83062.44083.051SE +/- 0.00108, N = 3SE +/- 0.00021, N = 31.753142.71186MIN: 1.7MIN: 2.661. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUm6i.8xlargem5.8xlarge0.90481.80962.71443.61924.524SE +/- 0.00091, N = 3SE +/- 0.00415, N = 32.891774.02131MIN: 2.82MIN: 3.941. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUm6i.8xlargem5.8xlarge246810SE +/- 0.01111, N = 3SE +/- 0.01645, N = 36.602457.11450MIN: 3.59MIN: 4.071. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUm6i.8xlargem5.8xlarge0.74871.49742.24612.99483.7435SE +/- 0.00094, N = 3SE +/- 0.00287, N = 32.360063.32751MIN: 2.18MIN: 3.291. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUm6i.8xlargem5.8xlarge400800120016002000SE +/- 0.36, N = 3SE +/- 1.52, N = 31262.001646.37MIN: 1237.58MIN: 1641.321. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPUm6i.8xlargem5.8xlarge2004006008001000SE +/- 0.39, N = 3SE +/- 0.58, N = 3683.61891.90MIN: 668.66MIN: 888.421. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPUm6i.8xlargem5.8xlarge3691215SE +/- 0.00038, N = 3SE +/- 0.00721, N = 38.3753110.60660MIN: 8.35MIN: 10.391. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPUm6i.8xlargem5.8xlarge48121620SE +/- 0.01, N = 3SE +/- 0.00, N = 311.1913.85MIN: 11.14MIN: 13.71. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPUm6i.8xlargem5.8xlarge48121620SE +/- 0.00, N = 3SE +/- 0.02, N = 311.5414.37MIN: 11.5MIN: 14.321. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPUm6i.8xlargem5.8xlarge0.17670.35340.53010.70680.8835SE +/- 0.000278, N = 3SE +/- 0.000465, N = 30.5982670.785254MIN: 0.54MIN: 0.751. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUm6i.8xlargem5.8xlarge400800120016002000SE +/- 0.80, N = 3SE +/- 3.08, N = 31259.811643.55MIN: 1231.21MIN: 1634.91. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUm6i.8xlargem5.8xlarge2004006008001000SE +/- 0.16, N = 3SE +/- 0.44, N = 3681.69892.72MIN: 667.85MIN: 889.581. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

oneDNN

Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.1.2Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPUm6i.8xlargem5.8xlarge0.53251.0651.59752.132.6625SE +/- 0.00072, N = 3SE +/- 0.00424, N = 31.839162.36659MIN: 1.76MIN: 2.281. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl

TNN

Target: CPU - Model: DenseNet

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: DenseNetm6i.8xlargem5.8xlarge10002000300040005000SE +/- 0.64, N = 3SE +/- 3.11, N = 33786.264457.14MIN: 3733.1 / MAX: 3883.97MIN: 4358.21 / MAX: 4557.141. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: MobileNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: MobileNet v2m6i.8xlargem5.8xlarge100200300400500SE +/- 0.22, N = 3SE +/- 0.09, N = 3349.68462.36MIN: 348.47 / MAX: 351.38MIN: 460.64 / MAX: 467.451. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v2

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v2m6i.8xlargem5.8xlarge20406080100SE +/- 0.08, N = 3SE +/- 0.01, N = 370.55103.47MIN: 70.14 / MAX: 71.88MIN: 103.34 / MAX: 103.681. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

TNN

Target: CPU - Model: SqueezeNet v1.1

OpenBenchmarking.orgms, Fewer Is BetterTNN 0.3Target: CPU - Model: SqueezeNet v1.1m6i.8xlargem5.8xlarge90180270360450SE +/- 0.06, N = 3SE +/- 0.03, N = 3357.66436.69MIN: 356.91 / MAX: 361MIN: 436.06 / MAX: 437.361. (CXX) g++ options: -fopenmp -pthread -fvisibility=hidden -fvisibility=default -O3 -rdynamic -ldl

Xcompact3d Incompact3d

Input: input.i3d 129 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 129 Cells Per Directionm6i.8xlargem5.8xlarge3691215SE +/- 0.13, N = 4SE +/- 0.07, N = 310.3813.451. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per Directionm6i.8xlargem5.8xlarge1224364860SE +/- 0.35, N = 3SE +/- 0.43, N = 341.3554.981. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -pthread -lmpi_usempif08 -lmpi_mpifh -lmpi

Timed FFmpeg Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed FFmpeg Compilation 4.4Time To Compilem6i.8xlargem5.8xlarge1224364860SE +/- 0.15, N = 3SE +/- 0.12, N = 339.3153.76

Timed Linux Kernel Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 5.10.20Time To Compilem6i.8xlargem5.8xlarge1632486480SE +/- 0.49, N = 3SE +/- 0.70, N = 353.9172.46

Timed LLVM Compilation

Build System: Unix Makefiles

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 12.0Build System: Unix Makefilesm6i.8xlargem5.8xlarge120240360480600SE +/- 2.32, N = 3SE +/- 3.92, N = 3428.63577.69

Timed Node.js Compilation

Time To Compile

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Node.js Compilation 15.11Time To Compilem6i.8xlargem5.8xlarge70140210280350SE +/- 0.12, N = 3SE +/- 0.40, N = 3253.25332.86

YafaRay

Total Time For Sample Scene

OpenBenchmarking.orgSeconds, Fewer Is BetterYafaRay 3.5.1Total Time For Sample Scenem6i.8xlargem5.8xlarge306090120150SE +/- 1.25, N = 3SE +/- 2.77, N = 1290.46133.741. (CXX) g++ options: -std=c++11 -pthread -O3 -ffast-math -rdynamic -ldl -lImath -lIlmImf -lIex -lHalf -lz -lIlmThread -lxml2 -lfreetype

Appleseed

Scene: Emily

OpenBenchmarking.orgSeconds, Fewer Is BetterAppleseed 2.0 BetaScene: Emilym6i.8xlargem5.8xlarge70140210280350239.74298.61

Appleseed

Scene: Disney Material

OpenBenchmarking.orgSeconds, Fewer Is BetterAppleseed 2.0 BetaScene: Disney Materialm6i.8xlargem5.8xlarge4080120160200133.28164.39

Appleseed

Scene: Material Tester

OpenBenchmarking.orgSeconds, Fewer Is BetterAppleseed 2.0 BetaScene: Material Testerm6i.8xlargem5.8xlarge4080120160200132.09166.45


Phoronix Test Suite v10.8.4