2 x Intel Xeon Gold 5220R testing with a TYAN S7106 (V2.01.B40 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
Compare your own system(s) to this result file with the
Phoronix Test Suite by running the command:
phoronix-test-suite benchmark 2212200-NE-XEONRGOLD29 xeon r gold onednn 3.0 - Phoronix Test Suite xeon r gold onednn 3.0 2 x Intel Xeon Gold 5220R testing with a TYAN S7106 (V2.01.B40 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2212200-NE-XEONRGOLD29&grs&rdt .
xeon r gold onednn 3.0 Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution a b c 2 x Intel Xeon Gold 5220R @ 3.90GHz (36 Cores / 72 Threads) TYAN S7106 (V2.01.B40 BIOS) Intel Sky Lake-E DMI3 Registers 94GB 500GB Samsung SSD 860 ASPEED VE228 2 x Intel I210 + 2 x QLogic cLOM8214 1/10GbE Ubuntu 20.04 6.1.0-phx (x86_64) GNOME Shell 3.36.9 X Server 1.20.13 GCC 9.4.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x5003302 Security Details - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Mitigation of Clear buffers; SMT vulnerable + retbleed: Mitigation of Stuffing + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
xeon r gold onednn 3.0 onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU a b c 1.73090 3.81228 5.78540 2.34752 0.215615 9.59277 8.14539 3.11322 1.29656 0.517990 9.21597 0.583986 1409.74 796.627 802.635 7.59329 1.75250 7.07070 0.696550 2.76971 1400.41 800.124 1400.64 6.41676 1.67674 3.77586 5.80604 2.41696 0.221767 9.61333 7.93565 3.0436 1.27572 0.510599 9.06402 0.581283 1392.36 805.569 802.953 7.58018 1.74919 7.10229 0.693376 2.76176 1399 799.425 1397.35 6.42421 1.80903 3.94714 5.97953 2.35576 0.219184 9.86147 8.11243 3.06163 1.26769 0.507931 9.14855 0.576626 1401.03 798.111 811.321 7.51633 1.76469 7.04447 0.695969 2.75773 1404.71 797.837 1397.94 6.43035 OpenBenchmarking.org
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU a b c 0.407 0.814 1.221 1.628 2.035 SE +/- 0.01351, N = 15 1.73090 1.67674 1.80903 MIN: 1.48 MIN: 1.51 MIN: 1.63 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU a b c 0.8881 1.7762 2.6643 3.5524 4.4405 SE +/- 0.00890, N = 3 3.81228 3.77586 3.94714 MIN: 3.69 MIN: 3.68 MIN: 3.68 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU a b c 1.3454 2.6908 4.0362 5.3816 6.727 SE +/- 0.01411, N = 3 5.78540 5.80604 5.97953 MIN: 5.54 MIN: 5.57 MIN: 5.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU a b c 0.5438 1.0876 1.6314 2.1752 2.719 SE +/- 0.02805, N = 3 2.34752 2.41696 2.35576 MIN: 2.04 MIN: 2.12 MIN: 2.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU a b c 0.0499 0.0998 0.1497 0.1996 0.2495 SE +/- 0.002809, N = 3 0.215615 0.221767 0.219184 MIN: 0.19 MIN: 0.19 MIN: 0.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU a b c 3 6 9 12 15 SE +/- 0.01211, N = 3 9.59277 9.61333 9.86147 MIN: 9.46 MIN: 9.47 MIN: 9.38 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b c 2 4 6 8 10 SE +/- 0.01873, N = 3 8.14539 7.93565 8.11243 MIN: 6.65 MIN: 6.58 MIN: 6.66 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU a b c 0.7005 1.401 2.1015 2.802 3.5025 SE +/- 0.02930, N = 3 3.11322 3.04360 3.06163 MIN: 2.94 MIN: 2.93 MIN: 2.96 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b c 0.2917 0.5834 0.8751 1.1668 1.4585 SE +/- 0.01585, N = 3 1.29656 1.27572 1.26769 MIN: 1.16 MIN: 1.16 MIN: 1.15 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU a b c 0.1165 0.233 0.3495 0.466 0.5825 SE +/- 0.006679, N = 3 0.517990 0.510599 0.507931 MIN: 0.48 MIN: 0.48 MIN: 0.48 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU a b c 3 6 9 12 15 SE +/- 0.03595, N = 3 9.21597 9.06402 9.14855 MIN: 8.85 MIN: 8.87 MIN: 8.86 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b c 0.1314 0.2628 0.3942 0.5256 0.657 SE +/- 0.003133, N = 3 0.583986 0.581283 0.576626 MIN: 0.54 MIN: 0.55 MIN: 0.54 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU a b c 300 600 900 1200 1500 SE +/- 14.53, N = 3 1409.74 1392.36 1401.03 MIN: 1368.85 MIN: 1371.11 MIN: 1372.31 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 1.64, N = 3 796.63 805.57 798.11 MIN: 775.72 MIN: 777.68 MIN: 775.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 2.09, N = 3 802.64 802.95 811.32 MIN: 775.47 MIN: 773.52 MIN: 778.11 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b c 2 4 6 8 10 SE +/- 0.02200, N = 3 7.59329 7.58018 7.51633 MIN: 7.35 MIN: 7.38 MIN: 7.37 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU a b c 0.3971 0.7942 1.1913 1.5884 1.9855 SE +/- 0.00424, N = 3 1.75250 1.74919 1.76469 MIN: 1.65 MIN: 1.64 MIN: 1.66 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU a b c 2 4 6 8 10 SE +/- 0.02943, N = 3 7.07070 7.10229 7.04447 MIN: 6.85 MIN: 6.86 MIN: 6.86 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU a b c 0.1567 0.3134 0.4701 0.6268 0.7835 SE +/- 0.002022, N = 3 0.696550 0.693376 0.695969 MIN: 0.68 MIN: 0.68 MIN: 0.68 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b c 0.6232 1.2464 1.8696 2.4928 3.116 SE +/- 0.00721, N = 3 2.76971 2.76176 2.75773 MIN: 2.71 MIN: 2.71 MIN: 2.71 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU a b c 300 600 900 1200 1500 SE +/- 4.30, N = 3 1400.41 1399.00 1404.71 MIN: 1369.82 MIN: 1359.32 MIN: 1369.91 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b c 200 400 600 800 1000 SE +/- 2.50, N = 3 800.12 799.43 797.84 MIN: 779.81 MIN: 776 MIN: 776.86 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b c 300 600 900 1200 1500 SE +/- 2.56, N = 3 1400.64 1397.35 1397.94 MIN: 1370.89 MIN: 1370.32 MIN: 1370.93 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.0 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU a b c 2 4 6 8 10 SE +/- 0.00754, N = 3 6.41676 6.42421 6.43035 MIN: 6.32 MIN: 6.32 MIN: 6.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Phoronix Test Suite v10.8.4