10900K sysbench onednn Intel Core i9-10900K testing with a Gigabyte Z490 AORUS MASTER (F3 BIOS) and Gigabyte AMD Radeon RX 5500/5500M / Pro 5500M 8GB on Ubuntu 20.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2103134-PTS-10900KSY85&sor&gru .
10900K sysbench onednn Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution 1 2 3 4 Intel Core i9-10900K @ 5.30GHz (10 Cores / 20 Threads) Gigabyte Z490 AORUS MASTER (F3 BIOS) Intel Comet Lake PCH 16GB Samsung SSD 970 EVO 250GB Gigabyte AMD Radeon RX 5500/5500M / Pro 5500M 8GB (1900/875MHz) Realtek ALC1220 ASUS MG28U Intel + Intel Wi-Fi 6 AX201 Ubuntu 20.10 5.11.0-051100rc2daily20210106-generic (x86_64) 20210105 GNOME Shell 3.38.1 X Server 1.20.9 4.6 Mesa 20.2.1 (LLVM 11.0.0) 1.2.131 GCC 10.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0xe0 - Thermald 2.3 Security Details - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
10900K sysbench onednn sysbench: CPU sysbench: RAM / Memory onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU 1 2 3 4 26017.89 34129.11 3.30456 12.2150 1.17248 2.45003 21.3621 6.67724 4.82270 17.9280 1.49138 3.76517 2843.83 1745.36 2847.38 1779.52 3.94807 2834.49 1748.03 1.91115 26029.63 34243.57 3.30545 12.1785 1.17465 2.42769 21.3339 6.85465 4.81353 17.6531 1.49445 3.76150 2837.76 1729.05 2837.57 1732.40 3.92455 2836.92 1742.26 1.93276 26028.92 34315.67 3.30228 12.1638 1.17258 2.47206 21.3277 6.62239 4.82441 17.6899 1.49196 3.77453 2854.28 1742.27 2818.43 1754.38 3.89827 2836.90 1725.15 1.91319 26019.11 34146.51 3.30052 12.1712 1.17424 2.45468 21.3298 6.59741 4.83458 17.7791 1.49279 3.74976 2818.99 1764.34 2844.56 1744.70 3.89701 2832.25 1749.05 1.93351 OpenBenchmarking.org
Sysbench Test: CPU OpenBenchmarking.org Events Per Second, More Is Better Sysbench 1.0.20 Test: CPU 2 3 4 1 6K 12K 18K 24K 30K SE +/- 5.92, N = 3 SE +/- 1.44, N = 3 SE +/- 1.33, N = 3 SE +/- 2.59, N = 3 26029.63 26028.92 26019.11 26017.89 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
Sysbench Test: RAM / Memory OpenBenchmarking.org MiB/sec, More Is Better Sysbench 1.0.20 Test: RAM / Memory 3 2 4 1 7K 14K 21K 28K 35K SE +/- 108.17, N = 3 SE +/- 99.86, N = 3 SE +/- 148.70, N = 3 SE +/- 68.06, N = 3 34315.67 34243.57 34146.51 34129.11 1. (CC) gcc options: -pthread -O2 -funroll-loops -rdynamic -ldl -laio -lm
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 4 3 1 2 0.7437 1.4874 2.2311 2.9748 3.7185 SE +/- 0.01635, N = 3 SE +/- 0.00348, N = 3 SE +/- 0.00272, N = 3 SE +/- 0.00232, N = 3 3.30052 3.30228 3.30456 3.30545 MIN: 3.03 MIN: 3.02 MIN: 3.03 MIN: 3.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 3 4 2 1 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 12.16 12.17 12.18 12.22 MIN: 12.05 MIN: 12.06 MIN: 12.08 MIN: 12.1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 1 3 4 2 0.2643 0.5286 0.7929 1.0572 1.3215 SE +/- 0.00221, N = 3 SE +/- 0.00090, N = 3 SE +/- 0.00107, N = 3 SE +/- 0.00194, N = 3 1.17248 1.17258 1.17424 1.17465 MIN: 1.16 MIN: 1.16 MIN: 1.16 MIN: 1.16 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 2 1 4 3 0.5562 1.1124 1.6686 2.2248 2.781 SE +/- 0.01028, N = 3 SE +/- 0.03980, N = 3 SE +/- 0.01102, N = 3 SE +/- 0.01612, N = 3 2.42769 2.45003 2.45468 2.47206 MIN: 2.31 MIN: 2.34 MIN: 2.31 MIN: 2.33 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 3 4 2 1 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 21.33 21.33 21.33 21.36 MIN: 21.25 MIN: 21.24 MIN: 21.24 MIN: 21.29 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 4 3 1 2 2 4 6 8 10 SE +/- 0.11275, N = 15 SE +/- 0.09531, N = 15 SE +/- 0.09219, N = 15 SE +/- 0.13332, N = 15 6.59741 6.62239 6.67724 6.85465 MIN: 3.47 MIN: 3.47 MIN: 3.45 MIN: 3.45 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 2 1 3 4 1.0878 2.1756 3.2634 4.3512 5.439 SE +/- 0.00575, N = 3 SE +/- 0.00884, N = 3 SE +/- 0.00815, N = 3 SE +/- 0.00473, N = 3 4.81353 4.82270 4.82441 4.83458 MIN: 4.71 MIN: 4.79 MIN: 4.75 MIN: 4.73 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 2 3 4 1 4 8 12 16 20 SE +/- 0.19, N = 3 SE +/- 0.13, N = 3 SE +/- 0.08, N = 3 SE +/- 0.07, N = 3 17.65 17.69 17.78 17.93 MIN: 17.18 MIN: 17.18 MIN: 17.16 MIN: 17.28 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 1 3 4 2 0.3363 0.6726 1.0089 1.3452 1.6815 SE +/- 0.00072, N = 3 SE +/- 0.00095, N = 3 SE +/- 0.00086, N = 3 SE +/- 0.00187, N = 3 1.49138 1.49196 1.49279 1.49445 MIN: 1.48 MIN: 1.48 MIN: 1.48 MIN: 1.48 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 4 2 1 3 0.8493 1.6986 2.5479 3.3972 4.2465 SE +/- 0.00653, N = 3 SE +/- 0.00158, N = 3 SE +/- 0.00296, N = 3 SE +/- 0.00173, N = 3 3.74976 3.76150 3.76517 3.77453 MIN: 3.72 MIN: 3.74 MIN: 3.74 MIN: 3.75 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 4 2 1 3 600 1200 1800 2400 3000 SE +/- 16.37, N = 3 SE +/- 5.58, N = 3 SE +/- 10.02, N = 3 SE +/- 23.35, N = 3 2818.99 2837.76 2843.83 2854.28 MIN: 2765.73 MIN: 2798.07 MIN: 2768.24 MIN: 2775.53 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 2 3 1 4 400 800 1200 1600 2000 SE +/- 21.18, N = 3 SE +/- 21.30, N = 3 SE +/- 15.68, N = 3 SE +/- 6.63, N = 3 1729.05 1742.27 1745.36 1764.34 MIN: 1683.97 MIN: 1683.83 MIN: 1705.83 MIN: 1689.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 3 2 4 1 600 1200 1800 2400 3000 SE +/- 6.65, N = 3 SE +/- 15.45, N = 3 SE +/- 5.94, N = 3 SE +/- 3.12, N = 3 2818.43 2837.57 2844.56 2847.38 MIN: 2766.26 MIN: 2767.4 MIN: 2768.09 MIN: 2765.94 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 2 4 3 1 400 800 1200 1600 2000 SE +/- 13.35, N = 3 SE +/- 6.63, N = 3 SE +/- 19.63, N = 3 SE +/- 8.77, N = 3 1732.40 1744.70 1754.38 1779.52 MIN: 1680.02 MIN: 1681.33 MIN: 1691.38 MIN: 1736.71 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 4 3 2 1 0.8883 1.7766 2.6649 3.5532 4.4415 SE +/- 0.00827, N = 3 SE +/- 0.00147, N = 3 SE +/- 0.00768, N = 3 SE +/- 0.01417, N = 3 3.89701 3.89827 3.92455 3.94807 MIN: 3.72 MIN: 3.76 MIN: 3.77 MIN: 3.79 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 4 1 3 2 600 1200 1800 2400 3000 SE +/- 11.43, N = 3 SE +/- 12.24, N = 3 SE +/- 6.62, N = 3 SE +/- 0.84, N = 3 2832.25 2834.49 2836.90 2836.92 MIN: 2770.76 MIN: 2770.1 MIN: 2764.69 MIN: 2770.47 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 3 2 1 4 400 800 1200 1600 2000 SE +/- 2.77, N = 3 SE +/- 26.98, N = 3 SE +/- 14.77, N = 3 SE +/- 27.03, N = 3 1725.15 1742.26 1748.03 1749.05 MIN: 1683.55 MIN: 1671.61 MIN: 1687.97 MIN: 1667.38 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 1 3 2 4 0.435 0.87 1.305 1.74 2.175 SE +/- 0.00175, N = 3 SE +/- 0.00284, N = 3 SE +/- 0.01672, N = 3 SE +/- 0.01806, N = 3 1.91115 1.91319 1.93276 1.93351 MIN: 1.88 MIN: 1.88 MIN: 1.88 MIN: 1.88 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Phoronix Test Suite v10.8.5