onednn onnx alderlake Intel Core i9-12900K testing with a ASUS ROG STRIX Z690-E GAMING WIFI (1003 BIOS) and Gigabyte AMD Radeon RX 6800 XT 16GB on Ubuntu 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2203319-NE-ONEDNNONN20&gru&sro .
onednn onnx alderlake Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution A B C D Intel Core i9-12900K @ 5.20GHz (16 Cores / 24 Threads) ASUS ROG STRIX Z690-E GAMING WIFI (1003 BIOS) Intel Device 7aa7 32GB 1000GB Western Digital WDS100T1X0E-00AFY0 + 2000GB Gigabyte AMD Radeon RX 6800 XT 16GB (2575/1000MHz) Intel Device 7ad0 ASUS VP28U Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 21.10 5.17.0-phx (x86_64) GNOME Shell 40.5 X Server 1.20.13 + Wayland 4.6 Mesa 22.1.0-devel (git-ae710f3 2022-03-26 impish-oibaf-ppa) (LLVM 13.0.1 DRM 3.46) 1.3.207 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x18 - Thermald 2.4.6 Python Details - Python 3.9.7 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
onednn onnx alderlake onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU A B C D 7989 11035 628 670 925 931 111 96 363 1908 4518 5332 2.63724 3.91213 1.08279 0.882082 5.90865 8.27680 5.24536 6.05436 1.35376 2.21737 2883.12 1614.54 2881.96 1617.55 1.24567 2884.94 1615.08 1.033654 8110 11077 632 664 937 916 111 95 364 1913 4504 5349 2.65135 3.81887 1.05578 0.851627 5.90320 8.38411 5.24601 6.04615 1.35332 2.22008 2875.93 1611.80 2885.96 1612.80 1.25510 2881.54 1630.53 0.826339 8046 11079 631 666 933 918 111 95 362 1896 4444 5328 2.63180 3.88487 1.05754 0.881799 5.90578 8.21415 5.24648 6.05662 1.34061 2.22075 2881.76 1612.97 2882.62 1611.27 1.28432 2879.53 1618.78 0.842998 8095 11123 633 669 964 926 111 95 364 1893 4506 5384 2.63287 3.88717 1.05923 0.885057 5.90930 8.05084 5.24826 6.04798 1.34296 2.21930 2879.92 1614.78 2880.35 1614.27 1.31081 2882.22 1616.11 1.011141 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Parallel A B C D 2K 4K 6K 8K 10K SE +/- 58.12, N = 3 SE +/- 47.85, N = 3 SE +/- 40.10, N = 3 SE +/- 24.57, N = 3 7989 8110 8046 8095 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Standard A B C D 2K 4K 6K 8K 10K SE +/- 13.90, N = 3 SE +/- 11.10, N = 3 SE +/- 13.33, N = 3 SE +/- 41.07, N = 3 11035 11077 11079 11123 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Parallel A B C D 140 280 420 560 700 SE +/- 2.75, N = 3 SE +/- 2.03, N = 3 SE +/- 2.05, N = 3 SE +/- 1.17, N = 3 628 632 631 633 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Standard A B C D 140 280 420 560 700 SE +/- 2.33, N = 3 SE +/- 4.18, N = 3 SE +/- 2.74, N = 3 SE +/- 3.53, N = 3 670 664 666 669 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Parallel A B C D 200 400 600 800 1000 SE +/- 8.08, N = 3 SE +/- 10.12, N = 5 SE +/- 12.55, N = 3 SE +/- 6.62, N = 3 925 937 933 964 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Standard A B C D 200 400 600 800 1000 SE +/- 5.01, N = 3 SE +/- 3.92, N = 3 SE +/- 9.18, N = 3 SE +/- 2.60, N = 3 931 916 918 926 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel A B C D 20 40 60 80 100 SE +/- 0.60, N = 3 SE +/- 0.29, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 111 111 111 111 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard A B C D 20 40 60 80 100 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 1.00, N = 3 96 95 95 95 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel A B C D 80 160 240 320 400 SE +/- 0.17, N = 3 SE +/- 0.50, N = 3 SE +/- 0.17, N = 3 SE +/- 0.29, N = 3 363 364 362 364 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard A B C D 400 800 1200 1600 2000 SE +/- 5.36, N = 3 SE +/- 2.93, N = 3 SE +/- 11.79, N = 3 SE +/- 4.07, N = 3 1908 1913 1896 1893 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Parallel A B C D 1000 2000 3000 4000 5000 SE +/- 53.96, N = 3 SE +/- 63.24, N = 3 SE +/- 28.86, N = 3 SE +/- 57.47, N = 3 4518 4504 4444 4506 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard A B C D 1200 2400 3600 4800 6000 SE +/- 20.95, N = 3 SE +/- 31.64, N = 3 SE +/- 28.50, N = 3 SE +/- 52.89, N = 3 5332 5349 5328 5384 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU A B C D 0.5966 1.1932 1.7898 2.3864 2.983 SE +/- 0.00542, N = 3 SE +/- 0.01495, N = 3 SE +/- 0.00440, N = 3 SE +/- 0.00281, N = 3 2.63724 2.65135 2.63180 2.63287 MIN: 2.5 MIN: 2.51 MIN: 2.49 MIN: 2.51 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B C D 0.8802 1.7604 2.6406 3.5208 4.401 SE +/- 0.00073, N = 3 SE +/- 0.00339, N = 3 SE +/- 0.00218, N = 3 SE +/- 0.00582, N = 3 3.91213 3.81887 3.88487 3.88717 MIN: 3.89 MIN: 3.78 MIN: 3.86 MIN: 3.85 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B C D 0.2436 0.4872 0.7308 0.9744 1.218 SE +/- 0.01605, N = 15 SE +/- 0.00867, N = 3 SE +/- 0.00884, N = 3 SE +/- 0.00217, N = 3 1.08279 1.05578 1.05754 1.05923 MIN: 1.01 MIN: 1.01 MIN: 1.01 MIN: 1.02 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B C D 0.1991 0.3982 0.5973 0.7964 0.9955 SE +/- 0.000523, N = 3 SE +/- 0.003743, N = 3 SE +/- 0.002403, N = 3 SE +/- 0.006655, N = 3 0.882082 0.851627 0.881799 0.885057 MIN: 0.87 MIN: 0.83 MIN: 0.86 MIN: 0.86 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B C D 1.3296 2.6592 3.9888 5.3184 6.648 SE +/- 0.00465, N = 3 SE +/- 0.00395, N = 3 SE +/- 0.00383, N = 3 SE +/- 0.00197, N = 3 5.90865 5.90320 5.90578 5.90930 MIN: 5.79 MIN: 5.83 MIN: 5.82 MIN: 5.75 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A B C D 2 4 6 8 10 SE +/- 0.10247, N = 15 SE +/- 0.11946, N = 14 SE +/- 0.09831, N = 15 SE +/- 0.05693, N = 3 8.27680 8.38411 8.21415 8.05084 MIN: 4.02 MIN: 3.99 MIN: 4.09 MIN: 4.24 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A B C D 1.1809 2.3618 3.5427 4.7236 5.9045 SE +/- 0.00223, N = 3 SE +/- 0.00260, N = 3 SE +/- 0.00169, N = 3 SE +/- 0.00317, N = 3 5.24536 5.24601 5.24648 5.24826 MIN: 5.18 MIN: 5.17 MIN: 5.19 MIN: 5.17 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C D 2 4 6 8 10 SE +/- 0.00353, N = 3 SE +/- 0.00195, N = 3 SE +/- 0.00242, N = 3 SE +/- 0.00248, N = 3 6.05436 6.04615 6.05662 6.04798 MIN: 5.99 MIN: 5.97 MIN: 5.9 MIN: 5.95 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B C D 0.3046 0.6092 0.9138 1.2184 1.523 SE +/- 0.01690, N = 3 SE +/- 0.01445, N = 3 SE +/- 0.00457, N = 3 SE +/- 0.00783, N = 3 1.35376 1.35332 1.34061 1.34296 MIN: 1.28 MIN: 1.28 MIN: 1.28 MIN: 1.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A B C D 0.4997 0.9994 1.4991 1.9988 2.4985 SE +/- 0.00202, N = 3 SE +/- 0.00095, N = 3 SE +/- 0.00145, N = 3 SE +/- 0.00390, N = 3 2.21737 2.22008 2.22075 2.21930 MIN: 2.2 MIN: 2.19 MIN: 2.2 MIN: 2.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU A B C D 600 1200 1800 2400 3000 SE +/- 3.43, N = 3 SE +/- 3.97, N = 3 SE +/- 2.26, N = 3 SE +/- 3.46, N = 3 2883.12 2875.93 2881.76 2879.92 MIN: 2869.72 MIN: 2865.18 MIN: 2866.35 MIN: 2866.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A B C D 300 600 900 1200 1500 SE +/- 2.06, N = 3 SE +/- 2.11, N = 3 SE +/- 0.75, N = 3 SE +/- 1.49, N = 3 1614.54 1611.80 1612.97 1614.78 MIN: 1608.19 MIN: 1603.03 MIN: 1606.37 MIN: 1607.96 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU A B C D 600 1200 1800 2400 3000 SE +/- 6.09, N = 3 SE +/- 2.20, N = 3 SE +/- 1.28, N = 3 SE +/- 2.59, N = 3 2881.96 2885.96 2882.62 2880.35 MIN: 2865.7 MIN: 2873.23 MIN: 2872.99 MIN: 2869.74 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A B C D 300 600 900 1200 1500 SE +/- 5.04, N = 3 SE +/- 1.77, N = 3 SE +/- 1.60, N = 3 SE +/- 0.67, N = 3 1617.55 1612.80 1611.27 1614.27 MIN: 1606.77 MIN: 1604.84 MIN: 1602.83 MIN: 1608.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU A B C D 0.2949 0.5898 0.8847 1.1796 1.4745 SE +/- 0.01358, N = 3 SE +/- 0.01551, N = 3 SE +/- 0.01243, N = 3 SE +/- 0.02113, N = 15 1.24567 1.25510 1.28432 1.31081 MIN: 1.17 MIN: 1.17 MIN: 1.19 MIN: 1.17 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU A B C D 600 1200 1800 2400 3000 SE +/- 4.56, N = 3 SE +/- 4.30, N = 3 SE +/- 2.50, N = 3 SE +/- 0.97, N = 3 2884.94 2881.54 2879.53 2882.22 MIN: 2874.53 MIN: 2866.57 MIN: 2866.7 MIN: 2871.87 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B C D 400 800 1200 1600 2000 SE +/- 3.13, N = 3 SE +/- 22.13, N = 3 SE +/- 3.52, N = 3 SE +/- 2.17, N = 3 1615.08 1630.53 1618.78 1616.11 MIN: 1605.12 MIN: 1601.77 MIN: 1609.4 MIN: 1608.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B C D 0.2326 0.4652 0.6978 0.9304 1.163 SE +/- 0.086658, N = 12 SE +/- 0.001870, N = 3 SE +/- 0.006474, N = 3 SE +/- 0.091776, N = 12 1.033654 0.826339 0.842998 1.011141 MIN: 0.71 MIN: 0.71 MIN: 0.72 MIN: 0.7 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
Phoronix Test Suite v10.8.5