onednn onnx threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) and AMD Radeon RX 5700 8GB on Pop 21.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2203314-PTS-ONEDNNON39&sro&export=pdf&gru .
onednn onnx threadripper Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution A B C D AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F4p BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Pop 21.10 5.17.0-rc1-sched-core-phx (x86_64) GNOME Shell 40.5 X Server 4.6 Mesa 21.2.2 (LLVM 12.0.1) 1.2.182 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-ZPT0kp/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301039 Python Details - Python 3.9.7 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Full AMD retpoline IBPB: conditional STIBP: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
onednn onnx threadripper onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU A B C D 3461 4219 361 293 424 531 82 153 1088 995 3815 7323 2.00953 5.54387 2.18433 1.13774 0.941266 6.68005 2.10617 6.39430 1.52619 0.979135 4959.96 1260.20 4954.34 1236.75 7.59165 5028.06 1208.523 11.6037 3512 4710 362 293 425 647 80 156 1072 1010 3780 6401 1.96420 6.27072 2.35161 1.11774 0.910056 6.82166 2.11025 6.43330 1.49871 0.992713 5003.99 1251.24 5034.83 1246.07 6.93013 4997.82 1242.44 11.3449 3529 4823 362 295 432 646 81 153 1079 1017 3784 7560 1.99383 6.28663 2.37403 1.11927 0.904110 6.85039 2.11212 6.44689 1.56000 0.987020 4964.59 1221.12 5003.38 1238.60 7.57941 5011.49 1250.99 11.9035 3495 4441 361 300 421 642 81 157 1079 991 3731 7375 1.91681 6.40806 2.42176 1.10705 0.928404 6.90607 2.11146 6.44330 1.45511 0.984819 4882.28 1211.44 4950.97 1223.53 7.04981 4884.90 1254.39 11.75215 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Parallel A B C D 800 1600 2400 3200 4000 SE +/- 7.49, N = 3 SE +/- 0.44, N = 3 SE +/- 6.29, N = 3 SE +/- 4.21, N = 3 3461 3512 3529 3495 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Standard A B C D 1000 2000 3000 4000 5000 SE +/- 60.06, N = 12 SE +/- 30.47, N = 3 SE +/- 17.68, N = 3 SE +/- 44.12, N = 12 4219 4710 4823 4441 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Parallel A B C D 80 160 240 320 400 SE +/- 0.50, N = 3 SE +/- 0.29, N = 3 SE +/- 0.33, N = 3 SE +/- 0.17, N = 3 361 362 362 361 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Standard A B C D 70 140 210 280 350 SE +/- 3.18, N = 4 SE +/- 1.42, N = 3 SE +/- 1.64, N = 3 SE +/- 1.04, N = 3 293 293 295 300 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Parallel A B C D 90 180 270 360 450 SE +/- 2.93, N = 3 SE +/- 1.44, N = 3 SE +/- 1.80, N = 3 SE +/- 0.93, N = 3 424 425 432 421 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Standard A B C D 140 280 420 560 700 SE +/- 3.69, N = 3 SE +/- 0.67, N = 3 SE +/- 2.02, N = 3 SE +/- 1.32, N = 3 531 647 646 642 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel A B C D 20 40 60 80 100 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 SE +/- 0.17, N = 3 82 80 81 81 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard A B C D 30 60 90 120 150 SE +/- 0.33, N = 3 SE +/- 0.44, N = 3 SE +/- 0.73, N = 3 SE +/- 0.44, N = 3 153 156 153 157 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel A B C D 200 400 600 800 1000 SE +/- 5.11, N = 3 SE +/- 4.36, N = 3 SE +/- 7.42, N = 3 SE +/- 2.33, N = 3 1088 1072 1079 1079 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard A B C D 200 400 600 800 1000 SE +/- 5.53, N = 3 SE +/- 5.18, N = 3 SE +/- 3.28, N = 3 SE +/- 7.44, N = 3 995 1010 1017 991 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Parallel A B C D 800 1600 2400 3200 4000 SE +/- 34.71, N = 3 SE +/- 19.55, N = 3 SE +/- 26.17, N = 3 SE +/- 42.45, N = 4 3815 3780 3784 3731 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard A B C D 1600 3200 4800 6400 8000 SE +/- 58.03, N = 3 SE +/- 409.51, N = 12 SE +/- 47.29, N = 3 SE +/- 46.97, N = 3 7323 6401 7560 7375 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU A B C D 0.4521 0.9042 1.3563 1.8084 2.2605 SE +/- 0.02176, N = 15 SE +/- 0.06425, N = 12 SE +/- 0.05487, N = 15 SE +/- 0.06721, N = 15 2.00953 1.96420 1.99383 1.91681 MIN: 1.59 MIN: 1.46 MIN: 1.39 MIN: 1.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B C D 2 4 6 8 10 SE +/- 0.01655, N = 3 SE +/- 0.00710, N = 3 SE +/- 0.06711, N = 3 SE +/- 0.07364, N = 3 5.54387 6.27072 6.28663 6.40806 MIN: 5.3 MIN: 6.08 MIN: 6.01 MIN: 6.17 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B C D 0.5449 1.0898 1.6347 2.1796 2.7245 SE +/- 0.09135, N = 12 SE +/- 0.08755, N = 15 SE +/- 0.10516, N = 12 SE +/- 0.08567, N = 15 2.18433 2.35161 2.37403 2.42176 MIN: 1.28 MIN: 1.4 MIN: 1.6 MIN: 1.53 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B C D 0.256 0.512 0.768 1.024 1.28 SE +/- 0.00225, N = 3 SE +/- 0.00878, N = 9 SE +/- 0.00210, N = 3 SE +/- 0.01354, N = 3 1.13774 1.11774 1.11927 1.10705 MIN: 1.04 MIN: 1 MIN: 1.04 MIN: 1.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B C D 0.2118 0.4236 0.6354 0.8472 1.059 SE +/- 0.010786, N = 3 SE +/- 0.010535, N = 15 SE +/- 0.009118, N = 3 SE +/- 0.009748, N = 15 0.941266 0.910056 0.904110 0.928404 MIN: 0.86 MIN: 0.8 MIN: 0.84 MIN: 0.83 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A B C D 2 4 6 8 10 SE +/- 0.04611, N = 3 SE +/- 0.03741, N = 3 SE +/- 0.03051, N = 3 SE +/- 0.03878, N = 3 6.68005 6.82166 6.85039 6.90607 MIN: 5.95 MIN: 6.16 MIN: 6.18 MIN: 6.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A B C D 0.4752 0.9504 1.4256 1.9008 2.376 SE +/- 0.00321, N = 3 SE +/- 0.00652, N = 3 SE +/- 0.00512, N = 3 SE +/- 0.00569, N = 3 2.10617 2.11025 2.11212 2.11146 MIN: 2.05 MIN: 2.05 MIN: 2.06 MIN: 2.06 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C D 2 4 6 8 10 SE +/- 0.01609, N = 3 SE +/- 0.02160, N = 3 SE +/- 0.00816, N = 3 SE +/- 0.01318, N = 3 6.39430 6.43330 6.44689 6.44330 MIN: 6.31 MIN: 6.33 MIN: 6.34 MIN: 6.36 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B C D 0.351 0.702 1.053 1.404 1.755 SE +/- 0.00789, N = 3 SE +/- 0.04182, N = 12 SE +/- 0.00131, N = 3 SE +/- 0.02072, N = 3 1.52619 1.49871 1.56000 1.45511 MIN: 1.38 MIN: 0.93 MIN: 1.41 MIN: 1.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A B C D 0.2234 0.4468 0.6702 0.8936 1.117 SE +/- 0.001712, N = 3 SE +/- 0.001082, N = 3 SE +/- 0.004648, N = 3 SE +/- 0.000768, N = 3 0.979135 0.992713 0.987020 0.984819 MIN: 0.93 MIN: 0.93 MIN: 0.91 MIN: 0.92 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU A B C D 1100 2200 3300 4400 5500 SE +/- 28.21, N = 3 SE +/- 8.67, N = 3 SE +/- 48.98, N = 3 SE +/- 59.27, N = 15 4959.96 5003.99 4964.59 4882.28 MIN: 4866.16 MIN: 4937.34 MIN: 4820.84 MIN: 4219.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A B C D 300 600 900 1200 1500 SE +/- 1.48, N = 3 SE +/- 15.69, N = 3 SE +/- 2.45, N = 3 SE +/- 9.49, N = 3 1260.20 1251.24 1221.12 1211.44 MIN: 1234.63 MIN: 1199.03 MIN: 1196.31 MIN: 1174.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU A B C D 1100 2200 3300 4400 5500 SE +/- 40.36, N = 9 SE +/- 24.43, N = 3 SE +/- 9.55, N = 3 SE +/- 19.43, N = 3 4954.34 5034.83 5003.38 4950.97 MIN: 4613.84 MIN: 4959.6 MIN: 4934.99 MIN: 4866.67 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A B C D 300 600 900 1200 1500 SE +/- 9.32, N = 3 SE +/- 10.61, N = 15 SE +/- 2.90, N = 3 SE +/- 5.02, N = 3 1236.75 1246.07 1238.60 1223.53 MIN: 1201.87 MIN: 1122.63 MIN: 1199.98 MIN: 1198.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU A B C D 2 4 6 8 10 SE +/- 0.31854, N = 15 SE +/- 0.38999, N = 15 SE +/- 0.41090, N = 15 SE +/- 0.36772, N = 12 7.59165 6.93013 7.57941 7.04981 MIN: 5.06 MIN: 4.81 MIN: 4.43 MIN: 4.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU A B C D 1100 2200 3300 4400 5500 SE +/- 3.11, N = 3 SE +/- 9.21, N = 3 SE +/- 10.61, N = 3 SE +/- 80.48, N = 13 5028.06 4997.82 5011.49 4884.90 MIN: 4972.23 MIN: 4933.96 MIN: 4942.23 MIN: 4023.92 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B C D 300 600 900 1200 1500 SE +/- 37.60, N = 12 SE +/- 8.78, N = 3 SE +/- 14.38, N = 3 SE +/- 10.63, N = 3 1208.52 1242.44 1250.99 1254.39 MIN: 796.32 MIN: 1204.89 MIN: 1202.91 MIN: 1209.18 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B C D 3 6 9 12 15 SE +/- 0.23, N = 15 SE +/- 0.07, N = 3 SE +/- 0.32, N = 15 SE +/- 0.39, N = 12 11.60 11.34 11.90 11.75 MIN: 9.98 MIN: 10.78 MIN: 8.22 MIN: 8.18 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
Phoronix Test Suite v10.8.5