8380 sun 2 x Intel Xeon Platinum 8380 testing with a Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) and ASPEED on Ubuntu 20.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2204037-NE-8380SUN3435&gru .
8380 sun Processor Motherboard Chipset Memory Disk Graphics Monitor Network OS Kernel Desktop Display Server Vulkan Compiler File-System Screen Resolution A B C D 2 x Intel Xeon Platinum 8380 @ 3.40GHz (80 Cores / 160 Threads) Intel M50CYP2SB2U (SE5C6200.86B.0022.D08.2103221623 BIOS) Intel Device 0998 512GB 3841GB Micron_9300_MTFDHAL3T8TDP ASPEED VE228 2 x Intel X710 for 10GBASE-T + 2 x Intel E810-C for QSFP Ubuntu 20.04 5.15.11-051511-generic (x86_64) GNOME Shell 3.36.9 X Server 1.20.13 1.0.2 GCC 9.3.0 + Clang 10.0.0-4ubuntu1 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0xd0002a0 Java Details - OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04) Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
8380 sun perf-bench: Memcpy 1MB perf-bench: Memset 1MB java-jmh: Throughput perf-bench: Epoll Wait perf-bench: Futex Hash perf-bench: Sched Pipe perf-bench: Futex Lock-Pi perf-bench: Syscall Basic onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU A B C D 16.846117 58.850905 117974431957.74 3190 2990938 157484 47 13955440 0.874369 1.27751 1.30484 0.444806 2.98315 1.81335 1.38472 7.24738 0.884168 1.13824 0.372274 0.194293 627.389 379.171 618.676 2.12309 3.78575 3.62162 378.855 0.230657 636.287 378.959 0.177939 2.08241 15.93434 53.171926 117397403601.04 3376 2990356 159641 44 13985008 0.858012 1.2813 1.29761 0.438798 3.00061 1.81013 1.39272 7.23701 0.888675 1.12915 0.3691 0.196251 625.616 379.589 626.982 2.11786 3.79039 3.61967 383.208 0.234187 621.88 386.179 0.170109 2.09074 16.267801 58.448906 117539633814.56 3518 2985843 159064 49 13976056 0.863419 1.29049 1.28310 0.443744 2.99363 1.82023 1.38829 7.26969 0.888288 1.14503 0.364478 0.194502 621.734 379.634 618.143 2.10492 3.76245 3.60012 379.206 0.232623 618.823 379.585 0.172350 2.05510 15.954044 59.788673 117793385377.77 3345 2984853 158896 50 13925517 0.855213 1.28979 1.27539 0.445376 2.99408 1.82082 1.38183 7.29653 0.884572 1.15270 0.372693 0.191428 622.500 378.233 617.792 2.11735 3.77449 3.60258 378.641 0.232562 617.729 377.234 0.172182 2.08466 OpenBenchmarking.org
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB A B C D 4 8 12 16 20 SE +/- 0.16, N = 3 SE +/- 0.18, N = 3 16.85 15.93 16.27 15.95 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB A B C D 13 26 39 52 65 SE +/- 0.55, N = 15 SE +/- 1.02, N = 3 58.85 53.17 58.45 59.79 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma
Java JMH Throughput OpenBenchmarking.org Ops/s, More Is Better Java JMH Throughput A B C D 30000M 60000M 90000M 120000M 150000M 117974431957.74 117397403601.04 117539633814.56 117793385377.77
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait A B C D 800 1600 2400 3200 4000 SE +/- 26.58, N = 3 SE +/- 25.11, N = 3 3190 3376 3518 3345 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash A B C D 600K 1200K 1800K 2400K 3000K SE +/- 2180.19, N = 3 SE +/- 4824.86, N = 3 2990938 2990356 2985843 2984853 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe A B C D 30K 60K 90K 120K 150K SE +/- 545.51, N = 3 SE +/- 451.65, N = 3 157484 159641 159064 158896 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi A B C D 11 22 33 44 55 SE +/- 1.92, N = 15 SE +/- 2.63, N = 15 47 44 49 50 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic A B C D 3M 6M 9M 12M 15M SE +/- 4397.07, N = 3 SE +/- 37556.12, N = 3 13955440 13985008 13976056 13925517 1. (CC) gcc options: -pthread -shared -lunwind-x86_64 -lunwind -llzma -Xlinker -export-dynamic -O6 -ggdb3 -funwind-tables -std=gnu99 -fPIC -lnuma
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU A B C D 0.1967 0.3934 0.5901 0.7868 0.9835 SE +/- 0.001707, N = 3 SE +/- 0.002275, N = 3 0.874369 0.858012 0.863419 0.855213 MIN: 0.81 MIN: 0.8 MIN: 0.8 MIN: 0.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B C D 0.2904 0.5808 0.8712 1.1616 1.452 SE +/- 0.00528, N = 3 SE +/- 0.00317, N = 3 1.27751 1.28130 1.29049 1.28979 MIN: 1.24 MIN: 1.25 MIN: 1.25 MIN: 1.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B C D 0.2936 0.5872 0.8808 1.1744 1.468 SE +/- 0.02105, N = 3 SE +/- 0.01852, N = 3 1.30484 1.29761 1.28310 1.27539 MIN: 0.94 MIN: 1.02 MIN: 0.93 MIN: 0.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B C D 0.1002 0.2004 0.3006 0.4008 0.501 SE +/- 0.001544, N = 3 SE +/- 0.000362, N = 3 0.444806 0.438798 0.443744 0.445376 MIN: 0.41 MIN: 0.41 MIN: 0.41 MIN: 0.41 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU A B C D 0.6751 1.3502 2.0253 2.7004 3.3755 SE +/- 0.00300, N = 3 SE +/- 0.00450, N = 3 2.98315 3.00061 2.99363 2.99408 MIN: 2.86 MIN: 2.87 MIN: 2.86 MIN: 2.85 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU A B C D 0.4097 0.8194 1.2291 1.6388 2.0485 SE +/- 0.00151, N = 3 SE +/- 0.00600, N = 3 1.81335 1.81013 1.82023 1.82082 MIN: 1.68 MIN: 1.68 MIN: 1.68 MIN: 1.68 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B C D 0.3134 0.6268 0.9402 1.2536 1.567 SE +/- 0.00571, N = 3 SE +/- 0.00232, N = 3 1.38472 1.39272 1.38829 1.38183 MIN: 1.28 MIN: 1.24 MIN: 1.26 MIN: 1.26 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A B C D 2 4 6 8 10 SE +/- 0.00587, N = 3 SE +/- 0.00979, N = 3 7.24738 7.23701 7.26969 7.29653 MIN: 6.76 MIN: 6.73 MIN: 6.69 MIN: 6.67 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A B C D 0.2 0.4 0.6 0.8 1 SE +/- 0.002781, N = 3 SE +/- 0.001577, N = 3 0.884168 0.888675 0.888288 0.884572 MIN: 0.84 MIN: 0.84 MIN: 0.84 MIN: 0.84 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C D 0.2594 0.5188 0.7782 1.0376 1.297 SE +/- 0.00239, N = 3 SE +/- 0.00521, N = 3 1.13824 1.12915 1.14503 1.15270 MIN: 0.95 MIN: 0.96 MIN: 0.97 MIN: 0.97 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B C D 0.0839 0.1678 0.2517 0.3356 0.4195 SE +/- 0.000878, N = 3 SE +/- 0.003422, N = 3 0.372274 0.369100 0.364478 0.372693 MIN: 0.33 MIN: 0.33 MIN: 0.33 MIN: 0.33 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A B C D 0.0442 0.0884 0.1326 0.1768 0.221 SE +/- 0.001404, N = 3 SE +/- 0.000487, N = 3 0.194293 0.196251 0.194502 0.191428 MIN: 0.18 MIN: 0.18 MIN: 0.18 MIN: 0.18 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU A B C D 140 280 420 560 700 SE +/- 1.66, N = 3 SE +/- 1.46, N = 3 627.39 625.62 621.73 622.50 MIN: 596.88 MIN: 597.43 MIN: 594.79 MIN: 596.58 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A B C D 80 160 240 320 400 SE +/- 1.29, N = 3 SE +/- 0.51, N = 3 379.17 379.59 379.63 378.23 MIN: 363.09 MIN: 361.95 MIN: 361.75 MIN: 360.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU A B C D 140 280 420 560 700 SE +/- 1.17, N = 3 SE +/- 0.39, N = 3 618.68 626.98 618.14 617.79 MIN: 594.98 MIN: 598.45 MIN: 592.21 MIN: 594.21 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU A B C D 0.4777 0.9554 1.4331 1.9108 2.3885 SE +/- 0.00132, N = 3 SE +/- 0.00242, N = 3 2.12309 2.11786 2.10492 2.11735 MIN: 2.03 MIN: 2.03 MIN: 2.03 MIN: 2.03 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU A B C D 0.8528 1.7056 2.5584 3.4112 4.264 SE +/- 0.00243, N = 3 SE +/- 0.00723, N = 3 3.78575 3.79039 3.76245 3.77449 MIN: 3.54 MIN: 3.53 MIN: 3.52 MIN: 3.53 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU A B C D 0.8149 1.6298 2.4447 3.2596 4.0745 SE +/- 0.00386, N = 3 SE +/- 0.00989, N = 3 3.62162 3.61967 3.60012 3.60258 MIN: 3.51 MIN: 3.51 MIN: 3.51 MIN: 3.51 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A B C D 80 160 240 320 400 SE +/- 2.30, N = 3 SE +/- 0.30, N = 3 378.86 383.21 379.21 378.64 MIN: 362.36 MIN: 366.5 MIN: 358.89 MIN: 362.01 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU A B C D 0.0527 0.1054 0.1581 0.2108 0.2635 SE +/- 0.001971, N = 3 SE +/- 0.000626, N = 3 0.230657 0.234187 0.232623 0.232562 MIN: 0.21 MIN: 0.22 MIN: 0.21 MIN: 0.22 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU A B C D 140 280 420 560 700 SE +/- 0.48, N = 3 SE +/- 0.57, N = 3 636.29 621.88 618.82 617.73 MIN: 595.54 MIN: 595.66 MIN: 594.97 MIN: 594.74 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B C D 80 160 240 320 400 SE +/- 1.61, N = 3 SE +/- 1.09, N = 3 378.96 386.18 379.59 377.23 MIN: 363.09 MIN: 363.76 MIN: 358.8 MIN: 359.79 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B C D 0.04 0.08 0.12 0.16 0.2 SE +/- 0.000739, N = 3 SE +/- 0.001576, N = 3 0.177939 0.170109 0.172350 0.172182 MIN: 0.16 MIN: 0.15 MIN: 0.16 MIN: 0.15 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU A B C D 0.4704 0.9408 1.4112 1.8816 2.352 SE +/- 0.01164, N = 3 SE +/- 0.03558, N = 3 2.08241 2.09074 2.05510 2.08466 MIN: 1.86 MIN: 1.85 MIN: 1.81 MIN: 1.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -lpthread -ldl
Phoronix Test Suite v10.8.4