10980XE onednn onnx Intel Core i9-10980XE testing with a ASRock X299 Steel Legend (P1.30 BIOS) and NVIDIA GeForce GTX 1080 Ti 11GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2204024-PTS-10980XEO27&sor .
10980XE onednn onnx Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution A B C D Intel Core i9-10980XE @ 4.80GHz (18 Cores / 36 Threads) ASRock X299 Steel Legend (P1.30 BIOS) Intel Sky Lake-E DMI3 Registers 32GB Samsung SSD 970 PRO 512GB NVIDIA GeForce GTX 1080 Ti 11GB Realtek ALC1220 ASUS VP28U Intel I219-V + Intel I211 Ubuntu 22.04 5.15.0-17-generic (x86_64) GNOME Shell 40.5 X Server 1.20.13 NVIDIA 495.46 4.6.0 OpenCL 3.0 CUDA 11.5.103 1.2.186 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-pWTZs6/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-pWTZs6/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_cpufreq schedutil - CPU Microcode: 0x5003102 Java Details - OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1) Python Details - Python 3.9.9 Security Details - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
10980XE onednn onnx fast-cli: Internet Download Speed fast-cli: Internet Upload Speed fast-cli: Internet Latency fast-cli: Internet Loaded Latency (Bufferbloat) speedtest-cli: Internet Download Speed speedtest-cli: Internet Upload Speed speedtest-cli: Internet Latency perf-bench: Epoll Wait perf-bench: Futex Hash perf-bench: Memcpy 1MB perf-bench: Memset 1MB perf-bench: Sched Pipe perf-bench: Futex Lock-Pi perf-bench: Syscall Basic onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU java-jmh: Throughput onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard A B C D 370 6.7 8 73 324.65 9.47 23.87 31157 4494394 17.382564 66.057732 86982 236 17140545 87.0693 58.0758 83.8984 44.4304 93.3073 55.9293 24.0564 99.18 19.0856 23.7444 50.2996 9.33345 23584.2 26069.6 23516.1 27.0016 116.514 22.3996 28356.5 35.2021 22990.4 27155 30.7784 37.7937 33246015808.702 5799 8749 444 665 782 897 101 148 1464 2053 5890 6168 370 6.9 8 64 321.74 9.22 15.487 30615 4493415 16.766498 68.321136 78301 215 17221544 89.0246 58.2895 88.3586 44.2168 93.432 56.0613 24.384 109.808 18.6987 23.7723 50.1553 9.40424 23366.8 26549.2 22790.9 25.8871 135.473 23.9021 27247.5 36.3562 22726.7 27353.6 34.5221 39.0485 33247767440.989 5795 8812 448 663 774 911 102 119 1474 1538 5975 9884 390 7.9 13 64 280.18 8.4 28.782 35556 4513944 17.779377 69.500836 111014 222 17145832 87.0626 57.2458 89.5153 45.8347 93.2753 56.7312 24.4218 107.714 19.6687 24.0619 0.774919 1.62907 18304.9 26033.7 22788.5 26.1804 130.958 22.7645 28624.4 26.991 22605.3 28513.6 30.8774 37.7823 33236796629.92 6114 9595 447 680 786 923 101 360 6.8 10 70 327.12 8.95 22.677 35205 4499081 17.277106 68.902923 76237 231 17504146 90.3226 58.4361 89.0099 44.3999 91.696 55.5827 24.3468 112.623 18.9448 23.7889 54.0262 9.35235 23371.5 26872.5 23844.1 25.9415 118.154 23.0195 28758.8 31.2678 23319.6 28429.2 33.3777 37.5531 33246121586.081 6072 9591 449 569 796 897 101 148 1491 2014 5969 10255 OpenBenchmarking.org
fast-cli Internet Download Speed OpenBenchmarking.org Mbit/s, More Is Better fast-cli Internet Download Speed C B A D 80 160 240 320 400 390 370 370 360
fast-cli Internet Upload Speed OpenBenchmarking.org Mbit/s, More Is Better fast-cli Internet Upload Speed C B D A 2 4 6 8 10 7.9 6.9 6.8 6.7
fast-cli Internet Latency OpenBenchmarking.org ms, Fewer Is Better fast-cli Internet Latency A B D C 3 6 9 12 15 8 8 10 13
fast-cli Internet Loaded Latency (Bufferbloat) OpenBenchmarking.org ms, Fewer Is Better fast-cli Internet Loaded Latency (Bufferbloat) B C D A 16 32 48 64 80 64 64 70 73
speedtest-cli Internet Download Speed OpenBenchmarking.org Mbit/s, More Is Better speedtest-cli 2.1.3 Internet Download Speed D A B C 70 140 210 280 350 327.12 324.65 321.74 280.18
speedtest-cli Internet Upload Speed OpenBenchmarking.org Mbit/s, More Is Better speedtest-cli 2.1.3 Internet Upload Speed A B D C 3 6 9 12 15 9.47 9.22 8.95 8.40
speedtest-cli Internet Latency OpenBenchmarking.org ms, Fewer Is Better speedtest-cli 2.1.3 Internet Latency B D A C 7 14 21 28 35 15.49 22.68 23.87 28.78
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait C D A B 8K 16K 24K 32K 40K 35556 35205 31157 30615 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash C D A B 1000K 2000K 3000K 4000K 5000K 4513944 4499081 4494394 4493415 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB C A D B 4 8 12 16 20 17.78 17.38 17.28 16.77 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB C D B A 15 30 45 60 75 69.50 68.90 68.32 66.06 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe C A B D 20K 40K 60K 80K 100K 111014 86982 78301 76237 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi A D C B 50 100 150 200 250 236 231 222 215 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic D B C A 4M 8M 12M 16M 20M 17504146 17221544 17145832 17140545 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU C A B D 20 40 60 80 100 87.06 87.07 89.02 90.32 MIN: 49.86 MIN: 32.83 MIN: 33.04 MIN: 70.91 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU C A B D 13 26 39 52 65 57.25 58.08 58.29 58.44 MIN: 14.92 MIN: 42.18 MIN: 50.37 MIN: 39.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B D C 20 40 60 80 100 83.90 88.36 89.01 89.52 MIN: 35.65 MIN: 57.57 MIN: 64.34 MIN: 54.63 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU B D A C 10 20 30 40 50 44.22 44.40 44.43 45.83 MIN: 32.09 MIN: 6.12 MIN: 19.44 MIN: 24.2 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU D C A B 20 40 60 80 100 91.70 93.28 93.31 93.43 MIN: 42.74 MIN: 59.18 MIN: 46.42 MIN: 43.74 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU D A B C 13 26 39 52 65 55.58 55.93 56.06 56.73 MIN: 30.28 MIN: 29.18 MIN: 16.51 MIN: 32.47 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A D B C 6 12 18 24 30 24.06 24.35 24.38 24.42 MIN: 12.14 MIN: 14.81 MIN: 14.3 MIN: 14.77 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A C B D 30 60 90 120 150 99.18 107.71 109.81 112.62 MIN: 6.42 MIN: 23.72 MIN: 24.19 MIN: 65.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU B D A C 5 10 15 20 25 18.70 18.94 19.09 19.67 MIN: 11.49 MIN: 16.51 MIN: 2.83 MIN: 3.02 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B D C 6 12 18 24 30 23.74 23.77 23.79 24.06 MIN: 13.82 MIN: 22.07 MIN: 10.19 MIN: 13.26 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU C B A D 12 24 36 48 60 0.774919 50.155300 50.299600 54.026200 MIN: 10.62 MIN: 9.93 MIN: 10.84 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU C A D B 3 6 9 12 15 1.62907 9.33345 9.35235 9.40424 MIN: 0.66 MIN: 8.08 MIN: 8.8 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU C B D A 5K 10K 15K 20K 25K 18304.9 23366.8 23371.5 23584.2 MIN: 16470 MIN: 19714.8 MIN: 20963.9 MIN: 21773.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU C A B D 6K 12K 18K 24K 30K 26033.7 26069.6 26549.2 26872.5 MIN: 20193.5 MIN: 20933.3 MIN: 21084.1 MIN: 20170 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU C B A D 5K 10K 15K 20K 25K 22788.5 22790.9 23516.1 23844.1 MIN: 19335.2 MIN: 19530.9 MIN: 21274.9 MIN: 22253.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU B D C A 6 12 18 24 30 25.89 25.94 26.18 27.00 MIN: 11.87 MIN: 12.27 MIN: 12.79 MIN: 12.07 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU A D C B 30 60 90 120 150 116.51 118.15 130.96 135.47 MIN: 32.08 MIN: 16.46 MIN: 35.29 MIN: 51.31 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU A C D B 6 12 18 24 30 22.40 22.76 23.02 23.90 MIN: 17.08 MIN: 19.46 MIN: 19.25 MIN: 19.53 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU B A C D 6K 12K 18K 24K 30K 27247.5 28356.5 28624.4 28758.8 MIN: 22350 MIN: 24027 MIN: 25457.4 MIN: 23747 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU C D A B 8 16 24 32 40 26.99 31.27 35.20 36.36 MIN: 1.4 MIN: 11.61 MIN: 15.38 MIN: 25.91 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU C B A D 5K 10K 15K 20K 25K 22605.3 22726.7 22990.4 23319.6 MIN: 19157.8 MIN: 19307.6 MIN: 19958.1 MIN: 20370.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B D C 6K 12K 18K 24K 30K 27155.0 27353.6 28429.2 28513.6 MIN: 21711.8 MIN: 23193.9 MIN: 24528.9 MIN: 24411.3 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A C D B 8 16 24 32 40 30.78 30.88 33.38 34.52 MIN: 21.88 MIN: 10.41 MIN: 15.35 MIN: 16.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU D C A B 9 18 27 36 45 37.55 37.78 37.79 39.05 MIN: 24.18 MIN: 17.54 MIN: 20.51 MIN: 28.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
Java JMH Throughput OpenBenchmarking.org Ops/s, More Is Better Java JMH Throughput B D A C 7000M 14000M 21000M 28000M 35000M 33247767440.99 33246121586.08 33246015808.70 33236796629.92
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Parallel C D A B 1300 2600 3900 5200 6500 6114 6072 5799 5795 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Standard C D B A 2K 4K 6K 8K 10K 9595 9591 8812 8749 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Parallel D B C A 100 200 300 400 500 449 448 447 444 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Standard C A B D 150 300 450 600 750 680 665 663 569 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Parallel D C A B 200 400 600 800 1000 796 786 782 774 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Standard C B D A 200 400 600 800 1000 923 911 897 897 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel B D C A 20 40 60 80 100 102 101 101 101 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard D A B 30 60 90 120 150 148 148 119 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel D B A 300 600 900 1200 1500 1491 1474 1464 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard A D B 400 800 1200 1600 2000 2053 2014 1538 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Parallel B D A 1300 2600 3900 5200 6500 5975 5969 5890 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard D B A 2K 4K 6K 8K 10K 10255 9884 6168 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.4