10980XE onednn onnx Intel Core i9-10980XE testing with a ASRock X299 Steel Legend (P1.30 BIOS) and NVIDIA GeForce GTX 1080 Ti 11GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2204024-PTS-10980XEO27 .
10980XE onednn onnx Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution A B C D Intel Core i9-10980XE @ 4.80GHz (18 Cores / 36 Threads) ASRock X299 Steel Legend (P1.30 BIOS) Intel Sky Lake-E DMI3 Registers 32GB Samsung SSD 970 PRO 512GB NVIDIA GeForce GTX 1080 Ti 11GB Realtek ALC1220 ASUS VP28U Intel I219-V + Intel I211 Ubuntu 22.04 5.15.0-17-generic (x86_64) GNOME Shell 40.5 X Server 1.20.13 NVIDIA 495.46 4.6.0 OpenCL 3.0 CUDA 11.5.103 1.2.186 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-pWTZs6/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-pWTZs6/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_cpufreq schedutil - CPU Microcode: 0x5003102 Java Details - OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1) Python Details - Python 3.9.9 Security Details - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
10980XE onednn onnx fast-cli: Internet Download Speed fast-cli: Internet Upload Speed fast-cli: Internet Latency fast-cli: Internet Loaded Latency (Bufferbloat) speedtest-cli: Internet Download Speed speedtest-cli: Internet Upload Speed speedtest-cli: Internet Latency perf-bench: Epoll Wait perf-bench: Futex Hash perf-bench: Memcpy 1MB perf-bench: Memset 1MB perf-bench: Sched Pipe perf-bench: Futex Lock-Pi perf-bench: Syscall Basic onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU java-jmh: Throughput onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard A B C D 370 6.7 8 73 324.65 9.47 23.87 31157 4494394 17.382564 66.057732 86982 236 17140545 87.0693 58.0758 83.8984 44.4304 93.3073 55.9293 24.0564 99.18 19.0856 23.7444 50.2996 9.33345 23584.2 26069.6 23516.1 27.0016 116.514 22.3996 28356.5 35.2021 22990.4 27155 30.7784 37.7937 33246015808.702 5799 8749 444 665 782 897 101 148 1464 2053 5890 6168 370 6.9 8 64 321.74 9.22 15.487 30615 4493415 16.766498 68.321136 78301 215 17221544 89.0246 58.2895 88.3586 44.2168 93.432 56.0613 24.384 109.808 18.6987 23.7723 50.1553 9.40424 23366.8 26549.2 22790.9 25.8871 135.473 23.9021 27247.5 36.3562 22726.7 27353.6 34.5221 39.0485 33247767440.989 5795 8812 448 663 774 911 102 119 1474 1538 5975 9884 390 7.9 13 64 280.18 8.4 28.782 35556 4513944 17.779377 69.500836 111014 222 17145832 87.0626 57.2458 89.5153 45.8347 93.2753 56.7312 24.4218 107.714 19.6687 24.0619 0.774919 1.62907 18304.9 26033.7 22788.5 26.1804 130.958 22.7645 28624.4 26.991 22605.3 28513.6 30.8774 37.7823 33236796629.92 6114 9595 447 680 786 923 101 360 6.8 10 70 327.12 8.95 22.677 35205 4499081 17.277106 68.902923 76237 231 17504146 90.3226 58.4361 89.0099 44.3999 91.696 55.5827 24.3468 112.623 18.9448 23.7889 54.0262 9.35235 23371.5 26872.5 23844.1 25.9415 118.154 23.0195 28758.8 31.2678 23319.6 28429.2 33.3777 37.5531 33246121586.081 6072 9591 449 569 796 897 101 148 1491 2014 5969 10255 OpenBenchmarking.org
fast-cli Internet Download Speed OpenBenchmarking.org Mbit/s, More Is Better fast-cli Internet Download Speed A B C D 80 160 240 320 400 370 370 390 360
fast-cli Internet Upload Speed OpenBenchmarking.org Mbit/s, More Is Better fast-cli Internet Upload Speed A B C D 2 4 6 8 10 6.7 6.9 7.9 6.8
fast-cli Internet Latency OpenBenchmarking.org ms, Fewer Is Better fast-cli Internet Latency A B C D 3 6 9 12 15 8 8 13 10
fast-cli Internet Loaded Latency (Bufferbloat) OpenBenchmarking.org ms, Fewer Is Better fast-cli Internet Loaded Latency (Bufferbloat) A B C D 16 32 48 64 80 73 64 64 70
speedtest-cli Internet Download Speed OpenBenchmarking.org Mbit/s, More Is Better speedtest-cli 2.1.3 Internet Download Speed A B C D 70 140 210 280 350 324.65 321.74 280.18 327.12
speedtest-cli Internet Upload Speed OpenBenchmarking.org Mbit/s, More Is Better speedtest-cli 2.1.3 Internet Upload Speed A B C D 3 6 9 12 15 9.47 9.22 8.40 8.95
speedtest-cli Internet Latency OpenBenchmarking.org ms, Fewer Is Better speedtest-cli 2.1.3 Internet Latency A B C D 7 14 21 28 35 23.87 15.49 28.78 22.68
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait A B C D 8K 16K 24K 32K 40K 31157 30615 35556 35205 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash A B C D 1000K 2000K 3000K 4000K 5000K 4494394 4493415 4513944 4499081 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB A B C D 4 8 12 16 20 17.38 16.77 17.78 17.28 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB A B C D 15 30 45 60 75 66.06 68.32 69.50 68.90 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe A B C D 20K 40K 60K 80K 100K 86982 78301 111014 76237 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi A B C D 50 100 150 200 250 236 215 222 231 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic A B C D 4M 8M 12M 16M 20M 17140545 17221544 17145832 17504146 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU A B C D 20 40 60 80 100 87.07 89.02 87.06 90.32 MIN: 32.83 MIN: 33.04 MIN: 49.86 MIN: 70.91 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B C D 13 26 39 52 65 58.08 58.29 57.25 58.44 MIN: 42.18 MIN: 50.37 MIN: 14.92 MIN: 39.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B C D 20 40 60 80 100 83.90 88.36 89.52 89.01 MIN: 35.65 MIN: 57.57 MIN: 54.63 MIN: 64.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B C D 10 20 30 40 50 44.43 44.22 45.83 44.40 MIN: 19.44 MIN: 32.09 MIN: 24.2 MIN: 6.12 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU A B C D 20 40 60 80 100 93.31 93.43 93.28 91.70 MIN: 46.42 MIN: 43.74 MIN: 59.18 MIN: 42.74 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU A B C D 13 26 39 52 65 55.93 56.06 56.73 55.58 MIN: 29.18 MIN: 16.51 MIN: 32.47 MIN: 30.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B C D 6 12 18 24 30 24.06 24.38 24.42 24.35 MIN: 12.14 MIN: 14.3 MIN: 14.77 MIN: 14.81 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A B C D 30 60 90 120 150 99.18 109.81 107.71 112.62 MIN: 6.42 MIN: 24.19 MIN: 23.72 MIN: 65.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A B C D 5 10 15 20 25 19.09 18.70 19.67 18.94 MIN: 2.83 MIN: 11.49 MIN: 3.02 MIN: 16.51 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C D 6 12 18 24 30 23.74 23.77 24.06 23.79 MIN: 13.82 MIN: 22.07 MIN: 13.26 MIN: 10.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B C D 12 24 36 48 60 50.299600 50.155300 0.774919 54.026200 MIN: 9.93 MIN: 10.62 MIN: 10.84 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A B C D 3 6 9 12 15 9.33345 9.40424 1.62907 9.35235 MIN: 0.66 MIN: 8.8 MIN: 8.08 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU A B C D 5K 10K 15K 20K 25K 23584.2 23366.8 18304.9 23371.5 MIN: 21773.9 MIN: 19714.8 MIN: 16470 MIN: 20963.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A B C D 6K 12K 18K 24K 30K 26069.6 26549.2 26033.7 26872.5 MIN: 20933.3 MIN: 21084.1 MIN: 20193.5 MIN: 20170 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU A B C D 5K 10K 15K 20K 25K 23516.1 22790.9 22788.5 23844.1 MIN: 21274.9 MIN: 19530.9 MIN: 19335.2 MIN: 22253.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU A B C D 6 12 18 24 30 27.00 25.89 26.18 25.94 MIN: 12.07 MIN: 11.87 MIN: 12.79 MIN: 12.27 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU A B C D 30 60 90 120 150 116.51 135.47 130.96 118.15 MIN: 32.08 MIN: 51.31 MIN: 35.29 MIN: 16.46 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU A B C D 6 12 18 24 30 22.40 23.90 22.76 23.02 MIN: 17.08 MIN: 19.53 MIN: 19.46 MIN: 19.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A B C D 6K 12K 18K 24K 30K 28356.5 27247.5 28624.4 28758.8 MIN: 24027 MIN: 22350 MIN: 25457.4 MIN: 23747 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU A B C D 8 16 24 32 40 35.20 36.36 26.99 31.27 MIN: 15.38 MIN: 25.91 MIN: 1.4 MIN: 11.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU A B C D 5K 10K 15K 20K 25K 22990.4 22726.7 22605.3 23319.6 MIN: 19958.1 MIN: 19307.6 MIN: 19157.8 MIN: 20370.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B C D 6K 12K 18K 24K 30K 27155.0 27353.6 28513.6 28429.2 MIN: 21711.8 MIN: 23193.9 MIN: 24411.3 MIN: 24528.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B C D 8 16 24 32 40 30.78 34.52 30.88 33.38 MIN: 21.88 MIN: 16.34 MIN: 10.41 MIN: 15.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU A B C D 9 18 27 36 45 37.79 39.05 37.78 37.55 MIN: 20.51 MIN: 28.78 MIN: 17.54 MIN: 24.18 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
Java JMH Throughput OpenBenchmarking.org Ops/s, More Is Better Java JMH Throughput A B C D 7000M 14000M 21000M 28000M 35000M 33246015808.70 33247767440.99 33236796629.92 33246121586.08
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Parallel A B C D 1300 2600 3900 5200 6500 5799 5795 6114 6072 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Standard A B C D 2K 4K 6K 8K 10K 8749 8812 9595 9591 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Parallel A B C D 100 200 300 400 500 444 448 447 449 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Standard A B C D 150 300 450 600 750 665 663 680 569 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Parallel A B C D 200 400 600 800 1000 782 774 786 796 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Standard A B C D 200 400 600 800 1000 897 911 923 897 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel A B C D 20 40 60 80 100 101 102 101 101 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard A B D 30 60 90 120 150 148 119 148 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel A B D 300 600 900 1200 1500 1464 1474 1491 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard A B D 400 800 1200 1600 2000 2053 1538 2014 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Parallel A B D 1300 2600 3900 5200 6500 5890 5975 5969 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard A B D 2K 4K 6K 8K 10K 6168 9884 10255 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.4