10980XE onednn onnx Intel Core i9-10980XE testing with a ASRock X299 Steel Legend (P1.30 BIOS) and NVIDIA GeForce GTX 1080 Ti 11GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2204024-PTS-10980XEO27&grs&sro .
10980XE onednn onnx Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL OpenCL Vulkan Compiler File-System Screen Resolution A B C D Intel Core i9-10980XE @ 4.80GHz (18 Cores / 36 Threads) ASRock X299 Steel Legend (P1.30 BIOS) Intel Sky Lake-E DMI3 Registers 32GB Samsung SSD 970 PRO 512GB NVIDIA GeForce GTX 1080 Ti 11GB Realtek ALC1220 ASUS VP28U Intel I219-V + Intel I211 Ubuntu 22.04 5.15.0-17-generic (x86_64) GNOME Shell 40.5 X Server 1.20.13 NVIDIA 495.46 4.6.0 OpenCL 3.0 CUDA 11.5.103 1.2.186 GCC 11.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-pWTZs6/gcc-11-11.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-pWTZs6/gcc-11-11.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_cpufreq schedutil - CPU Microcode: 0x5003102 Java Details - OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1) Python Details - Python 3.9.9 Security Details - itlb_multihit: KVM: Mitigation of VMX disabled + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Mitigation of TSX disabled
10980XE onednn onnx onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU speedtest-cli: Internet Latency onnx: super-resolution-10 - CPU - Standard fast-cli: Internet Latency perf-bench: Sched Pipe onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onnx: ArcFace ResNet-100 - CPU - Standard onednn: Recurrent Neural Network Training - f32 - CPU onnx: fcn-resnet101-11 - CPU - Standard onnx: yolov4 - CPU - Standard fast-cli: Internet Upload Speed speedtest-cli: Internet Download Speed onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU perf-bench: Epoll Wait fast-cli: Internet Loaded Latency (Bufferbloat) onednn: Deconvolution Batch shapes_1d - f32 - CPU speedtest-cli: Internet Upload Speed onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU perf-bench: Futex Lock-Pi onnx: GPT-2 - CPU - Standard fast-cli: Internet Download Speed onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU perf-bench: Memcpy 1MB onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onnx: GPT-2 - CPU - Parallel perf-bench: Memset 1MB onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel perf-bench: Syscall Basic onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU onnx: ArcFace ResNet-100 - CPU - Parallel onednn: Convolution Batch Shapes Auto - f32 - CPU onnx: super-resolution-10 - CPU - Parallel onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onnx: yolov4 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel perf-bench: Futex Hash java-jmh: Throughput A B C D 50.2996 9.33345 23.87 6168 8 86982 35.2021 2053 23584.2 148 665 6.7 324.65 116.514 31157 73 99.18 9.47 30.7784 236 8749 370 22.3996 83.8984 17.382564 28356.5 5799 66.057732 19.0856 27155 23516.1 27.0016 37.7937 87.0693 44.4304 26069.6 22990.4 897 782 17140545 58.0758 55.9293 93.3073 1464 24.0564 5890 23.7444 444 101 4494394 33246015808.702 50.1553 9.40424 15.487 9884 8 78301 36.3562 1538 23366.8 119 663 6.9 321.74 135.473 30615 64 109.808 9.22 34.5221 215 8812 370 23.9021 88.3586 16.766498 27247.5 5795 68.321136 18.6987 27353.6 22790.9 25.8871 39.0485 89.0246 44.2168 26549.2 22726.7 911 774 17221544 58.2895 56.0613 93.432 1474 24.384 5975 23.7723 448 102 4493415 33247767440.989 0.774919 1.62907 28.782 13 111014 26.991 18304.9 680 7.9 280.18 130.958 35556 64 107.714 8.4 30.8774 222 9595 390 22.7645 89.5153 17.779377 28624.4 6114 69.500836 19.6687 28513.6 22788.5 26.1804 37.7823 87.0626 45.8347 26033.7 22605.3 923 786 17145832 57.2458 56.7312 93.2753 24.4218 24.0619 447 101 4513944 33236796629.92 54.0262 9.35235 22.677 10255 10 76237 31.2678 2014 23371.5 148 569 6.8 327.12 118.154 35205 70 112.623 8.95 33.3777 231 9591 360 23.0195 89.0099 17.277106 28758.8 6072 68.902923 18.9448 28429.2 23844.1 25.9415 37.5531 90.3226 44.3999 26872.5 23319.6 897 796 17504146 58.4361 55.5827 91.696 1491 24.3468 5969 23.7889 449 101 4499081 33246121586.081 OpenBenchmarking.org
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU A B C D 12 24 36 48 60 50.299600 50.155300 0.774919 54.026200 MIN: 9.93 MIN: 10.62 MIN: 10.84 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU A B C D 3 6 9 12 15 9.33345 9.40424 1.62907 9.35235 MIN: 0.66 MIN: 8.8 MIN: 8.08 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
speedtest-cli Internet Latency OpenBenchmarking.org ms, Fewer Is Better speedtest-cli 2.1.3 Internet Latency A B C D 7 14 21 28 35 23.87 15.49 28.78 22.68
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Standard A B D 2K 4K 6K 8K 10K 6168 9884 10255 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
fast-cli Internet Latency OpenBenchmarking.org ms, Fewer Is Better fast-cli Internet Latency A B C D 3 6 9 12 15 8 8 13 10
perf-bench Benchmark: Sched Pipe OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Sched Pipe A B C D 20K 40K 60K 80K 100K 86982 78301 111014 76237 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU A B C D 8 16 24 32 40 35.20 36.36 26.99 31.27 MIN: 15.38 MIN: 25.91 MIN: 1.4 MIN: 11.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard A B D 400 800 1200 1600 2000 2053 1538 2014 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU A B C D 5K 10K 15K 20K 25K 23584.2 23366.8 18304.9 23371.5 MIN: 21773.9 MIN: 19714.8 MIN: 16470 MIN: 20963.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard A B D 30 60 90 120 150 148 119 148 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Standard A B C D 150 300 450 600 750 665 663 680 569 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
fast-cli Internet Upload Speed OpenBenchmarking.org Mbit/s, More Is Better fast-cli Internet Upload Speed A B C D 2 4 6 8 10 6.7 6.9 7.9 6.8
speedtest-cli Internet Download Speed OpenBenchmarking.org Mbit/s, More Is Better speedtest-cli 2.1.3 Internet Download Speed A B C D 70 140 210 280 350 324.65 321.74 280.18 327.12
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU A B C D 30 60 90 120 150 116.51 135.47 130.96 118.15 MIN: 32.08 MIN: 51.31 MIN: 35.29 MIN: 16.46 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
perf-bench Benchmark: Epoll Wait OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Epoll Wait A B C D 8K 16K 24K 32K 40K 31157 30615 35556 35205 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
fast-cli Internet Loaded Latency (Bufferbloat) OpenBenchmarking.org ms, Fewer Is Better fast-cli Internet Loaded Latency (Bufferbloat) A B C D 16 32 48 64 80 73 64 64 70
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU A B C D 30 60 90 120 150 99.18 109.81 107.71 112.62 MIN: 6.42 MIN: 24.19 MIN: 23.72 MIN: 65.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
speedtest-cli Internet Upload Speed OpenBenchmarking.org Mbit/s, More Is Better speedtest-cli 2.1.3 Internet Upload Speed A B C D 3 6 9 12 15 9.47 9.22 8.40 8.95
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU A B C D 8 16 24 32 40 30.78 34.52 30.88 33.38 MIN: 21.88 MIN: 16.34 MIN: 10.41 MIN: 15.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
perf-bench Benchmark: Futex Lock-Pi OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Lock-Pi A B C D 50 100 150 200 250 236 215 222 231 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Standard A B C D 2K 4K 6K 8K 10K 8749 8812 9595 9591 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
fast-cli Internet Download Speed OpenBenchmarking.org Mbit/s, More Is Better fast-cli Internet Download Speed A B C D 80 160 240 320 400 370 370 390 360
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU A B C D 6 12 18 24 30 22.40 23.90 22.76 23.02 MIN: 17.08 MIN: 19.53 MIN: 19.46 MIN: 19.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU A B C D 20 40 60 80 100 83.90 88.36 89.52 89.01 MIN: 35.65 MIN: 57.57 MIN: 54.63 MIN: 64.34 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
perf-bench Benchmark: Memcpy 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memcpy 1MB A B C D 4 8 12 16 20 17.38 16.77 17.78 17.28 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU A B C D 6K 12K 18K 24K 30K 28356.5 27247.5 28624.4 28758.8 MIN: 24027 MIN: 22350 MIN: 25457.4 MIN: 23747 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: GPT-2 - Device: CPU - Executor: Parallel A B C D 1300 2600 3900 5200 6500 5799 5795 6114 6072 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Memset 1MB OpenBenchmarking.org GB/sec, More Is Better perf-bench Benchmark: Memset 1MB A B C D 15 30 45 60 75 66.06 68.32 69.50 68.90 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU A B C D 5 10 15 20 25 19.09 18.70 19.67 18.94 MIN: 2.83 MIN: 11.49 MIN: 3.02 MIN: 16.51 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU A B C D 6K 12K 18K 24K 30K 27155.0 27353.6 28513.6 28429.2 MIN: 21711.8 MIN: 23193.9 MIN: 24411.3 MIN: 24528.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU A B C D 5K 10K 15K 20K 25K 23516.1 22790.9 22788.5 23844.1 MIN: 21274.9 MIN: 19530.9 MIN: 19335.2 MIN: 22253.5 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU A B C D 6 12 18 24 30 27.00 25.89 26.18 25.94 MIN: 12.07 MIN: 11.87 MIN: 12.79 MIN: 12.27 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU A B C D 9 18 27 36 45 37.79 39.05 37.78 37.55 MIN: 20.51 MIN: 28.78 MIN: 17.54 MIN: 24.18 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU A B C D 20 40 60 80 100 87.07 89.02 87.06 90.32 MIN: 32.83 MIN: 33.04 MIN: 49.86 MIN: 70.91 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU A B C D 10 20 30 40 50 44.43 44.22 45.83 44.40 MIN: 19.44 MIN: 32.09 MIN: 24.2 MIN: 6.12 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU A B C D 6K 12K 18K 24K 30K 26069.6 26549.2 26033.7 26872.5 MIN: 20933.3 MIN: 21084.1 MIN: 20193.5 MIN: 20170 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU A B C D 5K 10K 15K 20K 25K 22990.4 22726.7 22605.3 23319.6 MIN: 19958.1 MIN: 19307.6 MIN: 19157.8 MIN: 20370.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Standard A B C D 200 400 600 800 1000 897 911 923 897 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: bertsquad-12 - Device: CPU - Executor: Parallel A B C D 200 400 600 800 1000 782 774 786 796 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Syscall Basic OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Syscall Basic A B C D 4M 8M 12M 16M 20M 17140545 17221544 17145832 17504146 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU A B C D 13 26 39 52 65 58.08 58.29 57.25 58.44 MIN: 42.18 MIN: 50.37 MIN: 14.92 MIN: 39.61 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU A B C D 13 26 39 52 65 55.93 56.06 56.73 55.58 MIN: 29.18 MIN: 16.51 MIN: 32.47 MIN: 30.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU A B C D 20 40 60 80 100 93.31 93.43 93.28 91.70 MIN: 46.42 MIN: 43.74 MIN: 59.18 MIN: 42.74 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel A B D 300 600 900 1200 1500 1464 1474 1491 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU A B C D 6 12 18 24 30 24.06 24.38 24.42 24.35 MIN: 12.14 MIN: 14.3 MIN: 14.77 MIN: 14.81 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: super-resolution-10 - Device: CPU - Executor: Parallel A B D 1300 2600 3900 5200 6500 5890 5975 5969 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.6 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU A B C D 6 12 18 24 30 23.74 23.77 24.06 23.79 MIN: 13.82 MIN: 22.07 MIN: 13.26 MIN: 10.19 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -std=c++11 -pie -ldl -lpthread
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: yolov4 - Device: CPU - Executor: Parallel A B C D 100 200 300 400 500 444 448 447 449 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Minute, More Is Better ONNX Runtime 1.11 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel A B C D 20 40 60 80 100 101 102 101 101 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto -fno-fat-lto-objects -ldl -lrt
perf-bench Benchmark: Futex Hash OpenBenchmarking.org ops/sec, More Is Better perf-bench Benchmark: Futex Hash A B C D 1000K 2000K 3000K 4000K 5000K 4494394 4493415 4513944 4499081 1. (CC) gcc options: -O6 -ggdb3 -funwind-tables -std=gnu99 -lunwind-x86_64 -lunwind -llzma -Xlinker -lpthread -lrt -lm -ldl -lelf -lslang -lz -lnuma
Java JMH Throughput OpenBenchmarking.org Ops/s, More Is Better Java JMH Throughput A B C D 7000M 14000M 21000M 28000M 35000M 33246015808.70 33247767440.99 33236796629.92 33246121586.08
Phoronix Test Suite v10.8.4