onnx new AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Pop 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402035-NE-ONNXNEW6040&grs&rdt .
onnx new Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b c d AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) AMD Starship/Matisse 4 x 32GB DDR4-3000MT/s CMK64GX4M2D3000C16 Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Pop 22.04 6.6.6-76060606-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 4.6 Mesa 23.3.2-1pop0~1704238321~22.04~36f1d0e (LLVM 15.0.7 DRM 3.54) 1.3.267 GCC 11.4.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107a Python Details - Python 3.10.12 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx new onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel a b c d 101.553 158.943 106.626 84.988 262.065 3.41805 24.796 7.4745 4.56634 5.98899 2.98135 21.3852 8.69295 199.462 119.152 48.2263 0.789445 89.5931 15.5479 84.3254 40.3241 46.7559 11.1599 11.8559 8.38855 20.7291 64.313 133.782 335.413 1266.7 3.81235 5.0096 115.028 218.988 9.84382 6.2888 166.968 292.555 11.7589 9.36878 86.5898 132.114 91.0383 74.7827 231.654 3.30419 25.5728 7.31859 4.42556 5.87005 2.91718 21.0124 8.63703 195.479 118.037 47.7198 0.798150 88.7364 15.4415 84.2686 39.0979 47.5860 11.2692 11.8641 8.46928 20.9515 64.7543 136.658 342.798 1253.22 4.31444 5.11408 115.771 225.956 11.5448 7.56668 170.353 302.645 13.3654 10.9739 76.0924 119.847 81.4970 65.1860 223.993 3.30267 25.8677 7.21143 4.39522 5.82904 2.91581 20.8684 8.58440 197.531 117.048 47.5911 0.788713 89.0029 15.4128 83.7961 38.6552 47.9147 11.2341 11.9310 8.54047 21.0075 64.8772 138.664 342.953 1267.89 4.46068 5.06250 116.484 227.636 13.1389 8.34118 171.549 302.824 15.3358 12.2602 73.8745 115.971 79.6284 63.7474 226.280 3.25386 25.4426 7.16765 4.40892 5.81121 2.90364 20.8534 8.48632 198.965 117.504 48.1204 0.795279 88.9273 15.4263 84.0513 39.3208 47.9502 11.2437 11.8946 8.50747 20.7765 64.8208 139.512 344.390 1257.91 4.41593 5.02241 117.851 226.820 13.5336 8.62179 172.083 307.339 15.6783 12.5484 OpenBenchmarking.org
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.23, N = 3 SE +/- 0.39, N = 3 101.55 86.59 76.09 73.87 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d 40 80 120 160 200 SE +/- 0.84, N = 3 SE +/- 0.08, N = 3 SE +/- 1.26, N = 3 158.94 132.11 119.85 115.97 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 0.26, N = 3 SE +/- 0.14, N = 3 SE +/- 0.24, N = 3 106.63 91.04 81.50 79.63 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.62, N = 3 SE +/- 0.72, N = 3 SE +/- 0.21, N = 3 84.99 74.78 65.19 63.75 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 60 120 180 240 300 SE +/- 2.43, N = 4 SE +/- 0.47, N = 3 SE +/- 0.82, N = 3 262.07 231.65 223.99 226.28 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 0.7691 1.5382 2.3073 3.0764 3.8455 SE +/- 0.01341, N = 3 SE +/- 0.02977, N = 3 SE +/- 0.01952, N = 3 3.41805 3.30419 3.30267 3.25386 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 6 12 18 24 30 SE +/- 0.13, N = 3 SE +/- 0.13, N = 3 SE +/- 0.21, N = 9 24.80 25.57 25.87 25.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 SE +/- 0.07142, N = 3 SE +/- 0.01975, N = 3 SE +/- 0.02570, N = 3 7.47450 7.31859 7.21143 7.16765 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 1.0274 2.0548 3.0822 4.1096 5.137 SE +/- 0.01003, N = 3 SE +/- 0.03613, N = 9 SE +/- 0.02513, N = 3 4.56634 4.42556 4.39522 4.40892 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c d 1.3475 2.695 4.0425 5.39 6.7375 SE +/- 0.01409, N = 3 SE +/- 0.00419, N = 3 SE +/- 0.02691, N = 3 5.98899 5.87005 5.82904 5.81121 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 0.6708 1.3416 2.0124 2.6832 3.354 SE +/- 0.00899, N = 3 SE +/- 0.00183, N = 3 SE +/- 0.00422, N = 3 2.98135 2.91718 2.91581 2.90364 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.08, N = 3 21.39 21.01 20.87 20.85 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 SE +/- 0.01301, N = 3 SE +/- 0.01460, N = 3 SE +/- 0.07957, N = 3 8.69295 8.63703 8.58440 8.48632 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 40 80 120 160 200 SE +/- 2.06, N = 5 SE +/- 1.45, N = 15 SE +/- 0.56, N = 3 199.46 195.48 197.53 198.97 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 SE +/- 0.78, N = 3 SE +/- 0.34, N = 3 SE +/- 0.50, N = 3 119.15 118.04 117.05 117.50 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 11 22 33 44 55 SE +/- 0.26, N = 3 SE +/- 0.21, N = 3 SE +/- 0.10, N = 3 48.23 47.72 47.59 48.12 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 0.1796 0.3592 0.5388 0.7184 0.898 SE +/- 0.009251, N = 3 SE +/- 0.001362, N = 3 SE +/- 0.011220, N = 3 0.789445 0.798150 0.788713 0.795279 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.77, N = 3 SE +/- 0.28, N = 3 SE +/- 0.31, N = 3 89.59 88.74 89.00 88.93 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 15.55 15.44 15.41 15.43 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 0.23, N = 3 SE +/- 0.03, N = 3 SE +/- 0.11, N = 3 84.33 84.27 83.80 84.05 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 9 18 27 36 45 SE +/- 0.19, N = 3 SE +/- 0.20, N = 3 SE +/- 0.34, N = 9 40.32 39.10 38.66 39.32 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 11 22 33 44 55 SE +/- 0.06, N = 3 SE +/- 0.12, N = 3 SE +/- 0.20, N = 3 46.76 47.59 47.91 47.95 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 11.16 11.27 11.23 11.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 11.86 11.86 11.93 11.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 SE +/- 0.05653, N = 3 SE +/- 0.02479, N = 3 SE +/- 0.03634, N = 3 8.38855 8.46928 8.54047 8.50747 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 5 10 15 20 25 SE +/- 0.11, N = 3 SE +/- 0.09, N = 3 SE +/- 0.04, N = 3 20.73 20.95 21.01 20.78 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 14 28 42 56 70 SE +/- 0.09, N = 3 SE +/- 0.12, N = 3 SE +/- 0.19, N = 3 64.31 64.75 64.88 64.82 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 30 60 90 120 150 SE +/- 1.33, N = 3 SE +/- 0.38, N = 3 SE +/- 0.50, N = 3 133.78 136.66 138.66 139.51 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 70 140 210 280 350 SE +/- 1.06, N = 3 SE +/- 0.21, N = 3 SE +/- 0.50, N = 3 335.41 342.80 342.95 344.39 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 300 600 900 1200 1500 SE +/- 14.56, N = 3 SE +/- 2.19, N = 3 SE +/- 17.69, N = 3 1266.70 1253.22 1267.89 1257.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 1.0037 2.0074 3.0111 4.0148 5.0185 SE +/- 0.04592, N = 4 SE +/- 0.00933, N = 3 SE +/- 0.01606, N = 3 3.81235 4.31444 4.46068 4.41593 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 1.1507 2.3014 3.4521 4.6028 5.7535 SE +/- 0.05402, N = 5 SE +/- 0.03696, N = 15 SE +/- 0.01436, N = 3 5.00960 5.11408 5.06250 5.02241 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 SE +/- 0.17, N = 3 SE +/- 0.20, N = 3 SE +/- 1.11, N = 3 115.03 115.77 116.48 117.85 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 50 100 150 200 250 SE +/- 0.51, N = 3 SE +/- 1.88, N = 9 SE +/- 1.30, N = 3 218.99 225.96 227.64 226.82 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.00918, N = 3 SE +/- 0.03970, N = 3 SE +/- 0.07065, N = 3 9.84382 11.54480 13.13890 13.53360 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 SE +/- 0.04827, N = 3 SE +/- 0.00537, N = 3 SE +/- 0.09288, N = 3 6.28880 7.56668 8.34118 8.62179 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c d 40 80 120 160 200 SE +/- 0.41, N = 3 SE +/- 0.12, N = 3 SE +/- 0.80, N = 3 166.97 170.35 171.55 172.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 70 140 210 280 350 SE +/- 1.22, N = 3 SE +/- 2.72, N = 3 SE +/- 1.84, N = 3 292.56 302.65 302.82 307.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 4 8 12 16 20 SE +/- 0.11, N = 3 SE +/- 0.17, N = 3 SE +/- 0.05, N = 3 11.76 13.37 15.34 15.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 SE +/- 0.03121, N = 3 SE +/- 0.02097, N = 3 SE +/- 0.03777, N = 3 9.36878 10.97390 12.26020 12.54840 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5