onnx runtime 1.14 threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 23.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2302119-NE-ONNXRUNTI95&gru&rdt .
onnx runtime 1.14 threadripper Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 23.04 6.2.0-060200rc7daily20230206-generic (x86_64) GNOME Shell 43.2 X Server + Wayland 4.6 Mesa 22.2.5 (LLVM 15.0.6 DRM 3.49) GCC 12.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-AKimc9/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-AKimc9/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301055 Python Details - Python 3.11.1 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx runtime 1.14 threadripper onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard a b c 108.782 85.9929 2.85784 4.81490 5.82423 8.26847 171.131 235.285 0.814278 2.86976 7.24304 13.2769 51.4787 115.481 86.3233 86.5947 21.4277 25.7234 9.18459 11.6246 349.962 207.686 171.916 120.939 5.84251 4.24929 1228.37 348.457 138.204 75.3166 19.4241 8.65766 11.5836 11.5478 46.6658 38.8813 80.5879 63.9007 2.77160 4.64766 5.74155 8.16566 171.318 223.589 0.814379 2.86004 7.27339 13.2565 51.6498 115.658 86.1680 86.1633 21.4906 25.8279 12.4015 15.6449 360.820 215.160 174.569 122.460 5.83457 4.47101 1228.24 349.642 137.548 75.4319 19.3620 8.64411 11.6046 11.6058 46.5290 38.7169 80.0299 63.9074 2.73687 4.65903 5.45685 8.18333 169.072 222.362 0.805340 2.85854 7.33023 13.2664 51.0841 114.712 87.1965 86.0900 21.5037 25.7349 12.4876 15.6436 365.441 214.634 183.284 122.197 5.91310 4.49502 1242.17 349.826 136.457 75.3766 19.5839 8.71611 11.4672 11.6156 46.5003 38.8606 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel a b c 20 40 60 80 100 SE +/- 0.16, N = 3 SE +/- 0.48, N = 3 SE +/- 0.35, N = 3 108.78 80.59 80.03 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 SE +/- 0.15, N = 3 85.99 63.90 63.91 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Parallel a b c 0.643 1.286 1.929 2.572 3.215 SE +/- 0.02529, N = 3 SE +/- 0.01620, N = 3 SE +/- 0.02646, N = 3 2.85784 2.77160 2.73687 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard a b c 1.0834 2.1668 3.2502 4.3336 5.417 SE +/- 0.00971, N = 3 SE +/- 0.01149, N = 3 SE +/- 0.00954, N = 3 4.81490 4.64766 4.65903 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c 1.3105 2.621 3.9315 5.242 6.5525 SE +/- 0.05655, N = 15 SE +/- 0.07458, N = 15 SE +/- 0.05278, N = 3 5.82423 5.74155 5.45685 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 SE +/- 0.02175, N = 3 SE +/- 0.00657, N = 3 SE +/- 0.01448, N = 3 8.26847 8.16566 8.18333 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c 40 80 120 160 200 SE +/- 2.16, N = 3 SE +/- 0.44, N = 3 SE +/- 1.38, N = 3 171.13 171.32 169.07 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 2.31, N = 3 SE +/- 2.07, N = 3 SE +/- 0.47, N = 3 235.29 223.59 222.36 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c 0.1832 0.3664 0.5496 0.7328 0.916 SE +/- 0.008965, N = 3 SE +/- 0.009200, N = 3 SE +/- 0.010935, N = 3 0.814278 0.814379 0.805340 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 0.6457 1.2914 1.9371 2.5828 3.2285 SE +/- 0.00149, N = 3 SE +/- 0.00108, N = 3 SE +/- 0.00456, N = 3 2.86976 2.86004 2.85854 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c 2 4 6 8 10 SE +/- 0.06246, N = 15 SE +/- 0.07899, N = 5 SE +/- 0.08929, N = 3 7.24304 7.27339 7.33023 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 13.28 13.26 13.27 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c 12 24 36 48 60 SE +/- 0.34, N = 3 SE +/- 0.51, N = 3 SE +/- 0.34, N = 15 51.48 51.65 51.08 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 SE +/- 0.08, N = 3 SE +/- 0.20, N = 3 SE +/- 0.74, N = 3 115.48 115.66 114.71 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c 20 40 60 80 100 SE +/- 0.43, N = 3 SE +/- 0.49, N = 3 SE +/- 0.21, N = 3 86.32 86.17 87.20 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.19, N = 3 SE +/- 0.34, N = 3 SE +/- 0.29, N = 3 86.59 86.16 86.09 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c 5 10 15 20 25 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 SE +/- 0.06, N = 3 21.43 21.49 21.50 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 6 12 18 24 30 SE +/- 0.29, N = 3 SE +/- 0.15, N = 3 SE +/- 0.24, N = 3 25.72 25.83 25.73 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel a b c 3 6 9 12 15 SE +/- 0.01376, N = 3 SE +/- 0.07467, N = 3 SE +/- 0.05496, N = 3 9.18459 12.40150 12.48760 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard a b c 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 11.62 15.64 15.64 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Parallel a b c 80 160 240 320 400 SE +/- 3.09, N = 3 SE +/- 2.12, N = 3 SE +/- 3.54, N = 3 349.96 360.82 365.44 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 0.42, N = 3 SE +/- 0.53, N = 3 SE +/- 0.44, N = 3 207.69 215.16 214.63 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c 40 80 120 160 200 SE +/- 1.66, N = 15 SE +/- 2.24, N = 15 SE +/- 1.76, N = 3 171.92 174.57 183.28 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 SE +/- 0.32, N = 3 SE +/- 0.10, N = 3 SE +/- 0.22, N = 3 120.94 122.46 122.20 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c 1.3304 2.6608 3.9912 5.3216 6.652 SE +/- 0.07295, N = 3 SE +/- 0.01470, N = 3 SE +/- 0.04868, N = 3 5.84251 5.83457 5.91310 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 1.0114 2.0228 3.0342 4.0456 5.057 SE +/- 0.04197, N = 3 SE +/- 0.04177, N = 3 SE +/- 0.00943, N = 3 4.24929 4.47101 4.49502 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c 300 600 900 1200 1500 SE +/- 13.43, N = 3 SE +/- 14.04, N = 3 SE +/- 17.10, N = 3 1228.37 1228.24 1242.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 80 160 240 320 400 SE +/- 0.18, N = 3 SE +/- 0.13, N = 3 SE +/- 0.56, N = 3 348.46 349.64 349.83 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c 30 60 90 120 150 SE +/- 1.20, N = 15 SE +/- 1.47, N = 5 SE +/- 1.65, N = 3 138.20 137.55 136.46 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.20, N = 3 SE +/- 0.04, N = 3 SE +/- 0.20, N = 3 75.32 75.43 75.38 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c 5 10 15 20 25 SE +/- 0.13, N = 3 SE +/- 0.19, N = 3 SE +/- 0.12, N = 15 19.42 19.36 19.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 SE +/- 0.00596, N = 3 SE +/- 0.01468, N = 3 SE +/- 0.05600, N = 3 8.65766 8.64411 8.71611 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 11.58 11.60 11.47 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 11.55 11.61 11.62 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c 11 22 33 44 55 SE +/- 0.16, N = 3 SE +/- 0.13, N = 3 SE +/- 0.13, N = 3 46.67 46.53 46.50 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 9 18 27 36 45 SE +/- 0.45, N = 3 SE +/- 0.23, N = 3 SE +/- 0.37, N = 3 38.88 38.72 38.86 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5