onnx runtime 1.14 threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 23.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2302119-NE-ONNXRUNTI95 .
onnx runtime 1.14 threadripper Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 23.04 6.2.0-060200rc7daily20230206-generic (x86_64) GNOME Shell 43.2 X Server + Wayland 4.6 Mesa 22.2.5 (LLVM 15.0.6 DRM 3.49) GCC 12.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-AKimc9/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-AKimc9/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301055 Python Details - Python 3.11.1 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx runtime 1.14 threadripper onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard a b c 108.782 9.18459 85.9929 11.6246 2.85784 349.962 4.81490 207.686 5.82423 171.916 8.26847 120.939 171.131 5.84251 235.285 4.24929 0.814278 1228.37 2.86976 348.457 7.24304 138.204 13.2769 75.3166 51.4787 19.4241 115.481 8.65766 86.3233 11.5836 86.5947 11.5478 21.4277 46.6658 25.7234 38.8813 80.5879 12.4015 63.9007 15.6449 2.77160 360.820 4.64766 215.160 5.74155 174.569 8.16566 122.460 171.318 5.83457 223.589 4.47101 0.814379 1228.24 2.86004 349.642 7.27339 137.548 13.2565 75.4319 51.6498 19.3620 115.658 8.64411 86.1680 11.6046 86.1633 11.6058 21.4906 46.5290 25.8279 38.7169 80.0299 12.4876 63.9074 15.6436 2.73687 365.441 4.65903 214.634 5.45685 183.284 8.18333 122.197 169.072 5.91310 222.362 4.49502 0.805340 1242.17 2.85854 349.826 7.33023 136.457 13.2664 75.3766 51.0841 19.5839 114.712 8.71611 87.1965 11.4672 86.0900 11.6156 21.5037 46.5003 25.7349 38.8606 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel a b c 20 40 60 80 100 SE +/- 0.16, N = 3 SE +/- 0.48, N = 3 SE +/- 0.35, N = 3 108.78 80.59 80.03 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel a b c 3 6 9 12 15 SE +/- 0.01376, N = 3 SE +/- 0.07467, N = 3 SE +/- 0.05496, N = 3 9.18459 12.40150 12.48760 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 SE +/- 0.15, N = 3 85.99 63.90 63.91 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard a b c 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 11.62 15.64 15.64 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Parallel a b c 0.643 1.286 1.929 2.572 3.215 SE +/- 0.02529, N = 3 SE +/- 0.01620, N = 3 SE +/- 0.02646, N = 3 2.85784 2.77160 2.73687 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Parallel a b c 80 160 240 320 400 SE +/- 3.09, N = 3 SE +/- 2.12, N = 3 SE +/- 3.54, N = 3 349.96 360.82 365.44 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard a b c 1.0834 2.1668 3.2502 4.3336 5.417 SE +/- 0.00971, N = 3 SE +/- 0.01149, N = 3 SE +/- 0.00954, N = 3 4.81490 4.64766 4.65903 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 0.42, N = 3 SE +/- 0.53, N = 3 SE +/- 0.44, N = 3 207.69 215.16 214.63 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c 1.3105 2.621 3.9315 5.242 6.5525 SE +/- 0.05655, N = 15 SE +/- 0.07458, N = 15 SE +/- 0.05278, N = 3 5.82423 5.74155 5.45685 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c 40 80 120 160 200 SE +/- 1.66, N = 15 SE +/- 2.24, N = 15 SE +/- 1.76, N = 3 171.92 174.57 183.28 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 SE +/- 0.02175, N = 3 SE +/- 0.00657, N = 3 SE +/- 0.01448, N = 3 8.26847 8.16566 8.18333 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 SE +/- 0.32, N = 3 SE +/- 0.10, N = 3 SE +/- 0.22, N = 3 120.94 122.46 122.20 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c 40 80 120 160 200 SE +/- 2.16, N = 3 SE +/- 0.44, N = 3 SE +/- 1.38, N = 3 171.13 171.32 169.07 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c 1.3304 2.6608 3.9912 5.3216 6.652 SE +/- 0.07295, N = 3 SE +/- 0.01470, N = 3 SE +/- 0.04868, N = 3 5.84251 5.83457 5.91310 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 2.31, N = 3 SE +/- 2.07, N = 3 SE +/- 0.47, N = 3 235.29 223.59 222.36 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 1.0114 2.0228 3.0342 4.0456 5.057 SE +/- 0.04197, N = 3 SE +/- 0.04177, N = 3 SE +/- 0.00943, N = 3 4.24929 4.47101 4.49502 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c 0.1832 0.3664 0.5496 0.7328 0.916 SE +/- 0.008965, N = 3 SE +/- 0.009200, N = 3 SE +/- 0.010935, N = 3 0.814278 0.814379 0.805340 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c 300 600 900 1200 1500 SE +/- 13.43, N = 3 SE +/- 14.04, N = 3 SE +/- 17.10, N = 3 1228.37 1228.24 1242.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 0.6457 1.2914 1.9371 2.5828 3.2285 SE +/- 0.00149, N = 3 SE +/- 0.00108, N = 3 SE +/- 0.00456, N = 3 2.86976 2.86004 2.85854 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 80 160 240 320 400 SE +/- 0.18, N = 3 SE +/- 0.13, N = 3 SE +/- 0.56, N = 3 348.46 349.64 349.83 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c 2 4 6 8 10 SE +/- 0.06246, N = 15 SE +/- 0.07899, N = 5 SE +/- 0.08929, N = 3 7.24304 7.27339 7.33023 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c 30 60 90 120 150 SE +/- 1.20, N = 15 SE +/- 1.47, N = 5 SE +/- 1.65, N = 3 138.20 137.55 136.46 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 13.28 13.26 13.27 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.20, N = 3 SE +/- 0.04, N = 3 SE +/- 0.20, N = 3 75.32 75.43 75.38 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c 12 24 36 48 60 SE +/- 0.34, N = 3 SE +/- 0.51, N = 3 SE +/- 0.34, N = 15 51.48 51.65 51.08 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c 5 10 15 20 25 SE +/- 0.13, N = 3 SE +/- 0.19, N = 3 SE +/- 0.12, N = 15 19.42 19.36 19.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 SE +/- 0.08, N = 3 SE +/- 0.20, N = 3 SE +/- 0.74, N = 3 115.48 115.66 114.71 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 SE +/- 0.00596, N = 3 SE +/- 0.01468, N = 3 SE +/- 0.05600, N = 3 8.65766 8.64411 8.71611 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c 20 40 60 80 100 SE +/- 0.43, N = 3 SE +/- 0.49, N = 3 SE +/- 0.21, N = 3 86.32 86.17 87.20 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 11.58 11.60 11.47 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.19, N = 3 SE +/- 0.34, N = 3 SE +/- 0.29, N = 3 86.59 86.16 86.09 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 11.55 11.61 11.62 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c 5 10 15 20 25 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 SE +/- 0.06, N = 3 21.43 21.49 21.50 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c 11 22 33 44 55 SE +/- 0.16, N = 3 SE +/- 0.13, N = 3 SE +/- 0.13, N = 3 46.67 46.53 46.50 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 6 12 18 24 30 SE +/- 0.29, N = 3 SE +/- 0.15, N = 3 SE +/- 0.24, N = 3 25.72 25.83 25.73 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 9 18 27 36 45 SE +/- 0.45, N = 3 SE +/- 0.23, N = 3 SE +/- 0.37, N = 3 38.88 38.72 38.86 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5