onnx runtime 1.14 threadripper AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 23.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2302119-NE-ONNXRUNTI95&grs&rdt .
onnx runtime 1.14 threadripper Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) AMD Starship/Matisse 128GB Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 23.04 6.2.0-060200rc7daily20230206-generic (x86_64) GNOME Shell 43.2 X Server + Wayland 4.6 Mesa 22.2.5 (LLVM 15.0.6 DRM 3.49) GCC 12.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-AKimc9/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-AKimc9/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x8301055 Python Details - Python 3.11.1 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx runtime 1.14 threadripper onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel a b c 108.782 85.9929 5.82423 235.285 2.85784 4.81490 171.131 8.26847 7.24304 86.3233 0.814278 51.4787 115.481 86.5947 25.7234 2.86976 21.4277 13.2769 38.8813 46.6658 11.5478 11.5836 8.65766 19.4241 75.3166 138.204 348.457 1228.37 4.24929 5.84251 120.939 171.916 207.686 349.962 11.6246 9.18459 80.5879 63.9007 5.74155 223.589 2.77160 4.64766 171.318 8.16566 7.27339 86.1680 0.814379 51.6498 115.658 86.1633 25.8279 2.86004 21.4906 13.2565 38.7169 46.5290 11.6058 11.6046 8.64411 19.3620 75.4319 137.548 349.642 1228.24 4.47101 5.83457 122.460 174.569 215.160 360.820 15.6449 12.4015 80.0299 63.9074 5.45685 222.362 2.73687 4.65903 169.072 8.18333 7.33023 87.1965 0.805340 51.0841 114.712 86.0900 25.7349 2.85854 21.5037 13.2664 38.8606 46.5003 11.6156 11.4672 8.71611 19.5839 75.3766 136.457 349.826 1242.17 4.49502 5.91310 122.197 183.284 214.634 365.441 15.6436 12.4876 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel a b c 20 40 60 80 100 SE +/- 0.16, N = 3 SE +/- 0.48, N = 3 SE +/- 0.35, N = 3 108.78 80.59 80.03 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.06, N = 3 SE +/- 0.10, N = 3 SE +/- 0.15, N = 3 85.99 63.90 63.91 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c 1.3105 2.621 3.9315 5.242 6.5525 SE +/- 0.05655, N = 15 SE +/- 0.07458, N = 15 SE +/- 0.05278, N = 3 5.82423 5.74155 5.45685 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 2.31, N = 3 SE +/- 2.07, N = 3 SE +/- 0.47, N = 3 235.29 223.59 222.36 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Parallel a b c 0.643 1.286 1.929 2.572 3.215 SE +/- 0.02529, N = 3 SE +/- 0.01620, N = 3 SE +/- 0.02646, N = 3 2.85784 2.77160 2.73687 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard a b c 1.0834 2.1668 3.2502 4.3336 5.417 SE +/- 0.00971, N = 3 SE +/- 0.01149, N = 3 SE +/- 0.00954, N = 3 4.81490 4.64766 4.65903 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c 40 80 120 160 200 SE +/- 2.16, N = 3 SE +/- 0.44, N = 3 SE +/- 1.38, N = 3 171.13 171.32 169.07 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 SE +/- 0.02175, N = 3 SE +/- 0.00657, N = 3 SE +/- 0.01448, N = 3 8.26847 8.16566 8.18333 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c 2 4 6 8 10 SE +/- 0.06246, N = 15 SE +/- 0.07899, N = 5 SE +/- 0.08929, N = 3 7.24304 7.27339 7.33023 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c 20 40 60 80 100 SE +/- 0.43, N = 3 SE +/- 0.49, N = 3 SE +/- 0.21, N = 3 86.32 86.17 87.20 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c 0.1832 0.3664 0.5496 0.7328 0.916 SE +/- 0.008965, N = 3 SE +/- 0.009200, N = 3 SE +/- 0.010935, N = 3 0.814278 0.814379 0.805340 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c 12 24 36 48 60 SE +/- 0.34, N = 3 SE +/- 0.51, N = 3 SE +/- 0.34, N = 15 51.48 51.65 51.08 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 SE +/- 0.08, N = 3 SE +/- 0.20, N = 3 SE +/- 0.74, N = 3 115.48 115.66 114.71 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.19, N = 3 SE +/- 0.34, N = 3 SE +/- 0.29, N = 3 86.59 86.16 86.09 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 6 12 18 24 30 SE +/- 0.29, N = 3 SE +/- 0.15, N = 3 SE +/- 0.24, N = 3 25.72 25.83 25.73 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 0.6457 1.2914 1.9371 2.5828 3.2285 SE +/- 0.00149, N = 3 SE +/- 0.00108, N = 3 SE +/- 0.00456, N = 3 2.86976 2.86004 2.85854 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c 5 10 15 20 25 SE +/- 0.08, N = 3 SE +/- 0.06, N = 3 SE +/- 0.06, N = 3 21.43 21.49 21.50 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 13.28 13.26 13.27 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 9 18 27 36 45 SE +/- 0.45, N = 3 SE +/- 0.23, N = 3 SE +/- 0.37, N = 3 38.88 38.72 38.86 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c 11 22 33 44 55 SE +/- 0.16, N = 3 SE +/- 0.13, N = 3 SE +/- 0.13, N = 3 46.67 46.53 46.50 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 11.55 11.61 11.62 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 11.58 11.60 11.47 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 SE +/- 0.00596, N = 3 SE +/- 0.01468, N = 3 SE +/- 0.05600, N = 3 8.65766 8.64411 8.71611 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c 5 10 15 20 25 SE +/- 0.13, N = 3 SE +/- 0.19, N = 3 SE +/- 0.12, N = 15 19.42 19.36 19.58 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.20, N = 3 SE +/- 0.04, N = 3 SE +/- 0.20, N = 3 75.32 75.43 75.38 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c 30 60 90 120 150 SE +/- 1.20, N = 15 SE +/- 1.47, N = 5 SE +/- 1.65, N = 3 138.20 137.55 136.46 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 80 160 240 320 400 SE +/- 0.18, N = 3 SE +/- 0.13, N = 3 SE +/- 0.56, N = 3 348.46 349.64 349.83 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c 300 600 900 1200 1500 SE +/- 13.43, N = 3 SE +/- 14.04, N = 3 SE +/- 17.10, N = 3 1228.37 1228.24 1242.17 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 1.0114 2.0228 3.0342 4.0456 5.057 SE +/- 0.04197, N = 3 SE +/- 0.04177, N = 3 SE +/- 0.00943, N = 3 4.24929 4.47101 4.49502 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c 1.3304 2.6608 3.9912 5.3216 6.652 SE +/- 0.07295, N = 3 SE +/- 0.01470, N = 3 SE +/- 0.04868, N = 3 5.84251 5.83457 5.91310 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 SE +/- 0.32, N = 3 SE +/- 0.10, N = 3 SE +/- 0.22, N = 3 120.94 122.46 122.20 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c 40 80 120 160 200 SE +/- 1.66, N = 15 SE +/- 2.24, N = 15 SE +/- 1.76, N = 3 171.92 174.57 183.28 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 0.42, N = 3 SE +/- 0.53, N = 3 SE +/- 0.44, N = 3 207.69 215.16 214.63 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Parallel a b c 80 160 240 320 400 SE +/- 3.09, N = 3 SE +/- 2.12, N = 3 SE +/- 3.54, N = 3 349.96 360.82 365.44 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard a b c 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 11.62 15.64 15.64 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel a b c 3 6 9 12 15 SE +/- 0.01376, N = 3 SE +/- 0.07467, N = 3 SE +/- 0.05496, N = 3 9.18459 12.40150 12.48760 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5