onnx 2024 AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and AMD Radeon RX 7900 XT 20GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402035-PTS-ONNX202487&sor&grr .
onnx 2024 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16 GB DRAM-6000MT/s G Skill F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB AMD Radeon RX 7900 XT 20GB (2025/1249MHz) AMD Navi 31 HDMI/DP DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 + Wayland 4.6 Mesa 24.1~git2401150600.33b77e~oibaf~m (git-33b77ec 2024-01-15 mantic-oibaf-ppa) (LLVM 16.0.6 DRM 3.56) GCC 13.2.0 + LLVM 16.0.6 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601203 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx 2024 onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel a b c d e 15.2012 65.778 21.3455 46.8456 4.82607 207.196 6.14139 162.765 2.12926 469.555 308.143 3.24523 47.7613 20.9365 76.1794 13.1265 0.828027 1207.09 20.8658 47.9216 533.163 1.87559 7.21183 138.591 65.0969 15.3613 82.8393 12.0713 5.14336 194.397 5.47142 182.744 27.0802 36.9262 1.09815 909.7 2.61147 382.831 7.74287 129.14 15.2712 65.4763 21.3386 46.8603 4.82062 207.43 6.28555 159.018 2.12591 470.276 303.431 3.29562 64.4411 15.5174 92.0615 10.862 0.887052 1126.85 20.9805 47.6597 548.407 1.82346 7.13996 139.978 60.845 16.4348 85.0315 11.76 5.12505 195.094 5.49686 181.895 27.2993 36.6296 1.10576 903.541 2.62849 380.349 7.71312 129.638 21.2529 47.0498 4.98387 200.636 6.12878 163.099 2.58167 387.272 302.036 3.31084 47.6607 20.9808 75.5056 13.2437 1.02652 973.694 523.98 1.90846 7.1134 140.497 61.9341 16.1458 82.0351 12.1896 5.68968 175.7 5.47609 182.59 27.2137 36.7446 1.1258 887.446 2.65494 376.531 7.78802 128.391 16.4681 61.4089 22.2643 45.1120 5.33607 192.925 6.32885 158.345 2.19267 456.821 304.535 3.28368 47.8807 20.8843 76.8741 13.0096 0.995978 1012.625 20.6537 48.4158 537.387 1.86086 7.15826 139.621 64.1141 15.5979 84.4855 11.8377 5.16511 193.584 5.41860 184.529 27.6822 36.1231 1.11678 894.706 2.66409 375.294 7.76130 128.833 16.5767 60.9887 22.2438 45.2489 6.27994 165.760 6.21146 161.009 2.43146 413.531 423.305 2.47468 55.5537 18.3880 84.7111 11.9183 0.828290 1206.81 20.8741 47.9037 542.057 1.84548 7.10485 140.666 63.4679 15.7559 84.8671 11.7858 5.23397 191.073 5.47131 182.748 27.1137 36.8815 1.11950 892.467 2.68514 372.404 7.75298 128.977 OpenBenchmarking.org
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b d e 4 8 12 16 20 SE +/- 0.49, N = 15 SE +/- 0.48, N = 15 15.20 15.27 16.47 16.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b d e 15 30 45 60 75 SE +/- 1.66, N = 15 SE +/- 1.64, N = 15 65.78 65.48 61.41 60.99 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard c b a e d 5 10 15 20 25 SE +/- 0.52, N = 15 SE +/- 0.42, N = 15 21.25 21.34 21.35 22.24 22.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard c b a e d 11 22 33 44 55 SE +/- 0.90, N = 15 SE +/- 0.76, N = 15 47.05 46.86 46.85 45.25 45.11 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard b a c d e 2 4 6 8 10 SE +/- 0.27617, N = 15 SE +/- 0.33493, N = 15 4.82062 4.82607 4.98387 5.33607 6.27994 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard b a c d e 50 100 150 200 250 SE +/- 7.65, N = 15 SE +/- 8.80, N = 15 207.43 207.20 200.64 192.93 165.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard c a e b d 2 4 6 8 10 SE +/- 0.04554, N = 12 SE +/- 0.08957, N = 15 6.12878 6.14139 6.21146 6.28555 6.32885 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard c a e b d 40 80 120 160 200 SE +/- 1.14, N = 12 SE +/- 2.09, N = 15 163.10 162.77 161.01 159.02 158.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard b a d e c 0.5809 1.1618 1.7427 2.3236 2.9045 SE +/- 0.02641, N = 15 SE +/- 0.05369, N = 12 2.12591 2.12926 2.19267 2.43146 2.58167 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard b a d e c 100 200 300 400 500 SE +/- 5.17, N = 15 SE +/- 9.69, N = 12 470.28 469.56 456.82 413.53 387.27 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c b d a e 90 180 270 360 450 SE +/- 0.23, N = 3 SE +/- 22.36, N = 15 302.04 303.43 304.54 308.14 423.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c b d a e 0.7449 1.4898 2.2347 2.9796 3.7245 SE +/- 0.00251, N = 3 SE +/- 0.15192, N = 15 3.31084 3.29562 3.28368 3.24523 2.47468 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard c a d e b 14 28 42 56 70 SE +/- 0.05, N = 3 SE +/- 2.20, N = 15 47.66 47.76 47.88 55.55 64.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard c a d e b 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.70, N = 15 20.98 20.94 20.88 18.39 15.52 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard c a d e b 20 40 60 80 100 SE +/- 0.61, N = 3 SE +/- 2.22, N = 15 75.51 76.18 76.87 84.71 92.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard c a d e b 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.31, N = 15 13.24 13.13 13.01 11.92 10.86 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a e b d c 0.231 0.462 0.693 0.924 1.155 SE +/- 0.002451, N = 3 SE +/- 0.024341, N = 15 0.828027 0.828290 0.887052 0.995978 1.026520 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a e b d c 300 600 900 1200 1500 SE +/- 3.57, N = 3 SE +/- 26.76, N = 15 1207.09 1206.81 1126.85 1012.63 973.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel d a e b 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 20.65 20.87 20.87 20.98 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel d a e b 11 22 33 44 55 SE +/- 0.24, N = 3 SE +/- 0.16, N = 3 48.42 47.92 47.90 47.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel c a d e b 120 240 360 480 600 SE +/- 0.70, N = 3 SE +/- 5.91, N = 4 523.98 533.16 537.39 542.06 548.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel c a d e b 0.4294 0.8588 1.2882 1.7176 2.147 SE +/- 0.00241, N = 3 SE +/- 0.02014, N = 4 1.90846 1.87559 1.86086 1.84548 1.82346 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel e c b d a 2 4 6 8 10 SE +/- 0.00746, N = 3 SE +/- 0.02149, N = 3 7.10485 7.11340 7.13996 7.15826 7.21183 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel e c b d a 30 60 90 120 150 SE +/- 0.15, N = 3 SE +/- 0.42, N = 3 140.67 140.50 139.98 139.62 138.59 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel b c e d a 15 30 45 60 75 SE +/- 0.20, N = 3 SE +/- 0.38, N = 3 60.85 61.93 63.47 64.11 65.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel b c e d a 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 16.43 16.15 15.76 15.60 15.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel c a d e b 20 40 60 80 100 SE +/- 0.71, N = 3 SE +/- 0.95, N = 3 82.04 82.84 84.49 84.87 85.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel c a d e b 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.13, N = 3 12.19 12.07 11.84 11.79 11.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard b a d e c 1.2802 2.5604 3.8406 5.1208 6.401 SE +/- 0.02251, N = 3 SE +/- 0.05921, N = 3 5.12505 5.14336 5.16511 5.23397 5.68968 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard b a d e c 40 80 120 160 200 SE +/- 0.84, N = 3 SE +/- 2.15, N = 3 195.09 194.40 193.58 191.07 175.70 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel d e a c b 1.2368 2.4736 3.7104 4.9472 6.184 SE +/- 0.02247, N = 3 SE +/- 0.00976, N = 3 5.41860 5.47131 5.47142 5.47609 5.49686 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel d e a c b 40 80 120 160 200 SE +/- 0.76, N = 3 SE +/- 0.33, N = 3 184.53 182.75 182.74 182.59 181.90 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a e c b d 7 14 21 28 35 SE +/- 0.11, N = 3 SE +/- 0.04, N = 3 27.08 27.11 27.21 27.30 27.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a e c b d 8 16 24 32 40 SE +/- 0.15, N = 3 SE +/- 0.05, N = 3 36.93 36.88 36.74 36.63 36.12 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b d e c 0.2533 0.5066 0.7599 1.0132 1.2665 SE +/- 0.01085, N = 3 SE +/- 0.00979, N = 3 1.09815 1.10576 1.11678 1.11950 1.12580 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b d e c 200 400 600 800 1000 SE +/- 8.54, N = 3 SE +/- 7.71, N = 3 909.70 903.54 894.71 892.47 887.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 0.6042 1.2084 1.8126 2.4168 3.021 SE +/- 0.01841, N = 3 SE +/- 0.03182, N = 3 2.61147 2.62849 2.65494 2.66409 2.68514 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 80 160 240 320 400 SE +/- 2.58, N = 3 SE +/- 4.36, N = 3 382.83 380.35 376.53 375.29 372.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel b a e d c 2 4 6 8 10 SE +/- 0.03494, N = 3 SE +/- 0.00637, N = 3 7.71312 7.74287 7.75298 7.76130 7.78802 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel b a e d c 30 60 90 120 150 SE +/- 0.58, N = 3 SE +/- 0.11, N = 3 129.64 129.14 128.98 128.83 128.39 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5