onnx 2024 AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and AMD Radeon RX 7900 XT 20GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402035-PTS-ONNX202487&sor&gru .
onnx 2024 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16 GB DRAM-6000MT/s G Skill F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB AMD Radeon RX 7900 XT 20GB (2025/1249MHz) AMD Navi 31 HDMI/DP DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 + Wayland 4.6 Mesa 24.1~git2401150600.33b77e~oibaf~m (git-33b77ec 2024-01-15 mantic-oibaf-ppa) (LLVM 16.0.6 DRM 3.56) GCC 13.2.0 + LLVM 16.0.6 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601203 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx 2024 onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard a b c d e 138.591 162.765 12.0713 13.1265 182.744 194.397 15.3613 20.9365 909.7 1207.09 1.87559 3.24523 36.9262 46.8456 382.831 469.555 129.14 207.196 47.9216 65.778 7.21183 6.14139 82.8393 76.1794 5.47142 5.14336 65.0969 47.7613 1.09815 0.828027 533.163 308.143 27.0802 21.3455 2.61147 2.12926 7.74287 4.82607 20.8658 15.2012 139.978 159.018 11.76 10.862 181.895 195.094 16.4348 15.5174 903.541 1126.85 1.82346 3.29562 36.6296 46.8603 380.349 470.276 129.638 207.43 47.6597 65.4763 7.13996 6.28555 85.0315 92.0615 5.49686 5.12505 60.845 64.4411 1.10576 0.887052 548.407 303.431 27.2993 21.3386 2.62849 2.12591 7.71312 4.82062 20.9805 15.2712 140.497 163.099 12.1896 13.2437 182.59 175.7 16.1458 20.9808 887.446 973.694 1.90846 3.31084 36.7446 47.0498 376.531 387.272 128.391 200.636 7.1134 6.12878 82.0351 75.5056 5.47609 5.68968 61.9341 47.6607 1.1258 1.02652 523.98 302.036 27.2137 21.2529 2.65494 2.58167 7.78802 4.98387 139.621 158.345 11.8377 13.0096 184.529 193.584 15.5979 20.8843 894.706 1012.625 1.86086 3.28368 36.1231 45.1120 375.294 456.821 128.833 192.925 48.4158 61.4089 7.15826 6.32885 84.4855 76.8741 5.41860 5.16511 64.1141 47.8807 1.11678 0.995978 537.387 304.535 27.6822 22.2643 2.66409 2.19267 7.76130 5.33607 20.6537 16.4681 140.666 161.009 11.7858 11.9183 182.748 191.073 15.7559 18.3880 892.467 1206.81 1.84548 2.47468 36.8815 45.2489 372.404 413.531 128.977 165.760 47.9037 60.9887 7.10485 6.21146 84.8671 84.7111 5.47131 5.23397 63.4679 55.5537 1.11950 0.828290 542.057 423.305 27.1137 22.2438 2.68514 2.43146 7.75298 6.27994 20.8741 16.5767 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel e c b d a 30 60 90 120 150 SE +/- 0.15, N = 3 SE +/- 0.42, N = 3 140.67 140.50 139.98 139.62 138.59 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard c a e b d 40 80 120 160 200 SE +/- 1.14, N = 12 SE +/- 2.09, N = 15 163.10 162.77 161.01 159.02 158.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel c a d e b 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.13, N = 3 12.19 12.07 11.84 11.79 11.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard c a d e b 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.31, N = 15 13.24 13.13 13.01 11.92 10.86 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel d e a c b 40 80 120 160 200 SE +/- 0.76, N = 3 SE +/- 0.33, N = 3 184.53 182.75 182.74 182.59 181.90 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard b a d e c 40 80 120 160 200 SE +/- 0.84, N = 3 SE +/- 2.15, N = 3 195.09 194.40 193.58 191.07 175.70 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel b c e d a 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 16.43 16.15 15.76 15.60 15.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard c a d e b 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.70, N = 15 20.98 20.94 20.88 18.39 15.52 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b d e c 200 400 600 800 1000 SE +/- 8.54, N = 3 SE +/- 7.71, N = 3 909.70 903.54 894.71 892.47 887.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a e b d c 300 600 900 1200 1500 SE +/- 3.57, N = 3 SE +/- 26.76, N = 15 1207.09 1206.81 1126.85 1012.63 973.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel c a d e b 0.4294 0.8588 1.2882 1.7176 2.147 SE +/- 0.00241, N = 3 SE +/- 0.02014, N = 4 1.90846 1.87559 1.86086 1.84548 1.82346 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c b d a e 0.7449 1.4898 2.2347 2.9796 3.7245 SE +/- 0.00251, N = 3 SE +/- 0.15192, N = 15 3.31084 3.29562 3.28368 3.24523 2.47468 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a e c b d 8 16 24 32 40 SE +/- 0.15, N = 3 SE +/- 0.05, N = 3 36.93 36.88 36.74 36.63 36.12 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard c b a e d 11 22 33 44 55 SE +/- 0.90, N = 15 SE +/- 0.76, N = 15 47.05 46.86 46.85 45.25 45.11 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 80 160 240 320 400 SE +/- 2.58, N = 3 SE +/- 4.36, N = 3 382.83 380.35 376.53 375.29 372.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard b a d e c 100 200 300 400 500 SE +/- 5.17, N = 15 SE +/- 9.69, N = 12 470.28 469.56 456.82 413.53 387.27 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel b a e d c 30 60 90 120 150 SE +/- 0.58, N = 3 SE +/- 0.11, N = 3 129.64 129.14 128.98 128.83 128.39 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard b a c d e 50 100 150 200 250 SE +/- 7.65, N = 15 SE +/- 8.80, N = 15 207.43 207.20 200.64 192.93 165.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel d a e b 11 22 33 44 55 SE +/- 0.24, N = 3 SE +/- 0.16, N = 3 48.42 47.92 47.90 47.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b d e 15 30 45 60 75 SE +/- 1.66, N = 15 SE +/- 1.64, N = 15 65.78 65.48 61.41 60.99 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel e c b d a 2 4 6 8 10 SE +/- 0.00746, N = 3 SE +/- 0.02149, N = 3 7.10485 7.11340 7.13996 7.15826 7.21183 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard c a e b d 2 4 6 8 10 SE +/- 0.04554, N = 12 SE +/- 0.08957, N = 15 6.12878 6.14139 6.21146 6.28555 6.32885 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel c a d e b 20 40 60 80 100 SE +/- 0.71, N = 3 SE +/- 0.95, N = 3 82.04 82.84 84.49 84.87 85.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard c a d e b 20 40 60 80 100 SE +/- 0.61, N = 3 SE +/- 2.22, N = 15 75.51 76.18 76.87 84.71 92.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel d e a c b 1.2368 2.4736 3.7104 4.9472 6.184 SE +/- 0.02247, N = 3 SE +/- 0.00976, N = 3 5.41860 5.47131 5.47142 5.47609 5.49686 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard b a d e c 1.2802 2.5604 3.8406 5.1208 6.401 SE +/- 0.02251, N = 3 SE +/- 0.05921, N = 3 5.12505 5.14336 5.16511 5.23397 5.68968 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel b c e d a 15 30 45 60 75 SE +/- 0.20, N = 3 SE +/- 0.38, N = 3 60.85 61.93 63.47 64.11 65.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard c a d e b 14 28 42 56 70 SE +/- 0.05, N = 3 SE +/- 2.20, N = 15 47.66 47.76 47.88 55.55 64.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b d e c 0.2533 0.5066 0.7599 1.0132 1.2665 SE +/- 0.01085, N = 3 SE +/- 0.00979, N = 3 1.09815 1.10576 1.11678 1.11950 1.12580 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a e b d c 0.231 0.462 0.693 0.924 1.155 SE +/- 0.002451, N = 3 SE +/- 0.024341, N = 15 0.828027 0.828290 0.887052 0.995978 1.026520 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel c a d e b 120 240 360 480 600 SE +/- 0.70, N = 3 SE +/- 5.91, N = 4 523.98 533.16 537.39 542.06 548.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c b d a e 90 180 270 360 450 SE +/- 0.23, N = 3 SE +/- 22.36, N = 15 302.04 303.43 304.54 308.14 423.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a e c b d 7 14 21 28 35 SE +/- 0.11, N = 3 SE +/- 0.04, N = 3 27.08 27.11 27.21 27.30 27.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard c b a e d 5 10 15 20 25 SE +/- 0.52, N = 15 SE +/- 0.42, N = 15 21.25 21.34 21.35 22.24 22.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 0.6042 1.2084 1.8126 2.4168 3.021 SE +/- 0.01841, N = 3 SE +/- 0.03182, N = 3 2.61147 2.62849 2.65494 2.66409 2.68514 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard b a d e c 0.5809 1.1618 1.7427 2.3236 2.9045 SE +/- 0.02641, N = 15 SE +/- 0.05369, N = 12 2.12591 2.12926 2.19267 2.43146 2.58167 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel b a e d c 2 4 6 8 10 SE +/- 0.03494, N = 3 SE +/- 0.00637, N = 3 7.71312 7.74287 7.75298 7.76130 7.78802 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard b a c d e 2 4 6 8 10 SE +/- 0.27617, N = 15 SE +/- 0.33493, N = 15 4.82062 4.82607 4.98387 5.33607 6.27994 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel d a e b 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 20.65 20.87 20.87 20.98 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b d e 4 8 12 16 20 SE +/- 0.49, N = 15 SE +/- 0.48, N = 15 15.20 15.27 16.47 16.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5