onnx 2024 AMD Ryzen 9 7950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) and AMD Radeon RX 7900 XT 20GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402035-PTS-ONNX202487&sor&grs .
onnx 2024 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen 9 7950X 16-Core @ 5.88GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (1416 BIOS) AMD Device 14d8 2 x 16 GB DRAM-6000MT/s G Skill F5-6000J3038F16G 2000GB Samsung SSD 980 PRO 2TB + 4001GB Western Digital WD_BLACK SN850X 4000GB AMD Radeon RX 7900 XT 20GB (2025/1249MHz) AMD Navi 31 HDMI/DP DELL U2723QE Intel I225-V + Intel Wi-Fi 6 AX210/AX211/AX411 Ubuntu 23.10 6.7.0-060700-generic (x86_64) GNOME Shell 45.2 X Server 1.21.1.7 + Wayland 4.6 Mesa 24.1~git2401150600.33b77e~oibaf~m (git-33b77ec 2024-01-15 mantic-oibaf-ppa) (LLVM 16.0.6 DRM 3.56) GCC 13.2.0 + LLVM 16.0.6 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa601203 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Vulnerable: Safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx 2024 onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel a b c d e 194.397 15.3613 1.87559 12.0713 162.765 382.831 909.7 36.9262 47.9216 138.591 182.744 129.14 15.2012 65.778 20.8658 4.82607 207.196 7.74287 2.12926 469.555 2.61147 21.3455 46.8456 27.0802 308.143 3.24523 533.163 0.828027 1207.09 1.09815 47.7613 20.9365 65.0969 5.14336 5.47142 76.1794 13.1265 82.8393 6.14139 7.21183 195.094 16.4348 1.82346 11.76 159.018 380.349 903.541 36.6296 47.6597 139.978 181.895 129.638 15.2712 65.4763 20.9805 4.82062 207.43 7.71312 2.12591 470.276 2.62849 21.3386 46.8603 27.2993 303.431 3.29562 548.407 0.887052 1126.85 1.10576 64.4411 15.5174 60.845 5.12505 5.49686 92.0615 10.862 85.0315 6.28555 7.13996 175.7 16.1458 1.90846 12.1896 163.099 376.531 887.446 36.7446 140.497 182.59 128.391 4.98387 200.636 7.78802 2.58167 387.272 2.65494 21.2529 47.0498 27.2137 302.036 3.31084 523.98 1.02652 973.694 1.1258 47.6607 20.9808 61.9341 5.68968 5.47609 75.5056 13.2437 82.0351 6.12878 7.1134 193.584 15.5979 1.86086 11.8377 158.345 375.294 894.706 36.1231 48.4158 139.621 184.529 128.833 16.4681 61.4089 20.6537 5.33607 192.925 7.76130 2.19267 456.821 2.66409 22.2643 45.1120 27.6822 304.535 3.28368 537.387 0.995978 1012.625 1.11678 47.8807 20.8843 64.1141 5.16511 5.41860 76.8741 13.0096 84.4855 6.32885 7.15826 191.073 15.7559 1.84548 11.7858 161.009 372.404 892.467 36.8815 47.9037 140.666 182.748 128.977 16.5767 60.9887 20.8741 6.27994 165.760 7.75298 2.43146 413.531 2.68514 22.2438 45.2489 27.1137 423.305 2.47468 542.057 0.828290 1206.81 1.11950 55.5537 18.3880 63.4679 5.23397 5.47131 84.7111 11.9183 84.8671 6.21146 7.10485 OpenBenchmarking.org
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard b a d e c 40 80 120 160 200 SE +/- 0.84, N = 3 SE +/- 2.15, N = 3 195.09 194.40 193.58 191.07 175.70 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel b c e d a 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.09, N = 3 16.43 16.15 15.76 15.60 15.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel c a d e b 0.4294 0.8588 1.2882 1.7176 2.147 SE +/- 0.00241, N = 3 SE +/- 0.02014, N = 4 1.90846 1.87559 1.86086 1.84548 1.82346 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel c a d e b 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.13, N = 3 12.19 12.07 11.84 11.79 11.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard c a e b d 40 80 120 160 200 SE +/- 1.14, N = 12 SE +/- 2.09, N = 15 163.10 162.77 161.01 159.02 158.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 80 160 240 320 400 SE +/- 2.58, N = 3 SE +/- 4.36, N = 3 382.83 380.35 376.53 375.29 372.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b d e c 200 400 600 800 1000 SE +/- 8.54, N = 3 SE +/- 7.71, N = 3 909.70 903.54 894.71 892.47 887.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a e c b d 8 16 24 32 40 SE +/- 0.15, N = 3 SE +/- 0.05, N = 3 36.93 36.88 36.74 36.63 36.12 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel d a e b 11 22 33 44 55 SE +/- 0.24, N = 3 SE +/- 0.16, N = 3 48.42 47.92 47.90 47.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel e c b d a 30 60 90 120 150 SE +/- 0.15, N = 3 SE +/- 0.42, N = 3 140.67 140.50 139.98 139.62 138.59 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel d e a c b 40 80 120 160 200 SE +/- 0.76, N = 3 SE +/- 0.33, N = 3 184.53 182.75 182.74 182.59 181.90 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel b a e d c 30 60 90 120 150 SE +/- 0.58, N = 3 SE +/- 0.11, N = 3 129.64 129.14 128.98 128.83 128.39 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b d e 4 8 12 16 20 SE +/- 0.49, N = 15 SE +/- 0.48, N = 15 15.20 15.27 16.47 16.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b d e 15 30 45 60 75 SE +/- 1.66, N = 15 SE +/- 1.64, N = 15 65.78 65.48 61.41 60.99 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel d a e b 5 10 15 20 25 SE +/- 0.10, N = 3 SE +/- 0.07, N = 3 20.65 20.87 20.87 20.98 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard b a c d e 2 4 6 8 10 SE +/- 0.27617, N = 15 SE +/- 0.33493, N = 15 4.82062 4.82607 4.98387 5.33607 6.27994 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard b a c d e 50 100 150 200 250 SE +/- 7.65, N = 15 SE +/- 8.80, N = 15 207.43 207.20 200.64 192.93 165.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel b a e d c 2 4 6 8 10 SE +/- 0.03494, N = 3 SE +/- 0.00637, N = 3 7.71312 7.74287 7.75298 7.76130 7.78802 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard b a d e c 0.5809 1.1618 1.7427 2.3236 2.9045 SE +/- 0.02641, N = 15 SE +/- 0.05369, N = 12 2.12591 2.12926 2.19267 2.43146 2.58167 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard b a d e c 100 200 300 400 500 SE +/- 5.17, N = 15 SE +/- 9.69, N = 12 470.28 469.56 456.82 413.53 387.27 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 0.6042 1.2084 1.8126 2.4168 3.021 SE +/- 0.01841, N = 3 SE +/- 0.03182, N = 3 2.61147 2.62849 2.65494 2.66409 2.68514 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard c b a e d 5 10 15 20 25 SE +/- 0.52, N = 15 SE +/- 0.42, N = 15 21.25 21.34 21.35 22.24 22.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard c b a e d 11 22 33 44 55 SE +/- 0.90, N = 15 SE +/- 0.76, N = 15 47.05 46.86 46.85 45.25 45.11 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a e c b d 7 14 21 28 35 SE +/- 0.11, N = 3 SE +/- 0.04, N = 3 27.08 27.11 27.21 27.30 27.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c b d a e 90 180 270 360 450 SE +/- 0.23, N = 3 SE +/- 22.36, N = 15 302.04 303.43 304.54 308.14 423.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c b d a e 0.7449 1.4898 2.2347 2.9796 3.7245 SE +/- 0.00251, N = 3 SE +/- 0.15192, N = 15 3.31084 3.29562 3.28368 3.24523 2.47468 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel c a d e b 120 240 360 480 600 SE +/- 0.70, N = 3 SE +/- 5.91, N = 4 523.98 533.16 537.39 542.06 548.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a e b d c 0.231 0.462 0.693 0.924 1.155 SE +/- 0.002451, N = 3 SE +/- 0.024341, N = 15 0.828027 0.828290 0.887052 0.995978 1.026520 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a e b d c 300 600 900 1200 1500 SE +/- 3.57, N = 3 SE +/- 26.76, N = 15 1207.09 1206.81 1126.85 1012.63 973.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b d e c 0.2533 0.5066 0.7599 1.0132 1.2665 SE +/- 0.01085, N = 3 SE +/- 0.00979, N = 3 1.09815 1.10576 1.11678 1.11950 1.12580 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard c a d e b 14 28 42 56 70 SE +/- 0.05, N = 3 SE +/- 2.20, N = 15 47.66 47.76 47.88 55.55 64.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard c a d e b 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.70, N = 15 20.98 20.94 20.88 18.39 15.52 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel b c e d a 15 30 45 60 75 SE +/- 0.20, N = 3 SE +/- 0.38, N = 3 60.85 61.93 63.47 64.11 65.10 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard b a d e c 1.2802 2.5604 3.8406 5.1208 6.401 SE +/- 0.02251, N = 3 SE +/- 0.05921, N = 3 5.12505 5.14336 5.16511 5.23397 5.68968 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel d e a c b 1.2368 2.4736 3.7104 4.9472 6.184 SE +/- 0.02247, N = 3 SE +/- 0.00976, N = 3 5.41860 5.47131 5.47142 5.47609 5.49686 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard c a d e b 20 40 60 80 100 SE +/- 0.61, N = 3 SE +/- 2.22, N = 15 75.51 76.18 76.87 84.71 92.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard c a d e b 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.31, N = 15 13.24 13.13 13.01 11.92 10.86 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel c a d e b 20 40 60 80 100 SE +/- 0.71, N = 3 SE +/- 0.95, N = 3 82.04 82.84 84.49 84.87 85.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard c a e b d 2 4 6 8 10 SE +/- 0.04554, N = 12 SE +/- 0.08957, N = 15 6.12878 6.14139 6.21146 6.28555 6.32885 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel e c b d a 2 4 6 8 10 SE +/- 0.00746, N = 3 SE +/- 0.02149, N = 3 7.10485 7.11340 7.13996 7.15826 7.21183 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5