onnx new AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Pop 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402035-NE-ONNXNEW6040&sro&grr .
onnx new Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b c d AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) AMD Starship/Matisse 4 x 32GB DDR4-3000MT/s CMK64GX4M2D3000C16 Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Pop 22.04 6.6.6-76060606-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 4.6 Mesa 23.3.2-1pop0~1704238321~22.04~36f1d0e (LLVM 15.0.7 DRM 3.54) 1.3.267 GCC 11.4.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107a Python Details - Python 3.10.12 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx new onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard a b c d 5.0096 199.462 40.3241 24.796 218.988 4.56634 3.81235 262.065 46.7559 21.3852 1266.7 0.789445 9.36878 106.626 11.7589 84.988 292.555 3.41805 335.413 2.98135 166.968 5.98899 115.028 8.69295 133.782 7.4745 6.2888 158.943 64.313 15.5479 9.84382 101.553 20.7291 48.2263 8.38855 119.152 11.8559 84.3254 11.1599 89.5931 5.11408 195.479 39.0979 25.5728 225.956 4.42556 4.31444 231.654 47.5860 21.0124 1253.22 0.798150 10.9739 91.0383 13.3654 74.7827 302.645 3.30419 342.798 2.91718 170.353 5.87005 115.771 8.63703 136.658 7.31859 7.56668 132.114 64.7543 15.4415 11.5448 86.5898 20.9515 47.7198 8.46928 118.037 11.8641 84.2686 11.2692 88.7364 5.06250 197.531 38.6552 25.8677 227.636 4.39522 4.46068 223.993 47.9147 20.8684 1267.89 0.788713 12.2602 81.4970 15.3358 65.1860 302.824 3.30267 342.953 2.91581 171.549 5.82904 116.484 8.58440 138.664 7.21143 8.34118 119.847 64.8772 15.4128 13.1389 76.0924 21.0075 47.5911 8.54047 117.048 11.9310 83.7961 11.2341 89.0029 5.02241 198.965 39.3208 25.4426 226.820 4.40892 4.41593 226.280 47.9502 20.8534 1257.91 0.795279 12.5484 79.6284 15.6783 63.7474 307.339 3.25386 344.390 2.90364 172.083 5.81121 117.851 8.48632 139.512 7.16765 8.62179 115.971 64.8208 15.4263 13.5336 73.8745 20.7765 48.1204 8.50747 117.504 11.8946 84.0513 11.2437 88.9273 OpenBenchmarking.org
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 1.1507 2.3014 3.4521 4.6028 5.7535 SE +/- 0.05402, N = 5 SE +/- 0.03696, N = 15 SE +/- 0.01436, N = 3 5.00960 5.11408 5.06250 5.02241 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 40 80 120 160 200 SE +/- 2.06, N = 5 SE +/- 1.45, N = 15 SE +/- 0.56, N = 3 199.46 195.48 197.53 198.97 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 9 18 27 36 45 SE +/- 0.19, N = 3 SE +/- 0.20, N = 3 SE +/- 0.34, N = 9 40.32 39.10 38.66 39.32 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 6 12 18 24 30 SE +/- 0.13, N = 3 SE +/- 0.13, N = 3 SE +/- 0.21, N = 9 24.80 25.57 25.87 25.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 50 100 150 200 250 SE +/- 0.51, N = 3 SE +/- 1.88, N = 9 SE +/- 1.30, N = 3 218.99 225.96 227.64 226.82 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 1.0274 2.0548 3.0822 4.1096 5.137 SE +/- 0.01003, N = 3 SE +/- 0.03613, N = 9 SE +/- 0.02513, N = 3 4.56634 4.42556 4.39522 4.40892 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 1.0037 2.0074 3.0111 4.0148 5.0185 SE +/- 0.04592, N = 4 SE +/- 0.00933, N = 3 SE +/- 0.01606, N = 3 3.81235 4.31444 4.46068 4.41593 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 60 120 180 240 300 SE +/- 2.43, N = 4 SE +/- 0.47, N = 3 SE +/- 0.82, N = 3 262.07 231.65 223.99 226.28 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 11 22 33 44 55 SE +/- 0.06, N = 3 SE +/- 0.12, N = 3 SE +/- 0.20, N = 3 46.76 47.59 47.91 47.95 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 5 10 15 20 25 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.08, N = 3 21.39 21.01 20.87 20.85 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 300 600 900 1200 1500 SE +/- 14.56, N = 3 SE +/- 2.19, N = 3 SE +/- 17.69, N = 3 1266.70 1253.22 1267.89 1257.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 0.1796 0.3592 0.5388 0.7184 0.898 SE +/- 0.009251, N = 3 SE +/- 0.001362, N = 3 SE +/- 0.011220, N = 3 0.789445 0.798150 0.788713 0.795279 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 SE +/- 0.03121, N = 3 SE +/- 0.02097, N = 3 SE +/- 0.03777, N = 3 9.36878 10.97390 12.26020 12.54840 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 0.26, N = 3 SE +/- 0.14, N = 3 SE +/- 0.24, N = 3 106.63 91.04 81.50 79.63 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 4 8 12 16 20 SE +/- 0.11, N = 3 SE +/- 0.17, N = 3 SE +/- 0.05, N = 3 11.76 13.37 15.34 15.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.62, N = 3 SE +/- 0.72, N = 3 SE +/- 0.21, N = 3 84.99 74.78 65.19 63.75 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 70 140 210 280 350 SE +/- 1.22, N = 3 SE +/- 2.72, N = 3 SE +/- 1.84, N = 3 292.56 302.65 302.82 307.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 0.7691 1.5382 2.3073 3.0764 3.8455 SE +/- 0.01341, N = 3 SE +/- 0.02977, N = 3 SE +/- 0.01952, N = 3 3.41805 3.30419 3.30267 3.25386 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 70 140 210 280 350 SE +/- 1.06, N = 3 SE +/- 0.21, N = 3 SE +/- 0.50, N = 3 335.41 342.80 342.95 344.39 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 0.6708 1.3416 2.0124 2.6832 3.354 SE +/- 0.00899, N = 3 SE +/- 0.00183, N = 3 SE +/- 0.00422, N = 3 2.98135 2.91718 2.91581 2.90364 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c d 40 80 120 160 200 SE +/- 0.41, N = 3 SE +/- 0.12, N = 3 SE +/- 0.80, N = 3 166.97 170.35 171.55 172.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c d 1.3475 2.695 4.0425 5.39 6.7375 SE +/- 0.01409, N = 3 SE +/- 0.00419, N = 3 SE +/- 0.02691, N = 3 5.98899 5.87005 5.82904 5.81121 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 SE +/- 0.17, N = 3 SE +/- 0.20, N = 3 SE +/- 1.11, N = 3 115.03 115.77 116.48 117.85 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 SE +/- 0.01301, N = 3 SE +/- 0.01460, N = 3 SE +/- 0.07957, N = 3 8.69295 8.63703 8.58440 8.48632 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 30 60 90 120 150 SE +/- 1.33, N = 3 SE +/- 0.38, N = 3 SE +/- 0.50, N = 3 133.78 136.66 138.66 139.51 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 SE +/- 0.07142, N = 3 SE +/- 0.01975, N = 3 SE +/- 0.02570, N = 3 7.47450 7.31859 7.21143 7.16765 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 SE +/- 0.04827, N = 3 SE +/- 0.00537, N = 3 SE +/- 0.09288, N = 3 6.28880 7.56668 8.34118 8.62179 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d 40 80 120 160 200 SE +/- 0.84, N = 3 SE +/- 0.08, N = 3 SE +/- 1.26, N = 3 158.94 132.11 119.85 115.97 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 14 28 42 56 70 SE +/- 0.09, N = 3 SE +/- 0.12, N = 3 SE +/- 0.19, N = 3 64.31 64.75 64.88 64.82 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 4 8 12 16 20 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 15.55 15.44 15.41 15.43 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.00918, N = 3 SE +/- 0.03970, N = 3 SE +/- 0.07065, N = 3 9.84382 11.54480 13.13890 13.53360 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.07, N = 3 SE +/- 0.23, N = 3 SE +/- 0.39, N = 3 101.55 86.59 76.09 73.87 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 5 10 15 20 25 SE +/- 0.11, N = 3 SE +/- 0.09, N = 3 SE +/- 0.04, N = 3 20.73 20.95 21.01 20.78 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 11 22 33 44 55 SE +/- 0.26, N = 3 SE +/- 0.21, N = 3 SE +/- 0.10, N = 3 48.23 47.72 47.59 48.12 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 SE +/- 0.05653, N = 3 SE +/- 0.02479, N = 3 SE +/- 0.03634, N = 3 8.38855 8.46928 8.54047 8.50747 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 SE +/- 0.78, N = 3 SE +/- 0.34, N = 3 SE +/- 0.50, N = 3 119.15 118.04 117.05 117.50 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.00, N = 3 SE +/- 0.02, N = 3 11.86 11.86 11.93 11.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 0.23, N = 3 SE +/- 0.03, N = 3 SE +/- 0.11, N = 3 84.33 84.27 83.80 84.05 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.10, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 11.16 11.27 11.23 11.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.77, N = 3 SE +/- 0.28, N = 3 SE +/- 0.31, N = 3 89.59 88.74 89.00 88.93 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5