zstd onnx runtime raptor lake Intel Core i9-13900K testing with a ASUS PRIME Z790-P WIFI (0602 BIOS) and AMD Radeon RX 6800/6800 XT / 6900 on Ubuntu 23.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2302118-NE-ZSTDONNXR37 .
zstd onnx runtime raptor lake Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Compiler File-System Screen Resolution OpenGL a b c d Intel Core i9-13900K @ 4.00GHz (24 Cores / 32 Threads) ASUS PRIME Z790-P WIFI (0602 BIOS) Intel Device 7a27 32GB 1000GB Western Digital WDS100T1X0E-00AFY0 AMD Radeon RX 6800/6800 XT / 6900 (2475/1000MHz) Realtek ALC897 ASUS VP28U Intel Device 7a70 Ubuntu 23.04 5.19.0-21-generic (x86_64) GNOME Shell 43.2 X Server + Wayland GCC 12.2.0 ext4 3840x2160 4.6 Mesa 22.2.5 (LLVM 15.0.6 DRM 3.47) OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-AKimc9/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-AKimc9/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x10e - Thermald 2.5.2 Python Details - Python 3.11.1 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
zstd onnx runtime raptor lake compress-zstd: 3 - Compression Speed compress-zstd: 3 - Decompression Speed compress-zstd: 8 - Compression Speed compress-zstd: 8 - Decompression Speed compress-zstd: 12 - Compression Speed compress-zstd: 12 - Decompression Speed compress-zstd: 19 - Compression Speed compress-zstd: 19 - Decompression Speed compress-zstd: 3, Long Mode - Compression Speed compress-zstd: 3, Long Mode - Decompression Speed compress-zstd: 8, Long Mode - Compression Speed compress-zstd: 8, Long Mode - Decompression Speed compress-zstd: 19, Long Mode - Compression Speed compress-zstd: 19, Long Mode - Decompression Speed onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard a b c d 3282.8 2320.9 768.5 2544 257.6 2527.1 20.6 2177.4 1208.6 2355.6 716.6 2550.3 11.5 2089.3 136.268 7.33381 179.346 5.57297 15.1539 65.9875 16.3677 61.0943 20.2501 49.3807 20.9839 47.6544 307.253 3.25366 968.274 1.03194 2.33535 428.199 3.03777 329.187 9.60869 104.071 9.42226 106.131 225.126 4.44106 307.673 3.24947 91.8196 10.8901 93.2845 10.7196 48.7596 20.5066 60.1419 16.6259 3337.1 2319.0 768.1 2544.5 255.2 2534.8 20.6 2173.8 1180.7 2354.4 716.3 2549.1 11.5 2087.4 140.148 7.13149 181.731 5.50076 15.3524 65.1514 16.1776 61.8791 20.1574 49.6100 21.2196 47.1291 306.758 3.25890 962.233 1.03846 2.35087 425.424 3.03000 330.034 10.1666 98.3894 9.39511 106.439 223.444 4.47496 314.390 3.18126 91.0069 10.9883 93.1303 10.7374 49.2392 20.3072 63.3598 15.7825 3351.9 2316.5 770.7 2544.5 254.9 2533.2 20.6 2173.5 1173.2 2354.1 729 2547.5 11.5 2093.5 138.427 7.21976 179.114 5.58038 15.5423 64.3384 17.4667 57.2505 20.1612 49.599 21.0028 47.6113 303.328 3.29575 958.033 1.04295 2.27124 440.287 3.03244 329.766 10.3221 96.879 9.42361 106.116 228.077 4.38369 322.408 3.10117 91.0558 10.9816 93.2817 10.72 49.0429 20.3885 63.1179 15.8423 3303.8 2320.3 770.6 2538.4 254.3 2524.2 20.6 2174.9 1166.8 2351 721.1 2545.3 11.5 2091.5 143.703 6.95506 178.588 5.59669 15.6649 63.8345 16.0208 62.4174 20.3158 49.2215 21.11 47.3696 306.918 3.25716 1003.24 0.996063 2.31074 432.76 3.02577 330.493 10.5553 94.7374 9.38318 106.573 225.858 4.42664 320.3 3.12147 95.6224 10.457 93.2876 10.7193 48.9069 20.4455 59.2699 16.8704 OpenBenchmarking.org
Zstd Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 3 - Compression Speed a b c d 700 1400 2100 2800 3500 SE +/- 26.62, N = 3 3282.8 3337.1 3351.9 3303.8 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 3 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 3 - Decompression Speed a b c d 500 1000 1500 2000 2500 SE +/- 0.85, N = 3 2320.9 2319.0 2316.5 2320.3 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 8 - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 8 - Compression Speed a b c d 170 340 510 680 850 SE +/- 7.57, N = 3 768.5 768.1 770.7 770.6 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 8 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 8 - Decompression Speed a b c d 500 1000 1500 2000 2500 SE +/- 0.52, N = 3 2544.0 2544.5 2544.5 2538.4 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 12 - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 12 - Compression Speed a b c d 60 120 180 240 300 SE +/- 0.49, N = 3 257.6 255.2 254.9 254.3 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 12 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 12 - Decompression Speed a b c d 500 1000 1500 2000 2500 SE +/- 0.71, N = 3 2527.1 2534.8 2533.2 2524.2 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 19 - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 19 - Compression Speed a b c d 5 10 15 20 25 SE +/- 0.00, N = 3 20.6 20.6 20.6 20.6 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 19 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 19 - Decompression Speed a b c d 500 1000 1500 2000 2500 SE +/- 0.43, N = 3 2177.4 2173.8 2173.5 2174.9 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 3, Long Mode - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 3, Long Mode - Compression Speed a b c d 300 600 900 1200 1500 SE +/- 6.62, N = 3 1208.6 1180.7 1173.2 1166.8 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 3, Long Mode - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 3, Long Mode - Decompression Speed a b c d 500 1000 1500 2000 2500 SE +/- 0.45, N = 3 2355.6 2354.4 2354.1 2351.0 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 8, Long Mode - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 8, Long Mode - Compression Speed a b c d 160 320 480 640 800 SE +/- 0.94, N = 3 716.6 716.3 729.0 721.1 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 8, Long Mode - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 8, Long Mode - Decompression Speed a b c d 500 1000 1500 2000 2500 SE +/- 0.43, N = 3 2550.3 2549.1 2547.5 2545.3 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 19, Long Mode - Compression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 19, Long Mode - Compression Speed a b c d 3 6 9 12 15 SE +/- 0.00, N = 3 11.5 11.5 11.5 11.5 1. (CC) gcc options: -O3 -pthread -lz
Zstd Compression Compression Level: 19, Long Mode - Decompression Speed OpenBenchmarking.org MB/s, More Is Better Zstd Compression 1.5.4 Compression Level: 19, Long Mode - Decompression Speed a b c d 400 800 1200 1600 2000 SE +/- 2.22, N = 3 2089.3 2087.4 2093.5 2091.5 1. (CC) gcc options: -O3 -pthread -lz
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 30 60 90 120 150 SE +/- 0.59, N = 3 136.27 140.15 138.43 143.70 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 SE +/- 0.03000, N = 3 7.33381 7.13149 7.21976 6.95506 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 40 80 120 160 200 SE +/- 1.09, N = 3 179.35 181.73 179.11 178.59 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 1.2593 2.5186 3.7779 5.0372 6.2965 SE +/- 0.03285, N = 3 5.57297 5.50076 5.58038 5.59669 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 4 8 12 16 20 SE +/- 0.18, N = 3 15.15 15.35 15.54 15.66 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 15 30 45 60 75 SE +/- 0.74, N = 3 65.99 65.15 64.34 63.83 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard a b c d 4 8 12 16 20 SE +/- 0.15, N = 15 16.37 16.18 17.47 16.02 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: yolov4 - Device: CPU - Executor: Standard a b c d 14 28 42 56 70 SE +/- 0.53, N = 15 61.09 61.88 57.25 62.42 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 5 10 15 20 25 SE +/- 0.09, N = 3 20.25 20.16 20.16 20.32 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 11 22 33 44 55 SE +/- 0.22, N = 3 49.38 49.61 49.60 49.22 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 5 10 15 20 25 SE +/- 0.14, N = 3 20.98 21.22 21.00 21.11 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 11 22 33 44 55 SE +/- 0.30, N = 3 47.65 47.13 47.61 47.37 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 70 140 210 280 350 SE +/- 0.45, N = 3 307.25 306.76 303.33 306.92 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 0.7415 1.483 2.2245 2.966 3.7075 SE +/- 0.00481, N = 3 3.25366 3.25890 3.29575 3.25716 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 200 400 600 800 1000 SE +/- 1.19, N = 3 968.27 962.23 958.03 1003.24 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 0.2347 0.4694 0.7041 0.9388 1.1735 SE +/- 0.001294, N = 3 1.031940 1.038460 1.042950 0.996063 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 0.5289 1.0578 1.5867 2.1156 2.6445 SE +/- 0.01857, N = 3 2.33535 2.35087 2.27124 2.31074 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 100 200 300 400 500 SE +/- 3.33, N = 3 428.20 425.42 440.29 432.76 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 0.6835 1.367 2.0505 2.734 3.4175 SE +/- 0.00583, N = 3 3.03777 3.03000 3.03244 3.02577 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 70 140 210 280 350 SE +/- 0.64, N = 3 329.19 330.03 329.77 330.49 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 SE +/- 0.12519, N = 3 9.60869 10.16660 10.32210 10.55530 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 1.20, N = 3 104.07 98.39 96.88 94.74 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.02522, N = 3 9.42226 9.39511 9.42361 9.38318 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.29, N = 3 106.13 106.44 106.12 106.57 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 50 100 150 200 250 SE +/- 1.64, N = 3 225.13 223.44 228.08 225.86 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 1.0069 2.0138 3.0207 4.0276 5.0345 SE +/- 0.03313, N = 3 4.44106 4.47496 4.38369 4.42664 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 70 140 210 280 350 SE +/- 3.92, N = 3 307.67 314.39 322.41 320.30 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 0.7311 1.4622 2.1933 2.9244 3.6555 SE +/- 0.04013, N = 3 3.24947 3.18126 3.10117 3.12147 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 SE +/- 0.56, N = 3 91.82 91.01 91.06 95.62 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 SE +/- 0.07, N = 3 10.89 10.99 10.98 10.46 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.16, N = 3 93.28 93.13 93.28 93.29 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.02, N = 3 10.72 10.74 10.72 10.72 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 11 22 33 44 55 SE +/- 0.02, N = 3 48.76 49.24 49.04 48.91 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 5 10 15 20 25 SE +/- 0.01, N = 3 20.51 20.31 20.39 20.45 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 14 28 42 56 70 SE +/- 0.34, N = 3 60.14 63.36 63.12 59.27 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.14 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 4 8 12 16 20 SE +/- 0.09, N = 3 16.63 15.78 15.84 16.87 1. (CXX) g++ options: -ffunction-sections -fdata-sections -march=native -mtune=native -O3 -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.4