9950X onnx svt AMD Ryzen 9 9950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (2204 BIOS) and AMD Radeon RX 7900 GRE 16GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408225-NE-9950XONNX44&grt&sor .
9950X onnx svt Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen 9 9950X 16-Core @ 5.75GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (2204 BIOS) AMD Device 14d8 2 x 32GB DDR5-6400MT/s Corsair CMK64GX5M2B6400C32 2000GB Corsair MP700 PRO AMD Radeon RX 7900 GRE 16GB AMD Navi 31 HDMI/DP DELL U2723QE Intel I225-V + Intel Wi-Fi 6E Ubuntu 24.04 6.10.0-phx (x86_64) GNOME Shell 46.0 X Server + Wayland 4.6 Mesa 24.2~git2406040600.8112d4~oibaf~n (git-8112d44 2024-06-04 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xb40401a Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
9950X onnx svt onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard svt-av1: Preset 3 - Bosphorus 4K svt-av1: Preset 5 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 3 - Bosphorus 1080p svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 5 - Beauty 4K 10-bit svt-av1: Preset 8 - Beauty 4K 10-bit svt-av1: Preset 13 - Beauty 4K 10-bit whisperfile: Tiny whisperfile: Small whisperfile: Medium a b c d e 150.865 6.62484 161.414 6.19550 7.71368 129.644 13.7840 72.5475 71.0823 14.0673 127.185 7.86110 190.549 5.24748 190.624 5.24802 11.7750 84.9286 25.1022 39.8348 245.278 4.07620 851.244 1.17405 2.22498 449.481 4.95375 201.866 24.4794 40.8495 55.7411 17.9386 152.641 6.55079 578.940 1.72670 221.936 4.50538 225.156 4.44116 1.41808 705.209 2.55959 390.73 41.3330 24.1918 61.4985 16.2609 11.814 44.104 98.657 257.201 37.313 128.707 302.419 1009.640 1.760 7.574 10.593 19.400 29.63488 127.97316 328.97184 151.245 6.60878 156.717 6.37911 7.68899 130.124 13.8825 72.0326 70.7958 14.1239 125.625 7.95942 190.482 5.24914 192.676 5.18948 11.6016 86.1980 25.0524 39.9139 243.727 4.1004 852.418 1.17235 2.25003 444.436 4.92894 202.881 24.4461 40.9041 56.0511 17.8392 152.853 6.54126 573.425 1.7433 220.47 4.53509 224.14 4.4613 1.41079 708.821 2.55502 391.384 41.3018 24.2099 60.4839 16.5315 11.821 43.974 99.255 258.647 37.209 127.961 304.327 1011.236 1.758 7.588 10.609 19.404 30.31093 128.23627 327.12841 149.566 6.6827 157.628 6.34248 8.03773 124.41 13.914 71.8677 71.5953 13.9657 125.997 7.93481 187.735 5.32595 192.823 5.18564 11.9832 83.4468 25.0458 39.924 242.238 4.12702 855.082 1.16879 2.24795 444.847 4.98053 200.78 24.2813 41.1806 56.0119 17.8517 149.511 6.68702 571.193 1.74974 214.475 4.66209 223.785 4.46838 1.42067 703.891 2.57815 387.873 40.8941 24.4512 62.198 16.076 11.794 43.769 99.222 257.418 37.198 128.368 303.057 1012.768 1.76 7.583 10.527 19.419 30.03969 129.62575 329.23828 150.289 6.65046 157.689 6.33978 8.62837 115.894 13.948 71.6925 70.5538 14.1704 127.169 7.86194 190.604 5.2459 183.018 5.46322 11.7088 85.4013 23.4098 42.7142 245.867 4.06632 854.139 1.17003 2.2368 447.063 4.93277 202.724 24.7729 40.3648 56.034 17.8447 149.296 6.69723 574.317 1.74031 223.209 4.4796 224.24 4.45929 1.40886 709.79 2.55793 390.94 41.0832 24.3388 62.1495 16.0884 11.818 43.874 98.242 255.705 37.151 128.103 303.122 1011.433 1.759 7.54 10.525 19.488 30.03582 129.79648 328.28944 151.197 6.61099 157.65 6.34139 7.82444 127.801 13.705 72.9637 71.6359 13.9577 126.016 7.93399 191.361 5.22498 193.253 5.17395 11.7526 85.0842 25.0243 39.9589 239.032 4.1821 861.654 1.15972 2.23767 446.891 4.93075 202.807 24.2594 41.2192 55.9068 17.8852 149.907 6.66991 575.415 1.7372 217.917 4.58833 220.163 4.54187 1.43691 695.937 2.55063 392.057 40.8234 24.4935 60.315 16.578 11.744 43.739 98.06 256.152 37.197 127.605 301.639 1010.763 1.759 7.541 10.737 19.487 30.37277 129.38091 328.01325 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel b e a d c 30 60 90 120 150 SE +/- 0.53, N = 3 SE +/- 0.19, N = 3 151.25 151.20 150.87 150.29 149.57 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel b e a d c 2 4 6 8 10 SE +/- 0.02289, N = 3 SE +/- 0.00827, N = 3 6.60878 6.61099 6.62484 6.65046 6.68270 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a d e c b 40 80 120 160 200 SE +/- 1.98, N = 3 SE +/- 0.41, N = 3 161.41 157.69 157.65 157.63 156.72 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a d e c b 2 4 6 8 10 SE +/- 0.07694, N = 3 SE +/- 0.01653, N = 3 6.19550 6.33978 6.34139 6.34248 6.37911 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel d c e a b 2 4 6 8 10 SE +/- 0.04367, N = 3 SE +/- 0.06383, N = 9 8.62837 8.03773 7.82444 7.71368 7.68899 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel d c e a b 30 60 90 120 150 SE +/- 0.73, N = 3 SE +/- 1.08, N = 9 115.89 124.41 127.80 129.64 130.12 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard d c b a e 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 13.95 13.91 13.88 13.78 13.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard d c b a e 16 32 48 64 80 SE +/- 0.27, N = 3 SE +/- 0.26, N = 3 71.69 71.87 72.03 72.55 72.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel e c a b d 16 32 48 64 80 SE +/- 0.42, N = 3 SE +/- 0.36, N = 3 71.64 71.60 71.08 70.80 70.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel e c a b d 4 8 12 16 20 SE +/- 0.08, N = 3 SE +/- 0.07, N = 3 13.96 13.97 14.07 14.12 14.17 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a d e c b 30 60 90 120 150 SE +/- 0.57, N = 3 SE +/- 0.96, N = 3 127.19 127.17 126.02 126.00 125.63 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a d e c b 2 4 6 8 10 SE +/- 0.03518, N = 3 SE +/- 0.06114, N = 3 7.86110 7.86194 7.93399 7.93481 7.95942 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel e d a b c 40 80 120 160 200 SE +/- 0.96, N = 3 SE +/- 0.37, N = 3 191.36 190.60 190.55 190.48 187.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel e d a b c 1.1983 2.3966 3.5949 4.7932 5.9915 SE +/- 0.02621, N = 3 SE +/- 0.01022, N = 3 5.22498 5.24590 5.24748 5.24914 5.32595 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard e c b a d 40 80 120 160 200 SE +/- 0.25, N = 3 SE +/- 1.93, N = 6 193.25 192.82 192.68 190.62 183.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard e c b a d 1.2292 2.4584 3.6876 4.9168 6.146 SE +/- 0.00692, N = 3 SE +/- 0.05529, N = 6 5.17395 5.18564 5.18948 5.24802 5.46322 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel c a e d b 3 6 9 12 15 SE +/- 0.07, N = 3 SE +/- 0.07, N = 3 11.98 11.78 11.75 11.71 11.60 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel c a e d b 20 40 60 80 100 SE +/- 0.50, N = 3 SE +/- 0.50, N = 3 83.45 84.93 85.08 85.40 86.20 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c e d 6 12 18 24 30 SE +/- 0.04, N = 3 25.10 25.05 25.05 25.02 23.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c e d 10 20 30 40 50 SE +/- 0.07, N = 3 39.83 39.91 39.92 39.96 42.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel d a b c e 50 100 150 200 250 SE +/- 1.89, N = 3 245.87 245.28 243.73 242.24 239.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel d a b c e 0.941 1.882 2.823 3.764 4.705 SE +/- 0.03127, N = 3 4.06632 4.07620 4.10040 4.12702 4.18210 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard e c d b a 200 400 600 800 1000 SE +/- 2.64, N = 3 861.65 855.08 854.14 852.42 851.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard e c d b a 0.2642 0.5284 0.7926 1.0568 1.321 SE +/- 0.00364, N = 3 1.15972 1.16879 1.17003 1.17235 1.17405 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel b c e d a 0.5063 1.0126 1.5189 2.0252 2.5315 SE +/- 0.01511, N = 3 2.25003 2.24795 2.23767 2.23680 2.22498 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel b c e d a 100 200 300 400 500 SE +/- 3.07, N = 3 444.44 444.85 446.89 447.06 449.48 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c a d e b 1.1206 2.2412 3.3618 4.4824 5.603 SE +/- 0.00675, N = 3 4.98053 4.95375 4.93277 4.93075 4.92894 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c a d e b 40 80 120 160 200 SE +/- 0.28, N = 3 200.78 201.87 202.72 202.81 202.88 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel d a b c e 6 12 18 24 30 SE +/- 0.09, N = 3 24.77 24.48 24.45 24.28 24.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel d a b c e 9 18 27 36 45 SE +/- 0.15, N = 3 40.36 40.85 40.90 41.18 41.22 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard b d c e a 13 26 39 52 65 SE +/- 0.11, N = 3 56.05 56.03 56.01 55.91 55.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard b d c e a 4 8 12 16 20 SE +/- 0.04, N = 3 17.84 17.84 17.85 17.89 17.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel b a e c d 30 60 90 120 150 SE +/- 0.85, N = 3 152.85 152.64 149.91 149.51 149.30 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel b a e c d 2 4 6 8 10 SE +/- 0.03664, N = 3 6.54126 6.55079 6.66991 6.68702 6.69723 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a e d b c 130 260 390 520 650 SE +/- 2.14, N = 3 578.94 575.42 574.32 573.43 571.19 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a e d b c 0.3937 0.7874 1.1811 1.5748 1.9685 SE +/- 0.00641, N = 3 1.72670 1.73720 1.74031 1.74330 1.74974 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel d a b e c 50 100 150 200 250 SE +/- 1.21, N = 3 223.21 221.94 220.47 217.92 214.48 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel d a b e c 1.049 2.098 3.147 4.196 5.245 SE +/- 0.02467, N = 3 4.47960 4.50538 4.53509 4.58833 4.66209 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a d b c e 50 100 150 200 250 SE +/- 0.09, N = 3 225.16 224.24 224.14 223.79 220.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a d b c e 1.0219 2.0438 3.0657 4.0876 5.1095 SE +/- 0.00178, N = 3 4.44116 4.45929 4.46130 4.46838 4.54187 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel e c a b d 0.3233 0.6466 0.9699 1.2932 1.6165 SE +/- 0.00694, N = 3 1.43691 1.42067 1.41808 1.41079 1.40886 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel e c a b d 150 300 450 600 750 SE +/- 3.47, N = 3 695.94 703.89 705.21 708.82 709.79 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard c a d b e 0.5801 1.1602 1.7403 2.3204 2.9005 SE +/- 0.02001, N = 3 2.57815 2.55959 2.55793 2.55502 2.55063 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard c a d b e 90 180 270 360 450 SE +/- 3.03, N = 3 387.87 390.73 390.94 391.38 392.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b d c e 9 18 27 36 45 SE +/- 0.10, N = 3 41.33 41.30 41.08 40.89 40.82 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b d c e 6 12 18 24 30 SE +/- 0.06, N = 3 24.19 24.21 24.34 24.45 24.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard c d a b e 14 28 42 56 70 SE +/- 0.49, N = 3 62.20 62.15 61.50 60.48 60.32 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard c d a b e 4 8 12 16 20 SE +/- 0.13, N = 3 16.08 16.09 16.26 16.53 16.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K b d a c e 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.04, N = 3 11.82 11.82 11.81 11.79 11.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b d c e 10 20 30 40 50 SE +/- 0.06, N = 3 SE +/- 0.13, N = 3 44.10 43.97 43.87 43.77 43.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K b c a d e 20 40 60 80 100 SE +/- 0.23, N = 3 SE +/- 0.16, N = 3 99.26 99.22 98.66 98.24 98.06 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K b c a e d 60 120 180 240 300 SE +/- 0.18, N = 3 SE +/- 0.27, N = 3 258.65 257.42 257.20 256.15 255.71 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c e d 9 18 27 36 45 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 37.31 37.21 37.20 37.20 37.15 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a c d b e 30 60 90 120 150 SE +/- 0.11, N = 3 SE +/- 0.17, N = 3 128.71 128.37 128.10 127.96 127.61 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p b d c a e 70 140 210 280 350 SE +/- 0.56, N = 3 SE +/- 0.64, N = 3 304.33 303.12 303.06 302.42 301.64 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p c d b e a 200 400 600 800 1000 SE +/- 1.10, N = 3 SE +/- 2.29, N = 3 1012.77 1011.43 1011.24 1010.76 1009.64 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit c a e d b 0.396 0.792 1.188 1.584 1.98 SE +/- 0.005, N = 3 SE +/- 0.003, N = 3 1.760 1.760 1.759 1.759 1.758 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit b c a e d 2 4 6 8 10 SE +/- 0.052, N = 3 SE +/- 0.022, N = 3 7.588 7.583 7.574 7.541 7.540 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit e b a c d 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 10.74 10.61 10.59 10.53 10.53 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit d e c b a 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 19.49 19.49 19.42 19.40 19.40 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny a d c b e 7 14 21 28 35 SE +/- 0.12, N = 3 29.63 30.04 30.04 30.31 30.37
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small a b e c d 30 60 90 120 150 SE +/- 0.21, N = 3 127.97 128.24 129.38 129.63 129.80
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium b e d a c 70 140 210 280 350 SE +/- 0.20, N = 3 327.13 328.01 328.29 328.97 329.24
Phoronix Test Suite v10.8.5