9950X onnx svt AMD Ryzen 9 9950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (2204 BIOS) and AMD Radeon RX 7900 GRE 16GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408225-NE-9950XONNX44&gru&sro .
9950X onnx svt Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen 9 9950X 16-Core @ 5.75GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (2204 BIOS) AMD Device 14d8 2 x 32GB DDR5-6400MT/s Corsair CMK64GX5M2B6400C32 2000GB Corsair MP700 PRO AMD Radeon RX 7900 GRE 16GB AMD Navi 31 HDMI/DP DELL U2723QE Intel I225-V + Intel Wi-Fi 6E Ubuntu 24.04 6.10.0-phx (x86_64) GNOME Shell 46.0 X Server + Wayland 4.6 Mesa 24.2~git2406040600.8112d4~oibaf~n (git-8112d44 2024-06-04 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xb40401a Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
9950X onnx svt svt-av1: Preset 3 - Bosphorus 4K svt-av1: Preset 5 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 3 - Bosphorus 1080p svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 5 - Beauty 4K 10-bit svt-av1: Preset 8 - Beauty 4K 10-bit svt-av1: Preset 13 - Beauty 4K 10-bit onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard whisperfile: Tiny whisperfile: Small whisperfile: Medium a b c d e 11.814 44.104 98.657 257.201 37.313 128.707 302.419 1009.640 1.760 7.574 10.593 19.400 150.865 161.414 7.71368 13.7840 71.0823 127.185 190.549 190.624 11.7750 25.1022 245.278 851.244 2.22498 4.95375 24.4794 55.7411 152.641 578.940 221.936 225.156 1.41808 2.55959 41.3330 61.4985 6.62484 6.19550 129.644 72.5475 14.0673 7.86110 5.24748 5.24802 84.9286 39.8348 4.07620 1.17405 449.481 201.866 40.8495 17.9386 6.55079 1.72670 4.50538 4.44116 705.209 390.73 24.1918 16.2609 29.63488 127.97316 328.97184 11.821 43.974 99.255 258.647 37.209 127.961 304.327 1011.236 1.758 7.588 10.609 19.404 151.245 156.717 7.68899 13.8825 70.7958 125.625 190.482 192.676 11.6016 25.0524 243.727 852.418 2.25003 4.92894 24.4461 56.0511 152.853 573.425 220.47 224.14 1.41079 2.55502 41.3018 60.4839 6.60878 6.37911 130.124 72.0326 14.1239 7.95942 5.24914 5.18948 86.1980 39.9139 4.1004 1.17235 444.436 202.881 40.9041 17.8392 6.54126 1.7433 4.53509 4.4613 708.821 391.384 24.2099 16.5315 30.31093 128.23627 327.12841 11.794 43.769 99.222 257.418 37.198 128.368 303.057 1012.768 1.76 7.583 10.527 19.419 149.566 157.628 8.03773 13.914 71.5953 125.997 187.735 192.823 11.9832 25.0458 242.238 855.082 2.24795 4.98053 24.2813 56.0119 149.511 571.193 214.475 223.785 1.42067 2.57815 40.8941 62.198 6.6827 6.34248 124.41 71.8677 13.9657 7.93481 5.32595 5.18564 83.4468 39.924 4.12702 1.16879 444.847 200.78 41.1806 17.8517 6.68702 1.74974 4.66209 4.46838 703.891 387.873 24.4512 16.076 30.03969 129.62575 329.23828 11.818 43.874 98.242 255.705 37.151 128.103 303.122 1011.433 1.759 7.54 10.525 19.488 150.289 157.689 8.62837 13.948 70.5538 127.169 190.604 183.018 11.7088 23.4098 245.867 854.139 2.2368 4.93277 24.7729 56.034 149.296 574.317 223.209 224.24 1.40886 2.55793 41.0832 62.1495 6.65046 6.33978 115.894 71.6925 14.1704 7.86194 5.2459 5.46322 85.4013 42.7142 4.06632 1.17003 447.063 202.724 40.3648 17.8447 6.69723 1.74031 4.4796 4.45929 709.79 390.94 24.3388 16.0884 30.03582 129.79648 328.28944 11.744 43.739 98.06 256.152 37.197 127.605 301.639 1010.763 1.759 7.541 10.737 19.487 151.197 157.65 7.82444 13.705 71.6359 126.016 191.361 193.253 11.7526 25.0243 239.032 861.654 2.23767 4.93075 24.2594 55.9068 149.907 575.415 217.917 220.163 1.43691 2.55063 40.8234 60.315 6.61099 6.34139 127.801 72.9637 13.9577 7.93399 5.22498 5.17395 85.0842 39.9589 4.1821 1.15972 446.891 202.807 41.2192 17.8852 6.66991 1.7372 4.58833 4.54187 695.937 392.057 24.4935 16.578 30.37277 129.38091 328.01325 OpenBenchmarking.org
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c d e 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 11.81 11.82 11.79 11.82 11.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c d e 10 20 30 40 50 SE +/- 0.06, N = 3 SE +/- 0.13, N = 3 44.10 43.97 43.77 43.87 43.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c d e 20 40 60 80 100 SE +/- 0.16, N = 3 SE +/- 0.23, N = 3 98.66 99.26 99.22 98.24 98.06 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c d e 60 120 180 240 300 SE +/- 0.27, N = 3 SE +/- 0.18, N = 3 257.20 258.65 257.42 255.71 256.15 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c d e 9 18 27 36 45 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 37.31 37.21 37.20 37.15 37.20 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c d e 30 60 90 120 150 SE +/- 0.11, N = 3 SE +/- 0.17, N = 3 128.71 127.96 128.37 128.10 127.61 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c d e 70 140 210 280 350 SE +/- 0.64, N = 3 SE +/- 0.56, N = 3 302.42 304.33 303.06 303.12 301.64 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c d e 200 400 600 800 1000 SE +/- 2.29, N = 3 SE +/- 1.10, N = 3 1009.64 1011.24 1012.77 1011.43 1010.76 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c d e 0.396 0.792 1.188 1.584 1.98 SE +/- 0.005, N = 3 SE +/- 0.003, N = 3 1.760 1.758 1.760 1.759 1.759 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c d e 2 4 6 8 10 SE +/- 0.022, N = 3 SE +/- 0.052, N = 3 7.574 7.588 7.583 7.540 7.541 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c d e 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 10.59 10.61 10.53 10.53 10.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c d e 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 19.40 19.40 19.42 19.49 19.49 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 0.19, N = 3 SE +/- 0.53, N = 3 150.87 151.25 149.57 150.29 151.20 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 1.98, N = 3 SE +/- 0.41, N = 3 161.41 156.72 157.63 157.69 157.65 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.04367, N = 3 SE +/- 0.06383, N = 9 7.71368 7.68899 8.03773 8.62837 7.82444 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 13.78 13.88 13.91 13.95 13.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d e 16 32 48 64 80 SE +/- 0.42, N = 3 SE +/- 0.36, N = 3 71.08 70.80 71.60 70.55 71.64 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d e 30 60 90 120 150 SE +/- 0.57, N = 3 SE +/- 0.96, N = 3 127.19 125.63 126.00 127.17 126.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 40 80 120 160 200 SE +/- 0.96, N = 3 SE +/- 0.37, N = 3 190.55 190.48 187.74 190.60 191.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 1.93, N = 6 SE +/- 0.25, N = 3 190.62 192.68 192.82 183.02 193.25 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.07, N = 3 SE +/- 0.07, N = 3 11.78 11.60 11.98 11.71 11.75 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 6 12 18 24 30 SE +/- 0.04, N = 3 25.10 25.05 25.05 23.41 25.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 50 100 150 200 250 SE +/- 1.89, N = 3 245.28 243.73 242.24 245.87 239.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 200 400 600 800 1000 SE +/- 2.64, N = 3 851.24 852.42 855.08 854.14 861.65 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 0.5063 1.0126 1.5189 2.0252 2.5315 SE +/- 0.01511, N = 3 2.22498 2.25003 2.24795 2.23680 2.23767 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 1.1206 2.2412 3.3618 4.4824 5.603 SE +/- 0.00675, N = 3 4.95375 4.92894 4.98053 4.93277 4.93075 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 6 12 18 24 30 SE +/- 0.09, N = 3 24.48 24.45 24.28 24.77 24.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 13 26 39 52 65 SE +/- 0.11, N = 3 55.74 56.05 56.01 56.03 55.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 0.85, N = 3 152.64 152.85 149.51 149.30 149.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 130 260 390 520 650 SE +/- 2.14, N = 3 578.94 573.43 571.19 574.32 575.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 50 100 150 200 250 SE +/- 1.21, N = 3 221.94 220.47 214.48 223.21 217.92 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 50 100 150 200 250 SE +/- 0.09, N = 3 225.16 224.14 223.79 224.24 220.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 0.3233 0.6466 0.9699 1.2932 1.6165 SE +/- 0.00694, N = 3 1.41808 1.41079 1.42067 1.40886 1.43691 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 0.5801 1.1602 1.7403 2.3204 2.9005 SE +/- 0.02001, N = 3 2.55959 2.55502 2.57815 2.55793 2.55063 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 9 18 27 36 45 SE +/- 0.10, N = 3 41.33 41.30 40.89 41.08 40.82 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 14 28 42 56 70 SE +/- 0.49, N = 3 61.50 60.48 62.20 62.15 60.32 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.00827, N = 3 SE +/- 0.02289, N = 3 6.62484 6.60878 6.68270 6.65046 6.61099 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 2 4 6 8 10 SE +/- 0.07694, N = 3 SE +/- 0.01653, N = 3 6.19550 6.37911 6.34248 6.33978 6.34139 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 0.73, N = 3 SE +/- 1.08, N = 9 129.64 130.12 124.41 115.89 127.80 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 16 32 48 64 80 SE +/- 0.26, N = 3 SE +/- 0.27, N = 3 72.55 72.03 71.87 71.69 72.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.08, N = 3 SE +/- 0.07, N = 3 14.07 14.12 13.97 14.17 13.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d e 2 4 6 8 10 SE +/- 0.03518, N = 3 SE +/- 0.06114, N = 3 7.86110 7.95942 7.93481 7.86194 7.93399 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 1.1983 2.3966 3.5949 4.7932 5.9915 SE +/- 0.02621, N = 3 SE +/- 0.01022, N = 3 5.24748 5.24914 5.32595 5.24590 5.22498 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 1.2292 2.4584 3.6876 4.9168 6.146 SE +/- 0.05529, N = 6 SE +/- 0.00692, N = 3 5.24802 5.18948 5.18564 5.46322 5.17395 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 20 40 60 80 100 SE +/- 0.50, N = 3 SE +/- 0.50, N = 3 84.93 86.20 83.45 85.40 85.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 10 20 30 40 50 SE +/- 0.07, N = 3 39.83 39.91 39.92 42.71 39.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 0.941 1.882 2.823 3.764 4.705 SE +/- 0.03127, N = 3 4.07620 4.10040 4.12702 4.06632 4.18210 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 0.2642 0.5284 0.7926 1.0568 1.321 SE +/- 0.00364, N = 3 1.17405 1.17235 1.16879 1.17003 1.15972 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 100 200 300 400 500 SE +/- 3.07, N = 3 449.48 444.44 444.85 447.06 446.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 0.28, N = 3 201.87 202.88 200.78 202.72 202.81 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 9 18 27 36 45 SE +/- 0.15, N = 3 40.85 40.90 41.18 40.36 41.22 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.04, N = 3 17.94 17.84 17.85 17.84 17.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.03664, N = 3 6.55079 6.54126 6.68702 6.69723 6.66991 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 0.3937 0.7874 1.1811 1.5748 1.9685 SE +/- 0.00641, N = 3 1.72670 1.74330 1.74974 1.74031 1.73720 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 1.049 2.098 3.147 4.196 5.245 SE +/- 0.02467, N = 3 4.50538 4.53509 4.66209 4.47960 4.58833 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 1.0219 2.0438 3.0657 4.0876 5.1095 SE +/- 0.00178, N = 3 4.44116 4.46130 4.46838 4.45929 4.54187 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 150 300 450 600 750 SE +/- 3.47, N = 3 705.21 708.82 703.89 709.79 695.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 90 180 270 360 450 SE +/- 3.03, N = 3 390.73 391.38 387.87 390.94 392.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 6 12 18 24 30 SE +/- 0.06, N = 3 24.19 24.21 24.45 24.34 24.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.13, N = 3 16.26 16.53 16.08 16.09 16.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny a b c d e 7 14 21 28 35 SE +/- 0.12, N = 3 29.63 30.31 30.04 30.04 30.37
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small a b c d e 30 60 90 120 150 SE +/- 0.21, N = 3 127.97 128.24 129.63 129.80 129.38
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium a b c d e 70 140 210 280 350 SE +/- 0.20, N = 3 328.97 327.13 329.24 328.29 328.01
Phoronix Test Suite v10.8.5