9950X onnx svt AMD Ryzen 9 9950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (2204 BIOS) and AMD Radeon RX 7900 GRE 16GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408225-NE-9950XONNX44&rdt&grs .
9950X onnx svt Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen 9 9950X 16-Core @ 5.75GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (2204 BIOS) AMD Device 14d8 2 x 32GB DDR5-6400MT/s Corsair CMK64GX5M2B6400C32 2000GB Corsair MP700 PRO AMD Radeon RX 7900 GRE 16GB AMD Navi 31 HDMI/DP DELL U2723QE Intel I225-V + Intel Wi-Fi 6E Ubuntu 24.04 6.10.0-phx (x86_64) GNOME Shell 46.0 X Server + Wayland 4.6 Mesa 24.2~git2406040600.8112d4~oibaf~n (git-8112d44 2024-06-04 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xb40401a Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
9950X onnx svt onnx: yolov4 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel whisperfile: Tiny onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel svt-av1: Preset 8 - Beauty 4K 10-bit onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel whisperfile: Small onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K onnx: fcn-resnet101-11 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 5 - Bosphorus 4K svt-av1: Preset 3 - Bosphorus 4K whisperfile: Medium svt-av1: Preset 5 - Beauty 4K 10-bit onnx: ArcFace ResNet-100 - CPU - Standard svt-av1: Preset 13 - Beauty 4K 10-bit svt-av1: Preset 3 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p svt-av1: Preset 3 - Beauty 4K 10-bit onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel a b c d e 7.71368 25.1022 190.624 221.936 11.7750 61.4985 161.414 245.278 29.63488 152.641 225.156 24.4794 10.593 1.41808 190.549 13.7840 71.0823 127.97316 578.940 41.3330 127.185 851.244 98.657 257.201 2.22498 150.865 2.55959 4.95375 302.419 128.707 44.104 11.814 328.97184 7.574 55.7411 19.400 37.313 1009.640 1.760 16.2609 24.1918 390.73 705.209 4.44116 4.50538 1.72670 6.55079 17.9386 40.8495 201.866 449.481 1.17405 4.07620 39.8348 84.9286 5.24802 5.24748 7.86110 14.0673 72.5475 129.644 6.19550 6.62484 7.68899 25.0524 192.676 220.47 11.6016 60.4839 156.717 243.727 30.31093 152.853 224.14 24.4461 10.609 1.41079 190.482 13.8825 70.7958 128.23627 573.425 41.3018 125.625 852.418 99.255 258.647 2.25003 151.245 2.55502 4.92894 304.327 127.961 43.974 11.821 327.12841 7.588 56.0511 19.404 37.209 1011.236 1.758 16.5315 24.2099 391.384 708.821 4.4613 4.53509 1.7433 6.54126 17.8392 40.9041 202.881 444.436 1.17235 4.1004 39.9139 86.1980 5.18948 5.24914 7.95942 14.1239 72.0326 130.124 6.37911 6.60878 8.03773 25.0458 192.823 214.475 11.9832 62.198 157.628 242.238 30.03969 149.511 223.785 24.2813 10.527 1.42067 187.735 13.914 71.5953 129.62575 571.193 40.8941 125.997 855.082 99.222 257.418 2.24795 149.566 2.57815 4.98053 303.057 128.368 43.769 11.794 329.23828 7.583 56.0119 19.419 37.198 1012.768 1.76 16.076 24.4512 387.873 703.891 4.46838 4.66209 1.74974 6.68702 17.8517 41.1806 200.78 444.847 1.16879 4.12702 39.924 83.4468 5.18564 5.32595 7.93481 13.9657 71.8677 124.41 6.34248 6.6827 8.62837 23.4098 183.018 223.209 11.7088 62.1495 157.689 245.867 30.03582 149.296 224.24 24.7729 10.525 1.40886 190.604 13.948 70.5538 129.79648 574.317 41.0832 127.169 854.139 98.242 255.705 2.2368 150.289 2.55793 4.93277 303.122 128.103 43.874 11.818 328.28944 7.54 56.034 19.488 37.151 1011.433 1.759 16.0884 24.3388 390.94 709.79 4.45929 4.4796 1.74031 6.69723 17.8447 40.3648 202.724 447.063 1.17003 4.06632 42.7142 85.4013 5.46322 5.2459 7.86194 14.1704 71.6925 115.894 6.33978 6.65046 7.82444 25.0243 193.253 217.917 11.7526 60.315 157.65 239.032 30.37277 149.907 220.163 24.2594 10.737 1.43691 191.361 13.705 71.6359 129.38091 575.415 40.8234 126.016 861.654 98.06 256.152 2.23767 151.197 2.55063 4.93075 301.639 127.605 43.739 11.744 328.01325 7.541 55.9068 19.487 37.197 1010.763 1.759 16.578 24.4935 392.057 695.937 4.54187 4.58833 1.7372 6.66991 17.8852 41.2192 202.807 446.891 1.15972 4.1821 39.9589 85.0842 5.17395 5.22498 7.93399 13.9577 72.9637 127.801 6.34139 6.61099 OpenBenchmarking.org
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.04367, N = 3 SE +/- 0.06383, N = 9 7.71368 7.68899 8.03773 8.62837 7.82444 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 6 12 18 24 30 SE +/- 0.04, N = 3 25.10 25.05 25.05 23.41 25.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 1.93, N = 6 SE +/- 0.25, N = 3 190.62 192.68 192.82 183.02 193.25 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 50 100 150 200 250 SE +/- 1.21, N = 3 221.94 220.47 214.48 223.21 217.92 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 3 6 9 12 15 SE +/- 0.07, N = 3 SE +/- 0.07, N = 3 11.78 11.60 11.98 11.71 11.75 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 14 28 42 56 70 SE +/- 0.49, N = 3 61.50 60.48 62.20 62.15 60.32 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 1.98, N = 3 SE +/- 0.41, N = 3 161.41 156.72 157.63 157.69 157.65 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 50 100 150 200 250 SE +/- 1.89, N = 3 245.28 243.73 242.24 245.87 239.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny a b c d e 7 14 21 28 35 SE +/- 0.12, N = 3 29.63 30.31 30.04 30.04 30.37
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 0.85, N = 3 152.64 152.85 149.51 149.30 149.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 50 100 150 200 250 SE +/- 0.09, N = 3 225.16 224.14 223.79 224.24 220.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 6 12 18 24 30 SE +/- 0.09, N = 3 24.48 24.45 24.28 24.77 24.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c d e 3 6 9 12 15 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 10.59 10.61 10.53 10.53 10.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 0.3233 0.6466 0.9699 1.2932 1.6165 SE +/- 0.00694, N = 3 1.41808 1.41079 1.42067 1.40886 1.43691 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 40 80 120 160 200 SE +/- 0.96, N = 3 SE +/- 0.37, N = 3 190.55 190.48 187.74 190.60 191.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.05, N = 3 13.78 13.88 13.91 13.95 13.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d e 16 32 48 64 80 SE +/- 0.42, N = 3 SE +/- 0.36, N = 3 71.08 70.80 71.60 70.55 71.64 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small a b c d e 30 60 90 120 150 SE +/- 0.21, N = 3 127.97 128.24 129.63 129.80 129.38
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 130 260 390 520 650 SE +/- 2.14, N = 3 578.94 573.43 571.19 574.32 575.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 9 18 27 36 45 SE +/- 0.10, N = 3 41.33 41.30 40.89 41.08 40.82 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d e 30 60 90 120 150 SE +/- 0.57, N = 3 SE +/- 0.96, N = 3 127.19 125.63 126.00 127.17 126.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 200 400 600 800 1000 SE +/- 2.64, N = 3 851.24 852.42 855.08 854.14 861.65 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c d e 20 40 60 80 100 SE +/- 0.16, N = 3 SE +/- 0.23, N = 3 98.66 99.26 99.22 98.24 98.06 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c d e 60 120 180 240 300 SE +/- 0.27, N = 3 SE +/- 0.18, N = 3 257.20 258.65 257.42 255.71 256.15 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 0.5063 1.0126 1.5189 2.0252 2.5315 SE +/- 0.01511, N = 3 2.22498 2.25003 2.24795 2.23680 2.23767 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 0.19, N = 3 SE +/- 0.53, N = 3 150.87 151.25 149.57 150.29 151.20 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 0.5801 1.1602 1.7403 2.3204 2.9005 SE +/- 0.02001, N = 3 2.55959 2.55502 2.57815 2.55793 2.55063 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 1.1206 2.2412 3.3618 4.4824 5.603 SE +/- 0.00675, N = 3 4.95375 4.92894 4.98053 4.93277 4.93075 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c d e 70 140 210 280 350 SE +/- 0.64, N = 3 SE +/- 0.56, N = 3 302.42 304.33 303.06 303.12 301.64 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c d e 30 60 90 120 150 SE +/- 0.11, N = 3 SE +/- 0.17, N = 3 128.71 127.96 128.37 128.10 127.61 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c d e 10 20 30 40 50 SE +/- 0.06, N = 3 SE +/- 0.13, N = 3 44.10 43.97 43.77 43.87 43.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c d e 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 11.81 11.82 11.79 11.82 11.74 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium a b c d e 70 140 210 280 350 SE +/- 0.20, N = 3 328.97 327.13 329.24 328.29 328.01
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c d e 2 4 6 8 10 SE +/- 0.022, N = 3 SE +/- 0.052, N = 3 7.574 7.588 7.583 7.540 7.541 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 13 26 39 52 65 SE +/- 0.11, N = 3 55.74 56.05 56.01 56.03 55.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c d e 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 19.40 19.40 19.42 19.49 19.49 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c d e 9 18 27 36 45 SE +/- 0.01, N = 3 SE +/- 0.01, N = 3 37.31 37.21 37.20 37.15 37.20 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c d e 200 400 600 800 1000 SE +/- 2.29, N = 3 SE +/- 1.10, N = 3 1009.64 1011.24 1012.77 1011.43 1010.76 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c d e 0.396 0.792 1.188 1.584 1.98 SE +/- 0.005, N = 3 SE +/- 0.003, N = 3 1.760 1.758 1.760 1.759 1.759 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.13, N = 3 16.26 16.53 16.08 16.09 16.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d e 6 12 18 24 30 SE +/- 0.06, N = 3 24.19 24.21 24.45 24.34 24.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d e 90 180 270 360 450 SE +/- 3.03, N = 3 390.73 391.38 387.87 390.94 392.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d e 150 300 450 600 750 SE +/- 3.47, N = 3 705.21 708.82 703.89 709.79 695.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d e 1.0219 2.0438 3.0657 4.0876 5.1095 SE +/- 0.00178, N = 3 4.44116 4.46130 4.46838 4.45929 4.54187 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d e 1.049 2.098 3.147 4.196 5.245 SE +/- 0.02467, N = 3 4.50538 4.53509 4.66209 4.47960 4.58833 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d e 0.3937 0.7874 1.1811 1.5748 1.9685 SE +/- 0.00641, N = 3 1.72670 1.74330 1.74974 1.74031 1.73720 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.03664, N = 3 6.55079 6.54126 6.68702 6.69723 6.66991 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d e 4 8 12 16 20 SE +/- 0.04, N = 3 17.94 17.84 17.85 17.84 17.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d e 9 18 27 36 45 SE +/- 0.15, N = 3 40.85 40.90 41.18 40.36 41.22 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d e 40 80 120 160 200 SE +/- 0.28, N = 3 201.87 202.88 200.78 202.72 202.81 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d e 100 200 300 400 500 SE +/- 3.07, N = 3 449.48 444.44 444.85 447.06 446.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d e 0.2642 0.5284 0.7926 1.0568 1.321 SE +/- 0.00364, N = 3 1.17405 1.17235 1.16879 1.17003 1.15972 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d e 0.941 1.882 2.823 3.764 4.705 SE +/- 0.03127, N = 3 4.07620 4.10040 4.12702 4.06632 4.18210 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d e 10 20 30 40 50 SE +/- 0.07, N = 3 39.83 39.91 39.92 42.71 39.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d e 20 40 60 80 100 SE +/- 0.50, N = 3 SE +/- 0.50, N = 3 84.93 86.20 83.45 85.40 85.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d e 1.2292 2.4584 3.6876 4.9168 6.146 SE +/- 0.05529, N = 6 SE +/- 0.00692, N = 3 5.24802 5.18948 5.18564 5.46322 5.17395 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d e 1.1983 2.3966 3.5949 4.7932 5.9915 SE +/- 0.02621, N = 3 SE +/- 0.01022, N = 3 5.24748 5.24914 5.32595 5.24590 5.22498 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d e 2 4 6 8 10 SE +/- 0.03518, N = 3 SE +/- 0.06114, N = 3 7.86110 7.95942 7.93481 7.86194 7.93399 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d e 4 8 12 16 20 SE +/- 0.08, N = 3 SE +/- 0.07, N = 3 14.07 14.12 13.97 14.17 13.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d e 16 32 48 64 80 SE +/- 0.26, N = 3 SE +/- 0.27, N = 3 72.55 72.03 71.87 71.69 72.96 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d e 30 60 90 120 150 SE +/- 0.73, N = 3 SE +/- 1.08, N = 9 129.64 130.12 124.41 115.89 127.80 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d e 2 4 6 8 10 SE +/- 0.07694, N = 3 SE +/- 0.01653, N = 3 6.19550 6.37911 6.34248 6.33978 6.34139 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d e 2 4 6 8 10 SE +/- 0.00827, N = 3 SE +/- 0.02289, N = 3 6.62484 6.60878 6.68270 6.65046 6.61099 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5