onnx 119 AMD Ryzen Threadripper 7980X 64-Cores testing with a System76 Thelio Major (FA Z5 BIOS) and AMD Radeon Pro W7900 on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408227-PTS-ONNX119192&sor&grr .
onnx 119 Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d e AMD Ryzen Threadripper 7980X 64-Cores @ 7.79GHz (64 Cores / 128 Threads) System76 Thelio Major (FA Z5 BIOS) AMD Device 14a4 4 x 32GB DDR5-4800MT/s Micron MTC20F1045S1RC48BA2 1000GB CT1000T700SSD5 AMD Radeon Pro W7900 AMD Device 14cc DELL P2415Q Aquantia AQC113C NBase-T/IEEE + Realtek RTL8125 2.5GbE + Intel Wi-Fi 6E Ubuntu 24.10 6.8.0-31-generic (x86_64) GNOME Shell X Server + Wayland 4.6 Mesa 24.0.9-0ubuntu2 (LLVM 17.0.6 DRM 3.57) GCC 14.2.0 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-F5tscv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-F5tscv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105 Python Details - Python 3.12.5 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx 119 svt-av1: Preset 3 - Beauty 4K 10-bit onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard svt-av1: Preset 3 - Bosphorus 4K svt-av1: Preset 5 - Beauty 4K 10-bit onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel svt-av1: Preset 8 - Beauty 4K 10-bit svt-av1: Preset 3 - Bosphorus 1080p onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard svt-av1: Preset 13 - Beauty 4K 10-bit onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel svt-av1: Preset 5 - Bosphorus 4K svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p a b c d e 1.864 908.296 1.10358 257.990 3.92686 13.398 7.775 12.2268 81.7752 189.677 5.27246 78.9209 12.6730 34.8288 28.7259 4.88925 204.479 147.814 6.76854 10.844 36.605 9.94497 100.5927 114.152 8.76007 19.520 621.226 1.60987 372.789 2.68248 6.21628 160.706 18.7297 53.4068 9.51832 105.039 2.97938 335.503 7.09442 140.958 79.3400 12.6039 26.0803 38.3476 22.4415 44.5600 2.19841 454.723 13.4748 74.2061 4.97187 201.103 7.61417 131.313 45.614 119.221 96.210 231.111 271.676 740.001 1.857 913.539 1.09825 252.335 4.00404 13.350 7.801 12.2939 81.4158 195.128 5.12789 78.5721 12.7377 34.6450 28.8766 4.99490 200.840 145.784 6.86262 10.827 36.553 9.93654 100.6830 114.827 8.71251 19.417 625.060 1.59999 370.177 2.70145 6.18911 161.407 18.3266 54.5779 9.45128 105.791 2.94936 338.928 7.03694 142.114 80.1594 12.4756 27.0582 36.9658 22.4918 44.4566 2.17339 460.016 13.4482 74.3624 4.96437 201.424 7.66242 130.489 45.343 119.229 96.423 233.248 269.879 740.790 1.867 913.494 1.09469 291.444 3.43115 13.53 7.811 12.1186 82.4953 188.5 5.30486 75.8716 13.1796 36.1605 27.6526 5.61236 178.126 144.559 6.9174 10.852 36.664 9.91284 100.876 110.418 9.05619 19.396 617.6 1.61916 371.599 2.69105 6.15989 162.175 19.7054 50.7421 9.46853 105.59 2.9469 339.175 7.08004 141.231 80.7021 12.3908 25.7836 38.7816 22.7032 44.0419 2.26126 442.117 13.6746 73.1208 5.14489 194.343 7.6459 130.766 45.662 119.768 96.06 232.396 266.006 764.003 1.874 873.467 1.14486 288.124 3.47069 13.381 7.809 12.2426 81.6673 199.707 5.00717 76.9365 12.9971 36.2111 27.6139 4.69742 212.81 140.724 7.10593 10.873 36.707 9.52165 105.021 113.726 8.79279 19.442 630.805 1.58526 374.577 2.66966 6.17185 161.862 18.4128 54.304 9.50891 105.146 2.97608 335.859 7.23734 138.164 79.2916 12.6112 25.5012 39.2101 22.0783 45.2883 2.38355 419.421 13.6088 73.4749 4.90503 203.835 7.5618 132.221 45.165 118.798 97.046 228.834 269.428 783.122 1.867 889.231 1.12456 276.923 3.61108 13.413 7.765 12.7923 78.1552 195.366 5.11842 78.4717 12.7428 36.0242 27.7572 4.74627 210.622 142.422 7.02121 10.965 36.456 10.0592 99.4083 110.033 9.08788 19.441 621.076 1.6101 371.175 2.69413 6.15756 162.245 18.3751 54.4155 9.28112 107.723 2.95124 338.703 7.21415 138.605 77.8483 12.845 27.2604 36.68 22.668 44.1099 2.27019 440.354 13.4522 74.3302 4.93285 202.692 7.61659 131.269 45.41 119.923 96.775 228.97 268.76 749.257 OpenBenchmarking.org
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit d e c a b 0.4217 0.8434 1.2651 1.6868 2.1085 SE +/- 0.002, N = 3 SE +/- 0.004, N = 3 1.874 1.867 1.867 1.864 1.857 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel d e a c b 200 400 600 800 1000 SE +/- 11.68, N = 15 SE +/- 13.79, N = 15 873.47 889.23 908.30 913.49 913.54 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel d e a b c 0.2576 0.5152 0.7728 1.0304 1.288 SE +/- 0.01462, N = 15 SE +/- 0.01709, N = 15 1.14486 1.12456 1.10358 1.09825 1.09469 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard b a e d c 60 120 180 240 300 SE +/- 6.89, N = 15 SE +/- 7.70, N = 15 252.34 257.99 276.92 288.12 291.44 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard b a e d c 0.9009 1.8018 2.7027 3.6036 4.5045 SE +/- 0.10773, N = 15 SE +/- 0.12199, N = 15 4.00404 3.92686 3.61108 3.47069 3.43115 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K c e a d b 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 13.53 13.41 13.40 13.38 13.35 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit c d b a e 2 4 6 8 10 SE +/- 0.008, N = 3 SE +/- 0.029, N = 3 7.811 7.809 7.801 7.775 7.765 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard c a d b e 3 6 9 12 15 SE +/- 0.06, N = 3 SE +/- 0.11, N = 15 12.12 12.23 12.24 12.29 12.79 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard c a d b e 20 40 60 80 100 SE +/- 0.40, N = 3 SE +/- 0.72, N = 15 82.50 81.78 81.67 81.42 78.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel c a b e d 40 80 120 160 200 SE +/- 1.34, N = 3 SE +/- 1.30, N = 15 188.50 189.68 195.13 195.37 199.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel c a b e d 1.1936 2.3872 3.5808 4.7744 5.968 SE +/- 0.03706, N = 3 SE +/- 0.03454, N = 15 5.30486 5.27246 5.12789 5.11842 5.00717 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard c d e b a 20 40 60 80 100 SE +/- 0.63, N = 15 SE +/- 0.83, N = 3 75.87 76.94 78.47 78.57 78.92 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard c d e b a 3 6 9 12 15 SE +/- 0.10, N = 15 SE +/- 0.13, N = 3 13.18 13.00 12.74 12.74 12.67 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel b a e c d 8 16 24 32 40 SE +/- 0.39, N = 5 SE +/- 0.26, N = 11 34.65 34.83 36.02 36.16 36.21 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel b a e c d 7 14 21 28 35 SE +/- 0.32, N = 5 SE +/- 0.21, N = 11 28.88 28.73 27.76 27.65 27.61 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel d e a b c 1.2628 2.5256 3.7884 5.0512 6.314 SE +/- 0.03038, N = 3 SE +/- 0.09195, N = 12 4.69742 4.74627 4.88925 4.99490 5.61236 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel d e a b c 50 100 150 200 250 SE +/- 1.28, N = 3 SE +/- 3.47, N = 12 212.81 210.62 204.48 200.84 178.13 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel d e c b a 30 60 90 120 150 SE +/- 1.59, N = 5 SE +/- 1.16, N = 9 140.72 142.42 144.56 145.78 147.81 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel d e c b a 2 4 6 8 10 SE +/- 0.07644, N = 5 SE +/- 0.05485, N = 9 7.10593 7.02121 6.91740 6.86262 6.76854 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit e d c a b 3 6 9 12 15 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 10.97 10.87 10.85 10.84 10.83 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p d c a b e 8 16 24 32 40 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 36.71 36.66 36.61 36.55 36.46 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard d c b a e 3 6 9 12 15 SE +/- 0.10599, N = 5 SE +/- 0.08968, N = 6 9.52165 9.91284 9.93654 9.94497 10.05920 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard d c b a e 20 40 60 80 100 SE +/- 1.11, N = 5 SE +/- 0.94, N = 6 105.02 100.88 100.68 100.59 99.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard e c d a b 30 60 90 120 150 SE +/- 0.25, N = 3 SE +/- 1.43, N = 4 110.03 110.42 113.73 114.15 114.83 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard e c d a b 3 6 9 12 15 SE +/- 0.01899, N = 3 SE +/- 0.10839, N = 4 9.08788 9.05619 8.79279 8.76007 8.71251 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a d e b c 5 10 15 20 25 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 19.52 19.44 19.44 19.42 19.40 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel c e a b d 140 280 420 560 700 SE +/- 4.43, N = 3 SE +/- 4.33, N = 3 617.60 621.08 621.23 625.06 630.81 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel c e a b d 0.3643 0.7286 1.0929 1.4572 1.8215 SE +/- 0.01149, N = 3 SE +/- 0.01116, N = 3 1.61916 1.61010 1.60987 1.59999 1.58526 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard b e c a d 80 160 240 320 400 SE +/- 1.22, N = 3 SE +/- 0.46, N = 3 370.18 371.18 371.60 372.79 374.58 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard b e c a d 0.6078 1.2156 1.8234 2.4312 3.039 SE +/- 0.00886, N = 3 SE +/- 0.00331, N = 3 2.70145 2.69413 2.69105 2.68248 2.66966 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel e c d b a 2 4 6 8 10 SE +/- 0.01421, N = 3 SE +/- 0.00960, N = 3 6.15756 6.15989 6.17185 6.18911 6.21628 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel e c d b a 40 80 120 160 200 SE +/- 0.37, N = 3 SE +/- 0.25, N = 3 162.25 162.18 161.86 161.41 160.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel b e d a c 5 10 15 20 25 SE +/- 0.25, N = 3 SE +/- 0.27, N = 3 18.33 18.38 18.41 18.73 19.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel b e d a c 12 24 36 48 60 SE +/- 0.73, N = 3 SE +/- 0.75, N = 3 54.58 54.42 54.30 53.41 50.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard e b c d a 3 6 9 12 15 SE +/- 0.05188, N = 3 SE +/- 0.01924, N = 3 9.28112 9.45128 9.46853 9.50891 9.51832 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard e b c d a 20 40 60 80 100 SE +/- 0.58, N = 3 SE +/- 0.21, N = 3 107.72 105.79 105.59 105.15 105.04 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel c b e d a 0.6704 1.3408 2.0112 2.6816 3.352 SE +/- 0.01351, N = 3 SE +/- 0.00618, N = 3 2.94690 2.94936 2.95124 2.97608 2.97938 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel c b e d a 70 140 210 280 350 SE +/- 1.56, N = 3 SE +/- 0.69, N = 3 339.18 338.93 338.70 335.86 335.50 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard b c a e d 2 4 6 8 10 SE +/- 0.05397, N = 3 SE +/- 0.04465, N = 3 7.03694 7.08004 7.09442 7.21415 7.23734 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard b c a e d 30 60 90 120 150 SE +/- 1.08, N = 3 SE +/- 0.88, N = 3 142.11 141.23 140.96 138.61 138.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel e d a b c 20 40 60 80 100 SE +/- 0.34, N = 3 SE +/- 0.51, N = 3 77.85 79.29 79.34 80.16 80.70 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel e d a b c 3 6 9 12 15 SE +/- 0.05, N = 3 SE +/- 0.08, N = 3 12.85 12.61 12.60 12.48 12.39 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard d c a b e 6 12 18 24 30 SE +/- 0.27, N = 3 SE +/- 0.33, N = 3 25.50 25.78 26.08 27.06 27.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard d c a b e 9 18 27 36 45 SE +/- 0.40, N = 3 SE +/- 0.45, N = 3 39.21 38.78 38.35 36.97 36.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard d a b e c 5 10 15 20 25 SE +/- 0.16, N = 3 SE +/- 0.07, N = 3 22.08 22.44 22.49 22.67 22.70 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard d a b e c 10 20 30 40 50 SE +/- 0.33, N = 3 SE +/- 0.14, N = 3 45.29 44.56 44.46 44.11 44.04 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard b a c e d 0.5363 1.0726 1.6089 2.1452 2.6815 SE +/- 0.01799, N = 3 SE +/- 0.00686, N = 3 2.17339 2.19841 2.26126 2.27019 2.38355 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard b a c e d 100 200 300 400 500 SE +/- 3.83, N = 3 SE +/- 1.43, N = 3 460.02 454.72 442.12 440.35 419.42 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel b e a d c 4 8 12 16 20 SE +/- 0.13, N = 3 SE +/- 0.04, N = 3 13.45 13.45 13.47 13.61 13.67 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel b e a d c 20 40 60 80 100 SE +/- 0.72, N = 3 SE +/- 0.25, N = 3 74.36 74.33 74.21 73.47 73.12 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard d e b a c 1.1576 2.3152 3.4728 4.6304 5.788 SE +/- 0.03731, N = 3 SE +/- 0.02497, N = 3 4.90503 4.93285 4.96437 4.97187 5.14489 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard d e b a c 40 80 120 160 200 SE +/- 1.52, N = 3 SE +/- 1.01, N = 3 203.84 202.69 201.42 201.10 194.34 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel d a e c b 2 4 6 8 10 SE +/- 0.01157, N = 3 SE +/- 0.02614, N = 3 7.56180 7.61417 7.61659 7.64590 7.66242 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel d a e c b 30 60 90 120 150 SE +/- 0.20, N = 3 SE +/- 0.45, N = 3 132.22 131.31 131.27 130.77 130.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K c a e b d 10 20 30 40 50 SE +/- 0.18, N = 3 SE +/- 0.22, N = 3 45.66 45.61 45.41 45.34 45.17 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p e c b a d 30 60 90 120 150 SE +/- 0.35, N = 3 SE +/- 0.52, N = 3 119.92 119.77 119.23 119.22 118.80 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K d e b a c 20 40 60 80 100 SE +/- 0.79, N = 3 SE +/- 0.26, N = 3 97.05 96.78 96.42 96.21 96.06 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K b c a e d 50 100 150 200 250 SE +/- 2.31, N = 3 SE +/- 1.99, N = 8 233.25 232.40 231.11 228.97 228.83 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b d e c 60 120 180 240 300 SE +/- 0.46, N = 3 SE +/- 0.65, N = 3 271.68 269.88 269.43 268.76 266.01 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p d c e b a 200 400 600 800 1000 SE +/- 8.40, N = 4 SE +/- 7.99, N = 4 783.12 764.00 749.26 740.79 740.00 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Phoronix Test Suite v10.8.5