newa AMD Ryzen AI 9 HX 370 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and AMD Radeon 512MB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408236-NE-NEWA8391444&grs&rdt .
newa Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen AI 9 HX 370 @ 4.37GHz (12 Cores / 24 Threads) ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) AMD Device 1507 4 x 8GB LPDDR5-7500MT/s Samsung K3KL9L90CM-MGCT 1024GB MTFDKBA1T0QFM-1BD1AABGB AMD Radeon 512MB AMD Rembrandt Radeon HD Audio MEDIATEK Device 7925 Ubuntu 24.04 6.10.0-phx (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.3~git2407230600.74b4c9~oibaf~n (git-74b4c91 2024-07-23 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - amdgpu.dcdebugmask=0x600 - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
newa svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 5 - Bosphorus 4K onnx: yolov4 - CPU - Standard svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard svt-av1: Preset 5 - Bosphorus 1080p onnx: super-resolution-10 - CPU - Parallel onnx: ZFNet-512 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel svt-av1: Preset 3 - Bosphorus 1080p onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 13 - Beauty 4K 10-bit onnx: GPT-2 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel svt-av1: Preset 3 - Bosphorus 4K onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel svt-av1: Preset 5 - Beauty 4K 10-bit whisperfile: Tiny onnx: super-resolution-10 - CPU - Standard svt-av1: Preset 8 - Beauty 4K 10-bit onnx: GPT-2 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel whisperfile: Small whisperfile: Medium onnx: T5 Encoder - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Parallel a b c d 27.656 106.946 13.234 5.70412 103.496 440.225 0.517125 7.59373 42.754 68.0893 42.6632 0.921778 146.145 136.319 11.37 19.4691 0.473424 86.2463 200.226 0.543 10.675 122.886 1.13181 502.181 69.9235 3.309 23.5779 3.61793 10.9385 2.697 54.40568 70.9505 4.011 102.72 35.4456 5.0416 262.4125 765.35919 161.643 28.2094 42.4095 1933.76 2112.26 14.0923 14.6852 4.99245 14.2975 51.3606 91.4167 883.536 1084.85 1.99007 7.33407 131.682 198.343 6.18373 6.84002 11.5926 23.4363 175.308 276.395 8.12987 9.72552 32.393 120.17 14.347 4.89908 112.095 481.646 0.488444 6.66153 45.54 64.5072 37.5739 0.865881 129.186 131.268 12.116 19.1579 0.435521 78.8507 194.575 0.561 9.634 110.405 1.03114 510.917 65.8763 3.463 22.8422 3.3069 10.2217 2.601 56.77183 66.6108 3.854 95.5163 34.2971 4.72998 271.82741 782.83619 162.322 29.1542 43.775 2047.31 2296.09 15.0098 15.4995 5.13748 15.178 52.1952 97.826 969.782 1154.89 1.95616 7.61522 150.111 211.407 6.15789 7.73761 12.679 26.6113 204.116 302.388 9.04782 10.459 25.738 96.951 11.827 5.17554 94.108 413.091 0.452827 6.69892 40.065 59.9789 38.1591 0.812788 135.739 120.849 10.744 17.3137 0.421012 76.8775 179.429 0.503 9.588 118.035 1.02034 462.592 64.1998 3.151 22.033 3.58762 10.0325 2.488 58.68302 65.8213 3.729 97.8331 33.016 4.73966 277.32688 785.54988 160.706 30.2848 45.3831 2208.35 2375.22 15.1903 16.6696 5.57084 15.5736 57.7547 99.6709 980.058 1230.33 2.16038 8.27242 149.273 210.979 6.22044 7.36381 13.0048 26.2028 193.212 278.728 8.46444 10.2101 26.037 98.812 11.896 5.90294 97.178 430.237 0.517357 7.21263 43.018 68.1302 42.3636 0.902388 135.925 135.844 10.846 19.305 0.453065 84.9334 200.893 0.514 10.036 118.382 1.10029 506.738 70.6805 3.234 24.1733 3.55341 10.2254 2.533 55.21668 70.8415 3.906 99.0373 35.2048 4.88633 268.06159 761.26344 160.622 28.402 41.3647 1932.89 2207.18 14.1137 14.6757 4.97589 14.1424 51.7975 97.7898 908.846 1108.16 1.97224 7.35765 138.641 204.648 6.22279 7.35405 11.7716 23.602 169.404 281.414 8.43814 10.0859 OpenBenchmarking.org
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K a b c d 8 16 24 32 40 27.66 32.39 25.74 26.04 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K a b c d 30 60 90 120 150 106.95 120.17 96.95 98.81 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K a b c d 4 8 12 16 20 13.23 14.35 11.83 11.90 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d 1.3282 2.6564 3.9846 5.3128 6.641 5.70412 4.89908 5.17554 5.90294 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p a b c d 30 60 90 120 150 103.50 112.10 94.11 97.18 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p a b c d 100 200 300 400 500 440.23 481.65 413.09 430.24 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d 0.1164 0.2328 0.3492 0.4656 0.582 0.517125 0.488444 0.452827 0.517357 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 7.59373 6.66153 6.69892 7.21263 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p a b c d 10 20 30 40 50 42.75 45.54 40.07 43.02 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 15 30 45 60 75 68.09 64.51 59.98 68.13 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d 10 20 30 40 50 42.66 37.57 38.16 42.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 0.2074 0.4148 0.6222 0.8296 1.037 0.921778 0.865881 0.812788 0.902388 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d 30 60 90 120 150 146.15 129.19 135.74 135.93 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 30 60 90 120 150 136.32 131.27 120.85 135.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p a b c d 3 6 9 12 15 11.37 12.12 10.74 10.85 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 5 10 15 20 25 19.47 19.16 17.31 19.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d 0.1065 0.213 0.3195 0.426 0.5325 0.473424 0.435521 0.421012 0.453065 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 86.25 78.85 76.88 84.93 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 40 80 120 160 200 200.23 194.58 179.43 200.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit a b c d 0.1262 0.2524 0.3786 0.5048 0.631 0.543 0.561 0.503 0.514 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b c d 3 6 9 12 15 10.675 9.634 9.588 10.036 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 122.89 110.41 118.04 118.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 0.2547 0.5094 0.7641 1.0188 1.2735 1.13181 1.03114 1.02034 1.10029 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 110 220 330 440 550 502.18 510.92 462.59 506.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 16 32 48 64 80 69.92 65.88 64.20 70.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K a b c d 0.7792 1.5584 2.3376 3.1168 3.896 3.309 3.463 3.151 3.234 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 6 12 18 24 30 23.58 22.84 22.03 24.17 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 0.814 1.628 2.442 3.256 4.07 3.61793 3.30690 3.58762 3.55341 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 10.94 10.22 10.03 10.23 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit a b c d 0.6068 1.2136 1.8204 2.4272 3.034 2.697 2.601 2.488 2.533 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny a b c d 13 26 39 52 65 54.41 56.77 58.68 55.22
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 16 32 48 64 80 70.95 66.61 65.82 70.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit a b c d 0.9025 1.805 2.7075 3.61 4.5125 4.011 3.854 3.729 3.906 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 102.72 95.52 97.83 99.04 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 8 16 24 32 40 35.45 34.30 33.02 35.20 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 1.1344 2.2688 3.4032 4.5376 5.672 5.04160 4.72998 4.73966 4.88633 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small a b c d 60 120 180 240 300 262.41 271.83 277.33 268.06
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium a b c d 200 400 600 800 1000 765.36 782.84 785.55 761.26
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d 40 80 120 160 200 161.64 162.32 160.71 160.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c d 7 14 21 28 35 28.21 29.15 30.28 28.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a b c d 10 20 30 40 50 42.41 43.78 45.38 41.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d 500 1000 1500 2000 2500 1933.76 2047.31 2208.35 1932.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a b c d 500 1000 1500 2000 2500 2112.26 2296.09 2375.22 2207.18 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c d 4 8 12 16 20 14.09 15.01 15.19 14.11 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 4 8 12 16 20 14.69 15.50 16.67 14.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c d 1.2534 2.5068 3.7602 5.0136 6.267 4.99245 5.13748 5.57084 4.97589 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a b c d 4 8 12 16 20 14.30 15.18 15.57 14.14 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c d 13 26 39 52 65 51.36 52.20 57.75 51.80 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b c d 20 40 60 80 100 91.42 97.83 99.67 97.79 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c d 200 400 600 800 1000 883.54 969.78 980.06 908.85 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b c d 300 600 900 1200 1500 1084.85 1154.89 1230.33 1108.16 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c d 0.4861 0.9722 1.4583 1.9444 2.4305 1.99007 1.95616 2.16038 1.97224 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 7.33407 7.61522 8.27242 7.35765 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 131.68 150.11 149.27 138.64 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b c d 50 100 150 200 250 198.34 211.41 210.98 204.65 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 6.18373 6.15789 6.22044 6.22279 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel a b c d 2 4 6 8 10 6.84002 7.73761 7.36381 7.35405 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 11.59 12.68 13.00 11.77 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel a b c d 6 12 18 24 30 23.44 26.61 26.20 23.60 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard a b c d 40 80 120 160 200 175.31 204.12 193.21 169.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel a b c d 70 140 210 280 350 276.40 302.39 278.73 281.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 8.12987 9.04782 8.46444 8.43814 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel a b c d 3 6 9 12 15 9.72552 10.45900 10.21010 10.08590 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5