new satty AMD Ryzen AI 9 365 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and AMD Radeon 512MB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408252-NE-NEWSATTY701&sor&grw .
new satty Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen AI 9 365 @ 4.31GHz (10 Cores / 20 Threads) ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) AMD Device 1507 4 x 6GB LPDDR5-7500MT/s Micron MT62F1536M32D4DS-026 1024GB MTFDKBA1T0QFM-1BD1AABGB AMD Radeon 512MB AMD Rembrandt Radeon HD Audio MEDIATEK Device 7925 Ubuntu 24.04 6.10.0-phx (x86_64) GNOME Shell 46.0 X Server + Wayland 4.6 Mesa 24.3~git2407280600.a211a5~oibaf~n (git-a211a51 2024-07-28 noble-oibaf-ppa) (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - amdgpu.dcdebugmask=0x600 - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced Python Details - Python 3.12.3 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
new satty whisperfile: Tiny whisperfile: Small whisperfile: Medium onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard svt-av1: Preset 3 - Bosphorus 4K svt-av1: Preset 5 - Bosphorus 4K svt-av1: Preset 8 - Bosphorus 4K svt-av1: Preset 13 - Bosphorus 4K svt-av1: Preset 3 - Bosphorus 1080p svt-av1: Preset 5 - Bosphorus 1080p svt-av1: Preset 8 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p svt-av1: Preset 3 - Beauty 4K 10-bit svt-av1: Preset 5 - Beauty 4K 10-bit svt-av1: Preset 8 - Beauty 4K 10-bit svt-av1: Preset 13 - Beauty 4K 10-bit simdjson: Kostya simdjson: TopTweet simdjson: LargeRand simdjson: PartialTweets simdjson: DistinctUserID onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: ZFNet-512 - CPU - Parallel onnx: ZFNet-512 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: ResNet101_DUC_HDC-12 - CPU - Parallel onnx: ResNet101_DUC_HDC-12 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard a b c d 52.71359 259.90777 754.78056 96.8538 130.748 4.36533 6.48303 42.5418 93.206 118.516 166.699 5.52731 8.60423 135.85 565.244 0.892606 1.16523 11.5819 22.927 74.6918 216.667 70.1183 80.9921 0.465907 0.534553 26.2276 39.7871 3.728 14.781 30.826 113.827 11.911 43.914 106.46 474.771 0.568 2.573 3.694 6.164 4.51 6.88 1.25 8.74 7.09 10.3168 7.6417 229.072 154.246 23.5033 10.7266 8.43551 5.99672 180.913 116.218 7.35929 1.76819 1120.31 858.199 86.3383 43.6148 13.3853 4.61373 14.2597 12.3454 2146.34 1870.72 38.1251 25.1311 52.63452 261.74247 751.4815 95.5885 128.263 4.15031 6.23528 41.976 91.4483 120.311 167.567 5.50029 8.48368 139.674 565.483 0.868291 1.15557 11.7777 22.9872 75.9022 222.382 69.1277 75.8635 0.445551 0.533521 26.4566 39.4192 3.491 13.308 29.745 109.45 11.437 41.951 100.454 455.785 0.565 2.557 3.606 6.158 4.42 6.91 1.25 6.74 6.91 10.4521 7.78951 240.94 160.374 23.8202 10.9323 8.30904 5.96555 181.802 117.87 7.1579 1.76734 1151.68 865.312 84.9029 43.5 13.1725 4.49478 14.4642 13.1798 2244.4 1874.34 37.795 25.3656 52.49648 257.37884 722.53519 97.0868 127.426 4.36559 6.92476 44.791 91.3322 118.211 168.371 5.8995 8.99805 144.829 572.861 0.895635 1.22009 11.6564 23.7636 77.3006 226.04 68.938 78.9522 0.456424 0.526244 26.1799 39.7086 3.796 14.881 31.636 116.143 12.089 44.658 107.482 476.223 0.573 2.63 3.708 6.135 4.41 6.94 1.25 6.59 6.9 10.2922 7.84183 229.058 144.406 22.3234 10.9462 8.45631 5.93691 169.5 111.131 6.90258 1.74432 1116.52 819.557 85.7866 42.0785 12.9344 4.42207 14.5036 12.6641 2190.94 1900.25 38.1944 25.1788 54.71240 269.17700 752.93123 94.3693 120.741 3.95583 5.89878 42.5568 83.0327 117.354 164.449 5.42253 7.96736 139.865 519.453 0.834423 1.06101 11.1720 19.7127 72.5395 193.668 64.0751 71.9372 0.420335 0.471409 25.7982 38.0561 3.410 12.942 29.258 112.326 11.564 42.091 101.169 460.254 0.550 2.481 3.523 6.143 4.46 7.02 1.24 6.83 7.04 10.5900 8.27490 252.883 169.616 23.4969 12.0469 8.52036 6.07861 184.512 125.545 7.14811 1.92558 1198.57 942.938 89.5086 50.7730 13.7852 5.16479 15.6076 13.9008 2379.56 2122.37 38.7603 26.2808 OpenBenchmarking.org
Whisperfile Model Size: Tiny OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Tiny c b a d 12 24 36 48 60 SE +/- 0.38, N = 3 52.50 52.63 52.71 54.71
Whisperfile Model Size: Small OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Small c a b d 60 120 180 240 300 SE +/- 3.09, N = 3 257.38 259.91 261.74 269.18
Whisperfile Model Size: Medium OpenBenchmarking.org Seconds, Fewer Is Better Whisperfile 20Aug24 Model Size: Medium c b d a 160 320 480 640 800 SE +/- 2.92, N = 3 722.54 751.48 752.93 754.78
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel c a b d 20 40 60 80 100 SE +/- 0.87, N = 3 97.09 96.85 95.59 94.37 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 30 60 90 120 150 SE +/- 0.43, N = 3 130.75 128.26 127.43 120.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel c a b d 0.9823 1.9646 2.9469 3.9292 4.9115 SE +/- 0.05493, N = 3 4.36559 4.36533 4.15031 3.95583 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard c a b d 2 4 6 8 10 SE +/- 0.04221, N = 12 6.92476 6.48303 6.23528 5.89878 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel c d a b 10 20 30 40 50 SE +/- 0.27, N = 3 44.79 42.56 42.54 41.98 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d 20 40 60 80 100 SE +/- 0.57, N = 12 93.21 91.45 91.33 83.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel b a c d 30 60 90 120 150 SE +/- 1.16, N = 3 120.31 118.52 118.21 117.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard c b a d 40 80 120 160 200 SE +/- 0.63, N = 3 168.37 167.57 166.70 164.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel c a b d 1.3274 2.6548 3.9822 5.3096 6.637 SE +/- 0.03573, N = 14 5.89950 5.52731 5.50029 5.42253 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard c a b d 3 6 9 12 15 SE +/- 0.09828, N = 3 8.99805 8.60423 8.48368 7.96736 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel c d b a 30 60 90 120 150 SE +/- 0.82, N = 3 144.83 139.87 139.67 135.85 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard c b a d 120 240 360 480 600 SE +/- 4.51, N = 12 572.86 565.48 565.24 519.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel c a b d 0.2015 0.403 0.6045 0.806 1.0075 SE +/- 0.006409, N = 3 0.895635 0.892606 0.868291 0.834423 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c a b d 0.2745 0.549 0.8235 1.098 1.3725 SE +/- 0.01056, N = 6 1.22009 1.16523 1.15557 1.06101 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel b c a d 3 6 9 12 15 SE +/- 0.04, N = 3 11.78 11.66 11.58 11.17 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard c b a d 6 12 18 24 30 SE +/- 0.16, N = 15 23.76 22.99 22.93 19.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel c b a d 20 40 60 80 100 SE +/- 0.61, N = 3 77.30 75.90 74.69 72.54 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard c b a d 50 100 150 200 250 SE +/- 1.31, N = 15 226.04 222.38 216.67 193.67 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 16 32 48 64 80 SE +/- 0.67, N = 3 70.12 69.13 68.94 64.08 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a c b d 20 40 60 80 100 SE +/- 0.54, N = 3 80.99 78.95 75.86 71.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a c b d 0.1048 0.2096 0.3144 0.4192 0.524 SE +/- 0.004361, N = 3 0.465907 0.456424 0.445551 0.420335 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d 0.1203 0.2406 0.3609 0.4812 0.6015 SE +/- 0.003579, N = 10 0.534553 0.533521 0.526244 0.471409 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel b a c d 6 12 18 24 30 SE +/- 0.09, N = 3 26.46 26.23 26.18 25.80 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a c b d 9 18 27 36 45 SE +/- 0.45, N = 3 39.79 39.71 39.42 38.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 4K c a b d 0.8541 1.7082 2.5623 3.4164 4.2705 SE +/- 0.037, N = 9 3.796 3.728 3.491 3.410 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 4K c a b d 4 8 12 16 20 SE +/- 0.15, N = 3 14.88 14.78 13.31 12.94 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 4K c a b d 7 14 21 28 35 SE +/- 0.19, N = 15 31.64 30.83 29.75 29.26 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 4K c a d b 30 60 90 120 150 SE +/- 0.95, N = 3 116.14 113.83 112.33 109.45 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Bosphorus 1080p c a d b 3 6 9 12 15 SE +/- 0.05, N = 3 12.09 11.91 11.56 11.44 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Bosphorus 1080p c a d b 10 20 30 40 50 SE +/- 0.37, N = 8 44.66 43.91 42.09 41.95 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Bosphorus 1080p c a d b 20 40 60 80 100 SE +/- 0.41, N = 3 107.48 106.46 101.17 100.45 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Bosphorus 1080p c a d b 100 200 300 400 500 SE +/- 3.20, N = 3 476.22 474.77 460.25 455.79 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 3 - Input: Beauty 4K 10-bit c a b d 0.1289 0.2578 0.3867 0.5156 0.6445 SE +/- 0.004, N = 3 0.573 0.568 0.565 0.550 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit c a b d 0.5918 1.1836 1.7754 2.3672 2.959 SE +/- 0.016, N = 3 2.630 2.573 2.557 2.481 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit c a b d 0.8343 1.6686 2.5029 3.3372 4.1715 SE +/- 0.010, N = 3 3.708 3.694 3.606 3.523 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 2.2 Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit a b d c 2 4 6 8 10 SE +/- 0.006, N = 3 6.164 6.158 6.143 6.135 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
simdjson Throughput Test: Kostya OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: Kostya a d b c 1.0148 2.0296 3.0444 4.0592 5.074 SE +/- 0.02, N = 3 4.51 4.46 4.42 4.41 1. (CXX) g++ options: -O3 -lrt
simdjson Throughput Test: TopTweet OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: TopTweet d c b a 2 4 6 8 10 SE +/- 0.02, N = 3 7.02 6.94 6.91 6.88 1. (CXX) g++ options: -O3 -lrt
simdjson Throughput Test: LargeRandom OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: LargeRandom c b a d 0.2813 0.5626 0.8439 1.1252 1.4065 SE +/- 0.00, N = 3 1.25 1.25 1.25 1.24 1. (CXX) g++ options: -O3 -lrt
simdjson Throughput Test: PartialTweets OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: PartialTweets a d b c 2 4 6 8 10 SE +/- 0.02, N = 3 8.74 6.83 6.74 6.59 1. (CXX) g++ options: -O3 -lrt
simdjson Throughput Test: DistinctUserID OpenBenchmarking.org GB/s, More Is Better simdjson 3.10 Throughput Test: DistinctUserID a d b c 2 4 6 8 10 SE +/- 0.08, N = 3 7.09 7.04 6.91 6.90 1. (CXX) g++ options: -O3 -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel c a b d 3 6 9 12 15 SE +/- 0.10, N = 3 10.29 10.32 10.45 10.59 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard a b c d 2 4 6 8 10 SE +/- 0.02926, N = 3 7.64170 7.78951 7.84183 8.27490 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel c a b d 60 120 180 240 300 SE +/- 3.53, N = 3 229.06 229.07 240.94 252.88 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard c a b d 40 80 120 160 200 SE +/- 1.19, N = 12 144.41 154.25 160.37 169.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel c d a b 6 12 18 24 30 SE +/- 0.15, N = 3 22.32 23.50 23.50 23.82 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ZFNet-512 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard a b c d 3 6 9 12 15 SE +/- 0.08, N = 12 10.73 10.93 10.95 12.05 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel b a c d 2 4 6 8 10 SE +/- 0.08432, N = 3 8.30904 8.43551 8.45631 8.52036 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard c b a d 2 4 6 8 10 SE +/- 0.02319, N = 3 5.93691 5.96555 5.99672 6.07861 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel c a b d 40 80 120 160 200 SE +/- 1.19, N = 14 169.50 180.91 181.80 184.51 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard c a b d 30 60 90 120 150 SE +/- 1.55, N = 3 111.13 116.22 117.87 125.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel c d b a 2 4 6 8 10 SE +/- 0.04235, N = 3 6.90258 7.14811 7.15790 7.35929 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard c b a d 0.4333 0.8666 1.2999 1.7332 2.1665 SE +/- 0.01672, N = 12 1.74432 1.76734 1.76819 1.92558 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel c a b d 300 600 900 1200 1500 SE +/- 9.16, N = 3 1116.52 1120.31 1151.68 1198.57 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard c a b d 200 400 600 800 1000 SE +/- 9.14, N = 6 819.56 858.20 865.31 942.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel b c a d 20 40 60 80 100 SE +/- 0.32, N = 3 84.90 85.79 86.34 89.51 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard c b a d 11 22 33 44 55 SE +/- 0.41, N = 15 42.08 43.50 43.61 50.77 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel c b a d 4 8 12 16 20 SE +/- 0.12, N = 3 12.93 13.17 13.39 13.79 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard c b a d 1.1621 2.3242 3.4863 4.6484 5.8105 SE +/- 0.03387, N = 15 4.42207 4.49478 4.61373 5.16479 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel a b c d 4 8 12 16 20 SE +/- 0.16, N = 3 14.26 14.46 14.50 15.61 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard a c b d 4 8 12 16 20 SE +/- 0.11, N = 3 12.35 12.66 13.18 13.90 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel a c b d 500 1000 1500 2000 2500 SE +/- 24.44, N = 3 2146.34 2190.94 2244.40 2379.56 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard a b c d 500 1000 1500 2000 2500 SE +/- 15.92, N = 10 1870.72 1874.34 1900.25 2122.37 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel b a c d 9 18 27 36 45 SE +/- 0.13, N = 3 37.80 38.13 38.19 38.76 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a c b d 6 12 18 24 30 SE +/- 0.31, N = 3 25.13 25.18 25.37 26.28 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5