onnx tr Tests for a future article. AMD Ryzen Threadripper PRO 5965WX 24-Cores testing with a ASUS Pro WS WRX80E-SAGE SE WIFI (1201 BIOS) and ASUS NVIDIA NV106 2GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402042-NE-ONNXTR51458&gru&rdt .
onnx tr Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c AMD Ryzen Threadripper PRO 5965WX 24-Cores @ 3.80GHz (24 Cores / 48 Threads) ASUS Pro WS WRX80E-SAGE SE WIFI (1201 BIOS) AMD Starship/Matisse 8 x 16GB DDR4-2133MT/s Corsair CMK32GX4M2E3200C16 2048GB SOLIDIGM SSDPFKKW020X7 ASUS NVIDIA NV106 2GB AMD Starship/Matisse VA2431 2 x Intel X550 + Intel Wi-Fi 6 AX200 Ubuntu 23.10 6.5.0-13-generic (x86_64) GNOME Shell 45.0 X Server + Wayland nouveau 4.3 Mesa 23.2.1-1ubuntu3 GCC 13.2.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa008205 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx tr onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 131.026 137.643 9.54753 10.3402 181.528 201.076 12.5316 13.0918 687.278 794.133 1.49191 2.33114 29.9463 29.0392 233.825 263.28 121.569 99.8424 30.7052 35.0904 7.62456 7.26082 104.735 96.707 5.5073 4.97262 79.7953 76.3802 1.4533 1.25873 670.278 428.971 33.3916 34.4339 4.27552 3.79772 8.22445 10.0152 32.5651 28.4942 137.542 145.105 9.15942 9.1818 173.722 197.039 12.7284 11.9124 686.138 662.193 1.44815 2.32498 29.2791 32.674 233.757 254.395 119.333 95.0154 31.137 36.4098 7.26342 6.8871 109.173 108.907 5.75502 5.07413 78.5612 83.943 1.45579 1.50952 690.533 430.108 34.1523 30.6032 4.27661 3.93035 8.37853 10.524 32.1133 27.4625 131.261 146.780 9.49134 9.70516 176.896 193.937 12.4706 13.3462 692.180 763.035 1.48414 2.95286 30.4140 32.2767 236.736 275.560 123.175 7.61116 6.86398 105.355 103.6459 5.65170 5.15764 80.2103 75.7265 1.44343 1.31876 673.788 347.232 32.8852 31.1380 4.22286 3.64067 8.11762 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 30 60 90 120 150 SE +/- 0.66, N = 3 131.03 137.54 131.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 30 60 90 120 150 SE +/- 3.42, N = 15 137.64 145.11 146.78 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 3 6 9 12 15 SE +/- 0.00732, N = 3 9.54753 9.15942 9.49134 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 3 6 9 12 15 SE +/- 0.19825, N = 15 10.34020 9.18180 9.70516 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 40 80 120 160 200 SE +/- 0.64, N = 3 181.53 173.72 176.90 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 40 80 120 160 200 SE +/- 2.01, N = 5 201.08 197.04 193.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 3 6 9 12 15 SE +/- 0.16, N = 3 12.53 12.73 12.47 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 3 6 9 12 15 SE +/- 0.37, N = 15 13.09 11.91 13.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 150 300 450 600 750 SE +/- 8.43, N = 3 687.28 686.14 692.18 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 200 400 600 800 1000 SE +/- 15.86, N = 15 794.13 662.19 763.04 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 0.3357 0.6714 1.0071 1.3428 1.6785 SE +/- 0.00223, N = 3 1.49191 1.44815 1.48414 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 0.6644 1.3288 1.9932 2.6576 3.322 SE +/- 0.11743, N = 15 2.33114 2.32498 2.95286 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 7 14 21 28 35 SE +/- 0.32, N = 3 29.95 29.28 30.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 8 16 24 32 40 SE +/- 0.61, N = 15 29.04 32.67 32.28 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 50 100 150 200 250 SE +/- 0.73, N = 3 233.83 233.76 236.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 60 120 180 240 300 SE +/- 4.77, N = 12 263.28 254.40 275.56 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 30 60 90 120 150 SE +/- 0.59, N = 3 121.57 119.33 123.18 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS 20 40 60 80 100 99.84 95.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS 7 14 21 28 35 30.71 31.14 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS 8 16 24 32 40 35.09 36.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 2 4 6 8 10 SE +/- 0.03837, N = 3 7.62456 7.26342 7.61116 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 2 4 6 8 10 SE +/- 0.16982, N = 15 7.26082 6.88710 6.86398 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 20 40 60 80 100 SE +/- 0.08, N = 3 104.74 109.17 105.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 20 40 60 80 100 SE +/- 2.14, N = 15 96.71 108.91 103.65 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 1.2949 2.5898 3.8847 5.1796 6.4745 SE +/- 0.02036, N = 3 5.50730 5.75502 5.65170 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 1.1605 2.321 3.4815 4.642 5.8025 SE +/- 0.05411, N = 5 4.97262 5.07413 5.15764 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 20 40 60 80 100 SE +/- 1.00, N = 3 79.80 78.56 80.21 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 20 40 60 80 100 SE +/- 2.09, N = 15 76.38 83.94 75.73 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 0.3276 0.6552 0.9828 1.3104 1.638 SE +/- 0.01776, N = 3 1.45330 1.45579 1.44343 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 0.3396 0.6792 1.0188 1.3584 1.698 SE +/- 0.02979, N = 15 1.25873 1.50952 1.31876 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 150 300 450 600 750 SE +/- 1.01, N = 3 670.28 690.53 673.79 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 90 180 270 360 450 SE +/- 15.41, N = 15 428.97 430.11 347.23 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 8 16 24 32 40 SE +/- 0.34, N = 3 33.39 34.15 32.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 8 16 24 32 40 SE +/- 0.59, N = 15 34.43 30.60 31.14 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 0.9622 1.9244 2.8866 3.8488 4.811 SE +/- 0.01316, N = 3 4.27552 4.27661 4.22286 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 0.8843 1.7686 2.6529 3.5372 4.4215 SE +/- 0.06571, N = 12 3.79772 3.93035 3.64067 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 2 4 6 8 10 SE +/- 0.03894, N = 3 8.22445 8.37853 8.11762 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS 3 6 9 12 15 10.02 10.52 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS 8 16 24 32 40 32.57 32.11 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS 7 14 21 28 35 28.49 27.46 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5