onnx tr Tests for a future article. AMD Ryzen Threadripper PRO 5965WX 24-Cores testing with a ASUS Pro WS WRX80E-SAGE SE WIFI (1201 BIOS) and ASUS NVIDIA NV106 2GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402042-NE-ONNXTR51458&sro&grr .
onnx tr Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c AMD Ryzen Threadripper PRO 5965WX 24-Cores @ 3.80GHz (24 Cores / 48 Threads) ASUS Pro WS WRX80E-SAGE SE WIFI (1201 BIOS) AMD Starship/Matisse 8 x 16GB DDR4-2133MT/s Corsair CMK32GX4M2E3200C16 2048GB SOLIDIGM SSDPFKKW020X7 ASUS NVIDIA NV106 2GB AMD Starship/Matisse VA2431 2 x Intel X550 + Intel Wi-Fi 6 AX200 Ubuntu 23.10 6.5.0-13-generic (x86_64) GNOME Shell 45.0 X Server + Wayland nouveau 4.3 Mesa 23.2.1-1ubuntu3 GCC 13.2.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa008205 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
onnx tr onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard a AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS c 7.26082 137.643 428.971 2.33114 76.3802 13.0918 96.707 10.3402 34.4339 29.0392 1.25873 794.133 3.79772 263.28 4.97262 201.076 670.278 1.49191 7.62456 131.026 79.7953 12.5316 104.735 9.54753 5.5073 181.528 33.3916 29.9463 1.4533 687.278 4.27552 233.825 8.22445 121.569 28.4942 35.0904 32.5651 30.7052 10.0152 99.8424 6.8871 145.105 430.108 2.32498 83.943 11.9124 108.907 9.1818 30.6032 32.674 1.50952 662.193 3.93035 254.395 5.07413 197.039 690.533 1.44815 7.26342 137.542 78.5612 12.7284 109.173 9.15942 5.75502 173.722 34.1523 29.2791 1.45579 686.138 4.27661 233.757 8.37853 119.333 27.4625 36.4098 32.1133 31.137 10.524 95.0154 6.86398 146.780 347.232 2.95286 75.7265 13.3462 103.6459 9.70516 31.1380 32.2767 1.31876 763.035 3.64067 275.560 5.15764 193.937 673.788 1.48414 7.61116 131.261 80.2103 12.4706 105.355 9.49134 5.65170 176.896 32.8852 30.4140 1.44343 692.180 4.22286 236.736 8.11762 123.175 OpenBenchmarking.org
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 2 4 6 8 10 SE +/- 0.16982, N = 15 6.88710 7.26082 6.86398 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 30 60 90 120 150 SE +/- 3.42, N = 15 145.11 137.64 146.78 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 90 180 270 360 450 SE +/- 15.41, N = 15 430.11 428.97 347.23 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 0.6644 1.3288 1.9932 2.6576 3.322 SE +/- 0.11743, N = 15 2.32498 2.33114 2.95286 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 20 40 60 80 100 SE +/- 2.09, N = 15 83.94 76.38 75.73 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 3 6 9 12 15 SE +/- 0.37, N = 15 11.91 13.09 13.35 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 20 40 60 80 100 SE +/- 2.14, N = 15 108.91 96.71 103.65 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 3 6 9 12 15 SE +/- 0.19825, N = 15 9.18180 10.34020 9.70516 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 8 16 24 32 40 SE +/- 0.59, N = 15 30.60 34.43 31.14 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 8 16 24 32 40 SE +/- 0.61, N = 15 32.67 29.04 32.28 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 0.3396 0.6792 1.0188 1.3584 1.698 SE +/- 0.02979, N = 15 1.50952 1.25873 1.31876 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 200 400 600 800 1000 SE +/- 15.86, N = 15 662.19 794.13 763.04 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 0.8843 1.7686 2.6529 3.5372 4.4215 SE +/- 0.06571, N = 12 3.93035 3.79772 3.64067 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 60 120 180 240 300 SE +/- 4.77, N = 12 254.40 263.28 275.56 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 1.1605 2.321 3.4815 4.642 5.8025 SE +/- 0.05411, N = 5 5.07413 4.97262 5.15764 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 40 80 120 160 200 SE +/- 2.01, N = 5 197.04 201.08 193.94 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 150 300 450 600 750 SE +/- 1.01, N = 3 690.53 670.28 673.79 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 0.3357 0.6714 1.0071 1.3428 1.6785 SE +/- 0.00223, N = 3 1.44815 1.49191 1.48414 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 2 4 6 8 10 SE +/- 0.03837, N = 3 7.26342 7.62456 7.61116 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 30 60 90 120 150 SE +/- 0.66, N = 3 137.54 131.03 131.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 20 40 60 80 100 SE +/- 1.00, N = 3 78.56 79.80 80.21 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 3 6 9 12 15 SE +/- 0.16, N = 3 12.73 12.53 12.47 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 20 40 60 80 100 SE +/- 0.08, N = 3 109.17 104.74 105.36 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 3 6 9 12 15 SE +/- 0.00732, N = 3 9.15942 9.54753 9.49134 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 1.2949 2.5898 3.8847 5.1796 6.4745 SE +/- 0.02036, N = 3 5.75502 5.50730 5.65170 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 40 80 120 160 200 SE +/- 0.64, N = 3 173.72 181.53 176.90 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 8 16 24 32 40 SE +/- 0.34, N = 3 34.15 33.39 32.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 7 14 21 28 35 SE +/- 0.32, N = 3 29.28 29.95 30.41 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 0.3276 0.6552 0.9828 1.3104 1.638 SE +/- 0.01776, N = 3 1.45579 1.45330 1.44343 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 150 300 450 600 750 SE +/- 8.43, N = 3 686.14 687.28 692.18 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 0.9622 1.9244 2.8866 3.8488 4.811 SE +/- 0.01316, N = 3 4.27661 4.27552 4.22286 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 50 100 150 200 250 SE +/- 0.73, N = 3 233.76 233.83 236.74 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 2 4 6 8 10 SE +/- 0.03894, N = 3 8.37853 8.22445 8.11762 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a c 30 60 90 120 150 SE +/- 0.59, N = 3 119.33 121.57 123.18 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a 7 14 21 28 35 27.46 28.49 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a 8 16 24 32 40 36.41 35.09 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a 8 16 24 32 40 32.11 32.57 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a 7 14 21 28 35 31.14 30.71 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a 3 6 9 12 15 10.52 10.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard AMD Ryzen Threadripper PRO 5965WX 24-Cores - ASUS a 20 40 60 80 100 95.02 99.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5