9684x-ne Tests for a future article. 2 x AMD EPYC 9684X 96-Core testing with a AMD Titanite_4G (RTI1007B BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402188-NE-9684XNE0007&sro&gru .
9684x-ne Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution a b c 2 x AMD EPYC 9684X 96-Core @ 2.55GHz (192 Cores / 384 Threads) AMD Titanite_4G (RTI1007B BIOS) AMD Device 14a4 1520GB 3201GB Micron_7450_MTFDKCB3T2TFS ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 23.10 6.6.0-060600-generic (x86_64) GCC 13.2.0 ext4 800x600 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10113e Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
9684x-ne oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only oidn: RTLightmap.hdr.4096x4096 - CPU-Only onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard gromacs: MPI CPU - water_GMX50_bare namd: ATPase with 327,506 Atoms namd: STMV with 1,066,628 Atoms onnx: GPT-2 - CPU - Parallel onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Parallel onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Parallel onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Parallel onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Parallel onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Parallel onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Parallel onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Parallel onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Parallel onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallel onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard a b c 3.49 3.49 1.68 110.687 159.717 6.36075 7.03904 284.322 235.262 8.72812 10.7281 331.596 446.93 0.880576 5.05401 15.5773 23.4132 101.632 147.311 91.8638 97.3769 22.9007 44.5379 24.055 20.90323 6.36963 9.02332 6.25834 157.207 142.061 3.51487 4.2498 114.566 93.2099 3.01302 2.23695 1135.61 197.859 64.1922 42.7093 9.8368 6.78768 10.8828 10.2686 43.6621 22.4507 3.49 3.49 1.65 111.541 122.039 6.37602 7.04966 281.887 237.889 8.73027 10.6571 314.567 430.574 0.881007 5.64928 15.4584 24.159 20.88550 6.49427 8.9547 8.19077 156.831 141.847 3.54536 4.20289 114.538 93.8308 3.17604 2.32191 1135.06 177.01 64.686 3.49 3.49 1.67 122.148 7.0734 235.807 10.8732 479.844 5.03581 24.0256 24.074 20.88433 6.47040 8.18389 141.371 4.23992 91.9517 2.08344 198.574 41.6199 OpenBenchmarking.org
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only a b c 0.7853 1.5706 2.3559 3.1412 3.9265 3.49 3.49 3.49
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only a b c 0.7853 1.5706 2.3559 3.1412 3.9265 3.49 3.49 3.49
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only a b c 0.378 0.756 1.134 1.512 1.89 1.68 1.65 1.67
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel a b 20 40 60 80 100 110.69 111.54 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c 40 80 120 160 200 159.72 122.04 122.15 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel a b 2 4 6 8 10 6.36075 6.37602 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 7.03904 7.04966 7.07340 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel a b 60 120 180 240 300 284.32 281.89 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c 50 100 150 200 250 235.26 237.89 235.81 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b 2 4 6 8 10 8.72812 8.73027 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 10.73 10.66 10.87 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b 70 140 210 280 350 331.60 314.57 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 100 200 300 400 500 446.93 430.57 479.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b 0.1982 0.3964 0.5946 0.7928 0.991 0.880576 0.881007 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 1.2711 2.5422 3.8133 5.0844 6.3555 5.05401 5.64928 5.03581 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b 4 8 12 16 20 15.58 15.46 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a c 6 12 18 24 30 23.41 24.03 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a 20 40 60 80 100 101.63 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a 30 60 90 120 150 147.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel a 20 40 60 80 100 91.86 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a 20 40 60 80 100 97.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a 5 10 15 20 25 22.90 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a 10 20 30 40 50 44.54 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare a b c 6 12 18 24 30 24.06 24.16 24.07 1. (CXX) g++ options: -O3 -lm
NAMD Input: ATPase with 327,506 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0b6 Input: ATPase with 327,506 Atoms a b c 5 10 15 20 25 20.90 20.89 20.88
NAMD Input: STMV with 1,066,628 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0b6 Input: STMV with 1,066,628 Atoms a b c 2 4 6 8 10 6.36963 6.49427 6.47040
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Parallel a b 3 6 9 12 15 9.02332 8.95470 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 6.25834 8.19077 8.18389 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Parallel a b 30 60 90 120 150 157.21 156.83 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 142.06 141.85 141.37 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Parallel a b 0.7977 1.5954 2.3931 3.1908 3.9885 3.51487 3.54536 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c 0.9562 1.9124 2.8686 3.8248 4.781 4.24980 4.20289 4.23992 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Parallel a b 30 60 90 120 150 114.57 114.54 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 93.21 93.83 91.95 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel a b 0.7146 1.4292 2.1438 2.8584 3.573 3.01302 3.17604 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 0.5224 1.0448 1.5672 2.0896 2.612 2.23695 2.32191 2.08344 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel a b 200 400 600 800 1000 1135.61 1135.06 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 40 80 120 160 200 197.86 177.01 198.57 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel a b 14 28 42 56 70 64.19 64.69 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a c 10 20 30 40 50 42.71 41.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel a 3 6 9 12 15 9.8368 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a 2 4 6 8 10 6.78768 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Parallel a 3 6 9 12 15 10.88 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a 3 6 9 12 15 10.27 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel a 10 20 30 40 50 43.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a 5 10 15 20 25 22.45 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5