onnx new

AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Pop 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402035-NE-ONNXNEW6040&sor&grr.

onnx newProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen ResolutionabcdAMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads)Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS)AMD Starship/Matisse4 x 32GB DDR4-3000MT/s CMK64GX4M2D3000C16Samsung SSD 970 EVO Plus 500GBAMD Radeon RX 5700 8GB (1750/875MHz)AMD Navi 10 HDMI AudioDELL P2415QIntel I211 + Intel Wi-Fi 6 AX200Pop 22.046.6.6-76060606-generic (x86_64)GNOME Shell 42.5X Server 1.21.1.44.6 Mesa 23.3.2-1pop0~1704238321~22.04~36f1d0e (LLVM 15.0.7 DRM 3.54)1.3.267GCC 11.4.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107aPython Details- Python 3.10.12Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

onnx newonnx: CaffeNet 12-int8 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Parallelonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: bertsquad-12 - CPU - Parallelonnx: bertsquad-12 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallelonnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Parallelonnx: GPT-2 - CPU - Parallelonnx: GPT-2 - CPU - Parallelonnx: GPT-2 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Parallelonnx: yolov4 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Parallelonnx: T5 Encoder - CPU - Parallelonnx: T5 Encoder - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Standardabcd5.0096199.46240.324124.796218.9884.566343.81235262.06546.755921.38521266.70.7894459.36878106.62611.758984.988292.5553.41805335.4132.98135166.9685.98899115.0288.69295133.7827.47456.2888158.94364.31315.54799.84382101.55320.729148.22638.38855119.15211.855984.325411.159989.59315.11408195.47939.097925.5728225.9564.425564.31444231.65447.586021.01241253.220.79815010.973991.038313.365474.7827302.6453.30419342.7982.91718170.3535.87005115.7718.63703136.6587.318597.56668132.11464.754315.441511.544886.589820.951547.71988.46928118.03711.864184.268611.269288.73645.06250197.53138.655225.8677227.6364.395224.46068223.99347.914720.86841267.890.78871312.260281.497015.335865.1860302.8243.30267342.9532.91581171.5495.82904116.4848.58440138.6647.211438.34118119.84764.877215.412813.138976.092421.007547.59118.54047117.04811.931083.796111.234189.00295.02241198.96539.320825.4426226.8204.408924.41593226.28047.950220.85341257.910.79527912.548479.628415.678363.7474307.3393.25386344.3902.90364172.0835.81121117.8518.48632139.5127.167658.62179115.97164.820815.426313.533673.874520.776548.12048.50747117.50411.894684.051311.243788.9273OpenBenchmarking.org

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Paralleladcb1.15072.30143.45214.60285.7535SE +/- 0.01436, N = 3SE +/- 0.03696, N = 15SE +/- 0.05402, N = 55.009605.022415.062505.114081. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Paralleladcb4080120160200SE +/- 0.56, N = 3SE +/- 1.45, N = 15SE +/- 2.06, N = 5199.46198.97197.53195.481. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardcbda918273645SE +/- 0.20, N = 3SE +/- 0.19, N = 3SE +/- 0.34, N = 938.6639.1039.3240.321. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardcbda612182430SE +/- 0.13, N = 3SE +/- 0.13, N = 3SE +/- 0.21, N = 925.8725.5725.4424.801. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Parallelabdc50100150200250SE +/- 0.51, N = 3SE +/- 1.30, N = 3SE +/- 1.88, N = 9218.99225.96226.82227.641. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Parallelabdc1.02742.05483.08224.10965.137SE +/- 0.01003, N = 3SE +/- 0.02513, N = 3SE +/- 0.03613, N = 94.566344.425564.408924.395221. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardabdc1.00372.00743.01114.01485.0185SE +/- 0.04592, N = 4SE +/- 0.01606, N = 3SE +/- 0.00933, N = 33.812354.314444.415934.460681. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardabdc60120180240300SE +/- 2.43, N = 4SE +/- 0.82, N = 3SE +/- 0.47, N = 3262.07231.65226.28223.991. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallelabcd1122334455SE +/- 0.06, N = 3SE +/- 0.12, N = 3SE +/- 0.20, N = 346.7647.5947.9147.951. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallelabcd510152025SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.08, N = 321.3921.0120.8720.851. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Parallelbdac30060090012001500SE +/- 14.56, N = 3SE +/- 17.69, N = 3SE +/- 2.19, N = 31253.221257.911266.701267.891. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Parallelbdac0.17960.35920.53880.71840.898SE +/- 0.009251, N = 3SE +/- 0.011220, N = 3SE +/- 0.001362, N = 30.7981500.7952790.7894450.7887131. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Parallelabcd3691215SE +/- 0.03121, N = 3SE +/- 0.02097, N = 3SE +/- 0.03777, N = 39.3687810.9739012.2602012.548401. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Parallelabcd20406080100SE +/- 0.26, N = 3SE +/- 0.14, N = 3SE +/- 0.24, N = 3106.6391.0481.5079.631. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardabcd48121620SE +/- 0.11, N = 3SE +/- 0.17, N = 3SE +/- 0.05, N = 311.7613.3715.3415.681. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardabcd20406080100SE +/- 0.62, N = 3SE +/- 0.72, N = 3SE +/- 0.21, N = 384.9974.7865.1963.751. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Parallelabcd70140210280350SE +/- 1.22, N = 3SE +/- 2.72, N = 3SE +/- 1.84, N = 3292.56302.65302.82307.341. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Parallelabcd0.76911.53822.30733.07643.8455SE +/- 0.01341, N = 3SE +/- 0.02977, N = 3SE +/- 0.01952, N = 33.418053.304193.302673.253861. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardabcd70140210280350SE +/- 1.06, N = 3SE +/- 0.21, N = 3SE +/- 0.50, N = 3335.41342.80342.95344.391. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardabcd0.67081.34162.01242.68323.354SE +/- 0.00899, N = 3SE +/- 0.00183, N = 3SE +/- 0.00422, N = 32.981352.917182.915812.903641. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardabcd4080120160200SE +/- 0.41, N = 3SE +/- 0.12, N = 3SE +/- 0.80, N = 3166.97170.35171.55172.081. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardabcd1.34752.6954.04255.396.7375SE +/- 0.01409, N = 3SE +/- 0.00419, N = 3SE +/- 0.02691, N = 35.988995.870055.829045.811211. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardabcd306090120150SE +/- 0.17, N = 3SE +/- 0.20, N = 3SE +/- 1.11, N = 3115.03115.77116.48117.851. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardabcd246810SE +/- 0.01301, N = 3SE +/- 0.01460, N = 3SE +/- 0.07957, N = 38.692958.637038.584408.486321. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallelabcd306090120150SE +/- 1.33, N = 3SE +/- 0.38, N = 3SE +/- 0.50, N = 3133.78136.66138.66139.511. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallelabcd246810SE +/- 0.07142, N = 3SE +/- 0.01975, N = 3SE +/- 0.02570, N = 37.474507.318597.211437.167651. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Parallelabcd246810SE +/- 0.04827, N = 3SE +/- 0.00537, N = 3SE +/- 0.09288, N = 36.288807.566688.341188.621791. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Parallelabcd4080120160200SE +/- 0.84, N = 3SE +/- 0.08, N = 3SE +/- 1.26, N = 3158.94132.11119.85115.971. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardabdc1428425670SE +/- 0.09, N = 3SE +/- 0.19, N = 3SE +/- 0.12, N = 364.3164.7564.8264.881. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardabdc48121620SE +/- 0.02, N = 3SE +/- 0.05, N = 3SE +/- 0.03, N = 315.5515.4415.4315.411. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardabcd3691215SE +/- 0.00918, N = 3SE +/- 0.03970, N = 3SE +/- 0.07065, N = 39.8438211.5448013.1389013.533601. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardabcd20406080100SE +/- 0.07, N = 3SE +/- 0.23, N = 3SE +/- 0.39, N = 3101.5586.5976.0973.871. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Paralleladbc510152025SE +/- 0.04, N = 3SE +/- 0.11, N = 3SE +/- 0.09, N = 320.7320.7820.9521.011. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Paralleladbc1122334455SE +/- 0.10, N = 3SE +/- 0.26, N = 3SE +/- 0.21, N = 348.2348.1247.7247.591. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardabdc246810SE +/- 0.05653, N = 3SE +/- 0.03634, N = 3SE +/- 0.02479, N = 38.388558.469288.507478.540471. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardabdc306090120150SE +/- 0.78, N = 3SE +/- 0.50, N = 3SE +/- 0.34, N = 3119.15118.04117.50117.051. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Parallelabdc3691215SE +/- 0.03, N = 3SE +/- 0.02, N = 3SE +/- 0.00, N = 311.8611.8611.8911.931. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Parallelabdc20406080100SE +/- 0.23, N = 3SE +/- 0.11, N = 3SE +/- 0.03, N = 384.3384.2784.0583.801. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardacdb3691215SE +/- 0.04, N = 3SE +/- 0.04, N = 3SE +/- 0.10, N = 311.1611.2311.2411.271. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardacdb20406080100SE +/- 0.28, N = 3SE +/- 0.31, N = 3SE +/- 0.77, N = 389.5989.0088.9388.741. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt


Phoronix Test Suite v10.8.4