lll onnx

AMD Ryzen Threadripper 7980X 64-Cores testing with a ASUS Pro WS TRX50-SAGE WIFI (0607 BIOS) and NAVI32 16GB on Pop 22.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402037-NE-LLLONNX5586&sor&grt.

lll onnxProcessorMotherboardChipsetMemoryDiskGraphicsAudioMonitorNetworkOSKernelDesktopDisplay ServerOpenGLVulkanCompilerFile-SystemScreen ResolutionabcdAMD Ryzen Threadripper 7980X 64-Cores @ 8.21GHz (64 Cores / 128 Threads)ASUS Pro WS TRX50-SAGE WIFI (0607 BIOS)AMD Device 14a44 x 32GB DRAM-6400MT/s F5-6400R3239G32GQ1000GB Western Digital WDS100T1X0E-00AFY0NAVI32 16GB (2124/1218MHz)Realtek ALC1220DELL U2723QEAquantia Device 04c0 + Intel Device 125b + MEDIATEK Device 0616Pop 22.046.6.6-76060606-generic (x86_64)GNOME Shell 42.5X Server4.6 Mesa 23.3.2-1pop0~1704238321~22.04~36f1d0e (LLVM 15.0.7 DRM 3.54)1.3.267GCC 11.4.0ext43840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xa108105 Python Details- Python 3.10.12Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

lll onnxllamafile: llava-v1.5-7b-q4 - CPUllamafile: mistral-7b-instruct-v0.2.Q8_0 - CPUllamafile: wizardcoder-python-34b-v1.0.Q6_K - CPUcompress-lz4: 1 - Compression Speedcompress-lz4: 1 - Decompression Speedcompress-lz4: 3 - Compression Speedcompress-lz4: 3 - Decompression Speedcompress-lz4: 9 - Compression Speedcompress-lz4: 9 - Decompression Speedonnx: GPT-2 - CPU - Parallelonnx: GPT-2 - CPU - Parallelonnx: GPT-2 - CPU - Standardonnx: GPT-2 - CPU - Standardonnx: yolov4 - CPU - Parallelonnx: yolov4 - CPU - Parallelonnx: yolov4 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: T5 Encoder - CPU - Parallelonnx: T5 Encoder - CPU - Parallelonnx: T5 Encoder - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: bertsquad-12 - CPU - Parallelonnx: bertsquad-12 - CPU - Parallelonnx: bertsquad-12 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Parallelonnx: CaffeNet 12-int8 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Parallelonnx: fcn-resnet101-11 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Parallelonnx: ArcFace ResNet-100 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Parallelonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Parallelonnx: super-resolution-10 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallelonnx: Faster R-CNN R-50-FPN-int8 - CPU - Parallelonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardabcd2716.85.98994.46415.8151.285718.149.76029.7178.5335.59535161.476.1911811.373187.923511.136389.7946394.9652.53056251.6173.9736513.403974.602916.821959.4439752.4751.32744768.4871.300873.33783299.5945.03178198.73535.288628.336642.407123.5796263.063.80037302.6913.30338140.3187.12533143.6396.9615935.672328.03146.481621.512327.1216.765.97988.756383.7152.155737.449.595995.6178.9665.58189162.1696.1645311.381287.874210.740993.1459397.5482.51439250.6213.9896013.145676.069815.960262.6829771.5141.29988702.1441.424033.42174292.3515.39325187.88834.324229.141238.514825.9625267.1163.74911312.2233.21958140.1717.13343146.3046.8350736.599927.329449.037320.599326.9316.725.97991.616396.8156.435884.550.156088.1178.8045.58651163.1016.1346811.206889.238310.850792.2031394.7642.53183249.1444.0144413.294675.224915.848763.0958746.4841.33991694.3071.440093.44674290.2205.80848174.83234.201229.237638.908425.6994261.1943.82755300.1623.3311138.4557.22117144.6616.9124137.217826.867146.573921.4727.2316.825.97992.066398.3149.86563349.936019.3179.9075.55278159.6656.2608711.657485.779411.138589.7763397.9452.5119248.9964.0155712.780478.242215.928662.7791738.6471.35233788.871.26733.43087291.4694.79144208.70333.970629.436138.768725.7921252.8423.95402298.8083.34628138.377.22558145.8626.8554737.047926.990346.654521.4325OpenBenchmarking.org

Llamafile

Test: llava-v1.5-7b-q4 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: llava-v1.5-7b-q4 - Acceleration: CPUdbac612182430SE +/- 0.04, N = 327.2327.1227.0026.93

Llamafile

Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPUdabc48121620SE +/- 0.04, N = 316.8216.8016.7616.72

Llamafile

Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU

OpenBenchmarking.orgTokens Per Second, More Is BetterLlamafile 0.6Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPUadcb1.34552.6914.03655.3826.7275SE +/- 0.00, N = 35.985.975.975.97

LZ4 Compression

Compression Level: 1 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 1 - Compression Speedadcb2004006008001000SE +/- 1.01, N = 3SE +/- 1.22, N = 3994.40992.06991.61988.751. (CC) gcc options: -O3

LZ4 Compression

Compression Level: 1 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 1 - Decompression Speedadcb14002800420056007000SE +/- 3.17, N = 3SE +/- 8.24, N = 36415.86398.36396.86383.71. (CC) gcc options: -O3

LZ4 Compression

Compression Level: 3 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 3 - Compression Speedcbad306090120150SE +/- 1.40, N = 3SE +/- 0.29, N = 3156.43152.15151.28149.861. (CC) gcc options: -O3

LZ4 Compression

Compression Level: 3 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 3 - Decompression Speedcbad13002600390052006500SE +/- 48.06, N = 3SE +/- 8.21, N = 35884.55737.45718.15633.01. (CC) gcc options: -O3

LZ4 Compression

Compression Level: 9 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 9 - Compression Speedcdab1122334455SE +/- 0.05, N = 3SE +/- 0.10, N = 350.1549.9349.7049.591. (CC) gcc options: -O3

LZ4 Compression

Compression Level: 9 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 9 - Decompression Speedcadb13002600390052006500SE +/- 53.86, N = 3SE +/- 25.15, N = 36088.16029.76019.35995.61. (CC) gcc options: -O3

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Paralleldbca4080120160200SE +/- 0.92, N = 3SE +/- 0.37, N = 3179.91178.97178.80178.531. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Paralleldbca1.2592.5183.7775.0366.295SE +/- 0.02848, N = 3SE +/- 0.01172, N = 35.552785.581895.586515.595351. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardcbad4080120160200SE +/- 1.38, N = 15SE +/- 0.40, N = 3163.10162.17161.47159.671. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardcbad246810SE +/- 0.05038, N = 15SE +/- 0.01518, N = 36.134686.164536.191186.260871. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Paralleldbac3691215SE +/- 0.10, N = 3SE +/- 0.08, N = 311.6611.3811.3711.211. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Paralleldbac20406080100SE +/- 0.75, N = 3SE +/- 0.67, N = 385.7887.8787.9289.241. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standarddacb3691215SE +/- 0.12, N = 5SE +/- 0.10, N = 711.1411.1410.8510.741. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standarddacb20406080100SE +/- 1.02, N = 5SE +/- 0.84, N = 789.7889.7992.2093.151. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Paralleldbac90180270360450SE +/- 1.85, N = 3SE +/- 0.91, N = 3397.95397.55394.97394.761. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Paralleldbac0.56971.13941.70912.27882.8485SE +/- 0.01182, N = 3SE +/- 0.00584, N = 32.511902.514392.530562.531831. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardabcd50100150200250SE +/- 1.13, N = 3SE +/- 3.25, N = 3251.62250.62249.14249.001. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardabcd0.90351.8072.71053.6144.5175SE +/- 0.01804, N = 3SE +/- 0.05232, N = 33.973653.989604.014444.015571. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Parallelacbd3691215SE +/- 0.10, N = 3SE +/- 0.03, N = 313.4013.2913.1512.781. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Parallelacbd20406080100SE +/- 0.58, N = 3SE +/- 0.20, N = 374.6075.2276.0778.241. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardabdc48121620SE +/- 0.14, N = 7SE +/- 0.03, N = 316.8215.9615.9315.851. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardabdc1428425670SE +/- 0.53, N = 7SE +/- 0.10, N = 359.4462.6862.7863.101. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallelbacd170340510680850SE +/- 13.34, N = 15SE +/- 7.62, N = 15771.51752.48746.48738.651. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallelbacd0.30430.60860.91291.21721.5215SE +/- 0.02174, N = 15SE +/- 0.01282, N = 151.299881.327441.339911.352331. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standarddabc2004006008001000SE +/- 5.52, N = 3SE +/- 5.80, N = 3788.87768.49702.14694.311. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standarddabc0.3240.6480.9721.2961.62SE +/- 0.01127, N = 3SE +/- 0.01205, N = 31.267301.300871.424031.440091. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Parallelcdba0.77551.5512.32653.1023.8775SE +/- 0.04332, N = 3SE +/- 0.04541, N = 33.446743.430873.421743.337831. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Parallelcdba70140210280350SE +/- 3.69, N = 3SE +/- 3.93, N = 3290.22291.47292.35299.591. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardcbad1.30692.61383.92075.22766.5345SE +/- 0.20779, N = 12SE +/- 0.17026, N = 155.808485.393255.031784.791441. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardcbad50100150200250SE +/- 6.80, N = 12SE +/- 5.60, N = 15174.83187.89198.74208.701. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallelabcd816243240SE +/- 0.41, N = 335.2934.3234.2033.971. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallelabcd714212835SE +/- 0.35, N = 328.3429.1429.2429.441. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardacdb1020304050SE +/- 0.08, N = 342.4138.9138.7738.511. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardacdb612182430SE +/- 0.05, N = 323.5825.7025.7925.961. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallelbacd60120180240300SE +/- 2.98, N = 15267.12263.06261.19252.841. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallelbacd0.88971.77942.66913.55884.4485SE +/- 0.04174, N = 153.749113.800373.827553.954021. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardbacd70140210280350SE +/- 6.34, N = 15312.22302.69300.16298.811. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardbacd0.75291.50582.25873.01163.7645SE +/- 0.06060, N = 153.219583.303383.331103.346281. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Parallelabcd306090120150SE +/- 0.93, N = 3140.32140.17138.46138.371. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Parallelabcd246810SE +/- 0.04722, N = 37.125337.133437.221177.225581. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardbdca306090120150SE +/- 0.70, N = 3146.30145.86144.66143.641. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardbdca246810SE +/- 0.03257, N = 36.835076.855476.912416.961591. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallelcdba918273645SE +/- 0.46, N = 337.2237.0536.6035.671. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallelcdba714212835SE +/- 0.35, N = 326.8726.9927.3328.031. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardbdca1122334455SE +/- 1.42, N = 1549.0446.6546.5746.481. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardbdca510152025SE +/- 0.51, N = 1520.6021.4321.4721.511. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt


Phoronix Test Suite v10.8.4