ldld AMD Ryzen 7 PRO 5850U testing with a LENOVO ThinkPad T14s Gen 2a 20XF004WUS (R1NET57W 1.27 BIOS) and AMD Radeon Vega / Mobile 1GB on Fedora Linux 39 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402198-NE-LDLD6476438&grs&rdt .
ldld Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c AMD Ryzen 7 PRO 5850U @ 4.51GHz (8 Cores / 16 Threads) LENOVO ThinkPad T14s Gen 2a 20XF004WUS (R1NET57W 1.27 BIOS) AMD Renoir/Cezanne 2 x 16GB LPDDR4-4266MT/s Micron MT53E2G32D4NQ-046 1024GB SAMSUNG MZVLB1T0HBLR-000L7 AMD Radeon Vega / Mobile 1GB AMD Renoir Radeon HD Audio Realtek RTL8111/8168/8411 + MEDIATEK MT7921 802.11ax PCI Fedora Linux 39 6.5.8-300.fc39.x86_64 (x86_64) GNOME Shell 45.0 X Server + Wayland 4.6 Mesa 23.2.1 (LLVM 16.0.6 DRM 3.54) GCC 13.2.1 20230918 btrfs 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-multilib --enable-offload-defaulted --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-build-config=bootstrap-lto --with-gcc-major-version-only --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: performance) - Platform Profile: balanced - CPU Microcode: 0xa50000d - ACPI Profile: balanced Graphics Details - BAR1 / Visible vRAM Size: 1024 MB Python Details - Python 3.12.0 Security Details - SELinux + gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ldld onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard compress-lz4: 3 - Decompression Speed dav1d: Chimera 1080p 10-bit compress-lz4: 1 - Decompression Speed compress-lz4: 1 - Compression Speed compress-lz4: 3 - Compression Speed dav1d: Chimera 1080p onnx: super-resolution-10 - CPU - Standard compress-lz4: 9 - Decompression Speed dav1d: Summer Nature 1080p onnx: CaffeNet 12-int8 - CPU - Standard oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only compress-lz4: 9 - Compression Speed onnx: bertsquad-12 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard dav1d: Summer Nature 4K namd: STMV with 1,066,628 Atoms onnx: fcn-resnet101-11 - CPU - Standard namd: ATPase with 327,506 Atoms vkfft: FFT + iFFT R2C / C2R vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C Bluestein in single precision gromacs: MPI CPU - water_GMX50_bare vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision onnx: ArcFace ResNet-100 - CPU - Standard onnx: GPT-2 - CPU - Standard vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling vkfft: FFT + iFFT C2C 1D batched in single precision onnx: T5 Encoder - CPU - Standard oidn: RTLightmap.hdr.4096x4096 - CPU-Only onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: yolov4 - CPU - Standard onnx: GPT-2 - CPU - Standard a b c 21.6643 4203.1 306.07 4566.3 779.42 119.47 373.16 57.3281 4258.6 529.59 233.2 0.21 0.21 39.06 7.80252 4.72981 97.1324 124.8 0.09658 0.784697 0.32043 3267 2683 1540 0.595 3308 912 17.5391 73.9314 11894 6453 6126 89.0517 0.10 46.1547 17.4409 10.2927 57.0123 1274.37 4.28653 128.158 11.2255 211.42 13.517 17.7654 3836.5 279.41 4200.9 724.19 110.84 346.73 54.1641 4006.1 498.53 223.242 0.20 0.20 37.65 7.46483 4.52820 93.2427 119.84 0.09286 0.754855 0.30829 3346 2610 1556 0.583 3344 926 17.3014 73.1053 12032 6525 6078 88.7391 0.10 56.3668 18.4601 10.7225 57.7965 1324.75 4.47781 133.956 11.2655 220.834 13.6698 17.6607 3895.6 284.84 4218.9 718.46 110.79 348.76 53.9127 4027.6 499.42 221.478 0.20 0.20 37.32 7.48038 4.53743 94.0366 120.36 0.09314 0.755065 0.30993 3392 2671 1579 0.585 3368 928 17.2633 72.8414 12045 6465 6141 88.7159 0.10 56.6173 18.546 10.6316 57.9229 1324.38 4.51337 133.677 11.2683 220.384 13.7188 OpenBenchmarking.org
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 5 10 15 20 25 SE +/- 0.19, N = 15 21.66 17.77 17.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
LZ4 Compression Compression Level: 3 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Decompression Speed a b c 900 1800 2700 3600 4500 SE +/- 19.26, N = 3 4203.1 3836.5 3895.6 1. (CC) gcc options: -O3
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Chimera 1080p 10-bit a b c 70 140 210 280 350 SE +/- 2.30, N = 15 306.07 279.41 284.84 1. (CC) gcc options: -pthread
LZ4 Compression Compression Level: 1 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Decompression Speed a b c 1000 2000 3000 4000 5000 SE +/- 39.91, N = 3 4566.3 4200.9 4218.9 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 1 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Compression Speed a b c 200 400 600 800 1000 SE +/- 6.53, N = 3 779.42 724.19 718.46 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Compression Speed a b c 30 60 90 120 150 SE +/- 0.32, N = 3 119.47 110.84 110.79 1. (CC) gcc options: -O3
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Chimera 1080p a b c 80 160 240 320 400 SE +/- 2.21, N = 3 373.16 346.73 348.76 1. (CC) gcc options: -pthread
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 13 26 39 52 65 SE +/- 0.17, N = 3 57.33 54.16 53.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
LZ4 Compression Compression Level: 9 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Decompression Speed a b c 900 1800 2700 3600 4500 SE +/- 6.24, N = 3 4258.6 4006.1 4027.6 1. (CC) gcc options: -O3
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Summer Nature 1080p a b c 110 220 330 440 550 SE +/- 4.79, N = 6 529.59 498.53 499.42 1. (CC) gcc options: -pthread
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 0.80, N = 3 233.20 223.24 221.48 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only a b c 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 0.21 0.20 0.20
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only a b c 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 0.21 0.20 0.20
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Compression Speed a b c 9 18 27 36 45 SE +/- 0.11, N = 3 39.06 37.65 37.32 1. (CC) gcc options: -O3
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 SE +/- 0.00345, N = 3 7.80252 7.46483 7.48038 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c 1.0642 2.1284 3.1926 4.2568 5.321 SE +/- 0.00718, N = 3 4.72981 4.52820 4.53743 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.21, N = 3 97.13 93.24 94.04 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Summer Nature 4K a b c 30 60 90 120 150 SE +/- 0.11, N = 3 124.80 119.84 120.36 1. (CC) gcc options: -pthread
NAMD Input: STMV with 1,066,628 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0b6 Input: STMV with 1,066,628 Atoms a b c 0.0217 0.0434 0.0651 0.0868 0.1085 SE +/- 0.00005, N = 3 0.09658 0.09286 0.09314
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 0.1766 0.3532 0.5298 0.7064 0.883 SE +/- 0.000621, N = 3 0.784697 0.754855 0.755065 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
NAMD Input: ATPase with 327,506 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0b6 Input: ATPase with 327,506 Atoms a b c 0.0721 0.1442 0.2163 0.2884 0.3605 SE +/- 0.00004, N = 3 0.32043 0.30829 0.30993
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R a b c 700 1400 2100 2800 3500 SE +/- 2.65, N = 3 3267 3346 3392 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision a b c 600 1200 1800 2400 3000 SE +/- 25.37, N = 6 2683 2610 2671 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision a b c 300 600 900 1200 1500 SE +/- 10.48, N = 3 1540 1556 1579 1. (CXX) g++ options: -O3
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare a b c 0.1339 0.2678 0.4017 0.5356 0.6695 SE +/- 0.001, N = 3 0.595 0.583 0.585 1. (CXX) g++ options: -O3 -lm
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision a b c 700 1400 2100 2800 3500 SE +/- 11.68, N = 3 3308 3344 3368 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision a b c 200 400 600 800 1000 SE +/- 7.54, N = 3 912 926 928 1. (CXX) g++ options: -O3
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 4 8 12 16 20 SE +/- 0.05, N = 3 17.54 17.30 17.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c 16 32 48 64 80 SE +/- 0.04, N = 3 73.93 73.11 72.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a b c 3K 6K 9K 12K 15K SE +/- 36.70, N = 3 11894 12032 12045 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling a b c 1400 2800 4200 5600 7000 SE +/- 17.74, N = 3 6453 6525 6465 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision a b c 1300 2600 3900 5200 6500 SE +/- 73.26, N = 3 6126 6078 6141 1. (CXX) g++ options: -O3
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.15, N = 3 89.05 88.74 88.72 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only a b c 0.0225 0.045 0.0675 0.09 0.1125 SE +/- 0.00, N = 3 0.10 0.10 0.10
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 13 26 39 52 65 SE +/- 0.55, N = 15 46.15 56.37 56.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 5 10 15 20 25 SE +/- 0.06, N = 3 17.44 18.46 18.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.02, N = 3 10.29 10.72 10.63 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 13 26 39 52 65 SE +/- 0.16, N = 3 57.01 57.80 57.92 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 300 600 900 1200 1500 SE +/- 1.09, N = 3 1274.37 1324.75 1324.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 1.0155 2.031 3.0465 4.062 5.0775 SE +/- 0.01608, N = 3 4.28653 4.47781 4.51337 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 SE +/- 0.06, N = 3 128.16 133.96 133.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.02, N = 3 11.23 11.27 11.27 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 0.35, N = 3 211.42 220.83 220.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c 4 8 12 16 20 SE +/- 0.01, N = 3 13.52 13.67 13.72 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5