ldld AMD Ryzen 7 PRO 5850U testing with a LENOVO ThinkPad T14s Gen 2a 20XF004WUS (R1NET57W 1.27 BIOS) and AMD Radeon Vega / Mobile 1GB on Fedora Linux 39 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402198-NE-LDLD6476438&gru .
ldld Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c AMD Ryzen 7 PRO 5850U @ 4.51GHz (8 Cores / 16 Threads) LENOVO ThinkPad T14s Gen 2a 20XF004WUS (R1NET57W 1.27 BIOS) AMD Renoir/Cezanne 2 x 16GB LPDDR4-4266MT/s Micron MT53E2G32D4NQ-046 1024GB SAMSUNG MZVLB1T0HBLR-000L7 AMD Radeon Vega / Mobile 1GB AMD Renoir Radeon HD Audio Realtek RTL8111/8168/8411 + MEDIATEK MT7921 802.11ax PCI Fedora Linux 39 6.5.8-300.fc39.x86_64 (x86_64) GNOME Shell 45.0 X Server + Wayland 4.6 Mesa 23.2.1 (LLVM 16.0.6 DRM 3.54) GCC 13.2.1 20230918 btrfs 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-multilib --enable-offload-defaulted --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-build-config=bootstrap-lto --with-gcc-major-version-only --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: performance) - Platform Profile: balanced - CPU Microcode: 0xa50000d - ACPI Profile: balanced Graphics Details - BAR1 / Visible vRAM Size: 1024 MB Python Details - Python 3.12.0 Security Details - SELinux + gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ldld vkfft: FFT + iFFT R2C / C2R vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling dav1d: Chimera 1080p dav1d: Summer Nature 4K dav1d: Summer Nature 1080p dav1d: Chimera 1080p 10-bit oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only oidn: RTLightmap.hdr.4096x4096 - CPU-Only onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard compress-lz4: 1 - Compression Speed compress-lz4: 1 - Decompression Speed compress-lz4: 3 - Compression Speed compress-lz4: 3 - Decompression Speed compress-lz4: 9 - Compression Speed compress-lz4: 9 - Decompression Speed gromacs: MPI CPU - water_GMX50_bare namd: ATPase with 327,506 Atoms namd: STMV with 1,066,628 Atoms onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard a b c 3267 11894 1540 2683 6126 3308 912 6453 373.16 124.8 529.59 306.07 0.21 0.21 0.10 73.9314 4.72981 89.0517 7.80252 233.2 0.784697 17.5391 97.1324 57.3281 21.6643 779.42 4566.3 119.47 4203.1 39.06 4258.6 0.595 0.32043 0.09658 13.517 211.42 11.2255 128.158 4.28653 1274.37 57.0123 10.2927 17.4409 46.1547 3346 12032 1556 2610 6078 3344 926 6525 346.73 119.84 498.53 279.41 0.20 0.20 0.10 73.1053 4.52820 88.7391 7.46483 223.242 0.754855 17.3014 93.2427 54.1641 17.7654 724.19 4200.9 110.84 3836.5 37.65 4006.1 0.583 0.30829 0.09286 13.6698 220.834 11.2655 133.956 4.47781 1324.75 57.7965 10.7225 18.4601 56.3668 3392 12045 1579 2671 6141 3368 928 6465 348.76 120.36 499.42 284.84 0.20 0.20 0.10 72.8414 4.53743 88.7159 7.48038 221.478 0.755065 17.2633 94.0366 53.9127 17.6607 718.46 4218.9 110.79 3895.6 37.32 4027.6 0.585 0.30993 0.09314 13.7188 220.384 11.2683 133.677 4.51337 1324.38 57.9229 10.6316 18.546 56.6173 OpenBenchmarking.org
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R a b c 700 1400 2100 2800 3500 SE +/- 2.65, N = 3 3267 3346 3392 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a b c 3K 6K 9K 12K 15K SE +/- 36.70, N = 3 11894 12032 12045 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision a b c 300 600 900 1200 1500 SE +/- 10.48, N = 3 1540 1556 1579 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision a b c 600 1200 1800 2400 3000 SE +/- 25.37, N = 6 2683 2610 2671 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision a b c 1300 2600 3900 5200 6500 SE +/- 73.26, N = 3 6126 6078 6141 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision a b c 700 1400 2100 2800 3500 SE +/- 11.68, N = 3 3308 3344 3368 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision a b c 200 400 600 800 1000 SE +/- 7.54, N = 3 912 926 928 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling a b c 1400 2800 4200 5600 7000 SE +/- 17.74, N = 3 6453 6525 6465 1. (CXX) g++ options: -O3
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Chimera 1080p a b c 80 160 240 320 400 SE +/- 2.21, N = 3 373.16 346.73 348.76 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Summer Nature 4K a b c 30 60 90 120 150 SE +/- 0.11, N = 3 124.80 119.84 120.36 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Summer Nature 1080p a b c 110 220 330 440 550 SE +/- 4.79, N = 6 529.59 498.53 499.42 1. (CC) gcc options: -pthread
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Chimera 1080p 10-bit a b c 70 140 210 280 350 SE +/- 2.30, N = 15 306.07 279.41 284.84 1. (CC) gcc options: -pthread
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only a b c 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 0.21 0.20 0.20
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only a b c 0.0473 0.0946 0.1419 0.1892 0.2365 SE +/- 0.00, N = 3 0.21 0.20 0.20
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only a b c 0.0225 0.045 0.0675 0.09 0.1125 SE +/- 0.00, N = 3 0.10 0.10 0.10
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c 16 32 48 64 80 SE +/- 0.04, N = 3 73.93 73.11 72.84 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c 1.0642 2.1284 3.1926 4.2568 5.321 SE +/- 0.00718, N = 3 4.72981 4.52820 4.53743 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.15, N = 3 89.05 88.74 88.72 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 2 4 6 8 10 SE +/- 0.00345, N = 3 7.80252 7.46483 7.48038 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 0.80, N = 3 233.20 223.24 221.48 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 0.1766 0.3532 0.5298 0.7064 0.883 SE +/- 0.000621, N = 3 0.784697 0.754855 0.755065 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 4 8 12 16 20 SE +/- 0.05, N = 3 17.54 17.30 17.26 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 20 40 60 80 100 SE +/- 0.21, N = 3 97.13 93.24 94.04 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 13 26 39 52 65 SE +/- 0.17, N = 3 57.33 54.16 53.91 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 5 10 15 20 25 SE +/- 0.19, N = 15 21.66 17.77 17.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
LZ4 Compression Compression Level: 1 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Compression Speed a b c 200 400 600 800 1000 SE +/- 6.53, N = 3 779.42 724.19 718.46 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 1 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Decompression Speed a b c 1000 2000 3000 4000 5000 SE +/- 39.91, N = 3 4566.3 4200.9 4218.9 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Compression Speed a b c 30 60 90 120 150 SE +/- 0.32, N = 3 119.47 110.84 110.79 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Decompression Speed a b c 900 1800 2700 3600 4500 SE +/- 19.26, N = 3 4203.1 3836.5 3895.6 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Compression Speed a b c 9 18 27 36 45 SE +/- 0.11, N = 3 39.06 37.65 37.32 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Decompression Speed a b c 900 1800 2700 3600 4500 SE +/- 6.24, N = 3 4258.6 4006.1 4027.6 1. (CC) gcc options: -O3
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare a b c 0.1339 0.2678 0.4017 0.5356 0.6695 SE +/- 0.001, N = 3 0.595 0.583 0.585 1. (CXX) g++ options: -O3 -lm
NAMD Input: ATPase with 327,506 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0b6 Input: ATPase with 327,506 Atoms a b c 0.0721 0.1442 0.2163 0.2884 0.3605 SE +/- 0.00004, N = 3 0.32043 0.30829 0.30993
NAMD Input: STMV with 1,066,628 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0b6 Input: STMV with 1,066,628 Atoms a b c 0.0217 0.0434 0.0651 0.0868 0.1085 SE +/- 0.00005, N = 3 0.09658 0.09286 0.09314
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b c 4 8 12 16 20 SE +/- 0.01, N = 3 13.52 13.67 13.72 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b c 50 100 150 200 250 SE +/- 0.35, N = 3 211.42 220.83 220.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.02, N = 3 11.23 11.27 11.27 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b c 30 60 90 120 150 SE +/- 0.06, N = 3 128.16 133.96 133.68 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b c 1.0155 2.031 3.0465 4.062 5.0775 SE +/- 0.01608, N = 3 4.28653 4.47781 4.51337 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b c 300 600 900 1200 1500 SE +/- 1.09, N = 3 1274.37 1324.75 1324.38 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b c 13 26 39 52 65 SE +/- 0.16, N = 3 57.01 57.80 57.92 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b c 3 6 9 12 15 SE +/- 0.02, N = 3 10.29 10.72 10.63 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b c 5 10 15 20 25 SE +/- 0.06, N = 3 17.44 18.46 18.55 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b c 13 26 39 52 65 SE +/- 0.55, N = 15 46.15 56.37 56.62 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Phoronix Test Suite v10.8.5