ldld

AMD Ryzen 7 PRO 5850U testing with a LENOVO ThinkPad T14s Gen 2a 20XF004WUS (R1NET57W 1.27 BIOS) and AMD Radeon Vega / Mobile 1GB on Fedora Linux 39 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2402198-NE-LDLD6476438&grs&rdt.

ldldProcessorMotherboardChipsetMemoryDiskGraphicsAudioNetworkOSKernelDesktopDisplay ServerOpenGLCompilerFile-SystemScreen ResolutionabcAMD Ryzen 7 PRO 5850U @ 4.51GHz (8 Cores / 16 Threads)LENOVO ThinkPad T14s Gen 2a 20XF004WUS (R1NET57W 1.27 BIOS)AMD Renoir/Cezanne2 x 16GB LPDDR4-4266MT/s Micron MT53E2G32D4NQ-0461024GB SAMSUNG MZVLB1T0HBLR-000L7AMD Radeon Vega / Mobile 1GBAMD Renoir Radeon HD AudioRealtek RTL8111/8168/8411 + MEDIATEK MT7921 802.11ax PCIFedora Linux 396.5.8-300.fc39.x86_64 (x86_64)GNOME Shell 45.0X Server + Wayland4.6 Mesa 23.2.1 (LLVM 16.0.6 DRM 3.54)GCC 13.2.1 20230918btrfs3840x2160OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-redhat-linux --disable-libunwind-exceptions --enable-__cxa_atexit --enable-bootstrap --enable-cet --enable-checking=release --enable-gnu-indirect-function --enable-gnu-unique-object --enable-initfini-array --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,m2,lto --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-multilib --enable-offload-defaulted --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --mandir=/usr/share/man --with-arch_32=i686 --with-build-config=bootstrap-lto --with-gcc-major-version-only --with-libstdcxx-zoneinfo=/usr/share/zoneinfo --with-linker-hash-style=gnu --with-tune=generic --without-cuda-driver Processor Details- Scaling Governor: amd-pstate-epp powersave (EPP: performance) - Platform Profile: balanced - CPU Microcode: 0xa50000d - ACPI Profile: balanced Graphics Details- BAR1 / Visible vRAM Size: 1024 MBPython Details- Python 3.12.0Security Details- SELinux + gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

ldldonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardcompress-lz4: 3 - Decompression Speeddav1d: Chimera 1080p 10-bitcompress-lz4: 1 - Decompression Speedcompress-lz4: 1 - Compression Speedcompress-lz4: 3 - Compression Speeddav1d: Chimera 1080ponnx: super-resolution-10 - CPU - Standardcompress-lz4: 9 - Decompression Speeddav1d: Summer Nature 1080ponnx: CaffeNet 12-int8 - CPU - Standardoidn: RT.ldr_alb_nrm.3840x2160 - CPU-Onlyoidn: RT.hdr_alb_nrm.3840x2160 - CPU-Onlycompress-lz4: 9 - Compression Speedonnx: bertsquad-12 - CPU - Standardonnx: yolov4 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standarddav1d: Summer Nature 4Knamd: STMV with 1,066,628 Atomsonnx: fcn-resnet101-11 - CPU - Standardnamd: ATPase with 327,506 Atomsvkfft: FFT + iFFT R2C / C2Rvkfft: FFT + iFFT C2C 1D batched in double precisionvkfft: FFT + iFFT C2C Bluestein in single precisiongromacs: MPI CPU - water_GMX50_barevkfft: FFT + iFFT C2C multidimensional in single precisionvkfft: FFT + iFFT C2C Bluestein benchmark in double precisiononnx: ArcFace ResNet-100 - CPU - Standardonnx: GPT-2 - CPU - Standardvkfft: FFT + iFFT C2C 1D batched in half precisionvkfft: FFT + iFFT C2C 1D batched in single precision, no reshufflingvkfft: FFT + iFFT C2C 1D batched in single precisiononnx: T5 Encoder - CPU - Standardoidn: RTLightmap.hdr.4096x4096 - CPU-Onlyonnx: Faster R-CNN R-50-FPN-int8 - CPU - Standardonnx: super-resolution-10 - CPU - Standardonnx: ResNet50 v1-12-int8 - CPU - Standardonnx: ArcFace ResNet-100 - CPU - Standardonnx: fcn-resnet101-11 - CPU - Standardonnx: CaffeNet 12-int8 - CPU - Standardonnx: bertsquad-12 - CPU - Standardonnx: T5 Encoder - CPU - Standardonnx: yolov4 - CPU - Standardonnx: GPT-2 - CPU - Standardabc21.66434203.1306.074566.3779.42119.47373.1657.32814258.6529.59233.20.210.2139.067.802524.7298197.1324124.80.096580.7846970.320433267268315400.595330891217.539173.9314118946453612689.05170.1046.154717.440910.292757.01231274.374.28653128.15811.2255211.4213.51717.76543836.5279.414200.9724.19110.84346.7354.16414006.1498.53223.2420.200.2037.657.464834.5282093.2427119.840.092860.7548550.308293346261015560.583334492617.301473.1053120326525607888.73910.1056.366818.460110.722557.79651324.754.47781133.95611.2655220.83413.669817.66073895.6284.844218.9718.46110.79348.7653.91274027.6499.42221.4780.200.2037.327.480384.5374394.0366120.360.093140.7550650.309933392267115790.585336892817.263372.8414120456465614188.71590.1056.617318.54610.631657.92291324.384.51337133.67711.2683220.38413.7188OpenBenchmarking.org

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardabc510152025SE +/- 0.19, N = 1521.6617.7717.661. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

LZ4 Compression

Compression Level: 3 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 3 - Decompression Speedabc9001800270036004500SE +/- 19.26, N = 34203.13836.53895.61. (CC) gcc options: -O3

dav1d

Video Input: Chimera 1080p 10-bit

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Chimera 1080p 10-bitabc70140210280350SE +/- 2.30, N = 15306.07279.41284.841. (CC) gcc options: -pthread

LZ4 Compression

Compression Level: 1 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 1 - Decompression Speedabc10002000300040005000SE +/- 39.91, N = 34566.34200.94218.91. (CC) gcc options: -O3

LZ4 Compression

Compression Level: 1 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 1 - Compression Speedabc2004006008001000SE +/- 6.53, N = 3779.42724.19718.461. (CC) gcc options: -O3

LZ4 Compression

Compression Level: 3 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 3 - Compression Speedabc306090120150SE +/- 0.32, N = 3119.47110.84110.791. (CC) gcc options: -O3

dav1d

Video Input: Chimera 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Chimera 1080pabc80160240320400SE +/- 2.21, N = 3373.16346.73348.761. (CC) gcc options: -pthread

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardabc1326395265SE +/- 0.17, N = 357.3354.1653.911. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

LZ4 Compression

Compression Level: 9 - Decompression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 9 - Decompression Speedabc9001800270036004500SE +/- 6.24, N = 34258.64006.14027.61. (CC) gcc options: -O3

dav1d

Video Input: Summer Nature 1080p

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Summer Nature 1080pabc110220330440550SE +/- 4.79, N = 6529.59498.53499.421. (CC) gcc options: -pthread

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardabc50100150200250SE +/- 0.80, N = 3233.20223.24221.481. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Intel Open Image Denoise

Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Onlyabc0.04730.09460.14190.18920.2365SE +/- 0.00, N = 30.210.200.20

Intel Open Image Denoise

Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Onlyabc0.04730.09460.14190.18920.2365SE +/- 0.00, N = 30.210.200.20

LZ4 Compression

Compression Level: 9 - Compression Speed

OpenBenchmarking.orgMB/s, More Is BetterLZ4 Compression 1.9.4Compression Level: 9 - Compression Speedabc918273645SE +/- 0.11, N = 339.0637.6537.321. (CC) gcc options: -O3

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardabc246810SE +/- 0.00345, N = 37.802527.464837.480381. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardabc1.06422.12843.19264.25685.321SE +/- 0.00718, N = 34.729814.528204.537431. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardabc20406080100SE +/- 0.21, N = 397.1393.2494.041. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

dav1d

Video Input: Summer Nature 4K

OpenBenchmarking.orgFPS, More Is Betterdav1d 1.4Video Input: Summer Nature 4Kabc306090120150SE +/- 0.11, N = 3124.80119.84120.361. (CC) gcc options: -pthread

NAMD

Input: STMV with 1,066,628 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: STMV with 1,066,628 Atomsabc0.02170.04340.06510.08680.1085SE +/- 0.00005, N = 30.096580.092860.09314

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardabc0.17660.35320.52980.70640.883SE +/- 0.000621, N = 30.7846970.7548550.7550651. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

NAMD

Input: ATPase with 327,506 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0b6Input: ATPase with 327,506 Atomsabc0.07210.14420.21630.28840.3605SE +/- 0.00004, N = 30.320430.308290.30993

VkFFT

Test: FFT + iFFT R2C / C2R

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT R2C / C2Rabc7001400210028003500SE +/- 2.65, N = 33267334633921. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in double precisionabc6001200180024003000SE +/- 25.37, N = 62683261026711. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein in single precisionabc30060090012001500SE +/- 10.48, N = 31540155615791. (CXX) g++ options: -O3

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareabc0.13390.26780.40170.53560.6695SE +/- 0.001, N = 30.5950.5830.5851. (CXX) g++ options: -O3 -lm

VkFFT

Test: FFT + iFFT C2C multidimensional in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C multidimensional in single precisionabc7001400210028003500SE +/- 11.68, N = 33308334433681. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C Bluestein benchmark in double precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C Bluestein benchmark in double precisionabc2004006008001000SE +/- 7.54, N = 39129269281. (CXX) g++ options: -O3

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardabc48121620SE +/- 0.05, N = 317.5417.3017.261. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardabc1632486480SE +/- 0.04, N = 373.9373.1172.841. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

VkFFT

Test: FFT + iFFT C2C 1D batched in half precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in half precisionabc3K6K9K12K15KSE +/- 36.70, N = 31189412032120451. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precision, no reshufflingabc14002800420056007000SE +/- 17.74, N = 36453652564651. (CXX) g++ options: -O3

VkFFT

Test: FFT + iFFT C2C 1D batched in single precision

OpenBenchmarking.orgBenchmark Score, More Is BetterVkFFT 1.3.4Test: FFT + iFFT C2C 1D batched in single precisionabc13002600390052006500SE +/- 73.26, N = 36126607861411. (CXX) g++ options: -O3

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInferences Per Second, More Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardabc20406080100SE +/- 0.15, N = 389.0588.7488.721. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Intel Open Image Denoise

Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 2.2Run: RTLightmap.hdr.4096x4096 - Device: CPU-Onlyabc0.02250.0450.06750.090.1125SE +/- 0.00, N = 30.100.100.10

ONNX Runtime

Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standardabc1326395265SE +/- 0.55, N = 1546.1556.3756.621. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: super-resolution-10 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: super-resolution-10 - Device: CPU - Executor: Standardabc510152025SE +/- 0.06, N = 317.4418.4618.551. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standardabc3691215SE +/- 0.02, N = 310.2910.7210.631. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: ArcFace ResNet-100 - Device: CPU - Executor: Standardabc1326395265SE +/- 0.16, N = 357.0157.8057.921. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: fcn-resnet101-11 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: fcn-resnet101-11 - Device: CPU - Executor: Standardabc30060090012001500SE +/- 1.09, N = 31274.371324.751324.381. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: CaffeNet 12-int8 - Device: CPU - Executor: Standardabc1.01552.0313.04654.0625.0775SE +/- 0.01608, N = 34.286534.477814.513371. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: bertsquad-12 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: bertsquad-12 - Device: CPU - Executor: Standardabc306090120150SE +/- 0.06, N = 3128.16133.96133.681. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: T5 Encoder - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: T5 Encoder - Device: CPU - Executor: Standardabc3691215SE +/- 0.02, N = 311.2311.2711.271. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: yolov4 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: yolov4 - Device: CPU - Executor: Standardabc50100150200250SE +/- 0.35, N = 3211.42220.83220.381. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

ONNX Runtime

Model: GPT-2 - Device: CPU - Executor: Standard

OpenBenchmarking.orgInference Time Cost (ms), Fewer Is BetterONNX Runtime 1.17Model: GPT-2 - Device: CPU - Executor: Standardabc48121620SE +/- 0.01, N = 313.5213.6713.721. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt


Phoronix Test Suite v10.8.5