dnn Benchmarks for a future article. AMD Ryzen AI 9 HX 370 testing with a ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) and llvmpipe on Ubuntu 24.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2410169-NE-DNN04230192&grs .
dnn Processor Motherboard Chipset Memory Disk Graphics Audio Network OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b AMD Ryzen AI 9 HX 370 @ 4.37GHz (12 Cores / 24 Threads) ASUS Zenbook S 16 UM5606WA_UM5606WA UM5606WA v1.0 (UM5606WA.308 BIOS) AMD Device 1507 4 x 8GB LPDDR5-7500MT/s Samsung K3KL9L90CM-MGCT 1024GB MTFDKBA1T0QFM-1BD1AABGB llvmpipe AMD Rembrandt Radeon HD Audio MEDIATEK Device 7925 Ubuntu 24.10 6.11.0-rc6-phx (x86_64) GNOME Shell 47.0 X Server + Wayland 4.5 Mesa 24.2.3-1ubuntu1 (LLVM 19.1.0 256 bits) GCC 14.2.0 ext4 2880x1800 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-14-zdkDXv/gcc-14-14.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (Boost: Enabled EPP: balance_performance) - Platform Profile: balanced - CPU Microcode: 0xb204011 - ACPI Profile: balanced Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
dnn onednn: Recurrent Neural Network Inference - CPU litert: DeepLab V3 litert: Mobilenet Quant litert: Inception ResNet V2 litert: Quantized COCO SSD MobileNet v1 onednn: IP Shapes 1D - CPU xnnpack: QS8MobileNetV2 xnnpack: FP32MobileNetV3Large litert: Inception V4 xnnpack: FP16MobileNetV3Large onednn: Recurrent Neural Network Training - CPU xnnpack: FP32MobileNetV3Small xnnpack: FP16MobileNetV3Small xnnpack: FP32MobileNetV1 xnnpack: FP32MobileNetV2 litert: NASNet Mobile onednn: Deconvolution Batch shapes_1d - CPU onednn: Convolution Batch Shapes Auto - CPU xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV1 litert: Mobilenet Float onednn: IP Shapes 3D - CPU litert: SqueezeNet onednn: Deconvolution Batch shapes_3d - CPU a b 2194.27 4268.97 1976.79 39825.1 2844.21 2.76848 1160 2267 49706.2 2583 3085.55 1103 1267 2387 1932 12624.7 5.20977 8.4918 2368 3214 2260.57 3.56923 3784.36 6.37788 1616.68 3750.1 1801.7 37957.1 2983.71 2.68462 1129 2215 50678.3 2536 3030.55 1084 1248 2353 1908 12537.6 5.17517 8.43828 2358 3203 2266.86 3.57302 3787.29 6.37874 OpenBenchmarking.org
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU a b 500 1000 1500 2000 2500 2194.27 1616.68 MIN: 2132.82 MIN: 1589 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 a b 900 1800 2700 3600 4500 4268.97 3750.10
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant a b 400 800 1200 1600 2000 1976.79 1801.70
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 a b 9K 18K 27K 36K 45K 39825.1 37957.1
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 a b 600 1200 1800 2400 3000 2844.21 2983.71
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU a b 0.6229 1.2458 1.8687 2.4916 3.1145 2.76848 2.68462 MIN: 2.35 MIN: 2.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 a b 200 400 600 800 1000 1160 1129 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large a b 500 1000 1500 2000 2500 2267 2215 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 a b 11K 22K 33K 44K 55K 49706.2 50678.3
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large a b 600 1200 1800 2400 3000 2583 2536 1. (CXX) g++ options: -O3 -lrt -lm
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU a b 700 1400 2100 2800 3500 3085.55 3030.55 MIN: 3048.43 MIN: 3004.46 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small a b 200 400 600 800 1000 1103 1084 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small a b 300 600 900 1200 1500 1267 1248 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 a b 500 1000 1500 2000 2500 2387 2353 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 a b 400 800 1200 1600 2000 1932 1908 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile a b 3K 6K 9K 12K 15K 12624.7 12537.6
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU a b 1.1722 2.3444 3.5166 4.6888 5.861 5.20977 5.17517 MIN: 4.16 MIN: 4.43 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU a b 2 4 6 8 10 8.49180 8.43828 MIN: 8.23 MIN: 8.21 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 a b 500 1000 1500 2000 2500 2368 2358 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 a b 700 1400 2100 2800 3500 3214 3203 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float a b 500 1000 1500 2000 2500 2260.57 2266.86
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU a b 0.8039 1.6078 2.4117 3.2156 4.0195 3.56923 3.57302 MIN: 3.5 MIN: 3.49 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet a b 800 1600 2400 3200 4000 3784.36 3787.29
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU a b 2 4 6 8 10 6.37788 6.37874 MIN: 5.6 MIN: 5.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Phoronix Test Suite v10.8.5