litert-onednn-xnnpack Benchmarks for a future article. 2 x AMD EPYC 9575F 64-Core testing with a AMD VOLCANO (RVOT1000D BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2410162-NE-LITERTONE15&rdt&grw .
litert-onednn-xnnpack Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution a aa b 2 x AMD EPYC 9575F 64-Core @ 5.01GHz (128 Cores / 256 Threads) AMD VOLCANO (RVOT1000D BIOS) AMD Device 153a 1520GB 2 x 3841GB SAMSUNG MZWLO3T8HCLS-00A07 ASPEED Broadcom NetXtreme BCM5720 PCIe Ubuntu 24.04 6.8.12-powercap-1ah-patched (x86_64) GCC 13.2.0 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - a: Scaling Governor: amd-pstate-epp powersave (EPP: power) - CPU Microcode: 0xb002110 - aa: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xb002110 - b: Scaling Governor: amd-pstate-epp performance (EPP: performance) - CPU Microcode: 0xb002110 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
litert-onednn-xnnpack litert: DeepLab V3 litert: SqueezeNet litert: Inception V4 litert: NASNet Mobile litert: Mobilenet Float litert: Mobilenet Quant litert: Inception ResNet V2 litert: Quantized COCO SSD MobileNet v1 xnnpack: FP32MobileNetV1 xnnpack: FP32MobileNetV2 xnnpack: FP32MobileNetV3Large xnnpack: FP32MobileNetV3Small xnnpack: FP16MobileNetV1 xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV3Small xnnpack: QS8MobileNetV2 onednn: IP Shapes 1D - CPU onednn: IP Shapes 3D - CPU onednn: Convolution Batch Shapes Auto - CPU onednn: Deconvolution Batch shapes_1d - CPU onednn: Deconvolution Batch shapes_3d - CPU onednn: Recurrent Neural Network Training - CPU onednn: Recurrent Neural Network Inference - CPU a aa b 28110.1 9607.40 65970.9 180002 5922.59 0.631954 0.451791 0.258396 17.8179 0.429801 417.017 387.459 23838.8 9561.25 68014 1338560 5619.35 82918.4 132102 10521.1 7130 16796 22775 21224 7483 18525 27319 25785 17257 0.632095 0.458506 0.259777 15.3918 0.425348 418.838 395.488 22624.7 9771.82 66603.9 190987 6222.55 36194.8 90362.0 10308.1 7699 14325 22324 17188 10504 14703 27079 17479 25089 0.634881 0.453833 0.257735 15.7465 0.430174 415.615 389.988 OpenBenchmarking.org
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 a aa b 6K 12K 18K 24K 30K SE +/- 3193.55, N = 12 SE +/- 1635.57, N = 15 28110.1 23838.8 22624.7
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet a aa b 2K 4K 6K 8K 10K SE +/- 109.45, N = 15 SE +/- 126.71, N = 3 9607.40 9561.25 9771.82
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 a aa b 15K 30K 45K 60K 75K SE +/- 5520.67, N = 15 SE +/- 746.34, N = 15 65970.9 68014.0 66603.9
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile a aa b 300K 600K 900K 1200K 1500K SE +/- 16545.28, N = 12 SE +/- 16019.04, N = 12 180002 1338560 190987
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float a aa b 1300 2600 3900 5200 6500 SE +/- 119.72, N = 13 SE +/- 104.76, N = 15 5922.59 5619.35 6222.55
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant aa b 20K 40K 60K 80K 100K SE +/- 5253.23, N = 15 82918.4 36194.8
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 aa b 30K 60K 90K 120K 150K SE +/- 2643.69, N = 12 132102.0 90362.0
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 aa b 2K 4K 6K 8K 10K SE +/- 144.17, N = 3 10521.1 10308.1
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 aa b 1600 3200 4800 6400 8000 7130 7699 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 aa b 4K 8K 12K 16K 20K 16796 14325 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large aa b 5K 10K 15K 20K 25K 22775 22324 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small aa b 5K 10K 15K 20K 25K 21224 17188 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 aa b 2K 4K 6K 8K 10K 7483 10504 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 aa b 4K 8K 12K 16K 20K 18525 14703 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large aa b 6K 12K 18K 24K 30K 27319 27079 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small aa b 6K 12K 18K 24K 30K 25785 17479 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 aa b 5K 10K 15K 20K 25K 17257 25089 1. (CXX) g++ options: -O3 -lrt -lm
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU a aa b 0.1428 0.2856 0.4284 0.5712 0.714 SE +/- 0.001909, N = 3 SE +/- 0.002596, N = 3 0.631954 0.632095 0.634881 MIN: 0.57 MIN: 0.57 MIN: 0.57 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU a aa b 0.1032 0.2064 0.3096 0.4128 0.516 SE +/- 0.002024, N = 3 SE +/- 0.000988, N = 3 0.451791 0.458506 0.453833 MIN: 0.39 MIN: 0.4 MIN: 0.4 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU a aa b 0.0584 0.1168 0.1752 0.2336 0.292 SE +/- 0.000187, N = 3 SE +/- 0.000249, N = 3 0.258396 0.259777 0.257735 MIN: 0.24 MIN: 0.25 MIN: 0.25 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU a aa b 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.07, N = 3 17.82 15.39 15.75 MIN: 14.97 MIN: 13.31 MIN: 13.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU a aa b 0.0968 0.1936 0.2904 0.3872 0.484 SE +/- 0.004175, N = 3 SE +/- 0.005431, N = 3 0.429801 0.425348 0.430174 MIN: 0.41 MIN: 0.41 MIN: 0.41 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU a aa b 90 180 270 360 450 SE +/- 1.82, N = 3 SE +/- 1.16, N = 3 417.02 418.84 415.62 MIN: 406.83 MIN: 408.29 MIN: 408.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU a aa b 90 180 270 360 450 SE +/- 2.03, N = 3 SE +/- 3.25, N = 3 387.46 395.49 389.99 MIN: 376.31 MIN: 382.73 MIN: 376.48 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
Phoronix Test Suite v10.8.5