litert-onednn-xnnpack Benchmarks for a future article. 2 x AMD EPYC 9575F 64-Core testing with a AMD VOLCANO (RVOT1000D BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite. a: Processor: 2 x AMD EPYC 9575F 64-Core @ 5.01GHz (128 Cores / 256 Threads), Motherboard: AMD VOLCANO (RVOT1000D BIOS), Chipset: AMD Device 153a, Memory: 1520GB, Disk: 2 x 3841GB SAMSUNG MZWLO3T8HCLS-00A07, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 24.04, Kernel: 6.8.12-powercap-1ah-patched (x86_64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 aa: Processor: 2 x AMD EPYC 9575F 64-Core @ 5.01GHz (128 Cores / 256 Threads), Motherboard: AMD VOLCANO (RVOT1000D BIOS), Chipset: AMD Device 153a, Memory: 1520GB, Disk: 2 x 3841GB SAMSUNG MZWLO3T8HCLS-00A07, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 24.04, Kernel: 6.8.12-powercap-1ah-patched (x86_64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 b: Processor: 2 x AMD EPYC 9575F 64-Core @ 5.01GHz (128 Cores / 256 Threads), Motherboard: AMD VOLCANO (RVOT1000D BIOS), Chipset: AMD Device 153a, Memory: 1520GB, Disk: 2 x 3841GB SAMSUNG MZWLO3T8HCLS-00A07, Graphics: ASPEED, Network: Broadcom NetXtreme BCM5720 PCIe OS: Ubuntu 24.04, Kernel: 6.8.12-powercap-1ah-patched (x86_64), Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 LiteRT 2024-10-15 Model: Inception V4 Microseconds < Lower Is Better a .. 65970.9 |=============================================================== aa . 68014.0 |================================================================= b .. 66603.9 |================================================================ LiteRT 2024-10-15 Model: Mobilenet Float Microseconds < Lower Is Better a .. 5922.59 |============================================================== aa . 5619.35 |=========================================================== b .. 6222.55 |================================================================= LiteRT 2024-10-15 Model: DeepLab V3 Microseconds < Lower Is Better a .. 28110.1 |================================================================= aa . 23838.8 |======================================================= b .. 22624.7 |==================================================== LiteRT 2024-10-15 Model: NASNet Mobile Microseconds < Lower Is Better a .. 180002 |========= aa . 1338560 |================================================================= b .. 190987 |========= LiteRT 2024-10-15 Model: Mobilenet Quant Microseconds < Lower Is Better aa . 82918.4 |================================================================= b .. 36194.8 |============================ LiteRT 2024-10-15 Model: Inception ResNet V2 Microseconds < Lower Is Better aa . 132102.0 |================================================================ b .. 90362.0 |============================================ LiteRT 2024-10-15 Model: SqueezeNet Microseconds < Lower Is Better a .. 9607.40 |================================================================ aa . 9561.25 |================================================================ b .. 9771.82 |================================================================= XNNPACK b7b048 Model: QS8MobileNetV2 us < Lower Is Better aa . 17257 |============================================== b .. 25089 |=================================================================== XNNPACK b7b048 Model: FP16MobileNetV3Small us < Lower Is Better aa . 25785 |=================================================================== b .. 17479 |============================================= XNNPACK b7b048 Model: FP16MobileNetV3Large us < Lower Is Better aa . 27319 |=================================================================== b .. 27079 |================================================================== XNNPACK b7b048 Model: FP16MobileNetV2 us < Lower Is Better aa . 18525 |=================================================================== b .. 14703 |===================================================== XNNPACK b7b048 Model: FP16MobileNetV1 us < Lower Is Better aa . 7483 |================================================ b .. 10504 |=================================================================== XNNPACK b7b048 Model: FP32MobileNetV3Small us < Lower Is Better aa . 21224 |=================================================================== b .. 17188 |====================================================== XNNPACK b7b048 Model: FP32MobileNetV3Large us < Lower Is Better aa . 22775 |=================================================================== b .. 22324 |================================================================== XNNPACK b7b048 Model: FP32MobileNetV2 us < Lower Is Better aa . 16796 |=================================================================== b .. 14325 |========================================================= XNNPACK b7b048 Model: FP32MobileNetV1 us < Lower Is Better aa . 7130 |=============================================================== b .. 7699 |==================================================================== oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU ms < Lower Is Better a .. 417.02 |================================================================== aa . 418.84 |================================================================== b .. 415.62 |================================================================= oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU ms < Lower Is Better a .. 387.46 |================================================================= aa . 395.49 |================================================================== b .. 389.99 |================================================================= LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 Microseconds < Lower Is Better aa . 10521.1 |================================================================= b .. 10308.1 |================================================================ oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU ms < Lower Is Better a .. 17.82 |=================================================================== aa . 15.39 |========================================================== b .. 15.75 |=========================================================== oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU ms < Lower Is Better a .. 0.631954 |================================================================ aa . 0.632095 |================================================================ b .. 0.634881 |================================================================ oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU ms < Lower Is Better a .. 0.451791 |=============================================================== aa . 0.458506 |================================================================ b .. 0.453833 |=============================================================== oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU ms < Lower Is Better a .. 0.258396 |================================================================ aa . 0.259777 |================================================================ b .. 0.257735 |=============================================================== oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU ms < Lower Is Better a .. 0.429801 |================================================================ aa . 0.425348 |=============================================================== b .. 0.430174 |================================================================