litert xnnpack AMD Ryzen Threadripper 3990X 64-Core testing with a Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) and AMD Radeon RX 5700 8GB on Pop 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2410157-PTS-LITERTXN77&grr&rdt .
litert xnnpack Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b d AMD Ryzen Threadripper 3990X 64-Core @ 2.90GHz (64 Cores / 128 Threads) Gigabyte TRX40 AORUS PRO WIFI (F6 BIOS) AMD Starship/Matisse 4 x 32GB DDR4-3000MT/s CMK64GX4M2D3000C16 Samsung SSD 970 EVO Plus 500GB AMD Radeon RX 5700 8GB AMD Navi 10 HDMI Audio DELL P2415Q Intel I211 + Intel Wi-Fi 6 AX200 Pop 22.04 6.8.0-76060800daily20240311-generic (x86_64) GNOME Shell 42.5 X Server 1.21.1.4 4.6 Mesa 24.0.3-1pop1~1711635559~22.04~7a9f319 (LLVM 15.0.7 DRM 3.57) 1.3.274 GCC 11.4.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830107a Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_rstack_overflow: Mitigation of Safe RET + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
litert xnnpack xnnpack: QS8MobileNetV2 xnnpack: FP16MobileNetV3Small xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Small xnnpack: FP32MobileNetV3Large xnnpack: FP32MobileNetV2 xnnpack: FP32MobileNetV1 litert: Mobilenet Quant litert: SqueezeNet litert: Inception V4 litert: DeepLab V3 litert: Quantized COCO SSD MobileNet v1 litert: Inception ResNet V2 litert: NASNet Mobile litert: Mobilenet Float a b d 5696 6357 9055 5908 3621 5398 9298 6964 4280 3191.88 6423.64 56528.6 10506.6 5332.12 59856.0 54458.3 4243.70 5691 5842 7559 5983 3649 5543 9174 6934 4238 3089.77 6349.14 57577.7 10343.9 5384.80 58681.4 54646.9 4187.56 6038 6057 7944 6324 3825 7369 11231 6889 4243 3084.06 6274.44 56815.6 10484.8 5207.07 59960.4 53315.5 4173.74 OpenBenchmarking.org
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 a b d 1300 2600 3900 5200 6500 SE +/- 22.24, N = 4 SE +/- 27.75, N = 3 SE +/- 292.13, N = 3 5696 5691 6038 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small a b d 1400 2800 4200 5600 7000 SE +/- 506.60, N = 4 SE +/- 43.47, N = 3 SE +/- 252.57, N = 3 6357 5842 6057 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large a b d 2K 4K 6K 8K 10K SE +/- 1200.15, N = 4 SE +/- 47.34, N = 3 SE +/- 426.05, N = 3 9055 7559 7944 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 a b d 1400 2800 4200 5600 7000 SE +/- 39.38, N = 4 SE +/- 27.67, N = 3 SE +/- 312.09, N = 3 5908 5983 6324 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 a b d 800 1600 2400 3200 4000 SE +/- 16.80, N = 4 SE +/- 11.67, N = 3 SE +/- 169.19, N = 3 3621 3649 3825 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small a b d 1600 3200 4800 6400 8000 SE +/- 47.52, N = 4 SE +/- 121.75, N = 3 SE +/- 1111.98, N = 3 5398 5543 7369 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large a b d 2K 4K 6K 8K 10K SE +/- 69.99, N = 4 SE +/- 54.24, N = 3 SE +/- 1046.96, N = 3 9298 9174 11231 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 a b d 1500 3000 4500 6000 7500 SE +/- 15.96, N = 4 SE +/- 41.70, N = 3 SE +/- 15.63, N = 3 6964 6934 6889 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 a b d 900 1800 2700 3600 4500 SE +/- 46.30, N = 4 SE +/- 34.33, N = 3 SE +/- 41.77, N = 3 4280 4238 4243 1. (CXX) g++ options: -O3 -lrt -lm
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant a b d 700 1400 2100 2800 3500 SE +/- 108.69, N = 15 SE +/- 81.43, N = 12 SE +/- 99.43, N = 12 3191.88 3089.77 3084.06
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet a b d 1400 2800 4200 5600 7000 SE +/- 148.43, N = 15 SE +/- 97.62, N = 15 SE +/- 24.67, N = 3 6423.64 6349.14 6274.44
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 a b d 12K 24K 36K 48K 60K SE +/- 639.55, N = 3 SE +/- 670.44, N = 15 SE +/- 464.00, N = 3 56528.6 57577.7 56815.6
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 a b d 2K 4K 6K 8K 10K SE +/- 163.08, N = 15 SE +/- 84.63, N = 3 SE +/- 84.63, N = 3 10506.6 10343.9 10484.8
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 a b d 1200 2400 3600 4800 6000 SE +/- 47.15, N = 15 SE +/- 72.48, N = 3 SE +/- 72.45, N = 3 5332.12 5384.80 5207.07
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 a b d 13K 26K 39K 52K 65K SE +/- 665.58, N = 3 SE +/- 317.66, N = 3 SE +/- 319.38, N = 3 59856.0 58681.4 59960.4
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile a b d 12K 24K 36K 48K 60K SE +/- 347.90, N = 3 SE +/- 560.60, N = 3 SE +/- 390.67, N = 3 54458.3 54646.9 53315.5
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float a b d 900 1800 2700 3600 4500 SE +/- 22.62, N = 3 SE +/- 24.52, N = 3 SE +/- 6.16, N = 3 4243.70 4187.56 4173.74
Phoronix Test Suite v10.8.5