I'm trying to use SSD MobileNet V2 FPNLite 320x320(models/tf2_detection_zoo.md at master · tensorflow/models · GitHub) on IMX8MP EVK and facing NNAPI delegate trouble.
Here is benchmark result.
It takes around 130 ms and this result is quite slower than SSD MobileNet V2 w/o FPN (25ms)
NNAPI delegate seems to be divided to 3 parts by PACK(nearest_neighbor_upsampling/stack) CPU execution.
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
QUANTIZE 0.000 0.637 0.614 0.484% 0.484% 0.000 1 [tfl.quantize]:0
TfLiteNnapiDelegate 0.614 21.112 21.080 16.626% 17.110% 0.000 1 …
PACK 21.695 0.022 0.022 0.018% 17.127% 0.000 1 [ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling/stack]:70
PACK 21.718 0.025 0.026 0.021% 17.148% 0.000 1 [ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling/stack_1]:71
TfLiteNnapiDelegate 21.745 6.269 6.243 4.924% 22.072% 0.000 1 …
PACK 27.990 0.054 0.053 0.042% 22.115% 0.000 1 [ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling_1/stack]:80
PACK 28.044 0.142 0.087 0.069% 22.183% 0.000 1 [ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling_1/stack_1]:81
TfLiteNnapiDelegate 28.131 30.216 29.673 23.404% 45.587% 0.000 1 [Squeeze1, convert_scores]:111
DEQUANTIZE 57.806 0.190 0.194 0.153% 45.740% 0.000 1 [Squeeze11]:102
DEQUANTIZE 58.000 4.820 4.055 3.198% 48.938% 0.000 1 [convert_scores1]:107
TFLite_Detection_PostProcess 62.057 66.738 64.740 51.062% 100.000% 0.000 1 [StatefulPartitionedCall:31, StatefulPartitionedCall:32, StatefulPartitionedCall:33, StatefulPartitionedCall:34]:108
=====================================================================
I found the VSI_NN_OP_UPSAMPLE support description on i.MX Machine Learning User's Guide (nxp.com) P115.
So I'm wondering why PACK(nearest_neighbor_upsampling/stack) cannot ba executed on NPU.
I'd like to grasp the countermeasure to accelerate SSD MobileNet V2 FPNLite with NNAPI.
Hello,
I faced similar issue with a UNET based segmentation model. For some reasons, upsampling layer with "nearest" interpolation (RESIZE_NEAREST_NEIGHBOR) does not work on NPU, however it works on CPU with XNN pack. I changed interpolation to "bilinear" (RESIZE_BILINEAR) and the model works fine on NPU (VX delegate) and CPU (XNN pack). Tried to re-create the problem (and solution) here: https://github.com/waseemh40/upsampling_segmentation_demo_imx8mp
-Waseem