IMX8MP EVK / tflite PACK(upsampling) NNAPI delegate

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

IMX8MP EVK / tflite PACK(upsampling) NNAPI delegate

774 Views
Kei-Ueda
Contributor I

I'm trying to use SSD MobileNet V2 FPNLite 320x320(models/tf2_detection_zoo.md at master · tensorflow/models · GitHub) on IMX8MP EVK and facing NNAPI delegate trouble.

Here is benchmark result.

It takes around 130 ms and this result is quite slower than SSD MobileNet V2 w/o FPN (25ms)

NNAPI delegate seems to be divided to 3 parts by PACK(nearest_neighbor_upsampling/stack) CPU execution. 

============================== Run Order ==============================

                     [node type]                  [start]         [first]        [avg ms]            [%]          [cdf%]          [mem KB]      [times called]  [Name]

                        QUANTIZE                    0.000           0.637           0.614         0.484%          0.484%             0.000              1       [tfl.quantize]:0

             TfLiteNnapiDelegate                    0.614          21.112          21.080        16.626%         17.110%             0.000              1       …

                            PACK                   21.695           0.022           0.022         0.018%         17.127%             0.000              1       [ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling/stack]:70

                            PACK                   21.718           0.025           0.026         0.021%         17.148%             0.000              1       [ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling/stack_1]:71

             TfLiteNnapiDelegate                   21.745           6.269           6.243         4.924%         22.072%             0.000              1       …

                            PACK                   27.990           0.054           0.053         0.042%         22.115%             0.000              1       [ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling_1/stack]:80

                            PACK                   28.044           0.142           0.087         0.069%         22.183%             0.000              1       [ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling_1/stack_1]:81

             TfLiteNnapiDelegate                   28.131          30.216          29.673        23.404%         45.587%             0.000              1       [Squeeze1, convert_scores]:111

                      DEQUANTIZE                   57.806           0.190           0.194         0.153%         45.740%             0.000              1       [Squeeze11]:102

                      DEQUANTIZE                   58.000           4.820           4.055         3.198%         48.938%             0.000              1       [convert_scores1]:107

        TFLite_Detection_PostProcess               62.057          66.738          64.740        51.062%        100.000%             0.000              1       [StatefulPartitionedCall:31, StatefulPartitionedCall:32, StatefulPartitionedCall:33, StatefulPartitionedCall:34]:108

=====================================================================

I found the VSI_NN_OP_UPSAMPLE support description on i.MX Machine Learning User's Guide (nxp.com) P115.

So I'm wondering why PACK(nearest_neighbor_upsampling/stack) cannot ba executed on NPU.

I'd like to grasp the countermeasure to accelerate SSD MobileNet V2 FPNLite with NNAPI.

 

0 Kudos
Reply
2 Replies

657 Views
waseem
Contributor I

Hello,

I faced similar issue with a UNET based segmentation model. For some reasons, upsampling layer with "nearest" interpolation (RESIZE_NEAREST_NEIGHBOR) does not work on NPU, however it works on CPU with XNN pack. I changed interpolation to "bilinear" (RESIZE_BILINEAR) and the model works fine on NPU (VX delegate) and CPU (XNN pack). Tried to re-create the problem (and solution) here: https://github.com/waseemh40/upsampling_segmentation_demo_imx8mp

 

-Waseem

0 Kudos
Reply

491 Views
Kei-Ueda
Contributor I

Hello,

I apologize for the delay in replying.
I had also found and resolved this point, but your information gives me more confidence.
Thank you for your useful comment.

-K.Ueda

 

0 Kudos
Reply