How to use the NPU delegate for running inference from Python

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to use the NPU delegate for running inference from Python

1,420 Views
amrithkrish
Contributor I

Hello team,

I have a custom TfLite model for object detection. I want to run the inference using the same from imx8MPlus board. 

The Python script I have written is performing the inference with the default delegate - "XNNPACK delegate for CPU."

I wish to use NPU for running the inference from the board. I tried changing the delegate to libvx_delegate in the tflite.Interpreter in my script as shown :

 

 delegate = tflite.load_delegate('/usr/lib/libvx_delegate.so')
    ModelInterpreter = tflite.Interpreter(model_path=ModelPath,experimental_delegates=[delegate])

 

However, when I printed the delegate I used, it is showing as :

 

Delegate used :  <tflite_runtime.interpreter.Delegate object at 0xffff70472390>

 

 

What should I change/ add in my Python script so that my script will use NPU for inference?

Thanks in advance!

0 Kudos
7 Replies

1,330 Views
amrithkrish
Contributor I

Hi Brian,

I checked the code that was shared and had implemented a Python script similar to it. However my custom model is taking around 35 seconds for performing one inference when libvx_delegate.so was used. (I had ignored the warm- up time).

So my questions are :

1. Is this expected?

2. Does the model have any influence in the inference time taken ? (Like the model size or something?)

3. Can this time be reduced if NNStreamer is used instead of running it from Python?

 

Thanks in advance!

0 Kudos

1,318 Views
brian14
NXP TechSupport
NXP TechSupport

Thank you for your reply.

1. Is this expected?

It depends on your model but with 35 seconds for inference this is a really low performance.

2. Does the model have any influence in the inference time taken? (Like the model size or something?)

Yes, on your implementation the model takes most of the time.

3. Can this time be reduced if NNStreamer is used instead of running it from Python?

It seems that the problem is on your model optimization. I suggest you review your model in detail profiling and running benchmarks for your model.
Please have a look on the iMX Machine Learning User's Guide, specifically on section 9 NN Execution on Hardware Accelerators:
i.MX Machine Learning User's Guide (nxp.com)

Have a great day!

0 Kudos

1,274 Views
amrithkrish
Contributor I

Hi @brian14 

I had one more doubt. When I tried running my custom yolov3 model using the inference script you had mentioned before, I'm getting - Segmentation error", but the script runs fine with other models. 

Why is this error produced for this particular model? Can the size of the model be a factor for this?

 

Thanks in advance!

 

0 Kudos

1,204 Views
brian14
NXP TechSupport
NXP TechSupport

Hi @amrithkrish,

Based on your error, it is possible that this error is related to the model size.

 

0 Kudos

1,186 Views
amrithkrish
Contributor I

Thank you for the information.

 

 

0 Kudos

1,395 Views
brian14
NXP TechSupport
NXP TechSupport

Hi @amrithkrish

Thank you for contacting NXP Support!

I have reviewed your Python code, and it seems that you are correctly loading the external delegate.
However, I'm not sure why are you trying to print the delegate.

If you want to obtain the output from your model using the NPU you will need to implement a code using the following example:

tflite-vx-delegate-imx/examples/python/label_image.py at lf-6.1.36_2.1.0 · nxp-imx/tflite-vx-delegat...

I hope this information will be helpful.

Have a great day!

0 Kudos

1,370 Views
amrithkrish
Contributor I

Hi Brian,

Thank you for your support.

I had printed the delegate to check if it is showing as External (as expected when using the NPU).

I shall look into the code you had shared.

Thanks once again!

 

0 Kudos