How to use the NPU delegate for running inference from Python

amrithkrish · ‎11-17-2023

Hello team,

I have a custom TfLite model for object detection. I want to run the inference using the same from imx8MPlus board.

The Python script I have written is performing the inference with the default delegate - "XNNPACK delegate for CPU."

I wish to use NPU for running the inference from the board. I tried changing the delegate to libvx_delegate in the tflite.Interpreter in my script as shown :

 delegate = tflite.load_delegate('/usr/lib/libvx_delegate.so')
    ModelInterpreter = tflite.Interpreter(model_path=ModelPath,experimental_delegates=[delegate])

However, when I printed the delegate I used, it is showing as :

Delegate used :  <tflite_runtime.interpreter.Delegate object at 0xffff70472390>

What should I change/ add in my Python script so that my script will use NPU for inference?

Thanks in advance!

amrithkrish · ‎11-30-2023

Hi Brian,

I checked the code that was shared and had implemented a Python script similar to it. However my custom model is taking around 35 seconds for performing one inference when libvx_delegate.so was used. (I had ignored the warm- up time).

So my questions are :

1. Is this expected?

2. Does the model have any influence in the inference time taken ? (Like the model size or something?)

3. Can this time be reduced if NNStreamer is used instead of running it from Python?

Thanks in advance!

brian14 · ‎12-01-2023

Thank you for your reply.

1. Is this expected?

It depends on your model but with 35 seconds for inference this is a really low performance.

2. Does the model have any influence in the inference time taken? (Like the model size or something?)

Yes, on your implementation the model takes most of the time.

3. Can this time be reduced if NNStreamer is used instead of running it from Python?

It seems that the problem is on your model optimization. I suggest you review your model in detail profiling and running benchmarks for your model.
Please have a look on the iMX Machine Learning User's Guide, specifically on section 9 NN Execution on Hardware Accelerators:
i.MX Machine Learning User's Guide (nxp.com)

Have a great day!

amrithkrish · ‎12-06-2023

Hi @brian14

I had one more doubt. When I tried running my custom yolov3 model using the inference script you had mentioned before, I'm getting - Segmentation error", but the script runs fine with other models.

Why is this error produced for this particular model? Can the size of the model be a factor for this?

Thanks in advance!

brian14 · ‎12-11-2023

Hi @amrithkrish,

Based on your error, it is possible that this error is related to the model size.

amrithkrish · ‎12-11-2023

Thank you for the information.

brian14 · ‎11-22-2023

Hi @amrithkrish,

Thank you for contacting NXP Support!

I have reviewed your Python code, and it seems that you are correctly loading the external delegate.
However, I'm not sure why are you trying to print the delegate.

If you want to obtain the output from your model using the NPU you will need to implement a code using the following example:

tflite-vx-delegate-imx/examples/python/label_image.py at lf-6.1.36_2.1.0 · nxp-imx/tflite-vx-delegat...

I hope this information will be helpful.

Have a great day!

amrithkrish · ‎11-26-2023

Hi Brian,

Thank you for your support.

I had printed the delegate to check if it is showing as External (as expected when using the NPU).

I shall look into the code you had shared.

Thanks once again!