Issue with TensorFlow Lite and NPU on iMX93 Module Board

esmamirhan · ‎12-27-2024

Hello,

I am using iMX93 with SDK Linux_6.1.55_2.2.0.

I am encountering an issue while running a TensorFlow Lite object detection model on a custom board with the iMX93 module. When I attempt to start inference with this model, iMX93 gets stuck.

Additionally, I have observed that the device consistently gets stuck whenever I access the NPU, even when using the inference_runner and interpreter_runner utilities provided by NXP.

Do you have any ideas or suggestions on what could be causing this problem?

Example usage:

interpreter = tflite.Interpreter(model_path="model.tflite",
experimental_delegates=[tflite.load_delegate("/usr/lib/libethosu_delegate.so")])
interpreter.allocate_tensors()

tflite.Interpreter works well, but allocate_tensors() makes the cpu gets stuck.

Thank you in advance!

Bio_TICFSL · ‎01-02-2025

Hello,

Please send you test code to check it.

Regards

esmamirhan · ‎01-09-2025

Hello,

Thank you for your reply! We have successfully resolved the model loading issue by fixing ethosu shared memory in Kernel, and the device is no longer crashing. However, we are now encountering a different issue with the NPU on our custom board, and we would appreciate your assistance.

The NPU is unable to run inference correctly. For example, the following test code works as expected on the i.MX 93 EVK, producing the output:

Test code:

import tflite_runtime.interpreter as tflite
import numpy as np

def run_inference(interpreter, image):
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_scale, input_zero_point = input_details[0]["quantization"]
image = image / input_scale + input_zero_point
image = image.astype(np.int8)

interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()

output = interpreter.get_tensor(output_details[0]['index'])
return output

interpreter = tflite.Interpreter(model_path="TFLITE_MODEL_PATH",
experimental_delegates=[tflite.load_delegate("/usr/lib/libethosu_delegate.so")])

interpreter.allocate_tensors()

for i in range(10):
image = np.zeros((416, 416, 3), dtype="uint8")
image = np.expand_dims(image, axis=0)
print(f"Running for {i}'th time")
output = run_inference(interpreter, image)

Output on i.MX EVK:

root@imx93evk:/# python3 demo.py
INFO: Ethosu delegate: device_name set to /dev/ethosu0.
INFO: Ethosu delegate: cache_file_path set to .
INFO: Ethosu delegate: timeout set to 60000000000.
INFO: Ethosu delegate: enable_cycle_counter set to 0.
INFO: Ethosu delegate: enable_profiling set to 0.
INFO: Ethosu delegate: profiling_buffer_size set to 2048.
INFO: Ethosu delegate: pmu_event0 set to 0.
INFO: Ethosu delegate: pmu_event1 set to 0.
INFO: Ethosu delegate: pmu_event2 set to 0.
INFO: Ethosu delegate: pmu_event3 set to 0.
INFO: EthosuDelegate: 1 nodes delegated out of 3 nodes with 1 partitions.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Running for 0'th time
Running for 1'th time
Running for 2'th time
Running for 3'th time
Running for 4'th time
Running for 5'th time
Running for 6'th time
Running for 7'th time
Running for 8'th time
Running for 9'th time

However, on our custom board, it gets stuck during the second inference. Additionally, if we attempt to run the script again, the NPU does not perform any inference until the device is rebooted. The output is as follows:

root@imx93-dtsis:/# python3 demo.py
INFO: Ethosu delegate: device_name set to /dev/ethosu0.
INFO: Ethosu delegate: cache_file_path set to .
INFO: Ethosu delegate: timeout set to 60000000000.
INFO: Ethosu delegate: enable_cycle_counter set to 0.
INFO: Ethosu delegate: enable_profiling set to 0.
INFO: Ethosu delegate: profiling_buffer_size set to 2048.
INFO: Ethosu delegate: pmu_event0 set to 0.
INFO: Ethosu delegate: pmu_event1 set to 0.
INFO: Ethosu delegate: pmu_event2 set to 0.
INFO: Ethosu delegate: pmu_event3 set to 0.
INFO: EthosuDelegate: 1 nodes delegated out of 3 nodes with 1 partitions.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Running for 0'th time
Running for 1'th time
Traceback (most recent call last):
File "/demo.py", line 36, in <module>
main()
File "/demo.py", line 33, in main
output = run_inference(interpreter, image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/demo.py", line 14, in run_inference
interpreter.invoke()
File "/usr/lib64/python3.11/site-packages/tflite_runtime/interpreter.py", line 917, in invoke
self._interpreter.Invoke()
RuntimeError: Ethos_u inference failed
Node number 3 (EthosuDelegate) failed to invoke.

Since we are using 1GB LPDDR4, NPU shared memory size is aligned 128 MB according to the link below.

https://community.nxp.com/t5/i-MX-Processors-Knowledge-Base/Config-Tool-Introduction-and-Use/ta-p/19...

Note that we fixed ethosu region in kernel as it is described in your docs like the following:

Could this reduced allocation be the cause of the issue? If not, do you have any suggestions for resolving it?

Thank you in advance!

Issue with TensorFlow Lite and NPU on iMX93 Module Board

Issue with TensorFlow Lite and NPU on iMX93 Module Board

Machine Learning