Issue with TensorFlow Lite and NPU on iMX93 Module Board

esmamirhan — Fri, 27 Dec 2024 11:50:02 GMT

Hello,

I am using iMX93 with SDK Linux_6.1.55_2.2.0.

I am encountering an issue while running a TensorFlow Lite object detection model on a custom board with the iMX93 module. When I attempt to start inference with this model, iMX93 gets stuck.

Additionally, I have observed that the device consistently gets stuck whenever I access the NPU, even when using the inference_runner and interpreter_runner utilities provided by NXP.

Do you have any ideas or suggestions on what could be causing this problem?

Example usage:

interpreter = tflite.Interpreter(model_path="model.tflite", experimental_delegates=[tflite.load_delegate("/usr/lib/libethosu_delegate.so")]) interpreter.allocate_tensors()

tflite.Interpreter works well, but allocate_tensors() makes the cpu gets stuck.

Thank you in advance!

Re: Issue with TensorFlow Lite and NPU on iMX93 Module Board

Bio_TICFSL — Thu, 02 Jan 2025 14:37:38 GMT

Hello,

Please send you test code to check it.

Regards

Re: Issue with TensorFlow Lite and NPU on iMX93 Module Board

esmamirhan — Thu, 09 Jan 2025 12:43:23 GMT

Hello,

Thank you for your reply! We have successfully resolved the model loading issue by fixing ethosu shared memory in Kernel, and the device is no longer crashing. However, we are now encountering a different issue with the NPU on our custom board, and we would appreciate your assistance.

The NPU is unable to run inference correctly. For example, the following test code works as expected on the i.MX 93 EVK, producing the output:

Test code:

import tflite_runtime.interpreter as tflite import numpy as np def run_inference(interpreter, image): input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() input_scale, input_zero_point = input_details[0]["quantization"] image = image / input_scale + input_zero_point image = image.astype(np.int8) interpreter.set_tensor(input_details[0]['index'], image) interpreter.invoke() output = interpreter.get_tensor(output_details[0]['index']) return output interpreter = tflite.Interpreter(model_path="TFLITE_MODEL_PATH", experimental_delegates=[tflite.load_delegate("/usr/lib/libethosu_delegate.so")]) interpreter.allocate_tensors() for i in range(10): image = np.zeros((416, 416, 3), dtype="uint8") image = np.expand_dims(image, axis=0) print(f"Running for {i}'th time") output = run_inference(interpreter, image)

Output on i.MX EVK:

root@imx93evk:/# python3 demo.py INFO: Ethosu delegate: device_name set to /dev/ethosu0. INFO: Ethosu delegate: cache_file_path set to . INFO: Ethosu delegate: timeout set to 60000000000. INFO: Ethosu delegate: enable_cycle_counter set to 0. INFO: Ethosu delegate: enable_profiling set to 0. INFO: Ethosu delegate: profiling_buffer_size set to 2048. INFO: Ethosu delegate: pmu_event0 set to 0. INFO: Ethosu delegate: pmu_event1 set to 0. INFO: Ethosu delegate: pmu_event2 set to 0. INFO: Ethosu delegate: pmu_event3 set to 0. INFO: EthosuDelegate: 1 nodes delegated out of 3 nodes with 1 partitions. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Running for 0'th time Running for 1'th time Running for 2'th time Running for 3'th time Running for 4'th time Running for 5'th time Running for 6'th time Running for 7'th time Running for 8'th time Running for 9'th time

However, on our custom board, it gets stuck during the second inference. Additionally, if we attempt to run the script again, the NPU does not perform any inference until the device is rebooted. The output is as follows:

root@imx93-dtsis:/# python3 demo.py INFO: Ethosu delegate: device_name set to /dev/ethosu0. INFO: Ethosu delegate: cache_file_path set to . INFO: Ethosu delegate: timeout set to 60000000000. INFO: Ethosu delegate: enable_cycle_counter set to 0. INFO: Ethosu delegate: enable_profiling set to 0. INFO: Ethosu delegate: profiling_buffer_size set to 2048. INFO: Ethosu delegate: pmu_event0 set to 0. INFO: Ethosu delegate: pmu_event1 set to 0. INFO: Ethosu delegate: pmu_event2 set to 0. INFO: Ethosu delegate: pmu_event3 set to 0. INFO: EthosuDelegate: 1 nodes delegated out of 3 nodes with 1 partitions. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Running for 0'th time Running for 1'th time Traceback (most recent call last): File "/demo.py", line 36, in <module> main() File "/demo.py", line 33, in main output = run_inference(interpreter, image) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/demo.py", line 14, in run_inference interpreter.invoke() File "/usr/lib64/python3.11/site-packages/tflite_runtime/interpreter.py", line 917, in invoke self._interpreter.Invoke() RuntimeError: Ethos_u inference failed Node number 3 (EthosuDelegate) failed to invoke.

Since we are using 1GB LPDDR4, NPU shared memory size is aligned 128 MB according to the link below.

https://community.nxp.com/t5/i-MX-Processors-Knowledge-Base/Config-Tool-Introduction-and-Use/ta-p/1986035

Note that we fixed ethosu region in kernel as it is described in your docs like the following:

Could this reduced allocation be the cause of the issue? If not, do you have any suggestions for resolving it?

Thank you in advance!

i.MX ProcessorsのトピックIssue with TensorFlow Lite and NPU on iMX93 Module Board

Issue with TensorFlow Lite and NPU on iMX93 Module Board

Re: Issue with TensorFlow Lite and NPU on iMX93 Module Board

Re: Issue with TensorFlow Lite and NPU on iMX93 Module Board