Hello,
Thank you for your reply! We have successfully resolved the model loading issue by fixing ethosu shared memory in Kernel, and the device is no longer crashing. However, we are now encountering a different issue with the NPU on our custom board, and we would appreciate your assistance.
The NPU is unable to run inference correctly. For example, the following test code works as expected on the i.MX 93 EVK, producing the output:
Test code:
import tflite_runtime.interpreter as tflite
import numpy as np
def run_inference(interpreter, image):
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_scale, input_zero_point = input_details[0]["quantization"]
image = image / input_scale + input_zero_point
image = image.astype(np.int8)
interpreter.set_tensor(input_details[0]['index'], image)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
return output
interpreter = tflite.Interpreter(model_path="TFLITE_MODEL_PATH",
experimental_delegates=[tflite.load_delegate("/usr/lib/libethosu_delegate.so")])
interpreter.allocate_tensors()
for i in range(10):
image = np.zeros((416, 416, 3), dtype="uint8")
image = np.expand_dims(image, axis=0)
print(f"Running for {i}'th time")
output = run_inference(interpreter, image)
Output on i.MX EVK:
root@imx93evk:/# python3 demo.py
INFO: Ethosu delegate: device_name set to /dev/ethosu0.
INFO: Ethosu delegate: cache_file_path set to .
INFO: Ethosu delegate: timeout set to 60000000000.
INFO: Ethosu delegate: enable_cycle_counter set to 0.
INFO: Ethosu delegate: enable_profiling set to 0.
INFO: Ethosu delegate: profiling_buffer_size set to 2048.
INFO: Ethosu delegate: pmu_event0 set to 0.
INFO: Ethosu delegate: pmu_event1 set to 0.
INFO: Ethosu delegate: pmu_event2 set to 0.
INFO: Ethosu delegate: pmu_event3 set to 0.
INFO: EthosuDelegate: 1 nodes delegated out of 3 nodes with 1 partitions.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Running for 0'th time
Running for 1'th time
Running for 2'th time
Running for 3'th time
Running for 4'th time
Running for 5'th time
Running for 6'th time
Running for 7'th time
Running for 8'th time
Running for 9'th time
However, on our custom board, it gets stuck during the second inference. Additionally, if we attempt to run the script again, the NPU does not perform any inference until the device is rebooted. The output is as follows:
root@imx93-dtsis:/# python3 demo.py
INFO: Ethosu delegate: device_name set to /dev/ethosu0.
INFO: Ethosu delegate: cache_file_path set to .
INFO: Ethosu delegate: timeout set to 60000000000.
INFO: Ethosu delegate: enable_cycle_counter set to 0.
INFO: Ethosu delegate: enable_profiling set to 0.
INFO: Ethosu delegate: profiling_buffer_size set to 2048.
INFO: Ethosu delegate: pmu_event0 set to 0.
INFO: Ethosu delegate: pmu_event1 set to 0.
INFO: Ethosu delegate: pmu_event2 set to 0.
INFO: Ethosu delegate: pmu_event3 set to 0.
INFO: EthosuDelegate: 1 nodes delegated out of 3 nodes with 1 partitions.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Running for 0'th time
Running for 1'th time
Traceback (most recent call last):
File "/demo.py", line 36, in <module>
main()
File "/demo.py", line 33, in main
output = run_inference(interpreter, image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/demo.py", line 14, in run_inference
interpreter.invoke()
File "/usr/lib64/python3.11/site-packages/tflite_runtime/interpreter.py", line 917, in invoke
self._interpreter.Invoke()
RuntimeError: Ethos_u inference failed
Node number 3 (EthosuDelegate) failed to invoke.
Since we are using 1GB LPDDR4, NPU shared memory size is aligned 128 MB according to the link below.
Note that we fixed ethosu region in kernel as it is described in your docs like the following:

Could this reduced allocation be the cause of the issue? If not, do you have any suggestions for resolving it?
Thank you in advance!