Hello,
I use an nnstreamer pipeline to run a detection model on video streams on an i.MX8MP (scarthgap 6.6.23).
It works fine. But after a random delay (~ some 10 seconds to some minutes), the rate of the inferences' output suddenly dramatically drops (typically from 15 to 0.3 FPS on our proprietary model).
At the same time, I observe that the load of the 2nd GPU (GC8000) raises at 100% and sticks to 100% until the pipeline stops.
I also notice weird display bugs on the weston's desktop screen that also disappear when the pipeline stops.
For IP reasons, I can't share my model but I managed to reproduce the bug with a Yolov5 tflite model publicly available in Kaggle at https://www.kaggle.com/models/kaggle/yolo-v5?select=1.tflite.
Way to reproduce the problem:
# Download the model from the Kagle website (https://www.kaggle.com/models/kaggle/yolo-v5?select=1.tflite)
# this provides with the 1.tflite file
# Sorry, I did not find any way to get a direct download url on kaggle's website. You'll have to download it manually from you browser (no need to register on kaggle's website)
# Optionally: Enable VX Caching
export VIV_VX_CACHE_BINARY_GRAPH_DIR=/root/.cache/vxdelegate/
export VIV_VX_ENABLE_CACHE_GRAPH_BINARY=1
# Optionally: Enable nnshark
export GST_DEBUG="GST_TRACER:7"
export GST_TRACERS="live"
# Run the pipeline
gst-launch-1.0 videotestsrc \
! video/x-raw, format=YUY2, width=320, height=320, framerate=20/1 \
! queue max-size-buffers=1 max-size-bytes=0 max-size-time=0 leaky=downstream \
! videoconvert n-threads=4 \
! video/x-raw, format=RGB \
! tensor_converter set-timestamp=false \
! tensor_transform mode=dimchg option=0:2 \
! tensor_transform mode=arithmetic option=typecast:float32,div:255 \
! tensor_filter framework=tensorflow-lite model=1.tflite custom=Delegate:External,ExtDelegateLib:libvx_delegate.so \
! fakesink
With this setup, at the beginning, the pipeline infers at a rate of ~0.85 FPS (inference time of 1.195s reported in nnshark) and the GC8000 load is between 10 and 40%.
After few minutes (~2'30 on my setup), the bug happens. The FPS drops to ~0.05FPS (inference time of 19.181s) and the GC8000 load is stucked at 100%.
This can be observed:
Thank you by advance for any help!
已解决! 转到解答。
Self reply to my own question:
The processor is heating quite a lot when the NPU is under intensive load.
For some reason, my development board was not equipped with a heat-sink.
Installing a heat-sink on the processor seems to have solved by problem. Adding a fan may also be necessary for production use. To be checked
So, for the moment, I consider my problem is solved.
Hopefully, this post can help someone with the same problem.
Self reply to my own question:
The processor is heating quite a lot when the NPU is under intensive load.
For some reason, my development board was not equipped with a heat-sink.
Installing a heat-sink on the processor seems to have solved by problem. Adding a fan may also be necessary for production use. To be checked
So, for the moment, I consider my problem is solved.
Hopefully, this post can help someone with the same problem.