i.mx6solo GPU[0] hang when running qt app

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

i.mx6solo GPU[0] hang when running qt app

254 次查看
yiqing
Contributor I

Hello all,

I've been encountering random GUI freeze issues while developing a Qt (5.15.2) application on the i.MX 6Solo platform with the Linux-4.1.1.5-r0 kernel. I've already spent two months trying to resolve this issue without success, so I am in great need of your assistance.

The occurrence rate of this issue is very low, with about 5 to 10 fault reports received per week across thousands of devices. In fact, I've never been able to reproduce this problem in my lab.

Analysis of the fault logs shows that each incident lasts for about 5 to 6 minutes. During the fault duration, there's no log output or heartbeat from the GUI process. The log shows that watchdog application repeatedly restarts the GUI process multiple times, but the fault cannot be resolved. Another prominent characteristic of this fault is that the GPU usage remains at 100% until the fault ends. When the fault resolves, the following log is output:

user.warn kernel: [galcore]: GPU[0] hang, automatic recovery.

user.warn kernel: [galcore]: recovery done

Afterwards, the GPU usage returns to normal, and the GUI process resumes normal operation.

After reviewing some information, we found that most GPU hang issues are due to power supply problems or core temperature issues. From the logs, the core temperature remains around 60 degrees Celsius with little fluctuation. There's no actual data regarding voltage issues yet, but since the hardware has not changed in the past three years and similar faults have not occurred, this makes the voltage issue less likely.

The occurrence of this fault is highly correlated with the recent software changes, so I believe the fault is more likely due to changes in the software. In the latest software, to achieve certain special effects, I had to add manual calls to the OpenGL API in the Qt display framework and use custom shaders for drawing. After this issue first appeared, I moved the shader calls into QML to avoid risks related to resource leaks or OpenGL scheduling, but the problem still persists.

I would like to know what factors at the software level might lead to GPU hangs? In the source code of galcore I find that the recovery time for a GPU hang is 20 seconds, so why does this fault last for 5 to 6 minutes?

Additionally, the current mitigation measure I'm taking is to reboot the system directly when a GUI process heartbeat timeout is detected along with abnormal GPU usage. This does indeed recover from the fault but results in a 5-second black screen. Is there a way to reset the GPU independently?

Best Regards!

YiQing

 

标签 (3)
标记 (3)
0 项奖励
回复
2 回复数

201 次查看
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello,

Please check the processor temperature. Possibly its overheating may cause hanging of its components.

Also please check if the processor is powered fine. I mean the voltage levels is OK and there is no noise.

Also please make sure that you are running the latest BSP. Here may be some updates for GPU driver applied.

 

Regards

0 项奖励
回复

183 次查看
yiqing
Contributor I
Thank you very much for your reply!

I believe the probability of the issue being due to temperature or power supply is low, so perhaps I should upgrade the BSP. Upgrading the BSP requires a lot of development and testing work, which will take a considerable amount of time.
Is there any temporary workaround before this work is completed? As I mentioned earlier, is there a way to reset the GPU separately?

Regards
0 项奖励
回复