Hi,
- Are you running any application or modifying any GPU related registers before this hang can occur?
No, we don't modify any GPU register. The error happens on boot before we start any graphical backend
- How much frequent the issue is?
This issue is intermittent, but around the 10% of boot times occur.
- Are you facing issues with 'gckOS_ReadRegisterEx' function every time when you meet this issue or the hang is happing from a different point of code each time?
Before add even more debug the issue always was found in the "gckOS_ReadRegisterEx" function (in the readl function), however now we have several log where the issue happens with the "gckOS_WriteRegisterEx" function (just in the writel function)
- Can you provide us some more debug logs inside the function 'gckOS_ReadRegisterEx'?
When the device hangs, always fails in the readl() or writel() functions. I cannot provide debug lines inside the readl() or writel() because if I add it the system does not boot, I assume that these are low level functions that works in an interrupt environment and I can not print messages on it.
Attached you can find several logs from different boot sequences where the device hangs, and in many cases after a few minutes there is a kernel dump and the boot process continues. It seems like the CPU is stuck in a read/write operation and exits by timeout or something like this. Notice that all of these attached kernel dumps are from a kernel 4.9 based on your imx_4.9.88_2.0.0_ga tag, because the product is based on this kernel.
- console_output_failing_on_boot_debug_4_9.txt: Boot with debug message. Device hangs after start the Galcore driver. Check the kernel dump and verify the stuck on the gckOS_ReadRegisterEx function
- console_output_failing_on_boot_more_debug_4_9.txt: Boot with debug message and timestamp. Device hangs after start the Galcore driver and continues after 2 min. Check the kernel dump and verify the stuck on the gckOS_WriteRegisterEx function.
- console_output_failing_on_boot_more_debug_4_9_log2.txt:Boot with debug message and timestamp. Device hangs after start the Galcore driver and continues after 9 min. Check the kernel dump and verify the stuck on the gckOS_WriteRegisterEx function.
In parallel we found that in the imx_4.9.88_2.0.0_ga tag, there is an issue related with the i.MX6QP/DP platform. This commit MLK-16266-02 ARM: imx: Enhance the code to support new TO for imx6qp introduced a comparison in the file drivers/clk/imx/clk-imx6q.c , using the function clk_on_imx6q() that checks the compatibility with "fsl,imx6q " to identify the i.MX6QP/DP platforms
static inline int clk_on_imx6q(void)
{
return of_machine_is_compatible("fsl,imx6q");
}
But these platforms does not have this compatibility machine, the i.MX6QP/DP platforms have the "fsl,imx6qp", and then it should be used the function clk_on_imx6qp()
static inline int clk_on_imx6qp(void)
{
return of_machine_is_compatible("fsl,imx6qp");
}
Attached you can find the patch 0001-ARM-imx-clk-imx6q-fix-clocks-initialization-to-i.MX6.patch to fix it over imx_4.9.88_2.0.0_ga . Please have a look into this issue and provide info about if the patch is correct and if there is anything wrong related with the clocks initialization. Without that patch for example, the GPU clocks are not initialized and maybe it is related with our boot issue doing the __ResetGPU().
Thanks,
Arturo.