Watchdog reboot on overloading GPU in SCM IMX6Q

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Watchdog reboot on overloading GPU in SCM IMX6Q

Jump to solution
10,568 Views
parthasarathyr
Contributor III

Hi,

I am using Linux imx-4.1.15-2.0.0_ga release and Yocto krogoth build.

I have stress tested GPU with glmark2 GPU application to check for over heating of IMX6 SCM when running GPU.

IMX6 SCM is very quickly reaching the threshold of 85 degree and reducing the GPU clock to 1/64. I have launched multiple instances of glmark2 application to overheat the module.

On doing this test occasionally i got system reboot, this happens after being in non-responsive state for sometime. Display is going blank when system enters the non-responsive state.

Is there any latest patch which fix this issue?

Is there any known issue particular to SCM module?

There is a separate release for SCM https://community.nxp.com/docs/DOC-333955 

Will this make any difference in BSP/Kernel or in fixing this issue?

Looking forward for some response. Meantime i am planning to build above release and test it.

Thanks,

Partha

Labels (2)
1 Solution
7,471 Views
michaelguntli
Contributor IV

alejandrolozanojuangutierrez maybe you can help with the patch "WA GPU 3D OT" to fix the arbitration within the memory controller?

The priority between GPU2D/GPU3D’s QoS transfer and non QoS transfer from the core may be overloading one of the bus fabric (PL301) then to the MMDC. MMDC (memory controller) prioritizes QoS flag and prevents access from non-QoS data traffic (CPU).  If GPU process requirements gets heavy, overrun error may occur due to logic race and lack of resources on bus fabric. 

View solution in original post

25 Replies
1,560 Views
juangutierrez
NXP Employee
NXP Employee

Can you try to add this on the board_late_init function of the board file?


--- a/board/freescale/mx6dqscm/mx6dqscm.c
+++ b/board/freescale/mx6dqscm/mx6dqscm.c
@@ -1259,6 +1259,12 @@ int board_late_init(void)
#ifdef CONFIG_ENV_IS_IN_MMC
board_late_mmc_env_init();
#endif
+
+ printf("Limit GPU 3D OT=1 WA\n");
+
+ writel(0x3, 0x00C43108);
+ writel(0x3, 0x00C48108);
+
return 0;
}

1,560 Views
takayuki_ishii
Contributor IV

Hello juangutierrez,

Our customer has a white out problem, and it is fixed after applied this patch.

Before apply this patch to their system, we want to know how it affects the system.

 * what is it changed to set this register value?

   I think that this register is a fn_mod register of CoreLink™ Network Interconnect(NIC-301)

   is it mapped in i.mx6(0x00C43108/0x00C48108).

  If so, in ARM document(Technical Reference Manual of NIC-301 DDI0397I), it have 2bit but

  only 0x0 or 0x1 are informed, but 0x3 have no information.

  Else, which register is it mapped and how effect to set 0x3?

 * Which impact does it in system?

 * You apply this patch in initialize of u-boot.

   If it apply in Kernel initialize, does it have some problem?

I am looking forward to hearing from you soon.

Best regards,

Ishii.

0 Kudos
Reply
1,560 Views
Elvis
Contributor I

I encountered the same problem. After the patch was applied, the fault could be solved, but the performance was reduced by 20%. Are there any other solutions? Since the performance is not decreased, the system will not crash.

0 Kudos
Reply
7,472 Views
michaelguntli
Contributor IV

alejandrolozanojuangutierrez maybe you can help with the patch "WA GPU 3D OT" to fix the arbitration within the memory controller?

The priority between GPU2D/GPU3D’s QoS transfer and non QoS transfer from the core may be overloading one of the bus fabric (PL301) then to the MMDC. MMDC (memory controller) prioritizes QoS flag and prevents access from non-QoS data traffic (CPU).  If GPU process requirements gets heavy, overrun error may occur due to logic race and lack of resources on bus fabric. 

1,560 Views
art
NXP Employee
NXP Employee

Q. Is there any latest patch which fix this issue? Is there any known issue particular to SCM module?

A. There seems to be nothing to fix in software since overheating can cause any unpredictable behaviour of the product. The only possible "fix" is to provide better cooling for the SCM.

Q. There is a separate release for SCM Yocto BSP for SCM-i.MX L4.1.15-2.0.0_ga

A. Yes, you have to use this build for SCM operation.


Have a great day,
Artur

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------