AnsweredAssumed Answered

MPC8378 watchdog reset problem

Question asked by Christopher Stutts on Mar 6, 2017
Latest reply on Mar 6, 2017 by Pavel Chubakov

We used the 8377RDB reference design and have upgraded to Linux 4.6 from 2.x. I've discovered that with the standard 83xx watchdog driver, configured to a) do a hardware reset, not machine check exception, and b) to actually not stay alive after the app with /dev/watchdog open stops feeding it (vs. a kthread doing the work):

a reset correctly occurs _except_ during a lot of NAND flash activity.

 

I see nothing in errata about this, though there is a comment in the nand controller driver about fragility of the nand controller. Basically, if I wipe the JFFS2 partitions in our device, at bootup, when the partitions are mounted, there is some time-consuming partition init going on behind the scenes. If I starve the watchdog in that window (easy with a small, non-standard watchdog period compiled into the driver), a hang occurs instead of a reset; power cycle is required. Ftrace and even top hint that the only thing different about that several second window vs. any other time is nand activity, nand opcodes to the controller, LBC IRQs. Top shows the jffs2 partition garbage collection kthread being busy, and ftrace shows activity within a second of the hang. Ftrace UDPs out its data, so I don't know what I'm missing close to the event: maybe waiting on a nand controller command, maybe not. There is some gigabit activity, but for the duration of the x second window only nand activity is common for each hang.

 

With machine check exception configured in the watchdog instead of reset, the chip behaves as documented.

Outcomes