How to properly trigger a window-watchdog

Matthias_LEHMANN · ‎04-08-2024

Hi everybody,

we're using the S32K142 in safety-relevant applications. The MCU is supervised by an external Window-Watchdog. In one application that's the NXP TJA1128, in another application we're using a Watchdog by another vendor.

I found hundreds of articles, forum posts or application notes about how watchdogs work in general and what's so special about a window watchdog. What's however missing is guidance or examples, how to properly trigger a watchdog from a SW / MCU point of view.

Our SW is a bare-metal system, i.e. there's no OS or scheduler - all tasks are cyclically executed in the required order, coordinated by timers. There are a few event-triggered tasks, which inflate the cycle-time, e.g. receiving, checking (crc) and storing a new SW image for a firmware update.

Servicing a classic watchdog is more or less trivial: virtually at the end of every cycle (there could also be a counter to wait for x cycles to complete) the watchdog is triggered. The special tasks which interrupt the normal cyclic execution also trigger the watchdog in sufficiently short intervals to make sure there's no erroneous reset.

With a Window watchdog it gets more delicate. Our current solution is to essentially use the same approach as for classic watchdogs, but have an additional SW-module as a "pre-filter" for the watchdog. This module collects all trigger-requests of the other SW-functions and discards all requests as long as the lower window hasn't expired yet. In my view that essentially "masks" the lower window, i.e. we're not really using the additional functionality provided by the window watchdog compared to a classic watchdog.

So it all boils down to the exam-question: how should a window-watchdog be serviced by the application-SW (assuming no OS is present)? We could probably do some profiling of the cycle length and it's statistic distribution and then use a counter to trigger the watchdog every x-th cycle. But then there are these special functions which mess everything up...

Bonus question: what are the additional failure modes covered by a window watchdog? Certain standards for functional safety (e.g. ISO25119-2 in annex C, table C.8) rank the diagnostic coverage for a classic (external) watchdog as "low", i.e. 60%. A window-watchdog is ranked at 90% diagnostic coverage. Is there an overview of failure modes that are detected by classic and a window watchdogs? What can make the SW go too fast? (apart from a too high clock frequency, which is even covered in our setup described above) The SW being stuck is probably most likely caused by programming errors, but that's already covered by a classic watchdog.

apologies for the lengthy question...

\\// Matthias

antoinedubois · ‎07-12-2024

Hello Matthias,

sorry for the delay of the answer.

The failure mode that a windowed watchdog covered can vary based on your SW architecture, but in general it is here to monitor your scheduler and slower drift in its periodic click (due to SW or HW failure).

For your bare-metal systems, if you want to bring some determinism, you may still want to have some periodic tasks scheduler (based on PIT from NXP) at the end of the MCU safety related function the PIT tasks would update the watchdog (with some challenger questions) to monitor your system stays deterministic and does not accelerate or slowly drift.

This is just an example, windowed-watchdog are also useful for a lot of other systematic failure of SW resources, that's why is is recommended in RT Safety critical systems.