Could you change the Subject to make it describe your problem better?
How do you clear all pending interrupts?
They should (meaning "must") clear when the CPU resets. That's a "hard reset".
if MQX is attempting a "soft restart" as a result of a watchdog interrupt, then "there's your problem".
But the BWT (Backup Watchdog Timer) and your other external watchdog both drive Reset. That should be guaranteed to reset the CPU.
These "hard resets" will reset all the masking bits in the Interrupt Controller, and more importantly, they will reset all of the internal devices that can cause the interrupts.
While on the subject of interrupts, a big trap on all of the MCF52 series chips is that you have to make sure you obey the Note on "16.3.6 Interrupt Control Registers (ICRnx)":
"It is the responsibility of the software to program the ICRnx registers with unique and non-overlapping level
and priority definitions. Failure to program the ICRnx registers in this manner can result in undefined
behavior."
That is very important, but slightly wrong. There are two interrupt controllers in there, "INTC0" and "INTC1". You only have to guarantee unique Level and Priority values within one interrupt controller. It is OK to have an interrupt in INTC0 that has the same priority and level as in INTC1.
I can see three possibilities if the CPU is actually wedged (which I don't think it is, see later).
Maybe you have some external hardware on the board that isn't being reset by the internal watchdog (which drives RSTOUT) or by the external watchdog (which drives RSTIN, and then the CPU drives RSTOUT). So maybe the external hardware is stuck in a state that causes the software to lock up and crash.
Maybe the crystal didn't start, or is running at a harmonic. Crystals are tricky things, and it is very easy to get the design wrong. You should google for "crystal margin test" which finds this good description:
http://www.nxp.com/assets/documents/data/en/application-notes/AN3208.pdf
More likely is that the CPU is locked up and wedged. Maybe it got some ESD and has now got a combination of junctions turned on that simply can't be fixed without a complete power removal. Maybe you've got undershoot and overshoot on external buses that have got the CPU or an external chip in a bad state. Do have have anything on the mini-flexbus?
I think you're on the wrong track with interrupts. After a Reset there can't be any interrupts enabled. So there's nothing to reset before MQX starts.
Do you have a loader or bootstrap that runs first? Does it turn any interrupts on and maybe leave them on by accident?
Do you have any external indication that the software is actually running? Can you change the code so it turns a LED on or something very early on (and maybe blinks it before it starts MQX and have code in your Application that flashes it after MQX has started) so that when it happens in the field you know the difference between "CPU is dead" and "code is locked up"?
The backup Watchdog is DISABLED by power-on reset. So if it is resetting the CPU then the CPU must have enabled it. So the CPU is at least running the code that enables the backup Watchdog.
The problem with the watchdog is that it doesn't leave much of a record as to why it bit. It DOES set the BWT bit in the RSR though. You can test this when your code starts up and then do "something different" if you know this has happened. One thing you can do that is useful is to have some periodic interrupt (like a timer interrupt) "look back" on the Stack and copy the interrupted program counter to a reserved location in the SRAM. This is "where the code was when the timer last went off". Then on reset, if RSR[BWT] is set you print that stored value, save it to EEPROM or send it somewhere. If you can get that program counter value it might tell you where the code was stuck on the last interrupt before the reset happened. Looking at the source it might be obvious why it was stuck.
If that doesn't do it (like it was stuck somewhere with all interrupts disabled), then you add a software watchdog. I would suggest that you program a timer at the same time as programming the BWT, set to a time a bit shorter than the BWT. Program it to interrupt at IPL7. That's an unmaskable interrupt. Change your watchdog patting code so it pats BOTH of them. I think you can set up the "Core Watchdog" to do this (generate an interrupt). Then if the soft one goes off you know the hard one is about to go off and the code is stuck somewhere. Dump the stack to somewhere non-volatile so you can get it back and inspect it later.
Good luck.
Tom