Kinetis K10 restarts on its own

peterkrause · ‎05-29-2015

Hello.

I'm currently developing an application using a Kinetis MK10DX256VLH7 (CodeWarrior 10.6). I have a big problem problem with spontaneous restarts of the MCU. Sometimes it runs for many hours before it restarts. And sometimes it only runs for a few minutes or shorter. I can't trigger it, so I can't debug it.

What can I do to find the restart source or problem?

Thanks.

mjbcswitzerland · ‎05-29-2015

Peter

A lock-up occurs when either an NMI or hard fault takes place but then results in another error that can't be recovered from.

This probably means that you don't have a proper hard fault handler installed so I would add this first. It can be any routine but I prefer to just use:

void __interrupt fnHardFault(void) // __interrupt has no meaning apart from making it clear that it is an interrupt handler

{

}

You can then set a break point in the interrupt handler and wait for it to be called (it will usually be due to code accessing non-existent memory or trying to write to a NULL pointer, or general run-away code).

When the break point is hit switch the debugger to disassemble mode (instruction stepping) and simply step back out of the interrupt and you will see the operation that caused it to fire. Then you can usually see what went wrong.

In the worst case it will be random code, which means that the fault was a result of a pevious fault (then trace may be useful to be able to look back to the initial fault).
If you happen to see what looks like valid code it can be an indication that the you have the Flash clock set too high (max. 25MHz) because this can make program operation unreliable. (In fact I have also seen Flash running sightly less that the maximum Flash speed randomly causing faults to occur but I understand that this was due to poor power supply design in that particular case).

Finally you should look at the watchdog concept since it is usually the first part of a design and not an after-thought in a project.

Regards

Mark

Kinetis: µTasker Kinetis support

K20: µTasker Kinetis TWR-K20D72M support / µTasker Kinetis FRDM-K20D50M support / µTasker Kinetis TWR-K20D50M support / µTasker Teensy3.1 support

For the complete "out-of-the-box" Kinetis experience and faster time to market

View solution in original post

mjbcswitzerland · ‎05-29-2015

Hello Peter

It is important to know whether you have the watchdog enabled or not. If enabled it is probably due to a watchdog reset, which could have two reasons:

1. The code is failing and not retriggering the watchdog
2. The watchdog servicing has an error and causes a random reset

Note that you can read the last cause of a reset from the chip to see what the cause was (including due to undervoltage).

In case 1 it is useful to disable the watchdog, connect the debugger and let the processor run until it fails. Typically the code will be spinning in a loop (or an interrupt) and this is then visible by pausing with the debugger.

In case 2 a typical reason is that the watchdog retrigger sequence is not protected against interrupts - an interrupt arriving during the restrigger sequence will cause the watchdog to fire. The solution is to disable interrupts before performing the watchdog retrigger and then re-enable them afterwards.

Regards

Mark

Kinetis: µTasker Kinetis support

K20: µTasker Kinetis TWR-K20D72M support / µTasker Kinetis FRDM-K20D50M support / µTasker Kinetis TWR-K20D50M support / µTasker Teensy3.1 support

For the complete "out-of-the-box" Kinetis experience and faster time to market

davidgraham · ‎07-24-2017

Thanks ...# 2 was a problem for me.

peterkrause · ‎05-29-2015

Hello Mark,

the watchdog is not implemented at the moment, I'm currently not using it.

Yon mentioned that I can read back the last reset source after reset. How can I do that?

Regards

Peter

mjbcswitzerland · ‎05-29-2015

Peter

Read RCM_SRS0 or MC_SRSH (depending on which register your chip has - probably the first one) to see the reset cause.

Regards

Mark

Kinetis: µTasker Kinetis support

K20: µTasker Kinetis TWR-K20D72M support / µTasker Kinetis FRDM-K20D50M support / µTasker Kinetis TWR-K20D50M support / µTasker Teensy3.1 support

For the complete "out-of-the-box" Kinetis experience and faster time to market

peterkrause · ‎05-29-2015

Hello Mark,

that looks very good. I hope that the register will give me an idea what's going wrong.

Thanks. I will I let you know if I found the problem.

Regards

Peter

peterkrause · ‎05-29-2015

Ok, I sent out the content of the RCM_SRS0 and RCM_SRS1 registers via UART at device start-up. It's yery difficult to get a valid result because sometimes the MCU runs very long. Nevertheless I managed to get three resets in the last hour. In all cases it was a Core Lockup Reset.

What is the best way to figure out which was the source of the lockup?

Regars

Peter

mjbcswitzerland · ‎05-29-2015

Peter

A lock-up occurs when either an NMI or hard fault takes place but then results in another error that can't be recovered from.

This probably means that you don't have a proper hard fault handler installed so I would add this first. It can be any routine but I prefer to just use:

void __interrupt fnHardFault(void) // __interrupt has no meaning apart from making it clear that it is an interrupt handler

{

}

You can then set a break point in the interrupt handler and wait for it to be called (it will usually be due to code accessing non-existent memory or trying to write to a NULL pointer, or general run-away code).

When the break point is hit switch the debugger to disassemble mode (instruction stepping) and simply step back out of the interrupt and you will see the operation that caused it to fire. Then you can usually see what went wrong.

In the worst case it will be random code, which means that the fault was a result of a pevious fault (then trace may be useful to be able to look back to the initial fault).
If you happen to see what looks like valid code it can be an indication that the you have the Flash clock set too high (max. 25MHz) because this can make program operation unreliable. (In fact I have also seen Flash running sightly less that the maximum Flash speed randomly causing faults to occur but I understand that this was due to poor power supply design in that particular case).

Finally you should look at the watchdog concept since it is usually the first part of a design and not an after-thought in a project.

Regards

Mark

Kinetis: µTasker Kinetis support

K20: µTasker Kinetis TWR-K20D72M support / µTasker Kinetis FRDM-K20D50M support / µTasker Kinetis TWR-K20D50M support / µTasker Teensy3.1 support

For the complete "out-of-the-box" Kinetis experience and faster time to market

peterkrause · ‎05-31-2015

Hello Mark,

thanks for your detailed information.

I hadn't implemented a special hard fault handler yet. Currently it is the default handler with a single instruction: __asm("bkpt");

As you recommended, I will implement a custom handler to see what's going wrong.

I also re-checked the flash clock. Flash is running at 24 MHz.

Regards

Peter

peterkrause · ‎06-01-2015

Hello Mark,

it seems that I found the bug. It was "deeply hidden" in one of our algorithm modules. At very special circumstances some wrong indices are calculated. The result of this error is an access to a non-exiting memory location. This faulty scenario ocurred not very often. So I'm lucky that I found the bug.

Thanks for the help. I'm sure, this was not the last bug of this type. But next time I'm better prepared to find it.

Regards

Peter