HardFault in callFlashRunCommand

knovinger · ‎11-21-2018

I'm working on a K64 project with MCUXpresso and i'm having issues with an intermittent HardFault. I say intermittent as it comes and goes based on various code changes. It is, however, very consistent once it occurs. It is also dependent upon optimization (i.e. does not occur when debugging with the default Debug config but does occur when using the default Release config).

With the help of a few other forum and blog posts...

https://mcuoneclipse.com/2012/11/24/debugging-hard-faults-on-arm-cortex-m/

https://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html

https://community.nxp.com/thread/306244

www.keil.com/appnotes/files/apnt209.pdf

...i've managed to at least see what instruction is causing the hard fault. What I could use now is some help deciphering the status registers captured in the hard fault handler.

When the hard fault breakpoint hits, this the state everything is in...

From here, I used stacked_lr (0x76ad) to locate the instruction prior to the break...

At this point, i'm running under the assumption that the fault occurs when accessing the FTFE (FTFx = 0x40020000) register with callFlashRunCommand.

Looking at the values of the hard fault status registers and comparing to the register definitions found in the Keil appnote, I see that bit 17 of the _CFSR is set, which means an invalid state. Specifically...

Based on this information, I find the PC contains 0x20001274. Looking at this location, I find the following...

From this, I'm assuming the issue is the first instruction of s_flashRunCommand.

As this is part of the flash driver, I'm not sure what all is going on here. Regardless, I must have either an array out-of-bounds and/or a pointer referencing the wrong address somewhere due to the fact it will work with one code modification but not another. So, as I make changes I'm just moving the problem around. That said, it's become very difficult to track down as one change may make it seem the problem is corrected while another, unrelated (one would think) change makes it surface.

If I could get some help deciphering what the register snapshot is telling me, I would appreciate it.

Some relevant details...

MCUXpresso IDE version 10.2.1

Kinetis MK64FN1M0xxx12

Custom Hardware

As for the SDK, I'm using v2.4.2, however, due to another issue with flash swap (SWAP does not work from UPPER on SDK 2.4.1 flash driver), I've downgraded the flash drivers to version 2.1.0.

jingpan · ‎11-26-2018

Hi Kevin,

It seems very strange in your screenshot that s_flashRunCommand is locate at odd address (0x20001273). This is why you always see hardfault. ARM operation code must halfworld aligned. In NXP's driver, s_flashRunCommand is defined as

static uint32_t s_ftfxRunCommand[kFTFx_RamFuncMaxSizeInWords];

It should not be allocate at odd address.

The flash swap bug has been reported. It is ok after change the parameter form False to True.

Regards,

Jing

mjbcswitzerland · ‎11-27-2018

Jing

It is normal to have the RAM code at an odd address - this is Thumb2 alignment and an even address would fail.

Kevin

Are you disabling interrupt when the flashing routine is called? If not, any code the interrupt tries to execute in the flash block being worked on will fail.

The hard fault handling methods tend to be over-complicated. Simply set a break-point in your hard fault handler (the handler should 'return' and not spin in a loop - an empty routine is fine). When it is hit, switch the debugger display to disassemble mode and single step the code so that it "steps out" and back to the instruction that sent it there. If you return to the line strh r0, [r0, #0] see what is in r0 since that is the only thing that could cause that line to fail.

Regards

Mark

jingpan · ‎11-27-2018

Hi Mark,

Yes, disable all interrupt is a good advise. But you say that Thumb2 alignment is odd... Are you sure?

An odd data (address+1) must be written to PC when jumping. But code must even alignment.

Regards

Jing

mjbcswitzerland · ‎11-27-2018

Jing

I made a mistake and you are correct. The code must be aligned on short word boundaries and called with its address + 1, as you have stated (ad not the other way around as I incorrectly wrote).

Therefore you are probably right that the source of difficulty is that the RAM buffer used for the flashing code is not guaranteed to be aligned and so sometimes works and sometimes fails (depending on other variables and how it happens to fall).

Kevin

Check how your code is allocating the RAM for the flashing code. It must be managed to always be correctly aligned!

Regards

Mark

knovinger · ‎11-27-2018

Jing/Mark, thank you for your responses.

Mark, I tried your suggestion of just setting a breakpoint inside the default Hardfault_Handler.

From what I can tell, it appears to be failing at the following line...

where r3=0x20001274 and r4 = 0x0.

You asked about r0 in the line previous. The value of r0 is 0x40020000.

In regards to the RAM allocation, can you clarify your statement "It must be managed to always be correctly aligned!"? Is this in reference to the IDE/compiler's configuration or how the C code is written (i.e. struct definitions, etc.)?

mjbcswitzerland · ‎11-27-2018

Kevin

Stepping out of the hard fault is showing your about the same location that you had concluded before (maybe one line further (?)). What normally happens is that you land back at the line that faulted and when you step again it makes exactly the same error and jumps back to the hard fault.

This also gives you the possibility of moving the program counter to move to the previous or next line to see whether the next instruction is OK, change register values to work out which one is bad, etc. (essentially much easier and more flexible than the various hard fault call stack debugging guides etc. - although it does need a debugger attached of course)

I don't see any risk in either of the two instructions so I do think that you have an alignment problem and the processor is not executing the instructs that you see and therefore is failing for a more random reason. I have copied the way that the flashing routine is located in SRAM in the uTasker project - note that the code buffer is constructed of unsigned shorts (thumb2 instruction length and alignment boundary) and so is naturally always correctly aligned. If your buffer is an unsigned char buffer there is a risk and it will be hard to control its alignments.

Regards

Mark

        #define PROG_WORD_SIZE 30                                        // adequate space for the small program
        int i = 0;
        unsigned char *ptrThumb2 = (unsigned char *)fnFlashRoutine;
        static unsigned short usProgSpace[PROG_WORD_SIZE] = {0};         // make space for the routine in static memory (this will have an even boundary)

        ptrThumb2 =  (unsigned char *)(((CAST_POINTER_ARITHMETIC)ptrThumb2) & ~0x1); // thumb 2 address
        while (i < PROG_WORD_SIZE) {                                     // copy program to SRAM
            usProgSpace[i++] = *(unsigned short *)ptrThumb2;
            ptrThumb2 += sizeof (unsigned short);
        }
        ptrThumb2 = (unsigned char *)usProgSpace;
        ptrThumb2++;                                                     // create a thumb 2 call
        fnRAM_code = (void(*)(volatile unsigned char *))(ptrThumb2);

‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Notice that fnFlashRoutine is always a pointer to an odd address although it is actually an even one and so the trickery of aligning code on even and the call address on add addresses.