iMXRT1052 hardfault

mark82 · ‎07-03-2023

Hello to everyone.

I've been struggling for days on a hardfault problem.
I have a bunch of preproduction custom boards with:

- RT1052 MCU
- external SDRAM chip
- external SPI Flash (quad)
- i2c IO expander
- audio codec (with SAI and i2c lines). i2c bus for audio is different from i2c bus for IO expander.

All compiler optimizations are turned off.

For firmware side I use FreeRTOS v10.4.3.
Here's my flow in brief:
- Initialization of above peripheralas
- Tasks init and FreeRTOS scheduler start

Hardfault occurs on some boards only!
On the faulty boards, it occurs at the very same point when communicating over i2c and precisely just after the master STOP condition.
I2C transactions are OK, though (analyzed with protocol analyzer).
After some digging, the reason of the hardfault is a bad value popped back from stack to program counter,
so CPU tried to execute an undefined instruction. Screenshot (hf_memory) is attached which represents the stack from where the wrong PC should be popped.
Verified in assembly code that within the function that triggers the hardfault, LR is pushed at the beginning and it is popped into PC.
The curious thing is that the expected LR (link register) and the wrong PC differ only by one bit (except the lsb):
PC = 0x6024a8fc
LR = 0x6004a8fd

Other curious things:
- The i2c transation that causes HARDFAULT is executed within a task of FreeRTOS. If I execute that line before the start of scheduler, no problem occurs.
- If I skip that i2c transaction for IO expander, the next i2c transaction for AUDIO Codec initialization within the audio task of FreeRTOS triggers the same hardfault with the same bad PC value.
- All faulty boards have the same behaviuour (same bad PC value)
- If I add some code somewhere, the bad PC is different but it always differs by only one bit with respect to LR.
- If I turn on compiler optimizations (I tried only for speed so far), the hardfault is not triggered.
- If I do this (enable wait mode for TCM insteaf of fast mode):
((FLEXRAM_Type *)FLEXRAM_BASE)->TCM_CTRL |= FLEXRAM_TCM_CTRL_TCM_RWAIT_EN(1) | FLEXRAM_TCM_CTRL_TCM_WWAIT_EN(1)
hardfault doesn't occur.

I also read the latest errata for something, but nothin came up.

No longer now what to do here...

Any suggestions?

mark82 · ‎07-14-2023

Hello.

We could add just a ceramic bypass capacitor 220nF on bottom on a point where we could scrach a via on the input of the internal regulator (where voltage is about 1.1V).

Marco

jingpan · ‎07-14-2023

Hi @mark82 ,

Sounds like VDD_SOC_INx. It's the digital power supply. But there is several VDD_SOC_INx.

Regards,

jingpan · ‎07-10-2023

Hi @mark82 ,

LR always keep the real address+1. This is defined by ARM. So I don't think the picture you post has problem.

It is usually very hard to debug such problem. But it seems it is more likely to be a hardware or software settings problem. Do you have some new progress?

Regards,

Jing

mark82 · ‎07-12-2023

Hello jingpan. Thanks for your reply.
The LSB is OK to be different, but the strange is that bit 21 is always different.

Anyway, after more digging, we could have other types of hardfaults, BUT always on the same value of PC. We contacted one of our assistant engineer who provided us with the microcontrollers, and he pointed out that the board itself lacks of some bypas capacitors on the bottom layer, just next to MCU. Scratching the solder and putting just one of those seems to make the problem disappear.
Also, without intervening on PCB, lowering the CPU core clock helps.

We still have another undesired effect, though to be investigated.
Anyway, does it sound a valid reason for hardfault, sincerely? Has it ever happenned to you to support such a situation?

Regards,
Marco

jingpan · ‎07-12-2023

Hi @mark82 ,

To be honest, this is the first time I know that improper cap parameter can cause such phenomenon. But it's true that omit these cap cause the system instable.

Which cap did you add?

Regards,

Jing

iMXRT1052 hardfault

iMXRT1052 hardfault

i.MXRT 105x