i.MX RT1060 HardFault UNDEFINSTR

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

i.MX RT1060 HardFault UNDEFINSTR

443 Views
MulattoKid
Contributor IV

Hi,

I'm working on a project with an i.MX RT1060 using external XIP NOR flash, ThreadX+FileX+LevelX. I'm experiencing HardFaults that are the MCU reports as a UNDEFINSTR error (reported through the Usage Fault Status register).

The backtrace points to code in LevelX, but the faulting instruction reported by the HardFault handler (and the callstack reported through VSCode + CortexDebug extension) is just a regular STR instruction storing something onto stack. I also tried replicating this code by having it store to an invalid address, but that results in a Bus Fault, not a Usage Fault. There also doesn't seem to be any other strange instruction close to faulting address.

What I'm wondering is if it might be that the MCU attempts to read data from flash while the flash is being written to/erased by LevelX, which isn't allowed. A hypothesis would be data/instruction prefetching.

We've put the code for the mflash_drv.c, fsl_flexspi.c and fsl_cache.c into RAM as done by examples in the SDK. I see that interrupts are disabled inside the mflash driver, but I'm not sure I can see anything that would prohibit prefetching from reading from flash while write/erase operations are being performed.

I'm wondering if this sounds familiar or if there's anything we're missing with our setup to avoid illegal reads?

Thanks,
Daniel

0 Kudos
Reply
4 Replies

433 Views
diego_charles
NXP TechSupport
NXP TechSupport

Hi @MulattoKid 

Thank you for reaching out! 

Just to get more context, are you experiencing the faults whenever opening, writting or closing a file, for example if you run the typical filex  test loop are faults occuring in the same place?

Secondly, I presume that you took as reference levelx_flash SDK example . What is the difference between the demo and your implementation?  

A suggestion to narrow down this  is to link whole application to SRAM ( you migth need to disable some parts of your implementation if internal SRAM footprint is not enougth). If there are no faults, it migth be that the problem is XIP execution and the filex/levelx driver. 

Diego

0 Kudos
Reply

375 Views
MulattoKid
Contributor IV

I now have the scenario where the following code generates a HardFault:

 

60086e76 <_lx_nor_flash_driver_write>:
60086e76:	b5f8      	push	{r3, r4, r5, r6, r7, lr}
60086e78:	af00      	add	r7, sp, #0
60086e7a:	4686      	mov	lr, r0
60086e7c:	4608      	mov	r0, r1
60086e7e:	4611      	mov	r1, r2
60086e80:	461a      	mov	r2, r3 <---- HardFault happens here

 

Before the PUSH instruction the registers look like this:

Screenshot from 2025-01-30 09-31-46.png

and before the MOV instruction that triggers the fault the registers look like this:

Screenshot from 2025-01-30 09-33-16.png

I don't see any issues:

  • Before the PUSH instruction LR has the address + 1 to the instruction that called BL to branch to _lx_nor_flash_driver_write
  • PC is correct
  • SP matches PSP
  • XPSR has bit 24 set, indicating Thumb mode is active
  • All instructions are on addresses that are 2-byte aligned, as required by Thumb

I've disabled all interrupts, and the ThreadX thread from which this is running is the only thread that's been created at this point, and it's the first code the thread runs. The thread has 64KiB of stack [0x2026db70, 0x2027db70] and as the images show the SP is still close to the top of the stack before the push (0x2027d998).

If I enter _lx_nor_flash_driver_write from the source code view and step over the initialization of the function call I get a HardFault, but if I step over individual instruction from the disassembly view it seems the core has crashed, as I'm unable to step over the MOV instruction where the fault points to, and I don't get into our HardFault_Handler...the system seems to have hung.

As previously mentioned, adding or removing other seemingly unrelated code can make the HardFault go away and come back, even without the address of the instruction where the HardFault occurs changing.

0 Kudos
Reply

383 Views
MulattoKid
Contributor IV

I've now got a version of our code that deterministically generates a HardFault, specifically UNDEFINSTR.

  • I've commented out any writing and erasing of flash, so I doubt that reading from flash as it's being erased/written is the problem
  • I've tested on multiple custom HW boards, so it's not a problem with a specific board
  • By commenting in/out or adding NOP instructions the HardFault can appear in slightly different places, but usually in some part of FileX/LevelX
  • I've tried updating our MPU config to an invalid one, which generated a DACCVIOL fault, which makes sense, and is different from the UNDEFINSTR fault I'm normally getting, so it doesn't seem like it's a invalid access
  • The instructions on which the fault is generated are often CMP and BEQ/BLS pair, but the addresses of the instructions seem to be correctly aligned to either 2 or 4 bytes depending on if it's a normal or wide instruction (we're compiling with -mthumb)
  • I've dumped the region of flash where the code resides, and compared it to the code we flash, and it matches, so there hasn't been a corruption. The memory view when debugging in VSCode also shows the contents in flash being correct (this is running of external flash [XIP])

I'm starting to run out of ideas for things to investigate...do you have any suggestions?

Thanks,
Daniel

0 Kudos
Reply

428 Views
MulattoKid
Contributor IV

Hi @diego_charles,

Thanks for you quick reply!

I'm not the engineer who implemented this part, so it's a bit hard for me to say how it differs from the sample, but I believe the FileX and LevelX drivers were mostly copy-pasted from the sample.

In my last automated test to trigger the HardFault it changed location and cause, and was now on an instruction to call to _fx_utility_logical_sector_flush, with the fault being reported as bit 3 in the UsageFault State register = Coprocessor usage fault.

It seems it can happen at different places in the code, but we've only seen it in FileX and LevelX functions, and in call chains that are related to reading/writing from/to flash. I'll investigate if we can do as you suggest and run from RAM instead, but it'll require quite a lot of modifications.

However, in general I'm curious how the SDK ensures no data from flash is prefetched when doing erase/write operations on flash when running XIP? As mentioned in my original post I don't think disabling interrupts and having an instruction barrier is enough? I suppose that by placing the code in RAM the risk is quite low, as it's likely that the erase/write has finished before it becomes relevant to retrieve code/data from flash again, but I'd think it's possible?

0 Kudos
Reply