Hi,
I'm working on a project with an i.MX RT1060 using external XIP NOR flash, ThreadX+FileX+LevelX. I'm experiencing HardFaults that are the MCU reports as a UNDEFINSTR error (reported through the Usage Fault Status register).
The backtrace points to code in LevelX, but the faulting instruction reported by the HardFault handler (and the callstack reported through VSCode + CortexDebug extension) is just a regular STR instruction storing something onto stack. I also tried replicating this code by having it store to an invalid address, but that results in a Bus Fault, not a Usage Fault. There also doesn't seem to be any other strange instruction close to faulting address.
What I'm wondering is if it might be that the MCU attempts to read data from flash while the flash is being written to/erased by LevelX, which isn't allowed. A hypothesis would be data/instruction prefetching.
We've put the code for the mflash_drv.c, fsl_flexspi.c and fsl_cache.c into RAM as done by examples in the SDK. I see that interrupts are disabled inside the mflash driver, but I'm not sure I can see anything that would prohibit prefetching from reading from flash while write/erase operations are being performed.
I'm wondering if this sounds familiar or if there's anything we're missing with our setup to avoid illegal reads?
Thanks,
Daniel
Hi @MulattoKid
Thank you for reaching out!
Just to get more context, are you experiencing the faults whenever opening, writting or closing a file, for example if you run the typical filex test loop are faults occuring in the same place?
Secondly, I presume that you took as reference levelx_flash SDK example . What is the difference between the demo and your implementation?
A suggestion to narrow down this is to link whole application to SRAM ( you migth need to disable some parts of your implementation if internal SRAM footprint is not enougth). If there are no faults, it migth be that the problem is XIP execution and the filex/levelx driver.
Diego
I now have the scenario where the following code generates a HardFault:
60086e76 <_lx_nor_flash_driver_write>:
60086e76: b5f8 push {r3, r4, r5, r6, r7, lr}
60086e78: af00 add r7, sp, #0
60086e7a: 4686 mov lr, r0
60086e7c: 4608 mov r0, r1
60086e7e: 4611 mov r1, r2
60086e80: 461a mov r2, r3 <---- HardFault happens here
Before the PUSH instruction the registers look like this:
and before the MOV instruction that triggers the fault the registers look like this:
I don't see any issues:
I've disabled all interrupts, and the ThreadX thread from which this is running is the only thread that's been created at this point, and it's the first code the thread runs. The thread has 64KiB of stack [0x2026db70, 0x2027db70] and as the images show the SP is still close to the top of the stack before the push (0x2027d998).
If I enter _lx_nor_flash_driver_write from the source code view and step over the initialization of the function call I get a HardFault, but if I step over individual instruction from the disassembly view it seems the core has crashed, as I'm unable to step over the MOV instruction where the fault points to, and I don't get into our HardFault_Handler...the system seems to have hung.
As previously mentioned, adding or removing other seemingly unrelated code can make the HardFault go away and come back, even without the address of the instruction where the HardFault occurs changing.
I've now got a version of our code that deterministically generates a HardFault, specifically UNDEFINSTR.
I'm starting to run out of ideas for things to investigate...do you have any suggestions?
Thanks,
Daniel
Hi @diego_charles,
Thanks for you quick reply!
I'm not the engineer who implemented this part, so it's a bit hard for me to say how it differs from the sample, but I believe the FileX and LevelX drivers were mostly copy-pasted from the sample.
In my last automated test to trigger the HardFault it changed location and cause, and was now on an instruction to call to _fx_utility_logical_sector_flush, with the fault being reported as bit 3 in the UsageFault State register = Coprocessor usage fault.
It seems it can happen at different places in the code, but we've only seen it in FileX and LevelX functions, and in call chains that are related to reading/writing from/to flash. I'll investigate if we can do as you suggest and run from RAM instead, but it'll require quite a lot of modifications.
However, in general I'm curious how the SDK ensures no data from flash is prefetched when doing erase/write operations on flash when running XIP? As mentioned in my original post I don't think disabling interrupts and having an instruction barrier is enough? I suppose that by placing the code in RAM the risk is quite low, as it's likely that the erase/write has finished before it becomes relevant to retrieve code/data from flash again, but I'd think it's possible?