Problem Statement: access to RAM by the flash peripheral during FLASH_DRV_Program() results in an IBUSERR
Background:
Observable behavior:
Processor hard faults during flash write procedure.
Problem 1. Initial investigation and fix:
IBUSERR results from access of a const flash_ssd_config_t * configuration structure instance which was declared as static. The bus error was precise and load instruction was explicit, occurring in flash_driver.c:95
The static type-qualifier on pSSDConfig was removed and the Hard Fault ceased to occur.
Problem 2. Subsequent investigation and fix:
A month later a similar problem occurred. This time, the bus error was inexact. A stack dump and some guessing implicated the memory pointed to by pData in the function below:
status_t FLASH_DRV_Program(const flash_ssd_config_t * pSSDConfig, uint32_t dest, uint32_t size, const uint8_t * pData);
It too was declared static. Removing the static type-qualifier again caused the problem to disappear.
Observations and Reproducibility:
Everything I found about IBUSERR online was the result of an illegal load, for example: https://wiki.segger.com/Cortex-M_Fault#Illegal_Function_Execution where illegal means a reserved address.
However, using readelf I found that the absence of the static type-qualifier doesn't change the address of the given symbol in RAM. The only change in the output of readelf is that the Bind column of the given symbol changes from LOCAL to GLOBAL. Therefore, whether or not an access is illegal depends on more than just the address.
Reading:
I'm reading the ARMv7-M Architecture Reference Manual:
which mentions that a MemManage exception might be triggered on illegal accesses. I've reverted the codebase to the state at which Problem 2 was observed and added a MemManage_Handler but have not yet been able to successfully reproduce the IBUSERR.
Question:
Solved! Go to Solution.
Hi,
this is not a problem of attributes, this is related to callback function and Read-While-Write error.
Mentioned line of code:
(pSSDConfig->CallBack)();\
... is a call of callback function. Because you got IBUSERR, it's not a problem of RAM or problem of this pSSDConfig structure. It's a problem of instruction fetch, not data access. Either the callback address is not valid or the callback function is placed to program flash memory.
Program flash of S32K142 consists of one read partition only. That means when you program or erase program flash, the code can run only from RAM or from data flash. If you access (either by instruction fetch or by data access) program flash during program or erase operation, it will lead to bus error. So, make sure that callback function is placed to RAM and that this callback function does not access program flash.
Let me also explain how to find the root cause when IBUSERR occurs:
For test purposes, I can try to jump to invalid address (somewhere behind the flash, for example):
typedef void (*func_ptr)(); // pointer to function type
(*(func_ptr)0x00080000)();
When running this code, fault handler is triggered and I can see that IBUSERR is set:
Now it's time to check the stack content. You can take a look at Figure 2 in:
https://www.nxp.com/docs/en/application-note/AN12201.pdf
... which shows the stack frame. What I can see in my debugger:
This stack frame is created when the exception is triggered. The most interesting is program counter PC (this is captured at the moment when exception is triggered) and link register LR - in this case, it's address of instruction right behind the instruction which caused the error (i.e. return address).
What I can see at this address 0x6FC (the last bit is set due to thumb instruction set, so the value is 0x6FD):
Now I can see that it was jump to address stored in r3. This can be seen also in stack frame - r3 still contains this address and PC also shows that this was the problem.
And now I can check also this address:
Here I can see that the bus error was triggered because I jumped to unimplemented address space.
Regards,
Lukas
Hi,
this is not a problem of attributes, this is related to callback function and Read-While-Write error.
Mentioned line of code:
(pSSDConfig->CallBack)();\
... is a call of callback function. Because you got IBUSERR, it's not a problem of RAM or problem of this pSSDConfig structure. It's a problem of instruction fetch, not data access. Either the callback address is not valid or the callback function is placed to program flash memory.
Program flash of S32K142 consists of one read partition only. That means when you program or erase program flash, the code can run only from RAM or from data flash. If you access (either by instruction fetch or by data access) program flash during program or erase operation, it will lead to bus error. So, make sure that callback function is placed to RAM and that this callback function does not access program flash.
Let me also explain how to find the root cause when IBUSERR occurs:
For test purposes, I can try to jump to invalid address (somewhere behind the flash, for example):
typedef void (*func_ptr)(); // pointer to function type
(*(func_ptr)0x00080000)();
When running this code, fault handler is triggered and I can see that IBUSERR is set:
Now it's time to check the stack content. You can take a look at Figure 2 in:
https://www.nxp.com/docs/en/application-note/AN12201.pdf
... which shows the stack frame. What I can see in my debugger:
This stack frame is created when the exception is triggered. The most interesting is program counter PC (this is captured at the moment when exception is triggered) and link register LR - in this case, it's address of instruction right behind the instruction which caused the error (i.e. return address).
What I can see at this address 0x6FC (the last bit is set due to thumb instruction set, so the value is 0x6FD):
Now I can see that it was jump to address stored in r3. This can be seen also in stack frame - r3 still contains this address and PC also shows that this was the problem.
And now I can check also this address:
Here I can see that the bus error was triggered because I jumped to unimplemented address space.
Regards,
Lukas
Thanks Lukas.
This troubleshooting is made difficult by the intermittent nature of the failure as well as the handful of possible mistakes I've made. This statement by you is very helpful:
Either the callback address is not valid or the callback function is placed to program flash memory.
1. Either the callback address is not valid
Perhaps guilty. Yesterday I found a function with a 1kB buffer as a stack variable. The stack itself is only 1kB. It was a leaf function so it didn't result in stack corruption. The buffer overflowed into the heap (also 1kB). However, this application doesn't use heap memory. This is a real bug but it doesn't seem to be directly related to the failure.
2. or the callback program is placed to [in?] program flash memory
Definitely guilty. pSSDConfig->Callback was initially placed in program flash. I misunderstood the documentation as simply a warning that it not be placed in a program flash sector subject to read/write. I now understand the meaning of the documentation to be that this callback (in my case a watchdog trigger) must not be placed in program flash at all.
When troubleshooting Problem 1 I recall applying __attribute((section(".ram"))) to pSSDConfig->Callback in RAM and finding it to have no effect. Because the IBUSERR still occurred I removed the __attribute.
Yesterday I again placed pSSDConfig->Callback in RAM. This time I used readelf to check that the function was in fact placed in RAM. I found that it was not, and that my linker script had no .ram section. In that particular linker script it was called .code_ram. I applied __attribute((section(".code_ram"))) and confirmed that the function was relocated. It is likely that I never actually relocated the function during the troubleshooting of Problem 1.
I'm now able to do an A-B test by removing __attribute((section(".code_ram"))) from pSSDConfig->Callback and observing an intermittent IBUSERR. It doesn't occur on every flash operation.
There still remains an issue where FLASH_DRV_VerifySection(...) occasionally returns STATUS_ERROR. I will proceed under the assumption that this is not related to the issue described in this thread.
Summary:
Thanks also Lukas for your note on how to find the root cause of an IBUSERR. I will return to it in future.