access to static variable in RAM by flash peripheral results in IBUSERR

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

access to static variable in RAM by flash peripheral results in IBUSERR

Jump to solution
1,181 Views
driftregion
Contributor I

Problem Statement: access to RAM by the flash peripheral during FLASH_DRV_Program() results in an IBUSERR

Background:

  • processor: S32K142
  • toolchain: arm-none-eabi-  version 10

Observable behavior:

Processor hard faults during flash write procedure.

Problem 1. Initial investigation and fix:

IBUSERR results from access of a const flash_ssd_config_t * configuration structure instance which was declared as static.  The bus error was precise and load instruction was explicit, occurring in flash_driver.c:95

DISABLE_CHECK_RAMSECTION_FUNCTION_CALL\
(pSSDConfig->CallBack)();\
ENABLE_CHECK_RAMSECTION_FUNCTION_CALL\
 

The static type-qualifier on pSSDConfig was removed and the Hard Fault ceased to occur.

Problem 2. Subsequent investigation and fix:

A month later a similar problem occurred. This time, the bus error was inexact. A stack dump and some guessing implicated the memory pointed to by pData in the function below:

status_t FLASH_DRV_Program(const flash_ssd_config_t * pSSDConfig, uint32_t dest, uint32_t size, const uint8_t * pData);

It too was declared static. Removing the static type-qualifier again caused the problem to disappear.

Observations and Reproducibility:

Everything I found about IBUSERR online was the result of an illegal load, for example: https://wiki.segger.com/Cortex-M_Fault#Illegal_Function_Execution where illegal means a reserved address.

However, using readelf I found that the absence of the static type-qualifier doesn't change the address of the given symbol in RAM. The only change in the output of readelf is that the Bind column of the given symbol changes from LOCAL to GLOBAL. Therefore, whether or not an access is illegal depends on more than just the address.

  • there seems to be a strong relationship between the binding (LOCAL/GLOBAL) of the memory passed to FLASH_DRV_Program() and the occurrence of an IBUSERR.
  • There is some additional input that results in IBUSERR beyond just the binding of the memory passed to FLASH_DRV_Program().  For example, when the codebase is reverted to immediately after the fix of Problem 1, IBUSERR has not been observed to occur. When the codebase is at the state at which Problem 2 was observed, the IBUSERR does not occur on every write. The frequency of occurrence is sometimes high and sometimes low.

Reading:

I'm reading the ARMv7-M Architecture Reference Manual:

  • A3.5 Memory types
  • A3.6 Access rights

which mentions that a MemManage exception might be triggered on illegal accesses.  I've reverted the codebase to the state at which Problem 2 was observed and added a MemManage_Handler but have not yet been able to successfully reproduce the IBUSERR.

Question:

  • What attribute other than its value makes an address illegal such that loading it results in an IBUSERR?
0 Kudos
1 Solution
1,163 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

this is not a problem of attributes, this is related to callback function and Read-While-Write error.

Mentioned line of code:

(pSSDConfig->CallBack)();\

... is a call of callback function. Because you got IBUSERR, it's not a problem of RAM or problem of this pSSDConfig structure. It's a problem of instruction fetch, not data access. Either the callback address is not valid or the callback function is placed to program flash memory.

Program flash of S32K142 consists of one read partition only. That means when you program or erase program flash, the code can run only from RAM or from data flash. If you access (either by instruction fetch or by data access) program flash during program or erase operation, it will lead to bus error. So, make sure that callback function is placed to RAM and that this callback function does not access program flash.

Let me also explain how to find the root cause when IBUSERR occurs:

For test purposes, I can try to jump to invalid address (somewhere behind the flash, for example):

typedef void (*func_ptr)(); // pointer to function type

(*(func_ptr)0x00080000)();

When running this code, fault handler is triggered and I can see that IBUSERR is set:

lukaszadrapa_0-1653039828412.png

Now it's time to check the stack content. You can take a look at Figure 2 in:

https://www.nxp.com/docs/en/application-note/AN12201.pdf

... which shows the stack frame. What I can see in my debugger:

lukaszadrapa_1-1653040159908.png

This stack frame is created when the exception is triggered. The most interesting is program counter PC (this is captured at the moment when exception is triggered) and link register LR - in this case, it's address of instruction right behind the instruction which caused the error (i.e. return address).

What I can see at this address 0x6FC (the last bit is set due to thumb instruction set, so the value is 0x6FD):

lukaszadrapa_2-1653040495642.png

Now I can see that it was jump to address stored in r3. This can be seen also in stack frame - r3 still contains this address and PC also shows that this was the problem.

And now I can check also this address:

lukaszadrapa_3-1653040615910.png

Here I can see that the bus error was triggered because I jumped to unimplemented address space.

Regards,

Lukas

View solution in original post

2 Replies
1,164 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

this is not a problem of attributes, this is related to callback function and Read-While-Write error.

Mentioned line of code:

(pSSDConfig->CallBack)();\

... is a call of callback function. Because you got IBUSERR, it's not a problem of RAM or problem of this pSSDConfig structure. It's a problem of instruction fetch, not data access. Either the callback address is not valid or the callback function is placed to program flash memory.

Program flash of S32K142 consists of one read partition only. That means when you program or erase program flash, the code can run only from RAM or from data flash. If you access (either by instruction fetch or by data access) program flash during program or erase operation, it will lead to bus error. So, make sure that callback function is placed to RAM and that this callback function does not access program flash.

Let me also explain how to find the root cause when IBUSERR occurs:

For test purposes, I can try to jump to invalid address (somewhere behind the flash, for example):

typedef void (*func_ptr)(); // pointer to function type

(*(func_ptr)0x00080000)();

When running this code, fault handler is triggered and I can see that IBUSERR is set:

lukaszadrapa_0-1653039828412.png

Now it's time to check the stack content. You can take a look at Figure 2 in:

https://www.nxp.com/docs/en/application-note/AN12201.pdf

... which shows the stack frame. What I can see in my debugger:

lukaszadrapa_1-1653040159908.png

This stack frame is created when the exception is triggered. The most interesting is program counter PC (this is captured at the moment when exception is triggered) and link register LR - in this case, it's address of instruction right behind the instruction which caused the error (i.e. return address).

What I can see at this address 0x6FC (the last bit is set due to thumb instruction set, so the value is 0x6FD):

lukaszadrapa_2-1653040495642.png

Now I can see that it was jump to address stored in r3. This can be seen also in stack frame - r3 still contains this address and PC also shows that this was the problem.

And now I can check also this address:

lukaszadrapa_3-1653040615910.png

Here I can see that the bus error was triggered because I jumped to unimplemented address space.

Regards,

Lukas

1,155 Views
driftregion
Contributor I

Thanks Lukas.

This troubleshooting is made difficult by the intermittent nature of the failure as well as the handful of possible mistakes I've made. This statement by you is very helpful:

Either the callback address is not valid or the callback function is placed to program flash memory.

1. Either the callback address is not valid

Perhaps guilty. Yesterday I found a function with a 1kB buffer as a stack variable. The stack itself is only 1kB. It was a leaf function so it didn't result in stack corruption. The buffer overflowed into the heap (also 1kB). However, this application doesn't use heap memory. This is a real bug but it doesn't seem to be directly related to the failure.

2. or the callback program is placed to [in?] program flash memory

Definitely guilty. pSSDConfig->Callback was initially placed in program flash. I misunderstood the documentation as simply a warning that it not be placed in a program flash sector subject to read/write. I now understand the meaning of the documentation to be that this callback (in my case a watchdog trigger) must not be placed in program flash at all.

When troubleshooting Problem 1 I recall applying __attribute((section(".ram"))) to  pSSDConfig->Callback in RAM and finding it to have no effect. Because the IBUSERR still occurred I removed the __attribute.

Yesterday I again placed pSSDConfig->Callback in RAM. This time I used readelf to check that the function was in fact placed in RAM. I found that it was not, and that my linker script had no .ram section. In that particular linker script it was called .code_ram. I applied __attribute((section(".code_ram"))) and confirmed that the function was relocated. It is likely that I never actually relocated the function during the troubleshooting of Problem 1.

I'm now able to do an A-B test by removing __attribute((section(".code_ram"))) from pSSDConfig->Callback and observing an intermittent IBUSERR. It doesn't occur on every flash operation.

There still remains an issue where FLASH_DRV_VerifySection(...) occasionally returns STATUS_ERROR. I will proceed under the assumption that this is not related to the issue described in this thread.

Summary:

  • the GCC ld flag -orphan-handling=error may be able to warn about nonexistent linker file sections, although it may not yet be supported in arm-none-eabi-gcc
  • the GCC flag -Wframe-larger-than=1024 may be used to warn about large variables on the stack

Thanks also Lukas for your note on how to find the root cause of an IBUSERR. I will return to it in future.

0 Kudos