Bus Fault Exception on accessing internal flash

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

Bus Fault Exception on accessing internal flash

27,590 次查看
bjoernjohanness
Contributor III

We're using a Kinetis K61 which causes a Bus Fault Exception (exception 5) when accessing a location in the internal flash (address 0xC0010).

As far as I can tell from UM, the Kinetis Internal Flash does not have any error detection or correction, correct? Yet I get the exception, how??. I've been searching high and low for any register flags pointing me to more details but there seems to be none. I've tried turning off cache the whole internal flash area without any difference. The area is corrupt but how can the K61 know that without ECC? BTW, the debugger also fails reading that same address. Mind boggling :smileyhappy:

The exact instruction that causes it and a printout from my UnhandledException handler which prints information at time of crash:

0x0809EB58 E8B15018  LDM           r1!,{r3-r4,r12,lr}

-------------------------------------------------------------------------------------

10:32:33.479> >Unhandled Exception!

10:32:33.479> SP:   0x1FFFCAF8

10:32:33.479> PC:   0x0809EB58(0x5018E8B1)

10:32:33.479> LR:   0x080C905B

10:32:33.479> PSR:  0x61000000

10:32:33.479> IRQ:  0 BASEPRI: 0x00

10:32:33.479> TASK: 30

10:32:33.479> ICSR: 0x00000805->Act:5,Pend:0

10:32:33.479> CFSR: 0x00008200 FSTAT:80

10:32:33.479> FAR:  0x000C0010<-Valid

10:32:33.479> SHCSR: 0x00070002

10:32:33.479> HFSR:  0x00000000 MCG: 0x0040026C

10:32:33.479> DFSR:  0x00000000 PMC: 0x00042231

10:32:33.479>   R0:  0x1FFFCB60 R7: 0x00000004

10:32:33.479>   R1:  0x000C0008 R8: 0x1FFFCB60

10:32:33.662>   R2:  0x00000000 R9: 0x000C0028

10:32:33.662>   R3:  0x00000038 R10: 0xAAAAAA1E

10:32:33.662>   R4:  0x6946794D R11: 0x00000000

10:32:33.662>   R5:  0x00000020 R12: 0x00000000

10:32:33.662>   R6:  0x000C0008 PSR: 0x61000000

10:32:33.662>  (LR:  0xFFFFFFFD PSR: 0x60000005)

10:32:33.662>

10:32:33.662> DDRCR30:00000001 IABR[0]:00000000

10:32:33.662> MCM_ISR:00020000 IABR[1]:00000000

10:32:33.662> MCM_FAD:28AA0020 IABR[2]:00000000

10:32:33.662> MCM_FAT:000001B3 IABR[3]:00000000

10:32:33.662> MCM_FDR:A8118003 MPU_CES:0001

10:32:33.662> DMA0_ES:00000000 MPU_ED0:00000000,EA0:00000000

10:32:33.662> DMA0_ER:00000000 MPU_ED1:00000000,EA1:00000000

10:32:33.662> SPI0_SR:C0004100 MPU_ED2:00000000,EA2:00000000

10:32:33.662> SPI1_SR:C2020303 MPU_ED3:00000000,EA3:00000000

10:32:33.662> SPI2_SR:00000000 MPU_ED4:00000000,EA4:00000000

20 回复数

22,086 次查看
GlebPlekhotko
Contributor II

A little addendum. 

I've checked the FLASH->INT_STATUS register and its ECC_ERR bit becomes set immediately after the FLASH_Init function call. Event if it is freshly erased by the Segger J-Flash microcontroller. A call to the FLASH_Erase function does not change a thing. If nothing is written there, then why there is the ECC error? Could somebody give an explanation?

Here is a code snippet:

 

#define FS_START_ADDR 0x0007B600

uint32_t flashBuf[FLASH_PAGE_SIZE / sizeof(uint32_t)];

void flashMemTest(void)
{
    uint32_t flashWord;
    uint32_t flashStatus = FLASH->INT_STATUS;  // Value is 0x00000004

    FLASH_Init(&flashConfig);
    flashConfig.modeConfig.readSingleWord.readWithEccOff = kFLASH_ReadWithEccOff;
    flashStatus = FLASH->INT_STATUS;  // Value is 0x0000000C
    FLASH->INT_CLR_STATUS = 0x0000000F;

    FLASH_Erase(&flashConfig, FS_START_ADDR, FLASH_PAGE_SIZE, kFLASH_ApiEraseKey);
    flashStatus = FLASH->INT_STATUS;  // Value is 0x0000000C
    FLASH->INT_CLR_STATUS = 0x0000000F;

    flashBuf[0] = 0x11223344;
    FLASH_Program(&flashConfig, FS_START_ADDR, (uint8_t *)flashBuf, sizeof flashBuf);
    flashStatus = FLASH->INT_STATUS;    // Value is 0x0000000C
    FLASH->INT_CLR_STATUS = 0x0000000F;
    flashWord = *((uint32_t*)FS_START_ADDR + 0);

    flashBuf[1] = 0x55667788;
    FLASH_Program(&flashConfig, FS_START_ADDR, (uint8_t *)flashBuf, sizeof flashBuf);
    flashStatus = FLASH->INT_STATUS;  // Value is 0x0000000C
    flashWord = *((uint32_t*)FS_START_ADDR + 1);

    while(1);
}

 

 

Note, that there is no bus fault in the example above.

0 项奖励
回复

22,168 次查看
GlebPlekhotko
Contributor II

Almost nine years later it is my turn to run into this issue.  

Now it is the LPC5528 and just like the original author of this thread, my attempt to access flash using the LDR instruction in the certain conditions triggers the "Bus Fault" or "Hard Fault" exception. Though I'm quite sure it is impossible to "fix" this issue currently, I'm still looking forward for the feedback from the support team, because there is no clear explanation in the available documentation. Or it is distributed among many documents which makes it hard to squeeze it out. So maybe the following few lines of text would be helpful for the future readers who might meet this issue.

First, how I actually run into it. It is an outcome of my attempts to save up the erase/program cycles. I've already described it in this thread. Long things short, I was going to erase a page once, and then populate the "obtained" pool of the erased space until it runs out. After that erase page and starts from the beginning. And this is how I've got the described issue.

Note, that exact type of the exception being risen depends on your configuration. The "Bus Fault" exception fires only if it is enabled in the register SHCSR, which is part of the System Control Block of the Cortex-M33. Otherwise the "Hard Fault" exception takes its place. So the issue might be reproduced using the following sequences.

Sequence 1 — Bus Fault — Read after erase

 

 

#define FS_START_ADDR 0x0007B600

uint32_t flashBuf[FLASH_PAGE_SIZE / sizeof(uint32_t)];

void flashMemTest(void)
{
  uint32_t flashWord;

  FLASH_Init(&flashConfig);
  flashConfig.modeConfig.readSingleWord.readWithEccOff = kFLASH_ReadWithEccOff;

  FLASH_Erase(&flashConfig, FS_START_ADDR, FLASH_PAGE_SIZE, kFLASH_ApiEraseKey);
  flashWord = *((uint32_t*)FS_START_ADDR);  // BUS FAULT

  while(1);
}

 

 

 

Sequence 2 — Bus Fault — Read after the second program

 

 

#define FS_START_ADDR 0x0007B600

uint32_t flashBuf[FLASH_PAGE_SIZE / sizeof(uint32_t)];

void flashMemTest(void)
{
    uint32_t flashWord;

    FLASH_Init(&flashConfig);
    flashConfig.modeConfig.readSingleWord.readWithEccOff = kFLASH_ReadWithEccOff;

    FLASH_Erase(&flashConfig, FS_START_ADDR, FLASH_PAGE_SIZE, kFLASH_ApiEraseKey);

    flashBuf[0] = 0x00000000;
    FLASH_Program(&flashConfig, FS_START_ADDR, (uint8_t *)flashBuf, sizeof flashBuf);
    flashWord = *((uint32_t*)FS_START_ADDR + 0);

    flashBuf[1] = 0x11223344;
    FLASH_Program(&flashConfig, FS_START_ADDR, (uint8_t *)flashBuf, sizeof flashBuf);
    flashWord = *((uint32_t*)FS_START_ADDR + 1);  // BUS FAULT
}

 

 

 

From my standpoint, it seems like the ECC module is in charge of this behavior. In particular, if in the second sequence one will change "flashBuf[0] = 0x00000000" to "flashBuf[0] = 0x11223344" no fault will be triggered. And it is very confusing, that freshly erased page cannot be accessed. It would be fine to have it populated with 0x00 (though, it really is) or 0xFF.

Nevertheless, it is completely fine to access flash under any conditions using the Flash API. No faults, no errors. Everything is completely fine. See the following example:

Sequence 3 — No Fault — Read using Flash API

 

 

#define FS_START_ADDR 0x0007B600

uint32_t flashBuf[FLASH_PAGE_SIZE / sizeof(uint32_t)];

void flashMemTest(void)
{
    uint32_t flashWord;

    FLASH_Init(&flashConfig);
    flashConfig.modeConfig.readSingleWord.readWithEccOff = kFLASH_ReadWithEccOff;

    FLASH_Erase(&flashConfig, FS_START_ADDR, FLASH_PAGE_SIZE, kFLASH_ApiEraseKey);
    FLASH_Read(&flashConfig, FS_START_ADDR, (uint8_t *)&flashWord, sizeof flashWord);

    while(1);
}

 

 

 

Reading the flash memory using the "FLASH_Read" function succeeds in any condition. One may use it as a workaround.

Despite the use case I've presented above is not a typical one and, I believe, highly discouraged by the developers, I'm still curious of the root cause causing of the issue. Is it really the ECC module? Or something else? Could it be disabled? Where can I get more information?

If anyone has a clue, please share.

0 项奖励
回复

25,090 次查看
aescov
Contributor I

Hi 

Was it ever clarified how to cope with this situation.  We are experiencing the same type of behavior on a K26.  We do not do double writes on purpose, but in rare situations we I get hard fault from flash access.

What is the recommeded way to handle this in software.

Best regards

Anders

0 项奖励
回复

25,090 次查看
julienblanc
Contributor I

We are encountering the same issue. While i understand that stressing the flash puts it in a bad state, it is extremely annoying to not have a way to detect why the hard fault is happening. The risk of confusing this specific fault (reading from a corrupted flash) with other faults is real.

How did you solve your issue ?

0 项奖励
回复

25,090 次查看
Kan_Li
NXP TechSupport
NXP TechSupport

Hi Bjoem Johannesson,

Would you please tell me the part number and mask set for that K61? Actually I tried to reproduce this issue on TWR-K70 but seems it works well as expected.

1.png

I am wondering if you have more K61s to check if the problem is also with other K61s, but if you just have one K61, would you please check how about the  result after performing other load instruction such as "LDR Rt, [Rn, #offset] Load Register with word" ?

Thanks for your patience!


Have a great day,
Kan

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 项奖励
回复

25,090 次查看
bjoernjohanness
Contributor III

Hi Kan, thanks for your reply!

The exception isn't coming from the executed instruction, it's because it reads from the internal flash address that's "corrupt". I do have several units and I can reproduce at will, here's a step-by-step:

  1. Erase address 0xC0010
  2. Program address 0xC0010 with a value, e.g. 0x12345678
  3. Now program address 0xC0010 with a different value, e.g. 0xC0CAC01A (I know this isn’t “allowed” but it triggers the issue)
  4. Read the address 0xC0010 (if read by microprocessor, e.g. using instruction above, this will cause the Bus Fault Exception)


Let me know if you can reproduce, thanks!! //bjoern

0 项奖励
回复

25,090 次查看
Kan_Li
NXP TechSupport
NXP TechSupport

Hi Bjoern,

I am sorry, but if you want to program a different value to the same address, you have to erase the sector contains this address firstly, otherwise it would cause bus fault exception, as specified in the spec. so it is not a hardware issue at all. I am wondering why you did it in that way. would you please help to clarify?


Have a great day,
Kan

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------------------------

0 项奖励
回复

25,090 次查看
bjoernjohanness
Contributor III

Kan, thanks for your reply.

I know programming the same location twice is not allowed without erasing it in between (and we wouldn't normally do that). However, it could still happen and our SW needs to be able to handle it. Therefore we need to have a way to detect it.

Three questions:

  • You mention "it would cause bus fault exception, as specified in the spec", I cannot find anything like that. What specification and where in it are you referring to?
  • Once it has occurred (in the bus fault exception handler), how do I detect that it was an internal-flash-read-operation that caused the exception? Which register-flag indicate this?
  • I can understand if I get a bus fault while performing the re-write of an address (without erase in between), but how does the internal flash "know" to signal a bus exception later (after reset etc)? Does it have ECC (not mentioned anywhere)?
0 项奖励
回复

25,090 次查看
ndavies
Contributor V

You may also want to look at this thread: Tracking down Hard Faults. It links to ways to deal with the hard faults.

0 项奖励
回复

25,090 次查看
bjoernjohanness
Contributor III

Norm, I really appreciate your input :smileygrin: I have something similar in my SW.


The problem here is that this behavior (Bus Fault) is undocumented and has no flags indicating the source of the fault. There's no explicit flag to clear, hence when I try to return from the exception handler the Bus Fault is still active and I end up back in the exception handler :smileysad:


The BFAR register hints at the source but doesn't explicitly tell me that the flash is corrupt and won't allow me to clear the source of the Bus Fault so I can return.

//bjoern

0 项奖励
回复

25,090 次查看
DavidS
NXP Employee
NXP Employee

Hi Bjoern,

I have found looking at the IPSR and ICSR registers helpful when you can set a breakpoint in your exception routine.

What debugger tool are you using?

Can you set a breakpoint in the exception routine?

Can you look at the callstack and go "backwards" to see what routine and "C" code specifically was trying to execute?

Are you running MQX or baremetal code or other?

What hardware are you running on?

The more you can show the better we can understand the issue.

Regards,

David

0 项奖励
回复

25,090 次查看
bjoernjohanness
Contributor III

Hi David, thanks for replying!

I agree those registers are good, if you look at my initial post those are in my crash-report :smileyhappy: Unfortunately there's nothing indicating the source of the bus fault and that is also my issue (I mean I know it's caused by reading from internal flash but there's no flag explicitly indicating that).

Yes I can easily catch it in the exception handler, the routine that's causing the bus fault is memcpy(). The root cause is reading from internal flash address 0x1C0010 which I've corrupted on purpose. No-where can I find anything saying that reading internal flash could EVER cause a Bus Fault.

It shouldn't matter in regards to my issue but my board is similar design to Tower K70, running modified Quadros RTOS and I use uVision.

0 项奖励
回复

25,090 次查看
DavidS
NXP Employee
NXP Employee

Hi Bjoern,

It is not documented well and I will see if we can improve that somehow with a long answer but...

the short answer is you are generating a flash overprogramming error that corrupts the data when more than one flash programming has occurred to a previously programmed location.

When you do the second program command (writing 0xC0CAC01A) you should see the FSTATA[MGSTAT0] bit set indicating there is a problem.

Regards,

David

25,094 次查看
bjoernjohanness
Contributor III

David,

Alright, sounds good. I'd really appreciate to have that mentioned in an official document :smileyhappy: Thanks!

Does this mean that the K61 internal flash has some sort of ECC? If not, how can it detect corrupt data even after a reset? Let me guess that each phrase (8 bytes) has its own ECC, that would explain the "whole-phrase-programming-only" requirement.

Your comment on FSTATA[MGSTAT0]: that only applies to the actual programming right? I.e. not when reading that corrupt data later (lets say following a reset)?

0 项奖励
回复

25,094 次查看
DavidS
NXP Employee
NXP Employee

Hi Bjoern,

Sorry for delay.  A slight cold set me back.

Correct that the FSTATA[MGSTAT0] only sets during the write to previously written flash location(s) and not during a read.

Our flash does have smarts (IP-Intellectual Property) that is not documented to notify user of the programming error.  Your code should test for that condition.

Regards,

David

25,094 次查看
bjoernjohanness
Contributor III

Interesting! Is there a purpose it's not documented? Seems like an extremely vital thing to have documented.

The actual programming is not an issue for me, I do test for that condition during normal programming. I understand that we "shouldn't" program the same area twice without erase and that's not really the issue. I only use this method to trigger the non-documented bus fault. I imagine that the same bus fault would be triggered once the flash get's worn out for example.

It's a MAJOR issue if reading from internal flash can cause a non-documented bus fault. (Actually, any non-documented fault/exception is a MAJOR issue). I mean, how are we supposed to handle that if it's not described. Sure, the read data is corrupt but that should't render the microcontroller completely useless, right? It needs to be handled gracefully but as far as I can tell you can't even mask the exception? I hope Freescale understands how serious this is.

Thanks for your help David!! //bjoern

0 项奖励
回复

25,094 次查看
DavidS
NXP Employee
NXP Employee

Hi Bjoern,

Once you have written to the same flash location without an erase step between them, you are running the device out of specification.  When operating the device out of specification then we cannot guarantee the operations of the device.

The solution is when you have tested for the error and found the error, then make sure you have erased the sector before flashing again.

Note that continued double (or more) writes to the same flash location will overstress the device.  The Reference Manual has the following in the Flash Chapter:

CAUTION

A flash memory location must be in the erased state before

being programmed. Cumulative programming of bits (back-toback

program operations without an intervening erase) within a

flash memory location is not allowed. Re-programming of

existing 0s to 0 is not allowed as this overstresses the device.

Regards,

David

0 项奖励
回复

25,094 次查看
bjoernjohanness
Contributor III

Yes, you are right and I've seen that too.


In this particular case I do over-program but the same result is expected due to (A) power-loss, (B) wear-out or (C) aging; whenever the error-checking is incorrect, right? Let's focus on those three scenarios instead so we don't operate out-of-spec (no more over-programming).

Over time, we can conclude they will happen. And since it will happen, we will have to handle it in SW. Therefore, we need at the very least:

  1. Documentation, ECC is not even mentioned and obviously this has a huge impact on our SW-design
  2. Indication of what caused the exception (a FTFE-ECC-error flag etc.)
  3. Disabling the exception from happening

25,094 次查看
Kan_Li
NXP TechSupport
NXP TechSupport

Hi Bjoern,

The spec I was referring to is the RM, you may find the following statement in the chapter for FMC:

1.png

and when bus fault happens, you may find the reason and address info in the regiters from ARM Cortex M4 core, please refer to the following for details.

2.png

Hope that helps,


Have a great day,
Kan

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 项奖励
回复

25,090 次查看
bjoernjohanness
Contributor III

Kan,

Thanks for clarifying the reference :smileyhappy: and pointing out BFAR.

From RM: "A write operation to program flash memory or to FlexNVM used as data flash memory results in a bus error"

=> That reference only talks about write-operations to the flash, right? Actually, it's not really related to my read-operation causing a Bus Fault but instead pointing out that you can't perform write-operations to flash.

No-where can I find anything saying that reading internal flash could EVER cause a Bus Fault. This is undocumented behavior and there's no flag anywhere indicating what caused the Bus Fault, possibly one could argue BFAR is hinting at it implicitly.

//bjoern

0 项奖励
回复