MPC5566 bus error occurring only for flash LAS block 0

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MPC5566 bus error occurring only for flash LAS block 0

986 Views
andrew_henderso
Contributor II

Hello all. I inherited an MPC5566 application that uses the two flash LAS 16K blocks (at 0x0000 and 0x1C000) for a ping-pong buffering scheme. The erase/writes of these ping-pong buffers does not occur that often, so the expected endurance of the flash blocks we're using is quite long. We've had a few isolated reports come in from our customers of boards refusing to boot after running for time periods ranging from weeks (~2000 writes) to years (~60000 writes). The RMA'd boards that I have examined all show the same symptom: LAS block 0 (the "ping" buffer) gives a bus error on access. Using a Lauterbach JTAG debugger to examine the flash memory of a returned board, all of the bytes in block 0 are shown as "??". The Lauterback Trace32 application refuses to dump bytes from this block and reports a bus error. I am able to dump bytes from all other blocks in flash and have verified that they are as expected via CRC checks.

I've tried to reproduce the error condition here by using flash locking/unlocking and creating ECC conditions per application note AN5200 ("Error Correcting Codes Implemented on MPC55xx and MPC56xx Devices"). But, in both of these cases, I can still access the flash data in the block without receiving a bus error via the JTAG. The bad boards coming in from the field give me a bus error on any data access in block 0. If I use Trace32 to delete the corrupt/bad block 0, the bus errors go away and the system behavior returns to normal (the firmware detects the "0xFF" pattern of the empty block and repairs the missing "ping" buffer using the data from the "pong" buffer). 

When accessing block 0 gives bus errors, the MPC5566 Boot Assist Module is still able to launch our bootloader (located in LAS block 1 at 0x4000) as it normally does, so it isn't like there is an RCHW in the bad block 0 that is getting in the way. BAM is able to recognize that block 0 does not contain a valid RCHW and moves on to booting from block 1 (which has a valid RCHW at the start of it).

My questions are:

1. Has anyone seen behavior similar to this? A single flash block giving bus errors when you try to view it via JTAG?

2. Is there are a particular flash control register that I should be looking at to diagnose why only one flash block is acting like this? When looking through the MPC5566 reference manual, I'm not seeing anything that would disable reading for a single block. You can lock against erase and writing, but I didn't see anything related to blocking reading (short of disabling the flash as a whole). You can also disable flash as a whole, but I didn't see anything related to disabling individual blocks.

3. How can I recreate this issue by disabling/locking that first flash block programmatically? What register writes might help me to do this on command? Right now, I'm limited to the few boards showing the issue that are coming in from the field, so I can't experiment as much as I'd like to to track down the root cause in the firmware.

Thank you for your help! 

Labels (1)
5 Replies

757 Views
davidtosenovjan
NXP TechSupport
NXP TechSupport

Hi, if I understand you well, you see the whole LAS block 0 full of ECC errors (addresses 0x0 - 0x0000_3FFF). Is this correct. Could you share screenshot for TRACE32 dump window?

0 Kudos

757 Views
andrew_henderso
Contributor II

Sure thing. Here is a snapshot of the memory boundary between LAS block 0 and block 1:

memdump.png

0 Kudos

757 Views
davidtosenovjan
NXP TechSupport
NXP TechSupport

These sort of errors is usually caused by unexpected reset during flash erase or flash program operation.

 

In this case it’ll be erase operation. In fact erase operation consist of 4 sub-operations - Program, Erase, Compaction and Soft Program.

 

Details you may find in the AN4521, Figure 1:

https://www.nxp.com/docs/en/application-note/AN4521.pdf

 

Even though this appnote speaks about different flash type than used with MPC5566, it’ll be very similar.

 

To sum it up - this can just happen and your SW must count with the option a flash block may be invalid after POR.

 

Regarding ECC error injection - with using of Lauterbach TRACE you can fill a portion of flash memory by ECC errors by overporgramming by script commands like this:

 

flash.reprogram 0x30000--0x3FFFF

data.set ea:0x30000 %quad 0x0045000000000000

flash.reprogram OFF

 

flash.program 0x30000--0x3FFFF

data.set ea:0x30000 %quad 0x0058000000000000

flash.program OFF

 

data.dump ea:0x30000--0x3FFFF

Hope it helps

757 Views
andrew_henderso
Contributor II

Hello David. I played around with the Trace32 commands that you provided and verified that the ECC errors are what is causing the "??" on the data dump that we are seeing. I had previously tried writing those same "0x0045..." and "0x0058..." values programmatically from within the bootloader to trigger ECC errors (after reading appnote AN5200), but the second write would fail and I didn't see the "??" on the dump. Writing those values through Trace32 is able to trigger it, though. 

It looks like the core issue is that we're not verifying the block erase after it occurs, and on very rare occasions it is failing. I'll review our firmware to determine the best method of detection/correction for this, but the data in the AN4521 app note is a good place for us to start our investigation.

Thank you for your help!

0 Kudos

757 Views
davidtosenovjan
NXP TechSupport
NXP TechSupport

OK. I would just note that recovering from depleted bits (section 4.2) is as issue specific to C90FL flash type.

MPC5566 uses H7Fa flash and depletion issue will not happen there.

In other aspects AN4521 may be valid for the rest of MPC55xx internal flashes as well.

0 Kudos