MPC-5125 Micron NAND Flash Boot Problem

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MPC-5125 Micron NAND Flash Boot Problem

1,240 Views
Tim562
Senior Contributor I

Hi All,

     I am seeing occasional failures of our products to boot and run after a power on or reset event. I am using MQX 3.8.1.1 with the (mostly) stock bootstrap/bootloader. The processor is retrieving the boot code from a Micron MT29F16G08ABABAWP NAND Flash part. The original NAND flash part used on the Freescale tower module went EOL so I chose this replacement part and tweaked the bootloader code for the different internal architecture of the newer flash part. Some units, after some variable time in service (hours, days, weeks, months, etc) will fail to boot after a power up event. When this happens, the only cure seems to be to re-fresh the bootstrap/bootloader code to the NAND flash device. Once this is done, the part boots normally again for some variable time. If the flash chip is replaced, sometimes the problem continues but more often the problem seems to be cured.

 

     Anybody else having any similar problems with there NAND flash parts? I've begun to wonder if perhaps NAND flash isn't an appropriate technology for booting a processor given the fact that cells can go bad in the blocks that the processor reads at boot time. Since the processor boot process is always going to go to the same locations in the flash to read boot code, the boot code stored at those locations (Blocks[0] and up) always needs to be correct. I had assumed that since SLC NAND flash is good for around 100,000 writes, and it's only written once or twice in our products, that the potential for a flash cell failure wouldn't be a problem. Maybe I have this wrong? Anybody else using NAND flash to boot their processors? Anybody else seeing anything similar? Thanks!

 

Best Regards,

Tim

Labels (1)
Tags (2)
0 Kudos
2 Replies

873 Views
Tim562
Senior Contributor I

When I observe the communications between the processor and the NAND flash in a target that is trying (unsuccessfully) to boot, I see an endless series of page read commands. At first I thought that this was because the processor isn't finding the required bootloader data. But, the processor manual indicates that the processor boot up logic will attempt to read a bootstrap utility from flash page[0], failing that it will look to page[256], then page[512] and finally page[768]. If all of those reads fail, the manual says the processor will abandon the attempt to boot and I should see no more flash page reads, right? I'm guessing that when the manual says "fails to read" it is implying an ECC failure right? Because how would the processor boot logic to determine if data read from the flash constitutes a valid bootstrap utility or not, right?

Would the continuous page reads I'm seeing possibly indicate that the bootstrap utility was found, is running, and that it is attempting (unsuccessfully) to read a valid bootloader? The bootloader has a 32 bit checksum value recorded when it's written so if the data wasn't correct, the bootstrap code could detect that.  If it can't read a valid bootloader it reboots the processor and the whole process starts again?

I am able to correct this problem by just re-writing the bootstrap/bootloader to the flash but it will eventually just fail again. Any thoughts? thanks!

Best,

Tim

0 Kudos

874 Views
Tim562
Senior Contributor I

Hi All, in case anyone following this finds themselves in a similar situation, here was the resolution in my case.

The constant NANDFLASH_PHYSICAL_PAGE_SIZE, in the ...mqx\source\bsp\twrmpc5125\bsp_priv.h file was modified from 4096 to 2112. The 2112 value represents a 2048 byte data area plus 4 bytes for bad block markers plus 60 bytes for ECC data. 2048 + 4 + 60 = 2112. It is necessary to add 64 bytes to the 2048 page not just the 60 byte ECC data because the Freescale NFC performs 64 bit read/write operations. When reads and writes are setup they need to be specified to start and end on a 8 byte (64 bit) boundary. The thinking is that when the page reads were not configured to happen on an 8 byte boundary, the data delivered to the ECC engine (from those reads) to be used for correction of the page data might be incorrect. Not really sure how the NFC handles reads and writes that are not spec'd on a 8 byte boundary. If the ECC engine has incorrect ECC data it will end up "correcting" data from the page read that doesn't need to be corrected (in other words, corrupting it). Hope this helps,

Best,

Tim

0 Kudos