Why does Vybrid NAND driver skip ECC check for 1st page in every erase block?

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

Why does Vybrid NAND driver skip ECC check for 1st page in every erase block?

跳至解决方案
4,741 次查看
dmitrykonyshev
Contributor II

Hi All,

I noticed that the U-Boot and Linux drivers for Vybrid disable hardware ECC calculation for the 1st page of an erase block. Here is a related excerpt from drivers/mtd/nand/fsl_nfc.c:

        switch (command) {

        case NAND_CMD_PAGEPROG:

                if (!(prv->page%0x40) && !prv->pg_boot)

                        nfc_set_field(mtd, NFC_FLASH_CONFIG,

                                CONFIG_ECC_MODE_MASK,

                                CONFIG_ECC_MODE_SHIFT, ECC_BYPASS);

Can someone please explain the purpose of this?

Regards,

Dmitry

标签 (2)
0 项奖励
回复
1 解答
3,834 次查看
kef2
Senior Contributor V

Hi Dmitry,

I believe it is a bug. Perhaps it there was some smart plan to keep manufacturer marked bad blocks untouched or something. In this thread you may find how I fixed fsl_nfc.c driver Can I get working sources of U-BOOT 2013.07, nand? Please let me know if you see any problems with it.

Regards,

Edward

在原帖中查看解决方案

0 项奖励
回复
9 回复数
3,834 次查看
kef2
Senior Contributor V

Bill,

Current u-boot 2013.07 from Timesys doesn't turn HW ECC for boot page, also doesn't turn off HW ECC for any other page u-boot is commanded to flash. And boot ROM doesn't mind where FCB page is HW ECC protected or not. Boot ROM just reads page containing FCB struct, and SW ECC checks it agains FCB ECC data stored on the same page, OOB doesn't matter. I wonder why and how page%0x40 in u-boot's fsl_nfc.c  doesn't switch HW ECC  for eny page u-boot writes, but it is fact. I HW ECC checked all the flash N times. HW ECC fails only on empty pages.

Edward

0 项奖励
回复
3,834 次查看
billpringlemeir
Contributor V

Right, well I was referring to the history of the driver.  I think that the u-boot versions are copies of the Linux versions.  You, I and possibly Dmitry are all seeing that the check is not needed.  I was just pointing out that the check is some vestige from a port of a Linux driver which does (try to) support HW_ECC.  I was also referring to the TimeSys Linux driver as well when it comes to HW_ECC support.

HW ECC fails only on empty pages.

This is a well known issue.  The driver needs to handle this.  However, it is problematic.

[RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections.  - Stephen Agner

     [RFC 2/5] mtd:fsl_nfc: Add hardware 45 byte BHC-ECC support for 24 bit corrections. - Bill Pringlemeir

The driver I posted to the MTD mailing list handles hardware ECC and erased pages that have no stuck at zero errors.

You can see in fsl_nfc.c line 666 that the driver has the ECC status check disabled.  So the TimeSys version with HW_ECC just accepts everything.  As time progresses and pages actually fail, the driver will report everything is fine to the higher layers.  What happens will depend on the filesystem in use.

The u-boot driver is in u-boot.imx tree as well.  I haven't used it yet, but I believe it supports both software and hardware ECC.

0 项奖励
回复
3,834 次查看
kef2
Senior Contributor V

Hi Bill,

I'm quite new to Linux (kernel) programming, I'm too old to start reading all the mailing lists on the Earth. Looks like I'm reinventing the wheel and you have already done fsl_nfc.c, which supports HW ECC on Vybrid, isn't it? If so, could you please share your code?

It looks like my fsl_nfc.c version needs to regard big endian MCF's, which seem using the same driver, right? There must be #if #else's for big and low endian machines, or special #if #elses for Vybrid, MCF, etc..

No, u-boot versions use similar, but not the same driver, at least in Timesys U-boot. Dmitry excerpt is from Linux driver.

I'm puzzled by u-boot driver, I see in it these lines:

if (!prv->pg_boot) {

  if (hardware_ecc)
   nfc_set_field(mtd, NFC_FLASH_CONFIG,
    CONFIG_ECC_MODE_MASK,
    CONFIG_ECC_MODE_SHIFT, ECC_45_BYTE);
  else
   /* set ECC BY_PASS */
   nfc_set_field(mtd, NFC_FLASH_CONFIG,
    CONFIG_ECC_MODE_MASK,
    CONFIG_ECC_MODE_SHIFT, ECC_BYPASS);

  if (!(page%0x40))
   nfc_set_field(mtd, NFC_FLASH_CONFIG,
    CONFIG_ECC_MODE_MASK,
    CONFIG_ECC_MODE_SHIFT, ECC_BYPASS);
}

switch (command) {
case NAND_CMD_PAGEPROG:
  fsl_nfc_send_cmd(mtd,
    PROGRAM_PAGE_CMD_BYTE1,
    PROGRAM_PAGE_CMD_BYTE2,
    PROGRAM_PAGE_CMD_CODE);
  break;

Looks like there must be some pages written with HW ECC bypassed, but flash checks using MQX code show, that there's no page with bad ECC failure except un programmed pages.

Thanks for the links to erased page ECC discussion. Interesting. Erased page with a couple of broken 0bits could be programmed and read well with the help of ECC. But what's better:

- erased page has some broken 0 bit(s), we signal unrecoverable error for such page

- page is programmed to all 1's except few 0's. ECC fails with unrecoverable error, 0 bit counting gives an answer that everything's well, and driver flips all 0's to 1's and gives two false reports, ECC OK, and data  = all 1's.

I think the first is better. Robust solution would be to not allow un-programmed pages, but this would slow down NAND support very much and lower NAND life time...

Regards,

Edward

3,834 次查看
billpringlemeir
Contributor V

Can someone please explain the purpose of this?

My best explanation is related to the NAND boot pages.  To write a NAND boot page, the controller turns all ECC off and does a fixed ECC check in the data.  However, it seems that the controller ignores the OOB data in this mode and it is perfectly fine to leave the ECC on.  There maybe some other version of the controller (non-Vybrid) that expects the OOB data to be zero.  I think the statement '!(prv->page%0x40) && !prv->pg_boot'  is at least showing it is something to do with the NAND boot.  I also found that it was not needed.  As well as the TimeSys U-Boot and Linux, there is now support in the u-boot.imx tree that was done by Stephen Agner.  I also posted some patches to the Linux MTD mailing list (and ARM Linux) for an older version of the Linux Mainline.  This driver is fairly easy to update to the current mainline Linux; it is approximately 4x faster than the TimeSys version.  Some of the updates Stephen made to the u-boot version should be merged to this code.  Especially, it makes assumptions about how the compiler will in-line things in order to improve performance.

In fact, the way the hardware ECC is written in the TimeSys Linux, if hardware ECC fails the error is just ignored.  So hardware ECC is like no ECC at all, as far as I recall.  This was due to a documentation errata in the position of the hardware ECC status.  For the Vybrid, it is +4 and for big endian it is +7.  The latest Vybrid manual corrects this.

3,835 次查看
kef2
Senior Contributor V

Hi Dmitry,

I believe it is a bug. Perhaps it there was some smart plan to keep manufacturer marked bad blocks untouched or something. In this thread you may find how I fixed fsl_nfc.c driver Can I get working sources of U-BOOT 2013.07, nand? Please let me know if you see any problems with it.

Regards,

Edward

0 项奖励
回复
3,834 次查看
sergei_p
Contributor III

Hello Freescale Support, Edward,

Edward, thanks for the reference to your version of the driver. I tried it but it doesn't work for me, unfortunately.

We are running 3.0.15 Linux on a VF6-based System-On-Module device which has Spansion S34ML04G1 device attached (2k page +64OOB). The device is  8bit width so I made respective changes to the driver.

The test I'm

~ # cp /bin/busybox /mnt

~ # ummxc_nand: ECC uncorrectable errors on page 3ff80!

mtd->read(0x940 bytes from 0x69a0074) returned ECC error

Data CRC failed on REF_PRISTINE data node at 0x069a0074: Read 0xe361caf6, calculated 0xeb8a5b88

The test I'm trying is to erase and mount a JFFS2 partition on a running system:

~ # cp /bin/busybox /mnt

~ # ummxc_nand: ECC uncorrectable errors on page 3ff80!

mtd->read(0x940 bytes from 0x69a0074) returned ECC error

Data CRC failed on REF_PRISTINE data node at 0x069a0074: Read 0xe361caf6, calculated 0xeb8a5b88

~ # nanddump -o -p -s 0x069a0000 /dev/mtd10

ECC failemxc_nand: ECC uncorrectable errors on page 3ff80!

d: 1

ECC corrected: 0

Number of bad blocks: 0

Number of bbt blocks: 0

Block size 131072, page size 2048, OOB size 64

Dumping data starting at 0x069a0000 and ending at 0x069e0000...

ECC: 1 uncorrectable bitflip(s) at offset 0x069a0000

...

0x069a07f0: 4b d3 08 e6 e7 12 76 e2 9b 60 e7 75 85 32 4e da

  OOB Data: ff ff ff ff 85 19 03 20 08 00 00 00 ff ff ff ff

  OOB Data: ae ff ff ff cb f7 4d f7 b6 fd 7b df e7 c5 f3 f8

  OOB Data: f3 ef e7 eb ea 79 7f 73 d7 c5 fb 8d 5f be 78 3d

  OOB Data: e7 fb fb f7 db d2 ed d7 ed b6 ff 6d fd ea fa e7

...

Looks like the ECC45 layout you are using is close enough to my case.

What is related is that the Bill's driver doesn't work with JFFS2 at all (partition is unable to mount), however with your version of  the ECC45 layout it started to mount a partition (however showed the similar errors on re-mounting after writing a file).

Freescale support - could you please clarify the ECC layout used by the VF6 NFC controller?

Any help will be much appreciated.

Regards,

Sergei

0 项奖励
回复
3,834 次查看
kef2
Senior Contributor V

Hello Sergei,

I've switched from JFFS2 to UBIFS due to enormously long JFFS2 mount time. And I've no problems (yet?) using HW ECC with UBI. Could you try UBIFS?

I just tried it with JFFS2. You are right. Looks like JFFS2 doesn't request block erase after it writes clean marker to first page in the block. Thanks for making me aware. I still believe that it is a bug suppressing ECC on first page in the block just because JFFS2 doesn't want to erase. Certainly it is not a good idea to skip using ECC protection. What could be done to fix it:

- JFFS2 could be forced to erase block, which was made dirty with "clean mark", or perhaps avoid using "clean marks" at all, don't know if 2nd is possible.

- Config option in the driver to check if page was erased before write. If dirty, erase whole block, modify with new data and reprogram pages, which were not empty. This would slow down driver quite a lot, even if done only for first page in the block.

- ???? more anyone's ideas?

Edward

0 项奖励
回复
3,834 次查看
sergei_p
Contributor III

Hi Edward,

Thanks for the feedback. I tried UBIFS and it works fine. However I think that UBIFS doesn't use OOB data, unlike of JFFS2, so ECC layout seems irrelevant.

I will think over the JFFS2 thing and will let you know.

Thanks!

Regards,

Sergei

0 项奖励
回复
3,834 次查看
kef2
Senior Contributor V

For JFFS2 (only) purposes I think it is possible to modify driver to do separate reads and writes to data and "OOB" areas. 2048+64 page can be split for example to 2048+4+46 and 14 byte parts. 14 bytes part could be used for OOB spare bytes with 0 or 8 byte ECC protection. 2048+4+46 area could be used for data, bad block mark and 45/46 byte ECC.

Edward

0 项奖励
回复