Hi Victor,
we don't know exactly if both issues have the same reason, some more details:
1.) The unstable state were also reproduced on EVK with following scenario:
- The device generate a watchdog reset while a file is unzipped to the DDR2.
- Than sometimes the i.MX28 hangs and only a hard power on reset can restart the device.
=> Before the reset is triggered the DDR2 are supplied from
VDD5V - 4P2 LinReg - DC-DC-Conv - DCDC-VDDA
=> After Reset is triggered, the DDR2 is supplied from VDDA LinReg
VDD5V - AVVD LinReg
What happens:
A Reset Chip resets everything (default Configuration). In this case a reset forces the device to restart
and execute the BootROM again. The device will therefore boot first using the LinReg.
The DDR2 continues to operate as the i.MX28 does not reset the DDR2. So the DDR2 continue to operate for a short time therefore drawing 175mA (EVK DDR2 max burst write current) leading to the charge attrition on VDDA. The voltage collapse till it resets the DDR2.
As a consequence, VDDD supplying the core does not get the necessary current to reset properly all registers. This explains why we saw in many cases the LinReg not even supplying with the reset defined Voltages.
Please ask your Freescale FAE for more details about the i.MX28 PMU.
2.) Some notes from my previous documentation about the ubifs issue
a.) The ubifs recovery error were detected using kernel 2.6.35 as well as kernel 3.9.11
b.) http://lists.infradead.org/pipermail/linux-mtd/2012-March/040378.html
Your driver does not protect the empty space. Normally the driver corrects bit-flips using ECC, but
some systems do not do this for empty space, i.e., for the flash regions which have been erased but
have never been written. UBIFS expects to see all 0xFFs there, and if it doesn't, it reports about
corrupt empty space.Ubifs expect empty space to be protected by ecc. But if empty page ecc is not ff, this need hack in the nand driver.
c.) UBIFS Corrupt during power failure
This error will be seen on every hardware when using NAND Flash. The idea is to keep a map of Logical Erase Block corresponding to Physical Erase Block. This information itself must be stored in the Flash. If this table is corrupted, then we cannot guaranty that we cannot correctly recover the data. Problems might happen while the power is cut when updating this table in the header of the Flash (linking LEB and PEB). If the power cut happened while writing to the Flash (i.e. states changing from 1 to 0), some bit might be not exactly 0 and therefore more sensitive to a read disturbance that will exceed the ECC capability.
The unstable bit issues part of the document (http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits) explains well what might happen.
d.) ubi_io_read: error -74 (ECC error)
Memory Technology Device (MTD) Subsystem for Linux.
e.) memory dump from affected devices
- one ore more one-bit errors inside a block
- distance between errors are exact 4096 bytes = 2 pages
- affected block carrys copy-flag
Jan 1 01:26:07 dnt8196 user.crit kernel: 1bbc0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
Jan 1 01:26:07 dnt8196 user.crit kernel: 1bbe0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
Jan 1 01:26:07 dnt8196 user.crit kernel: 1bc00: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
Jan 1 01:26:07 dnt8196 user.crit kernel: 1bc20: FFFFFFFF FFFFFFFF FFFFFFFF FFBFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
Jan 1 01:26:07 dnt8196 user.crit kernel: 1bc40: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
Jan 1 01:26:07 dnt8196 user.crit kernel: 1bc60: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
UBI uses the @copy_flag field to indicate that this logical eraseblock is a copy. UBI also calculates data CRC when the data is moved and stores it at the @data_crc field of the copy (P1). So when UBI needs to pick one physical eraseblock of two (P or P1), the @copy_flag of the newer one (P1) is examined. If it is cleared, the situation* is simple and the newer one is picked. If it is set, the data CRC of the copy (P1) is examined. If the CRC checksum is correct, this physical eraseblock is selected (P1). Otherwise the older one (P) is selected.
f.) reboot is useless, but to re-burn the Flash
- same thing on our side
That's all - as written before we don't know the exact reason for the ubifs recovery error.
Regards,
Birger