UBIFS: recovery needed

bba · ‎03-29-2012

Hello,

normaly the ubifs recovery prozess will be startet automaticly on mount, if nesessary.

mount -o sync -t ubifs ubi0:data /mnt/data

UBIFS: recovery needed
UBIFS: recovery completed
UBIFS: mounted UBI device 0, volume 2, name "data"
UBIFS: file system size:   33013760 bytes (32240 KiB, 31 MiB, 260 LEBs)
UBIFS: journal size:       1650688 bytes (1612 KiB, 1 MiB, 13 LEBs)
UBIFS: media format:       w4/r0 (latest is w4/r0)
UBIFS: default compressor: lzo
UBIFS: reserved for root: 1559321 bytes (1522 KiB)

Here everything works fine. But sometimes the recovery prozess failes:

mount -o sync -t ubifs ubi0:firmware /mnt/firmware

UBIFS: recovery needed
UBI error: ubi_io_read: error -74 while reading 34816 bytes from PEB 467:96256,
read 34816 bytes
UBIFS error (pid 772): ubifs_recover_leb: corrupt empty space LEB 253:114688, co
rruption starts at 811
UBIFS error (pid 772): ubifs_scanned_corruption: corruption at LEB 253:811
UBIFS error (pid 772): ubifs_recover_leb: LEB 253 scanning failed
mount: mounting ubi0:firmware on /mnt/firmware failed: Structure needs cleaning

I suppose that is similar to following mailing list entry:

http://lists.infradead.org/pipermail/linux-mtd/2009-March/024953.html

According to the ubifs documentation, chapter "Power-cuts tolerance", the problem were seen on NOR flash with kernel version 2.6.27 and should be solved in later versions.

http://www.linux-mtd.infradead.org/doc/ubifs.html

Any idea, how can we solve this problem?

Kind regards,

Birger

p.s. We got these error also on the rootfs ubifs partition, which will be mounted by the kernel via kernel command line.

CONFIG_PKG_BOOT_STREAM_CMDLINE1="noinitrd console=ttyAM0,115200 ubi.mtd=1 root=ubi0:rootfs0 rootfstype=ubifs rw gpmi"

mxs-rtc mxs-rtc.0: setting system clock to 1970-01-17 07:02:25 UTC (1407745)
UBIFS: recovery needed
UBI error: ubi_io_read: error -74 while reading 73728 bytes from PEB 108:57344, read 73728 bytes
UBIFS error (pid 1): ubifs_recover_leb: corrupt empty space LEB 326:53248, corruption starts at 3653
UBIFS error (pid 1): ubifs_scanned_corruption: corruption at LEB 326:3653
UBIFS error (pid 1): ubifs_recover_leb: LEB 326 scanning failed
VFS: Cannot open root device "ubi0:rootfs0" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions:
1f00           10240 mtdblock0 (driver?)
1f01          120832 mtdblock1 (driver?)
fe00           41044 ubiblka (driver?)
fe08           41044 ubiblkb (driver?)
fe10           32860 ubiblkc (driver?)
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
Backtrace:
[<c0031260>] (dump_backtrace+0x0/0x114) from [<c02eb078>] (dump_stack+0x18/0x1c)
r7:c0028a24 r6:00008000 r5:c7c15000 r4:c0402818

mustafacayir · ‎08-05-2023

I have exactly same problem with Linux kernel version 5.4 on i.MX 6ull platform. Did anyone find a solution?

bba · ‎04-12-2016

Thanks Rodney, will try it on our hardware.

w294875848 · ‎08-02-2020

Hi Birger，

Did you solved this issue?

I got this error at kernel 5.4.24 on imx6ul,and the patch above can not apply to this version

marui · ‎02-16-2016

Hi All,

I also face the same issue. How to fix it? Thanks!

rsa · ‎03-23-2016

The fix for this can be found the gpmi-patch from Ellie de Brauwer

[PATCH v1] mtd: gpmi: Deal with bitflips in erased regions regions

depending on your kernel version you may need to apply the patch manually (it worked for me on a 3.7.1 kernel version)

dmitryv · ‎10-16-2015

Hello All,

Did anybody find a solution for UBI recovery?

I see the same issue like discussed in the thread.

UBIFS: recovery needed

UBI error: ubi_io_read: error -74 while reading 126976 bytes from PEB 481:4096, read 126976 bytes

UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB 578:59199

UBIFS error (pid 1): ubifs_scanned_corruption: corruption at LEB 578:59199

00000000: ffffffdf ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ................................

...

000000a0: ffffffff dfffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ................................

..

UBIFS error (pid 1): ubifs_scan: LEB 578 scanning failed

UBIFS error (pid 1): do_commit: commit failed, error -117

UBIFS warning (pid 1): ubifs_ro_mode: switched to read-only mode, error -117

UBIFS: recovery completed

xiaobozhang · ‎04-16-2013

I got exactly the same problem after power fail:

mount -t ubifs -o sync ubi1:data1 /mnt/data1

UBIFS: recovery needed

UBI error: ubi_io_read: error -74 while reading 253952 bytes from PEB 463:8192, read 253952 bytes

UBIFS error (pid 972): ubifs_recover_leb: corrupt empty space LEB 942:57344, corruption starts at 7936

UBIFS error (pid 972): ubifs_scanned_corruption: corruption at LEB 942:7936

UBIFS error (pid 972): ubifs_scanned_corruption: first 8192 bytes from LEB 942:7936

UBIFS error (pid 972): ubifs_recover_leb: LEB 942 scanning failed

mount: mounting ubi1:data1 on /mnt/data1 failed: Structure needs cleaning

I doubt if it was caused by unstable bits issue because reboot the board could also not recover the error. ubirmvol and then ubimkvol made this partition work again. Still waiting for the solution...

ofer_livny · ‎01-15-2014

Hi

I know this is an old thread, but I too got the exact same error after power failure.

I am using iMX6Q, with Linux 3.0.35.

Did you manage to get to the bottom of this?

victorz · ‎02-18-2015

hi, everyone here,

any update for this problem?

bba · ‎02-18-2015

Hi all, we did not found the main course for that issue, normally the uifs recovery works fine.

But this issue was only observed on devices where the DDR2 are supplied by the iMX28 internal power supply. Here we found another issue due to the fact that our DDR2 needs (in burst mode !!!) to much current. Sometime the iMX28 falls into an undefined state during reboot cycle. Fom our FAE we got following answer:

"My assumption is that the internal current (100mA) limiter is set when a reset is generated internally, and before the DDR2 is reset (asynchronous reset). This means that the current limiter is active while the device is writing to DDR2. When writing a DDR2, the current drawn exceed by far 100mA (up to 175mA with the Elpida populated on the EVK). This leads to the internal Core supply attrition during reset, ending up into an undefined state (VDDA collapse, VDDD cannot supply the core correctly)."

On devices with external power supply for the DDR2 we did not seen this issue again. So maybe both issues have the same reason.

victorz · ‎02-18-2015

hi, birger, it is nice to hear something about this.

yes, it might be some undefined state of iMX28,

but i can not even make the DDR2 problem with current one which seems more like a NAND Flash error i think together now.

just share my case below and FYI,

CPU: iMX28

DDR2

NAND: Samsung

Kernel: 2.6.35 with patch of Freescale

then, exactly same messages as yours,

UBIFS: recovery needed

UBI error: ubi_io_read: error -74 while reading 36864 bytes from PEB 740:94208, read 36864 bytes

UBIFS error (pid 1): ubifs_recover_leb: corrupt empty space LEB 507:90112, corruption starts at 211

UBIFS error (pid 1): ubifs_scanned_corruption: corruption at LEB 507:211

UBIFS error (pid 1): ubifs_recover_leb: LEB 507 scanning failed

VFS: Cannot open root device "ubi0:rootfs0" or unknown-block(0,0)

Please append a correct "root=" boot option; here are the available partitions:

trace this error, i found some useful information,

1. in ubi_io_read, indeed ECC check fails 9 times over the threshold of NAND Flash

2. after dumping the LEB 507, it seems some bits flip on NAND Flash from 0xffff to 0xffdf

3. do workaround and mount rootfs correctly, it still fails on ubi_check_node function with the bad CRC, LEB 507 too.

by now, i can make sure that this problem is just a small probability event, and all after a long time (about 1 year) running.

reboot is useless, but to re-burn the Flash.

i will be surprised too if bits flip would be the answer, NAND Flash will not be so weak like that i think.

but unstable bit is also too mysterious for me.

so, would you like to give further more advice or discussion please?

thanks a lot.

bba · ‎02-19-2015

Hi Victor,

we don't know exactly if both issues have the same reason, some more details:

1.) The unstable state were also reproduced on EVK with following scenario:

- The device generate a watchdog reset while a file is unzipped to the DDR2.

- Than sometimes the i.MX28 hangs and only a hard power on reset can restart the device.

=> Before the reset is triggered the DDR2 are supplied from

VDD5V - 4P2 LinReg - DC-DC-Conv - DCDC-VDDA

=> After Reset is triggered, the DDR2 is supplied from VDDA LinReg

VDD5V - AVVD LinReg

What happens:

A Reset Chip resets everything (default Configuration). In this case a reset forces the device to restart

and execute the BootROM again. The device will therefore boot first using the LinReg.

The DDR2 continues to operate as the i.MX28 does not reset the DDR2. So the DDR2 continue to operate for a short time therefore drawing 175mA (EVK DDR2 max burst write current) leading to the charge attrition on VDDA. The voltage collapse till it resets the DDR2.

As a consequence, VDDD supplying the core does not get the necessary current to reset properly all registers. This explains why we saw in many cases the LinReg not even supplying with the reset defined Voltages.

Please ask your Freescale FAE for more details about the i.MX28 PMU.

2.) Some notes from my previous documentation about the ubifs issue

a.) The ubifs recovery error were detected using kernel 2.6.35 as well as kernel 3.9.11

b.) http://lists.infradead.org/pipermail/linux-mtd/2012-March/040378.html

Your driver does not protect the empty space. Normally the driver corrects bit-flips using ECC, but

some systems do not do this for empty space, i.e., for the flash regions which have been erased but

have never been written. UBIFS expects to see all 0xFFs there, and if it doesn't, it reports about

corrupt empty space.Ubifs expect empty space to be protected by ecc. But if empty page ecc is not ff, this need hack in the nand driver.

c.) UBIFS Corrupt during power failure

This error will be seen on every hardware when using NAND Flash. The idea is to keep a map of Logical Erase Block corresponding to Physical Erase Block. This information itself must be stored in the Flash. If this table is corrupted, then we cannot guaranty that we cannot correctly recover the data. Problems might happen while the power is cut when updating this table in the header of the Flash (linking LEB and PEB). If the power cut happened while writing to the Flash (i.e. states changing from 1 to 0), some bit might be not exactly 0 and therefore more sensitive to a read disturbance that will exceed the ECC capability.

The unstable bit issues part of the document (http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits) explains well what might happen.

d.) ubi_io_read: error -74 (ECC error)

Memory Technology Device (MTD) Subsystem for Linux.

e.) memory dump from affected devices

- one ore more one-bit errors inside a block

- distance between errors are exact 4096 bytes = 2 pages

- affected block carrys copy-flag

Jan 1 01:26:07 dnt8196 user.crit kernel: 1bbc0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

Jan 1 01:26:07 dnt8196 user.crit kernel: 1bbe0: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

Jan 1 01:26:07 dnt8196 user.crit kernel: 1bc00: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

Jan 1 01:26:07 dnt8196 user.crit kernel: 1bc20: FFFFFFFF FFFFFFFF FFFFFFFF FFBFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

Jan 1 01:26:07 dnt8196 user.crit kernel: 1bc40: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

Jan 1 01:26:07 dnt8196 user.crit kernel: 1bc60: FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF

UBI uses the @copy_flag field to indicate that this logical eraseblock is a copy. UBI also calculates data CRC when the data is moved and stores it at the @data_crc field of the copy (P1). So when UBI needs to pick one physical eraseblock of two (P or P1), the @copy_flag of the newer one (P1) is examined. If it is cleared, the situation* is simple and the newer one is picked. If it is set, the data CRC of the copy (P1) is examined. If the CRC checksum is correct, this physical eraseblock is selected (P1). Otherwise the older one (P) is selected.

f.) reboot is useless, but to re-burn the Flash

- same thing on our side

That's all - as written before we don't know the exact reason for the ubifs recovery error.

Regards,

Birger

yagabey1 · ‎08-08-2023

Hi, its been a long time since the last message in this thread, We are having the same ubifs corruption issue across both the 5.4 and 4.1 Kernels. Despite conducting extensive tests and altering various parameters such as input voltage and the timing of power cuts, the issue appears to be quite random, and we have been unable to identify a consistent pattern. Furthermore, our recovery efforts have been unsuccessful, and we've found that the only solution thus far has been to re-burn the flash. Could you please advise on any available patches or methods that might either prevent this occurrence or aid in recovering from this situation.

Thanks, Kind Regards

UBIFS: recovery needed

UBIFS: recovery needed

i.MX2x