IMX6ULL fails to mount UBIFS after power-cycle

PiVo · ‎12-15-2023

Dear readers,

We are experiencing an issue with mounting UBIFS. On a (very) limited number of products we see that it can occur that the filesystem fails to mount during boot. This issue is not (yet) reproducible by attempting excessive amounts of power-cuts while writing. When the issue occurs the kernel plainly bails out without any error notable messages:

[ 333.690893] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started, PID 4206
[ 333.740652] UBIFS (ubi0:0): recovery needed
[ 334.197053] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops

Our system is running on a system with kernel 5.4.24-2.1.0, using NAND type MT29F1G08ABAFAH4-AAT:F.

After enabling dynamic_debug in the kernel, I'm getting the following logging. Can anybody help me out decoding this, to get an idea of what is happening (has happened)?

Jan 1 01:48:48 kernel: [ 234.574942] ubifs_change_lp: UBIFS DBG lp (pid 687): LEB 130, free 0, dirty 2496, flags 19
Jan 1 01:48:48 kernel: [ 234.583293] ubifs_lpt_lookup_dirty: UBIFS DBG lp (pid 687): LEB 130, free 2048, dirty 608, flags 3
Jan 1 01:48:48 kernel: [ 234.592327] ubifs_wbuf_seek_nolock: UBIFS DBG io (pid 687): LEB 130:126976, jhead 1 (base)
Jan 1 01:48:48 kernel: [ 234.600665] ubifs_replay_journal: UBIFS DBG mnt (pid 687): finished, log head LEB 4:108544, max_sqnum 32724, highest_inum 206
Jan 1 01:48:48 kernel: [ 234.614759] kill_orphans: UBIFS DBG rcvry (pid 687): LEB 8
Jan 1 01:48:48 kernel: [ 234.620366] ubifs_start_scan: UBIFS DBG scan (pid 687): scan LEB 8:0
Jan 1 01:48:48 kernel: [ 234.645085] ubifs_scan: UBIFS DBG scan (pid 687): look at LEB 8:0 (126976 bytes left)
Jan 1 01:48:48 kernel: [ 234.653078] ubifs_scan_a_node: UBIFS DBG scan (pid 687): scanning orphan node at LEB 8:0
Jan 1 01:48:48 kernel: [ 234.661768] ubifs_scan: UBIFS DBG scan (pid 687): look at LEB 8:40 (126936 bytes left)
Jan 1 01:48:48 kernel: [ 234.669828] ubifs_scan_a_node: UBIFS DBG scan (pid 687): scanning padding node at LEB 8:40
Jan 1 01:48:48 kernel: [ 234.678193] ubifs_scan_a_node: UBIFS DBG scan (pid 687): 1980 bytes padded at LEB 8:40, offset now 2048
Jan 1 01:48:48 kernel: [ 234.687669] ubifs_scan: UBIFS DBG scan (pid 687): look at LEB 8:2048 (124928 bytes left)
Jan 1 01:48:48 kernel: [ 234.696317] ubifs_scan_a_node: UBIFS DBG scan (pid 687): hit empty space at LEB 8:2048
Jan 1 01:48:48 kernel: [ 234.704373] ubifs_end_scan: UBIFS DBG scan (pid 687): stop scanning LEB 8 at offset 2048
Jan 1 01:48:48 kernel: [ 234.725848] ubifs_lookup_level0: UBIFS DBG tnc (pid 687): search key (181, inode)
Jan 1 01:48:48 kernel: [ 234.733528] ubifs_lookup_level0: UBIFS DBG tnc (pid 687): found 0, lvl 0, n -1
Jan 1 01:48:48 kernel: [ 234.743305] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" stops

Thank you in advance,
Best regards, Pieter

PiVo · ‎12-19-2023

Hi Zhiming,

Thanks for your reply! Our issue is a little different, the kernel emits 0 errors or warnings when trying to mount. There are no messages indicating read errors or so. After re-performing an ubiformat, everything is fine and we can perform numerous "dirty" reboots. However in a real-life product that is of course not an option. The filesystem can not go corrupt in our case.

The NAND chips are practically new and we've seen this happen on some new systems. Since UBIFS is supposed to be fault tolerant, this issue is likely caused somewhere between the GPMI and the NAND. The error is also that un-common that no error messages pop up. I did not find any other reference with this kind of behavior.

I made a dump of the mtd and I will sniff through that to find any areas that are different to see what might have caused this.

BR, Pieter

Zhiming_Liu · ‎12-18-2023

Hello @PiVo

Here i find some reference cases may help you

https://community.nxp.com/t5/i-MX-Processors/UBIFS-recovery-needed/td-p/149849

https://community.toradex.com/t/vf50-kernel-panic-starting-with-ubifs-error-ubi0-2-pid-1/12911/4http...

This issue could relate to NAND or kernel. You can try to replace another MT29F1G08ABAFAH4-AAT:F NAND to cross-verify this issue.