The imx6ull file system reports an sqnum error

mark123 · ‎09-20-2024

Hello, nxp engineers

At present, our device will report detected log errors after running for a period of time, and the phenomenon is as follows. I don't know what caused it, bit flip or ecc error correction algorithm problem? Then I searched for this information, it seems that this problem is universal, said that only update the nand driver can be solved? Do you have a patch for 4.1.15?

Kernel version: L4.1.15

Nand: MT29F2G08ABAEAWP

UBIFS (ubi0:0): recovery needed
UBIFS error (ubi0:0 pid 1): replay_log_leb: bad sqnum 266064, commit sqnum 266897
UBIFS error (ubi0:0 pid 1): replay_log_leb: log error detected while replaying the log at LEB 4:2048
magic 0x6101831
crc 0x8667d5fb
node_type 7 (master node)
group_type 0 (no node group)
sqnum 266064
len 512
highest_inum 7018
commit number 5900
flags 0x3
log_lnum 5
root_lnum 1067
root_offs 113648
root_len 68
gc_lnum 539
ihead_lnum 1067
ihead_offs 114688
index_size 1018344
lpt_lnum 9
lpt_offs 86109
nhead_lnum 9
nhead_offs 88064
ltab_lnum 9
ltab_offs 86016
lsave_lnum 0
lsave_offs 0
lscan_lnum 1065
leb_cnt 1868
empty_lebs 1028
idx_lebs 186
total_free 130656256
total_dirty 39837896
total_used 64154960
total_dead 442368
total_dark 7469664
List of all partitions:
0100 65536 ram0 (driver?)
0101 65536 ram1 (driver?)
0102 65536 ram2 (driver?)
0103 65536 ram3 (driver?)
0104 65536 ram4 (driver?)
0105 65536 ram5 (driver?)
0106 65536 ram6 (driver?)
0107 65536 ram7 (driver?)
0108 65536 ram8 (driver?)
0109 65536 ram9 (driver?)
010a 65536 ram10 (driver?)
010b 65536 ram11 (driver?)
010c 65536 ram12 (driver?)
010d 65536 ram13 (driver?)
010e 65536 ram14 (driver?)
010f 65536 ram15 (driver?)
1f00 5120 mtdblock0 (driver?)
1f01 1024 mtdblock1 (driver?)
1f02 10240 mtdblock2 (driver?)
1f03 1024 mtdblock3 (driver?)
1f04 244736 mtdblock4 (driver?)
No filesystem could mount root, tried: ubifs
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Zhiming_Liu · ‎09-23-2024

Hi @mark123

Please refer this patch:

https://patchwork.ozlabs.org/project/linux-mtd/patch/1417461159-2972-1-git-send-email-boris.brezillo...

Best Regards
Zhiming

mark123 · ‎09-23-2024

Hi,Zhiming
Thank you very much for your answer, I believe this patch is helpful for me. I also have a doubt, I checked the patch you sent, it does introduce the problem of bit flipping, but the source code of the replay_log_leb error appears above in fs/ubifs/replay.c. Looks like there was a problem with the save or commit of the write log (this error occurred randomly during a power-off)? If bit reversal occurs, does logging fail?

Zhiming_Liu · ‎09-25-2024

Hi

I find the v7 of this patch: https://lists.infradead.org/pipermail/linux-mtd/2014-January/051357.html

The BCH block typically used with a GPMI block on an i.MX28/i.MX6 is only
able to correct bitflips on data actually streamed through the block.
When erasing a block the data does not stream through the BCH block
and therefore no ECC data is written to the NAND chip. This causes
gpmi_ecc_read_page to return failure as soon as a single non-1-bit is
found in an erased page. Typically causing problems at higher levels
(ubifs corrupted empty space warnings).

Best Regards
Zhiming

mark123 · ‎09-25-2024

Thanks, I have applied the patch, and I don't know if this problem will still occur, because from the error I posted, it should not be a blank page error or a bit flip. It looks like there is a problem with the ubifs log area, and it is random. I don't know what caused it, do you have any suggestions for troubleshooting?