Hi All
We have been running 2.6.35.x for some time now on our IMX53 custom boards ( booting from nand)
Recently we started in UPGRADING the kernel to a more recent version ( 4.4.75 currently).
Al fine so far.
In the lab I'm using different boards ( +10 pieces) to test drive the new kernel.
9 out of the 10 boards are running fine with this new KERNEL , 1 board is failing to recognize the NAND-FLASH ( 8 bits , 2 chips , hardware ECC enabled, Micron MT29F16G08ABACAWP) with this NEW kernel ( with the old kernel all seems to be fine...)
The reason for this failure is that when trying to read the ONFI-Parameter PAGE, there seems to be a one BYTE offset into the bytes READ from the NAND-CHIP ( command NAND_CMD_PARAM)
For 9 of the 10 boards... the data read back STARTS ( as specified ) with ONFI.
For the failing CPU/BOARD board is starts with NFI ( O is missing) ( all 256 bytes are shifted one byte , or otherwise said, the FIRST byte is missing ... ( if the First Byte would be there all would be OK.. so it is no rubbish.. ))
Reading Manufacturer ID: 0x2c, Chip ID: 0x48 , is working... reading ONFI PARAMETER PAGE... is failing ! ( with the 4.4.x-kernel)
I do have swapped the FLASHES and the ERROR stays with the CPU/BOARD.
{ Note putting the OLD kernel back ... 2.6.35.x .. and all is working fine.. must be related to NEW-kernel drivers , but could be a silicon bug triggered by some exception if you ask me .. been digging for more than a week on this}
I've been cross checking ERATA's but can not find anything that would fit.
I've been triple checking each NFC register as well .. all registers are setup correctly ( comparing good/ bad board.=> same register settings) !
Any clue ? any hints .. to get me going ( as said before, i've been searching for one week on this.. no luck so far, in understanding / solving the issue .. ! )
Just for info, type of NAND used ( 2 chips , 8 bit mode) :
nand: device found, Manufacturer ID: 0x2c, Chip ID: 0x48
nand: Micron MT29F16G08ABACAWP
nand: 2048 MiB, SLC, erase size: 512 KiB, page size: 4096, OOB size: 224
Best Regards
Noel
Solved! Go to Solution.
Hi Fabio ..
I've been testing kernel 4.12 ( in a hurry)..
And for the failing board, I get the same error output.
,[ 1.646968] nand: Could not find valid ONFI parameter page; aborting
[ 1.653593] nand: No NAND device found
[ 1.660295] libphy: Fixed MDIO Bus: probed
[ 1.670961] fec 63fec000.ethernet: 63fec000.ethernet supply phy not found, using dummy regulator
[ 1.680084] kworker/u2:0 (76) used greatest stack depth: 5864 bytes left
[ 1.712652] kworker/u2:0 (89) used greatest stack depth: 5472 bytes left
[ 1.725373] random: fast init done
When having a quick look I end up with the SAME error " Could not find valid ONFI parameter page; aborting"
When you have a look at this... the reason for this error is that the bytes in this ONFI block are shifted one byte ( or as said before , the first byte is missing)
* For a good 'one ' I get this DUMP of the ONFI-parameters-read-back
[ 2.909684] NAND_CMD_PARAM- data[0] = 0x4F => O
[ 2.914510] NAND_CMD_PARAM- data[1] = 0x4E => N
[ 2.919232] NAND_CMD_PARAM- data[2] = 0x46 => F
[ 2.923986] NAND_CMD_PARAM- data[3] = 0x49 => I
[ 2.928706] NAND_CMD_PARAM- data[4] = 0x1E
[ 2.933456] NAND_CMD_PARAM- data[5] = 0x00
[ 2.938175] NAND_CMD_PARAM- data[6] = 0x58
..
... some bytes/lines are stripped here
..
[ 4.149180] NAND_CMD_PARAM- data[254] = 0x20 (crc is/or should be here on this offset)
[ 4.154101] NAND_CMD_PARAM- data[255] = 0x12 (crc is/or should be here on this offset)
* For the bad-one ( on kernel 4.12 / 4.4.x , but working on 2.6.35). I get this DUMP of the ONFI-parameters-read-back
[ 1.819926] NAND_CMD_PARAM- data[0] = 0x4E =>N
[ 1.824666] NAND_CMD_PARAM- data[1] = 0x46 => F
[ 1.829405] NAND_CMD_PARAM- data[2] = 0x49 => I
[ 1.834143] NAND_CMD_PARAM- data[3] = 0x1E
[ 1.838882] NAND_CMD_PARAM- data[4] = 0x00
[ 1.843619] NAND_CMD_PARAM- data[5] = 0x58
..
... some bytes/lines are stripped here
..
[ 3.053545] NAND_CMD_PARAM- data[253] = 0x20 ????? ( crc byte also on the wrong offset!!!)
[ 3.058458] NAND_CMD_PARAM- data[254] = 0x12 (crc is/or should be here on this offset)
[ 3.063371] NAND_CMD_PARAM- data[255] = 0x4F ( O of the second ONFI parameter block)
The strange thing is 9 out of 10 boards are OK , but 1 out of 10 is BAD ... on these recent kernels.
When running the older 2.6.35 kernel.. even on this BAD-board (lets say) .. all is working fine. ( boards did run for multiple months on this old 2.6.35 kernel, with no issues, we have +1500 of these boards out on the wide world on 2.6.35)
I've been digging into these recent kernels for multiple months, ( on IMX and AT91 based devices..) but this is one of the 'issues' that doesn't feel good ... and believe me or not, its not the first time I'm porting a kernel to a board... so I have some experience on this!
Just For Info .. I've been logging/debugging in this recent "mxc_nand" driver all I can imagine, but the very strange thing is that the BYTES read by the IMX-NAND-FLASH controller... are shifted one byte into the 'RAM-area' of the NAND-FLASH controller... for this 1-bad behaving board/cpu !
As said before, when double checking all registers on the NAND-Controller all seems to be initialized correctly , what means . I'm running out of options on things I can check ! What worries me the most is that even this BAD-behaving board is running FINE on the OLDER kernel 2.6.35 !
Best Regards
Noel
Hi Fabio,
>> Is the GPMI pins IOMUX configuration the same on 4.12 versus 2.6.35?
I've been double checking IOMUX config 1 minute ago.. YES nand-related IOMUX configuration is the SAME for both OLD (2.6.35) en NEW (4.4.x) kernels.
But I keep on getting the same error "ONFI parameter page readout is shifted one byte ( 1byte is lost).
( detailed description .. see posts before )
Best Regards
Noel
FYI:
a) AS A TEST ( one of the so many tests , I've been trying) I've been overruling in code the ONFI-parameter readout ( faking a correct ONFI-parameter configuration)... at that time Nand is detected... but .. fails .. on reading the NAND.. ECC errors all over.. etc etc )
b) For UBOOT... nand is fine... UBOOT is loaded/booted from NAND ... ( and older kernel 2.6.35 NAND is fine too)
Hi Fabio ..
I've been testing kernel 4.12 ( in a hurry)..
And for the failing board, I get the same error output.
,[ 1.646968] nand: Could not find valid ONFI parameter page; aborting
[ 1.653593] nand: No NAND device found
[ 1.660295] libphy: Fixed MDIO Bus: probed
[ 1.670961] fec 63fec000.ethernet: 63fec000.ethernet supply phy not found, using dummy regulator
[ 1.680084] kworker/u2:0 (76) used greatest stack depth: 5864 bytes left
[ 1.712652] kworker/u2:0 (89) used greatest stack depth: 5472 bytes left
[ 1.725373] random: fast init done
When having a quick look I end up with the SAME error " Could not find valid ONFI parameter page; aborting"
When you have a look at this... the reason for this error is that the bytes in this ONFI block are shifted one byte ( or as said before , the first byte is missing)
* For a good 'one ' I get this DUMP of the ONFI-parameters-read-back
[ 2.909684] NAND_CMD_PARAM- data[0] = 0x4F => O
[ 2.914510] NAND_CMD_PARAM- data[1] = 0x4E => N
[ 2.919232] NAND_CMD_PARAM- data[2] = 0x46 => F
[ 2.923986] NAND_CMD_PARAM- data[3] = 0x49 => I
[ 2.928706] NAND_CMD_PARAM- data[4] = 0x1E
[ 2.933456] NAND_CMD_PARAM- data[5] = 0x00
[ 2.938175] NAND_CMD_PARAM- data[6] = 0x58
..
... some bytes/lines are stripped here
..
[ 4.149180] NAND_CMD_PARAM- data[254] = 0x20 (crc is/or should be here on this offset)
[ 4.154101] NAND_CMD_PARAM- data[255] = 0x12 (crc is/or should be here on this offset)
* For the bad-one ( on kernel 4.12 / 4.4.x , but working on 2.6.35). I get this DUMP of the ONFI-parameters-read-back
[ 1.819926] NAND_CMD_PARAM- data[0] = 0x4E =>N
[ 1.824666] NAND_CMD_PARAM- data[1] = 0x46 => F
[ 1.829405] NAND_CMD_PARAM- data[2] = 0x49 => I
[ 1.834143] NAND_CMD_PARAM- data[3] = 0x1E
[ 1.838882] NAND_CMD_PARAM- data[4] = 0x00
[ 1.843619] NAND_CMD_PARAM- data[5] = 0x58
..
... some bytes/lines are stripped here
..
[ 3.053545] NAND_CMD_PARAM- data[253] = 0x20 ????? ( crc byte also on the wrong offset!!!)
[ 3.058458] NAND_CMD_PARAM- data[254] = 0x12 (crc is/or should be here on this offset)
[ 3.063371] NAND_CMD_PARAM- data[255] = 0x4F ( O of the second ONFI parameter block)
The strange thing is 9 out of 10 boards are OK , but 1 out of 10 is BAD ... on these recent kernels.
When running the older 2.6.35 kernel.. even on this BAD-board (lets say) .. all is working fine. ( boards did run for multiple months on this old 2.6.35 kernel, with no issues, we have +1500 of these boards out on the wide world on 2.6.35)
I've been digging into these recent kernels for multiple months, ( on IMX and AT91 based devices..) but this is one of the 'issues' that doesn't feel good ... and believe me or not, its not the first time I'm porting a kernel to a board... so I have some experience on this!
Just For Info .. I've been logging/debugging in this recent "mxc_nand" driver all I can imagine, but the very strange thing is that the BYTES read by the IMX-NAND-FLASH controller... are shifted one byte into the 'RAM-area' of the NAND-FLASH controller... for this 1-bad behaving board/cpu !
As said before, when double checking all registers on the NAND-Controller all seems to be initialized correctly , what means . I'm running out of options on things I can check ! What worries me the most is that even this BAD-behaving board is running FINE on the OLDER kernel 2.6.35 !
Best Regards
Noel
Hi Noel,
Is the GPMI pins IOMUX configuration the same on 4.12 versus 2.6.35?
As you reproduced the issue on 4.12 I would recommend you to post the details above at the linux-mtd@lists.infradead.org mailing list.
Regards,
Fabio Estevam
Hi Noel,
Could you please try with kernel 4.12?
If the problem also happens, then please report it to the linux-mtd@lists.infradead.org mailing list.
Thanks