IMX53 - NAND - Kernel 4.4.xx - ONE BYTE OFFSET IN READS

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

IMX53 - NAND - Kernel 4.4.xx - ONE BYTE OFFSET IN READS

Jump to solution
1,134 Views
Noel_V
Contributor III

Hi All

We have been running 2.6.35.x for some time now on our IMX53 custom boards ( booting from nand)

Recently we started in UPGRADING the kernel to a more recent version ( 4.4.75 currently).

Al fine so far.

In the lab I'm using different boards ( +10 pieces)  to test drive the new kernel.

9 out of the 10 boards are running fine with this new KERNEL , 1 board is failing to recognize the NAND-FLASH ( 8 bits , 2 chips , hardware ECC enabled, Micron MT29F16G08ABACAWP) with this NEW kernel ( with the old kernel all seems to be fine...)

The reason for this failure is that when trying to read the ONFI-Parameter PAGE, there seems to be a one BYTE offset into the bytes READ from the NAND-CHIP ( command NAND_CMD_PARAM)

For 9 of the 10 boards... the data read back STARTS ( as specified ) with ONFI.

For the failing CPU/BOARD board is starts with NFI ( O is missing) ( all 256 bytes are shifted one byte , or otherwise said, the FIRST byte is missing ... ( if the First Byte would be there all would be OK.. so it is no rubbish.. ))

Reading Manufacturer ID: 0x2c, Chip ID: 0x48 , is working... reading ONFI PARAMETER PAGE... is failing ! ( with the 4.4.x-kernel) 

I do have swapped the FLASHES  and the ERROR stays with the CPU/BOARD.

{ Note putting the OLD kernel back ... 2.6.35.x .. and all is working fine.. must be related to NEW-kernel drivers , but could be a silicon bug triggered by some exception if you ask me .. been digging for more than a week on this}

I've been cross checking ERATA's but can not find anything that would fit.

I've been triple checking each NFC register as well .. all registers are setup correctly  ( comparing good/ bad board.=> same register settings)  !

Any clue ? any hints .. to get me  going  ( as said before,  i've been searching for one week on this.. no luck so far, in understanding / solving the issue .. ! )

Just for info, type of NAND used ( 2 chips , 8 bit mode) :

nand: device found, Manufacturer ID: 0x2c, Chip ID: 0x48
nand: Micron MT29F16G08ABACAWP
nand: 2048 MiB, SLC, erase size: 512 KiB, page size: 4096, OOB size: 224

Best Regards

Noel

0 Kudos
1 Solution
904 Views
Noel_V
Contributor III

Hi Fabio ..

I've been testing kernel 4.12 ( in a hurry)..

And for the failing board, I get the same error output.

,[    1.646968] nand: Could not find valid ONFI parameter page; aborting
[    1.653593] nand: No NAND device found
[    1.660295] libphy: Fixed MDIO Bus: probed
[    1.670961] fec 63fec000.ethernet: 63fec000.ethernet supply phy not found, using dummy regulator
[    1.680084] kworker/u2:0 (76) used greatest stack depth: 5864 bytes left
[    1.712652] kworker/u2:0 (89) used greatest stack depth: 5472 bytes left
[    1.725373] random: fast init done

When having a quick look I end up with the SAME error " Could not find valid ONFI parameter page; aborting"

When you have a look at this... the reason for this error is that the bytes in this ONFI block are shifted one byte ( or as said before , the first byte is missing)

* For a good 'one '  I get this DUMP of the ONFI-parameters-read-back

[    2.909684] NAND_CMD_PARAM- data[0] = 0x4F => O
[    2.914510] NAND_CMD_PARAM- data[1] = 0x4E => N
[    2.919232] NAND_CMD_PARAM- data[2] = 0x46 => F
[    2.923986] NAND_CMD_PARAM- data[3] = 0x49 => I
[    2.928706] NAND_CMD_PARAM- data[4] = 0x1E
[    2.933456] NAND_CMD_PARAM- data[5] = 0x00
[    2.938175] NAND_CMD_PARAM- data[6] = 0x58

..

... some bytes/lines are stripped here

..

[    4.149180] NAND_CMD_PARAM- data[254] = 0x20 (crc is/or should be here on this offset)
[    4.154101] NAND_CMD_PARAM- data[255] = 0x12 (crc is/or should be here on this offset)

* For the bad-one ( on kernel 4.12 / 4.4.x , but working on 2.6.35). I get this DUMP of the ONFI-parameters-read-back

[    1.819926] NAND_CMD_PARAM- data[0] = 0x4E =>N
[    1.824666] NAND_CMD_PARAM- data[1] = 0x46 => F
[    1.829405] NAND_CMD_PARAM- data[2] = 0x49 => I
[    1.834143] NAND_CMD_PARAM- data[3] = 0x1E
[    1.838882] NAND_CMD_PARAM- data[4] = 0x00
[    1.843619] NAND_CMD_PARAM- data[5] = 0x58

..

... some bytes/lines are stripped here

..

[   3.053545] NAND_CMD_PARAM- data[253] = 0x20 ????? ( crc byte also on the wrong offset!!!)
[    3.058458] NAND_CMD_PARAM- data[254] = 0x12 (crc is/or should be here on this offset)
[    3.063371] NAND_CMD_PARAM- data[255] = 0x4F ( O of the second ONFI parameter block)

The strange thing is 9 out of 10 boards are OK , but 1 out of 10 is BAD ... on these recent kernels.

When running the older 2.6.35 kernel.. even on this BAD-board (lets say) .. all is working fine. ( boards did run for multiple months on this old 2.6.35 kernel, with no issues, we have +1500 of these boards out on the wide world on 2.6.35)

I've been digging into these recent kernels for multiple months, ( on IMX and AT91 based devices..) but this is one of the 'issues' that doesn't feel good ... and believe me or not, its not the first time I'm porting a kernel to a board... so I have some experience on this!

Just For Info ..  I've been logging/debugging in this recent "mxc_nand" driver all I can imagine, but the very strange thing is that the BYTES read by the IMX-NAND-FLASH controller... are shifted one byte into the 'RAM-area' of the NAND-FLASH controller... for this 1-bad behaving board/cpu !

As said before, when double checking all registers on the NAND-Controller all seems to be initialized correctly , what means . I'm running out of options on things I can check ! What worries me the most is that even this BAD-behaving board is running FINE on the OLDER kernel 2.6.35 !

Best Regards

Noel

View solution in original post

0 Kudos
5 Replies
904 Views
Noel_V
Contributor III

Hi Fabio,

>> Is the GPMI pins IOMUX configuration the same on 4.12 versus 2.6.35?

I've been double checking IOMUX config 1 minute ago.. YES nand-related IOMUX configuration is the SAME for both OLD (2.6.35) en NEW  (4.4.x) kernels.

But I keep on getting the same error "ONFI parameter page readout is shifted one byte ( 1byte is lost).

( detailed description .. see posts before )

Best Regards

Noel

FYI:

a) AS A TEST ( one of the so many tests , I've been trying)  I've been overruling in code the ONFI-parameter readout ( faking a correct ONFI-parameter configuration)... at that time Nand is detected... but .. fails .. on reading the NAND.. ECC errors all over..  etc etc ) 

 

b) For UBOOT... nand is fine... UBOOT is loaded/booted from NAND ... ( and older kernel 2.6.35 NAND is fine too)

0 Kudos
905 Views
Noel_V
Contributor III

Hi Fabio ..

I've been testing kernel 4.12 ( in a hurry)..

And for the failing board, I get the same error output.

,[    1.646968] nand: Could not find valid ONFI parameter page; aborting
[    1.653593] nand: No NAND device found
[    1.660295] libphy: Fixed MDIO Bus: probed
[    1.670961] fec 63fec000.ethernet: 63fec000.ethernet supply phy not found, using dummy regulator
[    1.680084] kworker/u2:0 (76) used greatest stack depth: 5864 bytes left
[    1.712652] kworker/u2:0 (89) used greatest stack depth: 5472 bytes left
[    1.725373] random: fast init done

When having a quick look I end up with the SAME error " Could not find valid ONFI parameter page; aborting"

When you have a look at this... the reason for this error is that the bytes in this ONFI block are shifted one byte ( or as said before , the first byte is missing)

* For a good 'one '  I get this DUMP of the ONFI-parameters-read-back

[    2.909684] NAND_CMD_PARAM- data[0] = 0x4F => O
[    2.914510] NAND_CMD_PARAM- data[1] = 0x4E => N
[    2.919232] NAND_CMD_PARAM- data[2] = 0x46 => F
[    2.923986] NAND_CMD_PARAM- data[3] = 0x49 => I
[    2.928706] NAND_CMD_PARAM- data[4] = 0x1E
[    2.933456] NAND_CMD_PARAM- data[5] = 0x00
[    2.938175] NAND_CMD_PARAM- data[6] = 0x58

..

... some bytes/lines are stripped here

..

[    4.149180] NAND_CMD_PARAM- data[254] = 0x20 (crc is/or should be here on this offset)
[    4.154101] NAND_CMD_PARAM- data[255] = 0x12 (crc is/or should be here on this offset)

* For the bad-one ( on kernel 4.12 / 4.4.x , but working on 2.6.35). I get this DUMP of the ONFI-parameters-read-back

[    1.819926] NAND_CMD_PARAM- data[0] = 0x4E =>N
[    1.824666] NAND_CMD_PARAM- data[1] = 0x46 => F
[    1.829405] NAND_CMD_PARAM- data[2] = 0x49 => I
[    1.834143] NAND_CMD_PARAM- data[3] = 0x1E
[    1.838882] NAND_CMD_PARAM- data[4] = 0x00
[    1.843619] NAND_CMD_PARAM- data[5] = 0x58

..

... some bytes/lines are stripped here

..

[   3.053545] NAND_CMD_PARAM- data[253] = 0x20 ????? ( crc byte also on the wrong offset!!!)
[    3.058458] NAND_CMD_PARAM- data[254] = 0x12 (crc is/or should be here on this offset)
[    3.063371] NAND_CMD_PARAM- data[255] = 0x4F ( O of the second ONFI parameter block)

The strange thing is 9 out of 10 boards are OK , but 1 out of 10 is BAD ... on these recent kernels.

When running the older 2.6.35 kernel.. even on this BAD-board (lets say) .. all is working fine. ( boards did run for multiple months on this old 2.6.35 kernel, with no issues, we have +1500 of these boards out on the wide world on 2.6.35)

I've been digging into these recent kernels for multiple months, ( on IMX and AT91 based devices..) but this is one of the 'issues' that doesn't feel good ... and believe me or not, its not the first time I'm porting a kernel to a board... so I have some experience on this!

Just For Info ..  I've been logging/debugging in this recent "mxc_nand" driver all I can imagine, but the very strange thing is that the BYTES read by the IMX-NAND-FLASH controller... are shifted one byte into the 'RAM-area' of the NAND-FLASH controller... for this 1-bad behaving board/cpu !

As said before, when double checking all registers on the NAND-Controller all seems to be initialized correctly , what means . I'm running out of options on things I can check ! What worries me the most is that even this BAD-behaving board is running FINE on the OLDER kernel 2.6.35 !

Best Regards

Noel

0 Kudos
904 Views
fabio_estevam
NXP Employee
NXP Employee

Hi Noel,

Is the GPMI pins IOMUX configuration the same on 4.12 versus 2.6.35?

As you reproduced the issue on 4.12 I would recommend you to post the details above at the linux-mtd@lists.infradead.org  mailing list.

Regards,

Fabio Estevam

0 Kudos
904 Views
igorpadykov
NXP Employee
NXP Employee

Hi Noel

just for reference

https://community.freescale.com/message/311702#311702

Best regards
igor

0 Kudos
904 Views
fabio_estevam
NXP Employee
NXP Employee

Hi Noel,

Could you please try with kernel 4.12?

If the problem also happens, then please report it to the linux-mtd@lists.infradead.org  mailing list.

Thanks

0 Kudos