EERPORM record system worn out with 28 dropped sectors

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

EERPORM record system worn out with 28 dropped sectors

1,498 Views
shis
Contributor I


Hello All,

I have been confused by a trouble with the EEPROM of MC9S12XET256 for quite a long time.I'm trying to find out the reason causes the EEEPROM worn out.

At first,I'm sure the XET256 was partitioned with ERPART=16  and DFPART=0 (all of the 4K D-Flash are used for EEEPROM).Then a few days later,The EEEPROM could't restore data any longer.When read the value in address such as 0x10_0000,0x10_0100....0x10_7F00(the head of date record sector),there are about 28 sector heads with the value of 0xFFFF.According to the AN3490 4.4 Record System Status "The number of dropped EEE NVM sectors that can be managed by the EEE is limited to a maximum of 24". I think this maybe the reason the EEPROM didn't work.Meanwhile,the ERPART and DFPART also turned to 0xFFFF,but could not read the value in 0x12_0000,0x12_0004(both are 0x0, with EEEIFRON = 1).Howerer,when I repartiton the EEPROM  with the Full Partition D-Flash command (0x0F) by BDM,the EEPROM recover again.I have tried to turn off the power off while reading and writing the EEPROM,which may corrupt  the record system,but it didn't work.

My question is:

1)what's the reason cause the EEPROM "Pseudo worn out"?

2)Are there any methods to reappear the fault,and how?

Labels (1)
0 Kudos
10 Replies

1,163 Views
RadekS
NXP Employee
NXP Employee

Hi Richard,

Is your question somehow connected to this thread https://community.freescale.com/message/590798 ?

The problem could be with flash timing. If you select wrong divider, flash cells could be erased only weakly. Since whole D-Flash is erased in your case, I suppose that this will not be root cause. Anyway, please check your FCLKDIV value.

ERPART and DFPART could be updated only by (full) partition commands. Prior partitioning, default values are 0xFFFF. ERPART and DFPART are programmed prior formatting D-Flash. (Full) Partition command validate that ERPART and DFPART are higher than 0.

Therefore I am not sure how to produce such error.

If you didn’t newly erase flash IFR prior executing of new Full Partition D-Flash command and this command was successful, it means that Flash IFR and D-Flash was erased by mass erase command and you probably just wrongly read ERPART and DFPART values (some problem with debugger memory map?). Did you secure MCU or did you use unsecure command (mass erase) prior EEPROM failure?

Please check also TEST pin, whether it is connected to GND. Behavior of MCU in test mode could be different.

Do you know MCU maskset?

Unfortunately described behavior does not make sense for me now and I am not sure what was wrong.


I hope it helps you.

Have a great day,
RadekS

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

1,163 Views
shis
Contributor I

Thank you for your reply.

The question asked here is my colleague. https://community.freescale.com/message/590798

(1)I have tested the Crystal Oscillator frequency with an oscilloscope.The frequency is 4 MHz, and the FCLKDIV value is 0x3.

(2)So sorry for my poor English,I have tried to describe my trouble explicitly.The value of ERPART and DFPART  are acquired by using the "EEPROM Emulation Query Command (0x15)",both are 0xFFFF. However,when I read the address of 0x120000, 0x120004,both of them are zero.The code is the same as below:

        

MMCCTL1_EEEIFRON=1;                                  // to Enable EEE IFR be visible in the memory map

ERPART =*(unsigned int far *)0x120000;       // to get the value of  ERPART directly,not via the command of 0x15,the value is 0

DFPART  =*(unsigned int far *)0x120004;      // also to get the value of  DFPAR directly,which is 0

ERPART_duplicate =*(unsigned int far *)0x120002;       // it is 0x10

DFPART_duplicate  =*(unsigned int far *)0x120006;      // it is 0x0

(3)The MCU is unsecured,and I didnot use the unsecure command with this board.Because when I refresh the application with BDM, there is no such hint.I don't know what if I have used the unsecure command (mass erase) before EEPROM failure?I hope you can tell me some details about this,very grateful.

(4)Also,the TEST pin is connected to the ground.

(5)I have a question,what does it mean if the Flash ECC Error Results Register (FECCR)is 0x0a12?

(I)I have repartitioned the bad borad with BDM but never erase the Flash and EEPROM (Firstly,I opened the hiwave(True-time simulator & Real Time Debuger),and connected the board to the BDM;secondly,reset the board with the button "Reset Target".Thirdly,I pressed the button of "Start/Continue".After that,I have make the command of "Full Partition D-Flash Command" take effect, and then query the  ERPART  and DFPART ,they were 0x10 and 0x0,not 0xFFFF again, so,the board is ok again.)

(II)Then tried to enable and Disable the EEPROM of the bad board(I have repartitioned as mentioned above) repeatedly,and query the dead sector count with the command of "EEPROM Emulation Query Command". Unfortunately, the value have risen from 4 to 25 during the last 2 days.To contrast with the bad board, I also use a normal board ,whose dead sector count remains 0.

(III)It seems that I didn't repair the bad board.I want to know how to repair the bad board thoroughly,and why the two boards have different test result but with the same codes.

I'm looking forward to your help again.

Have a good day,

Richard

0 Kudos

1,163 Views
RadekS
NXP Employee
NXP Employee

Hi Richard,

Thank you for more information. Now it is clearer, however I still don’t have any full explanation.

Mass erase (unsecure command) is only way, how to revert MCU flash into erased state (as we typically delivery our MCUs). So, if you would like execute (Full) Partition command second time, you have to erase all flash blocks include IFR (only in special mode) prior re-partitioning.

Idea: Please additionally check also power supply levels (mainly VDD, VDDF,…) during programming (, code execution, …). I already met with case where wrong capacitor was assembled on VDD (just 470pF instead 470nF) and it leads to interruptions during programming (It was quite difficult to found such issue).

FECCR register reports where MCU detects ECC error. You could modify ECCRIX from 0 to 7 and read FECCR register for getting full record. In your case 0x0a is Parity bits read from Flash block and 0x12 is upper byte from global address.

This partially explains why you read 0x00 instead of 0x10. It seems that there is ECC error in ERPART (0x120000) and you read damaged value or value already "fixed" by ECC (for single bit ECC error).

So, it seems that at least IFR record is damaged and therefore emulation doesn’t work correctly. Currently I have no idea how to intentionally damage this IFR record. Source of this issue may be also some damaging already from manufacturing, ESD issue, or damaging by some particle (like neutron,….)/ or radiation from universe (very rare effect, but still possible).

Unfortunately this cannot be detected without deep analysis. So, if voltage levels will be OK and mass erase with re-partitioning will not help you, please initialize quality incident process with your distributor/sales representative and send MCU for analysis to Freescale.

http://www.freescale.com/about/technology-leadership/quality/problem-resolution-process:QUALITY_PROB...


I hope it helps you.

Have a great day,
RadekS

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

1,163 Views
shis
Contributor I

Hi Radek,

Thank you !

(1)the capacitor used for VDD is 330 nF.

(2)when I set the FECCRIX as 0x01, then got the address is 0x120004.You are right,the value of ERPART was damaged.I don't know weather it could lead the EEPROM break down?I supposed if the Record of IFR is damaged,the duplicate date of EPRART and DFPART will be adopted by the EERPOM controller.

(3)more details to share:

  (a) Firstly,the bad board was mass erase via the BDM,and then repartitioned successfully(ERPART = 0x10,DFPART  = 0x0);

  (b) After that, the EEPROM can restore the data again.According to document of AN3490,the record system is composed of sectors,and each sector starts with a header field ,which defines its status ans erase count,.So I check the erase count in the each sector head of the whole D-Flash(the address of sector head are 0x100000,0x100100,0x100200,...until 0x107F00,so the erase count are located with the address of 0x100002,0x100102...0x107F02),and found some sector erase count can be suddenly changed from 0x012F to 0xFFFF,meanwhile,the register of "Flash Error Status Register (FERSTAT) " turned to 0xC1,I just supposed it's the Erase and Program error make the sector dropped or dead, When the number of dead sector rises up to 25,the EEPROM cannot be used anymore,am I right? I also want to know how the EEE Erase error and Program Error generates?

(c) Lastly,I used BDM to erase the board again,the erase count of each sector turned to zero again.Before erasing ,some value of the erase count are 0xFFFE,or oxFFFF.I wonder how the EEPROM record its lifetime left to be used if it is erased by BDM?

Have a good day,

Richard

0 Kudos

1,163 Views
RadekS
NXP Employee
NXP Employee

Hi Richard,

Thank you for more details and tests.

I am more and more convinced, that root cause of this issue was that first partitioning was interrupted “there are about 28 sector heads with the value of 0xFFFF”.

It seems that even write into IFR was not finished correctly which leads to weird behavior and few additional symptoms which confused me.

In this case IFR record was not damaged after successfully programming, but it was already wrong from beginning (rather say from partitioning). Since it was not only one error (D-Flash was not formatted,…), it is hard to suppose how EEEPROM should work in that case.

Your explanation about number of dead sectors is interesting and currently it makes sense for me.

Just note: EEEPROM signalize error when number of dropped sectors exceeds 25% of the total sectors allocated for EEE NVM, i.e. if greater than 0.25 * (128-DPFART).

In fact, there is CPU core on background which executes EEEPROM firmware. Unfortunately, I am not author of this firmware and this code isn’t public available, therefore my knowledge about Program Error generating procedures is limited.

This counter in sector header is just basic tool for estimate appropriate lifetime of current software implementation. Unfortunately it has no connection to real endurance of D-Flash. Counter could be erased by mass erase. However, D-Flash is typically partitioned just once.


I hope it helps you.

Have a great day,
RadekS

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

1,163 Views
shis
Contributor I

Hi Radek,

The past two days,I have repartitioned the EEPROM, and the process was interrupted by unintentional reset,the test result seems to be different with the bad board.Using the query command,the bad board will return the dead sector count,while the good board does not!So it's not due to the partition was interrupted,but it seems to be the erase or program error when the Enable EEPROM command is used.If you know something about the error caused by EEE erase or program,please share with me.

One more question:Supposed there are some sectors(more than 24) have been "used up",whether the EEPROM will be flagged as "worn out"?In this situation, repartition the EEPROM,all of the D-Flash sectors are allocated for user D-Flash,whether the D-Flash can be used to store the date without any error?

Have a good weekend!

Richard

0 Kudos

1,163 Views
RadekS
NXP Employee
NXP Employee

Hi Richard,

I am afraid that your premise cannot be taken into account. Partition command could have been interrupted in any time during this command (I guess that it was in very early phase) and result behavior may differ significantly (according result status).

It is like trying to find who broke a pen according draw line width.

I suppose that your attempt to simulate this error again is interesting, but results will be almost worthless.

We could simply say, that behavior of EEE is defined only prior partitioning and after partitioning. Since partitioning command was not successfully finished, we cannot assume that MCU will work according official specification.

The most important information is whether MCU could be re-partitioned and whether it works correctly after that operation.  


I hope it helps you.

Have a great day,
RadekS

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

1,163 Views
shis
Contributor I

Hi Radek,

That's really very weird, the board DO NOT work well even after it has been re-partitioned.

Though the bad board has been re-partitioned successfully after the MASS ERASE with the BDM,when the EEE  Enable Command is used every 20ms, at first,it works well, no erase error or program error .However,one or two days later,more and more dropped sectors appeared,when the number of dead sector rises up to 28,the EEPROM becomes invalid again(the data is acquired with the command of Query EEE),it seems the EEE of MCU has been partially destroyed permanently.I'm just confused about what happens to  the EEE anyway.

However,the good board,which is used to be erased and programmed every 20ms still functions normally.

More Information:The bad board is fixed close to the engine, the distance between the both is about 15 cm,with a metal isolation!Is there any possibility of the high temperature?More and more bad boards with EEE failure are found recently,so I'm really eager to solve this problem as soon as possible.

Have a good day,

Richard

0 Kudos

1,163 Views
RadekS
NXP Employee
NXP Employee

Hi Richard,

Thank you for information.

So, it looks rather on some system issue (“More and more bad boards with EEE failure are found recently”).

Could you please create service request and send me schematic with MCU surroundings and at least part of your software project related to EEE?

https://www.freescale.com/webapp/servicerequest.create_SR.framework

I will shortly check it for any obvious or potential issues.

In mean time you could also initialize quality incident process with your distributor/sales representative and send MCU for analysis to Freescale.

http://www.freescale.com/about/technology-leadership/quality/problem-resolution-process:QUALITY_PROB....


I hope it helps you.

Have a great day,
RadekS

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

1,163 Views
shis
Contributor I

Hi Radek,

Maybe it's impossible to delivery the codes and schematic to you as a matter of confidentiality.Sorry for your sincere suggestion!

Though more boards have problem with EEE, they all have been used for more than a year.So,supposed it's because of the design defect,what may cause the appearance?I also exchange the CPU between them,the bad CPU is still bad,the good one remains good.

0 Kudos