ECC double bit error for EEPROM

charudattaingal · ‎06-20-2017

ECC double bit error for EEPROM

For some EEPROM location ECC double bit error is set due to that s12zvl micro generating machine exception (Interrupt 5).

How i can confirm that this ECC double bit error is set for EEPEROM memory?

I dint see any flag for same in S12zvl manual.

Where i can see the corrupt address location of EEPROM ?

I didn't see the any register holding the address of memory location cause the ECC double bit error for EEPROM/Flash in S12zvl manual.

I want to catch the machine exception generated for EEC double bit fault while reading from EEPROM/ Flash memory.

In machine exception handler I want to correct the corrupt location.

Requesting assistance for the same.

RadekS · ‎09-25-2019

Hi Charudatta,

This thread is quite old, but I would like to share with you and other community users a workaround code regarding your original second question:

Where i can see the corrupt address location of EEPROM ?

See document with example code at S12Z machine exception caused by ECC issue – address detection

Best regards

Radek

kef2 · ‎04-06-2018

Hi

I’ve just started with S12Z and machine exception on uncorrectable EEPROM ECC looked very suspicious. Standard approach to reboot on machine exception could be fine for RAM, for damaged code in flash, but it is very wrong for EEPROM! Yes, it’s bad when you can’t trust data you read, but what happens when you reboot? You read the same broken data and just enter boot loop. It is quite not likely that double ECC failure happens at the time when write to EEPROM and power outage happen simultaneously, but chances of this are not 0.0%! Then what? RAM ECC reports address of failure. EEPROM doesn’t. There’s no flash module command to check EEPROM integrity without triggering machine exception! FSTAT.MGSTATx bits are useless for this. To solve problem, proper code should register somehow EEPROM address, it’s going to read from EEPROM, and only then read EEPROM. Then, on machine exception, in case MMCECH.TGT==3 erase faulty registered sector of EEPROM. And only then reboot or find a way to return. Is there better approach to this very big problem?

It would be the best if triggering of machine exception would be configurable on/off for each ECC protected resource! Some applications may need using flash for data storage. Registering each access to big array in flash would be a big performance drop…

BTW, FERSTAT.DFDF has to trigger machine exception as well, but it doesn't! FERSTAT.DFDF is useless. How developer should test the problem without forbidden programming of not erased locations?

Regards

Edward

RadekS · ‎04-06-2018

Hi Edward,

Thank you for your opinion and ideas.

You are right, the MCU reset is not a solution of P-Flash or EEPROM ECC issue itself. It is just one of the ways how to leave machine exception “ISR”.

You are right, the flash controller does not have a specific command for checking integrity by ECC without machine exception. The ECC checksum is automatically checked during Flash/EEPROM reading and also during some of Flash/EEPROM commands (signalized by FSTAT.MGSTATx bits).

As you correctly mentioned, the proper code should register somehow EEPROM address. I used EEPROM_Read_Byte()/EEPROM_Read_Word() functions in my example code, which fulfil this condition (address is function parameter). In case of any issue, the volatile/static keywords may be used for registering specific memory area for this variable by compiler/linker.

The example code does not solve data recovery itself. The erase fault EEPROM word is one of the ways how to “solve” EEPROM ECC issue. The ECC issue solution is strongly application dependent, therefore I didn’t conclude it into example code. In some cases, the variable erase may be enough, sometimes the recovery from backup may be the correct way, in other cases just report of critical fault without repair attempt is the best solution. It strongly depends on application requirements.

Yes, I agree that triggering of machine exception configurable on/off for each ECC protected resource may be a good feature, however, I am not sure whether it fits original MCU security requirements. Anyway, thank you for a good idea.

The FERSTAT.DFDF is Double Bit Fault Detect Flag. I guess that you rather mean FCNFG.FDFD Force Double Bit Fault Detect. When FDFD is set, any Flash array read operation will force the DFDF flag in the FERSTAT register to be set, but it will not cause machine exception. Forcing the DFDF status bit by setting FDFD has effect only on the DFDF status bit value and does not result in an invalid access. Since, the double bit ECC issue is only one of possible machine exception sources, the FERSTAT.DFDF is not useless.

Since it is already some time when I play with this area, I will have to check your concerns about FERSTAT.DFDF bit on hardware. Could you please provide more details related to this topic?

Best regards

Radek

kef2 · ‎04-06-2018

Hi Radek,

The example code does not solve data recovery itself. The erase fault EEPROM word is one of the ways how to “solve” EEPROM ECC issue. The ECC issue solution is strongly application dependent, therefore I didn’t conclude it into example code. In some cases, the variable erase may be enough, sometimes the recovery from backup may be the correct way, in other cases just report of critical fault without repair attempt is the best solution. It strongly depends on application requirements.

At least EEPROM sector erase on machine exception would fix potential boot loop instead of leaving unit dead. Yes, not all data is equally critical. If it's some collected knowledge, then it can be easily collected again. If it's some critical constant, then well, dead unit makes sense. But if it's constant, then it is not likely to be damaged on power loss during EEPROM erase/program.

Yes, I meant FCNFG.FDFD. Idea of this bit in FCNFG is, as I see it, to simulate ECC failure to validate error handler. It worked on older MCU's, which weren't triggering machine exception on FDFD. IMO this bit is useless in S12Z. It either should trigger exception, like real double bit failure would do (the question to which flash/EEPROM address?, perhaps any) or it can be removed because it doesn't help any more simulating real life situation. Making machine exception maskable would put back a life into FCNFG.FDFD.

It's so good S12ZVCA includes EEPROM! Simulating it in flash, which I did many times without problems on different MCUs, could be tricky because of nonmaskable machine exception! I think I'd use external EEPROM.

Could you please provide more details related to this topic?

No. Solution is found, I need to cope with other tasks. Thanks

Edward

RadekS · ‎04-06-2018

Hi Edward,

Thank for clarification.

Yes, the EEPROM sector erase on machine exception would fix potential boot loop but without EEPROM data meaning knowledge, it is useless and potentially dangerous from my point of view. Some of our example codes are used by customers without proper analysis. When I will use automatic erasing data in the example code, I am convinced that soon or later some customer will come with a question about unwanted erased EEPROM content… in such case, the back analysis is impossible – data are already reprogrammed.

So, EEPROM data recovery is pure application specific topic and there isn’t any universal solution which we may recommend.

Well, you are right. The FDFD is useless for testing machine exception at S12Z derivatives. Therefore, I created this example code.

The biggest value of this example code is in selected patterns for cumulative writes which allow simulating defined ECC issue scenarios without direct ECC polynomic knowledge.

The using external serial EEPROM with ECC is an interesting idea. When I just look at one datasheet (first offered by Google), I was surprised that ECC running fully on background with no error signals. So, you don’t have an idea whether you read correct data, corrected data or damaged data (due to double bit ECC issue). Hard to say whether MCU code run with wrong data is better and safer than boot loop.

Best regards

Radek

kef2 · ‎04-06-2018

Well, remote (CAN, LIN, etc) diagnostics is still better than dead unit to replace for unknown reason. One may save in the log that EEPROM failed, at which address, make node alive, signal error code. What autopsy would reveal if double error happened due to write at power loss? Why machine exception if simple interrupt would be better? Machine exception due RAM ECC - ok, machine exception on instruction fetch from flash/EEPROM -ok. But data fetches from EEPROM/flash should be less restrictive.

Regarding external EEPROM, you may still write checksums at different address to validate data later. I meant that in case of missing internal EEPROM, instead of fighting machine exception while implementing simulated EEPROM in flash, I would rather go to external EEPROM. Erasable flash sectors are big, chance to loss sector on exception are not 0, so every byte or word would need dedicated flash sector.

Edward

charudattaingal · ‎06-20-2017

By looking at MMCECH, MMCECL register i will get to know status and source of EEC error.

Where I will get the address of corrupt EEPROM sector so I can erase it and recover the EEPROM memory?

RadekS · ‎06-20-2017

Hi Charudatta,
As you already found, the MMCEC registers will tell you the source of Machine Exception and MMCPC registers will tell you the CPU’s program counter value at the time the access violation occurred.
Unfortunately, there isn’t directly register with an address to the corrupted data.

I hope it helps you.

Have a great day,
Radek

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

charudattaingal · ‎06-21-2017

Hello Team,

Thanks for quick response.

ECC double bit error for EEPROM memory will generate the machine exception and by default it will generate the reset.

I want to write the my own machine exception handler, which will erase the ECC double bit fault location/complete sector of EEPROM memory.

As stated in reference manual for machine exception causes the CPU to not perform any stack operations, so it is not possible to return to application code by simply using an RTI instruction.

Once ECC double bit error is set, Device will always generate the reset while reading corrupted memory location.

I want to handle this situation in machine exception handler or at the initialization.

How I can achieve above mention task. ?

good to have if I get example code for the same.

Please suggest.

RadekS · ‎06-21-2017

Hi Charudatta,

The MCU reset at end of Machine exception is just one of options how to leave it.

You may also leave Machine exception by jump to some function like _Startup().

The simple example code for S12Z EEPROM is located here:

https://community.nxp.com/docs/DOC-333064

NVM driver:

http://www.nxp.com/assets/downloads/data/en/device-drivers/S12Z_NVM_SSD_v100.exe

So, I can imagine application where EEPROM initialization data will be stored in P-Flash.

The EEPROM will contain flag which will be configured when EEPROM initialization data will be loaded into EEPROM (during first run). You may simply erase this flag in Machine exception and reset MCU (jump to _Startup()). In that case, the initialization code will check flag and load data from flash to EEPROM like in the first run case.

I hope it helps you.

Have a great day,
Radek

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

charudattaingal · ‎06-21-2017

Hello Radek,

Appreciate your quick response.

Is there any way to disable ECC double bit fault detection algorithm in S12zvl Micro ?

RadekS · ‎06-21-2017

Hi Charudatta,

I am afraid that the ECC double bit fault detection cannot be disabled.

“Each uncorrectable memory corruption, which is detected during a S12ZCPU or ADC access triggers a machine exception.”

I hope it helps you.

Have a great day,
Radek

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

charudattaingal · ‎07-03-2017

Hello Team,

I want to create ECC double bit fault for EEPROM.

It will really helpful to me for testing Exception handler implemented as discussed above.

Please let me know How I can create ECC double bit fault for EEPROM.

Best Regards,

Charudatta

RadekS · ‎07-04-2017

Hi Charudatta,

please look at this example code https://community.nxp.com/docs/DOC-334381

I hope it helps you.

Best regards

Radek