RAM/FLASH ECC Error handling

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

RAM/FLASH ECC Error handling

7,416 Views
rajivbandodkar
Contributor II

Hi,

I am using S12ZVL32 controller. I am enabling ECC for Flash (that includes EeProm and Flash memory) and RAM ECC.

 

RAM ECC single bit fault: As I understand RAM itself is automatically corrected by the ECC logic, please correct me if wrong. In that case, what is the necessity to enable single bit fault interrupt for RAM ECC? How to handle this fault?

RAM ECC double bit fault: There is no interrupt for double bit fault, so how do I know there is a double bit fault other than polling the flag register? How to handle this fault?

EeProm/Flash Single Bit fault: As I understand, the faulty location itself is not corrected by ECC logic, though the read data is correct. In this case can I understand what location/Sector is faulty? In case of EeProm fault, the particular sector/All EeProm can be erased and program re-started. But in case of Flash is there any way to recover from the fault? How to handle this fault?

EeProm/Flash Double Bit fault: There is no interrupt for double bit fault, so how do I know there is a double bit fault other than polling the flag register? How to handle this fault?

Rajiv Bandodkar.

Labels (1)
8 Replies

4,665 Views
RadekS
NXP Employee
NXP Employee

Hi Rajiv,

RAM ECC single bit fault:

The ECC check is performed only during Read Access and Non-aligned Memory Write Access.

So, write command to address 0x1000 (aligned access) will not check ECC (just write new value and new ECC). The write to address 0x1001 (non-aligned access) will perform ECC check due to read-modify-write operation. Any read operation on RAM will perform ECC check.

According to RM:
“If a single bit ECC error was detected, then the SBEEIF flag is set.”, “If the logic detects a single bit ECC error, then the module corrects the data, so that the access initiator module receives correct data. In parallel, the logic writes the corrected data back to the memory, so that this read access repairs the single bit ECC error. This automatic ECC read repair function is disabled by setting the ECCDRR bit.”

So, this automatic fixing mechanism is driven by ECCDCMD_ECCDRR bit (by default - enabled).

The necessity of single bit ECC fault interrupt enabling depends on your application safety requirements.

If you enable this interrupt, you should at least clear ECCIF_SBEEIF flag. Additionally, you may record this event and/or signalize this error to the superior system.

EeProm/Flash Single Bit fault:

Yes, the faulty location itself is not corrected by ECC logic, but the read data is already corrected by ECC.

The only way for fixing single bit ECC fault on EEPROM/Flash is the read of a particular sector, erase them and write it back that ECC values are correctly restored.

The difference between EEPROM and Flash is only in sector size in that case and that you use slightly different commands. Anyway, at least the critical part of the Flash erase/program operation must be executed from RAM because S12ZVL contains just one Flash block. By the critical part, I mean writing 1 to FSTAT_CCIF bit followed by waiting on command finish (FSTAT_CCIF is set again). If interrupt vector table and interrupt routines are in Flash, it is necessary to disable interrupts during this critical part.

RAM/EeProm/Flash Double Bit faults:

You are right that there isn’t any interrupt enable bit for such faults. However, the double bit ECC faults are uncorrectable errors and they will trigger Machine exception (interrupt vector 5).

The S12ZCPU machine exceptions are triggered upon detection of illegal memory accesses and uncorrectable ECC errors.

Note: The Machine exception is not an interrupt. There is no stack-frame created for a Machine Exception so simply calling "RTI" (which expects a stack-frame) at the end of the Machine Exception routine will result in a crash.

The difference is explained here:

http://www.nxp.com/files/microcontrollers/doc/ref_manual/S12ZCPU_RM_V1.pdf

("Chapter 7, Exceptions").

Therefore the correct recovery action should be rather (signalize error and) MCU reset.

Some very simple example code of the Machine exception routine may be found here:

https://community.nxp.com/docs/DOC-330312

I hope it helps you.

Have a great day,
Radek

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

4,665 Views
mihir_rajput
Contributor III

Hi RadekS,

The MM9Z1_638D1 datasheet talks about Double Bit Fault Detect Interrupt Flag (DFDIF) and Single Bit Fault Detect Interrupt Flag (SFDIF) in FERSTAT register. 

But the FERCNFG register only has Single Bit Fault Detect Interrupt Enable (SFDIE).

Also, I don't understand the difference between handling of double bit fault detection and ECC double bit error.

Do both trigger Machine Exception? Can you throw some light on DFDIF?

Thanks. 

0 Kudos

4,665 Views
RadekS
NXP Employee
NXP Employee

Hi Mihir,

As first “double bit fault detection” = “ECC double bit error”. No difference.

 

The ECC checksum check is performed when:

  1. we read data Flash/EEPROM
  2. we write data into Flash/EEPROM

 

The ECC state machine may fix single-bit error during reading corrupted data. Since we read already fixed data, it doesn’t present a really serious issue. Therefore, we may decide whether we will enable interrupt and service single bit ECC errors.

The ECC state machine cannot fix more than a single-bit error (detected as double-bit ECC error) during reading corrupted data. So, we read corrupted data – that presents a serious issue. The machine exception is triggered – that cannot be disabled.

 

 

Additional feature:

The DFDIF or SFDIF flags may be intentionally set by writing into the FCNFG register for interrupt routine testing purpose. This works for a single bit ECC error (it invokes appropriate interrupt routine if enabled).

The MM9Z1_638D1 datasheet mentioned interrupt generation also for DFDIF flag. I am not sure for now, whether the setting of FDFD bit will generate a machine exception routine. This is not mentioned in other S12Z reference manuals. Unfortunately, I cannot check it now due to a lack of MM9Z1_638 evaluation board (I am on training this week).

I hope it helps you.

Best regards

Radek

0 Kudos

4,665 Views
mihir_rajput
Contributor III

I was able to simulate Double Bit ECC error on EEPROM. I took inspiration from your sample code on FlashECC.

But I failed at testing single bit ECC error. Not sure what I am missing here.

Let me know if you have any ideas.

 

void testEEPROMDoubleBitECCError(void)
{
// (1) Initialise Flash
uint8_t* destination = (uint8_t*)(TEST_ADDRESS);
uint32_t destinationU32 = (uint32_t)destination;
uint32_t status = FlashInit(&test_ssdConfig);
if (FTM_OK != status)
{
return;
}

// (2) EnableInterrupts
EnableInterrupts;

// (3) Erase EEPROM
status = EraseSector(&test_ssdConfig, destinationU32, 1, FlashCommandSequence);
if (FTM_OK != status)
{
return;
}

// (4) Write EEPROM
buffer[0] = 0xF4FFFFFFU;
status = ProgramData(&test_ssdConfig, destinationU32, 4, (unsigned long) buffer, FlashCommandSequence);

buffer[0] = 0xF8FFFFFFU;
status = ProgramData(&test_ssdConfig, destinationU32, 4, (unsigned long) buffer, FlashCommandSequence);

// (5) Read Flash
__asm(NOP);
result = *(unsigned int *)TEST_ADDRESS;
__asm(NOP); //trigger ISR

}

void testEEPROMSingleBitECCError(void)
{
// (1) Initialise Flash
uint8_t* destination = (uint8_t*)(TEST_ADDRESS);
uint32_t destinationU32 = (uint32_t)destination;
uint32_t status = FlashInit(&test_ssdConfig);
if (FTM_OK != status)
{
return;
}

FCNFG_IGNSF = 0;

// (2) Enable single bit ECC Interrupt
FERCNFG_SFDIE = 1;

// (3) EnableInterrupts
EnableInterrupts;

// (4) Erase EEPROM
status = EraseSector(&test_ssdConfig, destinationU32, 1, FlashCommandSequence);
if (FTM_OK != status)
{
return;
}

// (6) Write EEPROM again without erase
buffer[0] = 0xFFFFFFFFU;
status = ProgramData(&test_ssdConfig, destinationU32, 4, (unsigned long) buffer, FlashCommandSequence);

buffer[0] = 0xFFFFFFF7U;
status = ProgramData(&test_ssdConfig, destinationU32, 4, (unsigned long) buffer, FlashCommandSequence);

buffer[0] = 0xFFFFFFF6U;
status = ProgramData(&test_ssdConfig, destinationU32, 4, (unsigned long) buffer, FlashCommandSequence);

// (5) Read Flash
__asm(NOP);
result = *(unsigned int *)TEST_ADDRESS;
__asm(NOP);
result = *(unsigned int *)TEST_ADDRESS+1;
__asm(NOP);
result = *(unsigned int *)TEST_ADDRESS+2;
__asm(NOP);
result = *(unsigned int *)TEST_ADDRESS+3;
//trigger ISR
__asm(NOP);

}

0 Kudos

4,665 Views
RadekS
NXP Employee
NXP Employee

Hi Mihir,

If possible, please use patterns from example code.

For single-bit ECC error at EEPROM use cumulative write where first write into word is 0xFF00 and second is 0xFF04.

Why:

Every time when we write into EEPROM, the CRC is calculated and written into memory together with data.

So, you cannot choose any combination of single bit change in a data value because the cumulative write is valid also for  ECC checksum.

Your case: 

0xFFF7  ECC checksum is 0x26

0xFFF6  ECC checksum is 0x35

final values after cumulative write are

0xFFF6 with ECC checksum 0x24

As you can see, it does not fit and final behavior for various data combinations is rather "random" (no error/single bit error/double bit error).

The patterns in the example code were selected carefully.

BTW: write 0xFFFFFFFFU; has no sense.

EEPROM patterns from example code:

Single bit ECC error for read, single bit ECC error for verify
Word0ECC
first writeFF0036
second writeFF0420
final stateFF0020
fixed valueFF04
Single bit ECC error for read, double bit ECC error for verify
Word0ECC
first writeFFF420
second writeFFFB29
final stateFFF020
fixed valueFFF4
Double bit ECC error for read, double bit ECC error for verify
Word0ECC
first writeFFF330
second writeFFFC39
final stateFFF030

I hope it helps you

Best regards

Radek

4,665 Views
janiosimon
Contributor I

Hello Radek, 

i would like to understand the ECC calculation. In MC9S12ZVLRM.pdf is this table: 

pastedImage_1.png

if i understand correctly then the ECC can be calculated as:

ecc[0] = data & 0x443F;
ecc[1] = data & 0x13C7;
ecc[2] = data & 0xE1D1;
ecc[3] = data & 0xEE60;
ecc[4] = data & 0x3E8A;
ecc[5] = data & 0x993C;

for (j=0;j<6;j++)
{
   tmp = 0;
   for (i=0;i<16;i++)
   {
      tmp ^= ((ecc[j] >> i) & 0x01);
   }
ecc[j] = tmp;
}

for (j=0;j<6;j++)
{
   eccFinal |= ecc[j] << j;
}

eccFinal ~= eccFinal;

I would expect that eccFinal will have this results: 

data       ECC (NXP)      eccFinal

FF00      36                     2D

FF04      20                     0E

FFF4      20                     1A

FFFB      29                     1C

Please correct me if my interpretation is wrong. 

Thank you for answer. 

Jan

0 Kudos

4,665 Views
RadekS
NXP Employee
NXP Employee

Hello Jan,

just a small misunderstanding.

The ECC bit table in RM is for SRAM. The SRAM allows reading/write ECC in debug mode.

The EEPROM uses a different table for ECC checksum calculation.

This table is confidential. The EEPROM/PFLASH ECC cannot be directly read by the user.

I hope it helps you.

Have a great day,
Radek

4,665 Views
mihir_rajput
Contributor III

ECC.PNG

Is there a reason why DFDIE was deleted?

0 Kudos