S32K116 EIM none-correctale error cleared SRAM cause program crash

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

S32K116 EIM none-correctale error cleared SRAM cause program crash

Jump to solution
912 Views
Helloyt
Contributor II

Hi,

    I tried to test the action of ERM after an double bit ECC error occured(none-correctable),to my understanding,in S32K116 ,after an double bit ECC error occured the hardfault_handler will first be triggered ,and user can add some code to the hardfault_handler to deal the error flag in ERM

    but when i injected an double bit ECC error to SRAM_U, the SRAM was dirrectly cleared with the vector_table in SRAM reset to 0x00000000 ,and after an hardfault_handler trigger ,the pc(core register) jumpt to 0x00000000 to fetch code ,and then ,the program crashed.

    as you know , stack locte in S32K116'SSRAM_U ,therefor ethe double bit ECC error will be triggered in EIM_DRV_Init() first ,following is the debug info before enable EIM module

  • pic1

Helloyt_3-1689907127129.png

    I redefined NMI_Handler,HardFault_Handler at following code ,so their func address in SRAM is not default value(WDG_Handler,0x46d)

void ECC_check_init(void)
{
    ERM_DRV_Init(INST_ERM1, ERM_CHANNEL_COUNT0, erm1_InitConfig0);

    INT_SYS_EnableIRQ(ERM_fault_IRQn);
    INT_SYS_InstallHandler(ERM_fault_IRQn,erm_error_cbk,NULL);

    INT_SYS_InstallHandler(NonMaskableInt_IRQn,Fault_Handler,NULL);
    INT_SYS_InstallHandler(HardFault_IRQn,Fault_Handler,NULL);
    INT_SYS_InstallHandler(SVCall_IRQn,Fault_Handler,NULL);
    INT_SYS_InstallHandler(PendSV_IRQn,Fault_Handler,NULL);

}

void Fault_Handler(void)
{
    while(1);
}

  • pic2

Helloyt_4-1689907269437.png

 

    to describe the problem in detail ,the code line before trigger ECC double bit error will need to change to disassembly view as you see in pic2(no breakpoint in the program also will cause crash)

    the SRAM was cleared,and the register xpsr == 0x81000000 ,which mean no interrupt triggered,the same time(or before SRAM cleared) the ERM NEC0 was set to 1

  • pic3

Helloyt_5-1689907431501.png

   then,go on signal step debug, xpsr == 0x81000003 ,it seems an hardfault_handler interrupt was triggered and the program crashed

  I guess that after xpsr change to 0x81000003 the pc(register) will go to SRAM address 0x2000000c to fetch the entry of hardfault_handler,but as you see, the SRAM was cleared by some reason,at the end ,the program crashed because the error address

  so,what caused the SRAM cleared ,and what should i do to deal double bit error in hardfault handler?

0 Kudos
Reply
1 Solution
833 Views
danielmartynek
NXP TechSupport
NXP TechSupport

Hi @Helloyt,

When the MCU detects a non-correctable ECC error, it fetches the HardFault_Handler vector (or BusFault_Handler if enabled).

But because the vector is in SRAM_U, it detects another non-correctable ECC error during the vector fetching.

And this escalates the hard fault exception to Core Lockup.

 

Can you leave the vector table in the flash?

 

Declare symbol

__flash_vector_table__

in the startup.h file.

 

And define in the linker file.

danielmartynek_0-1690272393492.png

 

void HardFault_Handler(void){
   while(1){}
}

 

 

 

Regards,

Daniel

 

View solution in original post

6 Replies
834 Views
danielmartynek
NXP TechSupport
NXP TechSupport

Hi @Helloyt,

When the MCU detects a non-correctable ECC error, it fetches the HardFault_Handler vector (or BusFault_Handler if enabled).

But because the vector is in SRAM_U, it detects another non-correctable ECC error during the vector fetching.

And this escalates the hard fault exception to Core Lockup.

 

Can you leave the vector table in the flash?

 

Declare symbol

__flash_vector_table__

in the startup.h file.

 

And define in the linker file.

danielmartynek_0-1690272393492.png

 

void HardFault_Handler(void){
   while(1){}
}

 

 

 

Regards,

Daniel

 

831 Views
Helloyt
Contributor II

That's amazing ! the code runs well !

thanks for your patience and wisdom.

Helloyt_0-1690274553151.png

Helloyt_1-1690274588315.png

 

 

854 Views
danielmartynek
NXP TechSupport
NXP TechSupport

Hi @Helloyt,

Thank you for the detailed description.

I understand that VTOR is in the SRAM_U region. Therefore, once the EIM is enabled on SRAM_U and a non-correctable error is injected (by reading SRAM_U data or unstacking), it fetches the HardFault_Handler from SRAM_U and this causes Core Lockup which is a system reset source on the MCU.

Can you read the MCR_SRS[LOCKUP] flag?

 

Regards,

Daniel

840 Views
Helloyt
Contributor II

Hi ,@danielmartynek

  I think your mean is to supervise the LOCKUP flag in  System Reset Status Register (RCM_SRS) ,following pictures shows the change  of RCM_SRC[LOCKUP]

  • pic1 before ecc error occur 

Helloyt_1-1690252660894.png

  • pic2 a reset occured

Helloyt_2-1690252863166.png

 

the picture2 shows the core was lockup , but I not sure whether the NULL vector caused core lockup or the ecc double bit error caused core lockup with SRAM cleared.  

p.s. I found that this time the ecc error triggered after the EIM gloale enable instead of EIM channel enable yesterday ,with little code changed.

 

0 Kudos
Reply
893 Views
danielmartynek
NXP TechSupport
NXP TechSupport

Hi @Helloyt,

Can you post the images in a better resolution?

 

Thank you,

BR, Daniel

0 Kudos
Reply
864 Views
Helloyt
Contributor II

Hi @danielmartynek ,picture illustrated above are displayed in a better way here ,and you can read problem described above for detail

  • pic1 before inject double ecc error

Helloyt_0-1690161765181.png

  • pic2 jump into EIM_DRV_INIT()

Helloyt_1-1690162022962.png

  • pic3 change view to assembly code meanwhile supervise the  SRAM content

Helloyt_2-1690162266612.png

  • pic4 ecc double bit error occur , SRAM had been cleared and hardfault interrupr had not yet been triggered

Helloyt_0-1690162859967.png

  • pic5 hardfault triggered but SRAM content (vector address) is NULL ,thus the program crashed

Helloyt_1-1690163106315.png

 

usually ,to reflect the real register action ,breakpoints should not set in step,so the 2 pictures above show the same program without breakpoints in middle process

  • pic1 before inject double ecc error

Helloyt_0-1690161765181.png

  • pic2 mcu reset and program run to the same breakpoint

Helloyt_0-1690161765181.png

0 Kudos
Reply