S32K116 EIM none-correctale error cleared SRAM cause program crash

Helloyt · ‎07-20-2023

Hi,

I tried to test the action of ERM after an double bit ECC error occured(none-correctable),to my understanding,in S32K116 ,after an double bit ECC error occured the hardfault_handler will first be triggered ,and user can add some code to the hardfault_handler to deal the error flag in ERM

but when i injected an double bit ECC error to SRAM_U, the SRAM was dirrectly cleared with the vector_table in SRAM reset to 0x00000000 ,and after an hardfault_handler trigger ,the pc(core register) jumpt to 0x00000000 to fetch code ,and then ,the program crashed.

as you know , stack locte in S32K116'SSRAM_U ,therefor ethe double bit ECC error will be triggered in EIM_DRV_Init() first ,following is the debug info before enable EIM module

pic1

I redefined NMI_Handler,HardFault_Handler at following code ,so their func address in SRAM is not default value(WDG_Handler,0x46d)

void ECC_check_init(void)
{
ERM_DRV_Init(INST_ERM1, ERM_CHANNEL_COUNT0, erm1_InitConfig0);

INT_SYS_EnableIRQ(ERM_fault_IRQn);
INT_SYS_InstallHandler(ERM_fault_IRQn,erm_error_cbk,NULL);

INT_SYS_InstallHandler(NonMaskableInt_IRQn,Fault_Handler,NULL);
INT_SYS_InstallHandler(HardFault_IRQn,Fault_Handler,NULL);
INT_SYS_InstallHandler(SVCall_IRQn,Fault_Handler,NULL);
INT_SYS_InstallHandler(PendSV_IRQn,Fault_Handler,NULL);

}

void Fault_Handler(void)
{
while(1);
}

pic2

to describe the problem in detail ,the code line before trigger ECC double bit error will need to change to disassembly view as you see in pic2(no breakpoint in the program also will cause crash)

the SRAM was cleared,and the register xpsr == 0x81000000 ,which mean no interrupt triggered,the same time(or before SRAM cleared) the ERM NEC0 was set to 1

pic3

then,go on signal step debug, xpsr == 0x81000003 ,it seems an hardfault_handler interrupt was triggered and the program crashed

I guess that after xpsr change to 0x81000003 the pc(register) will go to SRAM address 0x2000000c to fetch the entry of hardfault_handler,but as you see, the SRAM was cleared by some reason,at the end ,the program crashed because the error address

so,what caused the SRAM cleared ,and what should i do to deal double bit error in hardfault handler?

danielmartynek · ‎07-25-2023

Hi @Helloyt,

When the MCU detects a non-correctable ECC error, it fetches the HardFault_Handler vector (or BusFault_Handler if enabled).

But because the vector is in SRAM_U, it detects another non-correctable ECC error during the vector fetching.

And this escalates the hard fault exception to Core Lockup.

Can you leave the vector table in the flash?

Declare symbol

__flash_vector_table__

in the startup.h file.

And define in the linker file.

void HardFault_Handler(void){
   while(1){}
}

Regards,

Daniel

View solution in original post

danielmartynek · ‎07-25-2023

Hi @Helloyt,

When the MCU detects a non-correctable ECC error, it fetches the HardFault_Handler vector (or BusFault_Handler if enabled).

But because the vector is in SRAM_U, it detects another non-correctable ECC error during the vector fetching.

And this escalates the hard fault exception to Core Lockup.

Can you leave the vector table in the flash?

Declare symbol

__flash_vector_table__

in the startup.h file.

And define in the linker file.

void HardFault_Handler(void){
   while(1){}
}

Regards,

Daniel

Helloyt · ‎07-25-2023

That's amazing ! the code runs well !

thanks for your patience and wisdom.

danielmartynek · ‎07-24-2023

Hi @Helloyt,

Thank you for the detailed description.

I understand that VTOR is in the SRAM_U region. Therefore, once the EIM is enabled on SRAM_U and a non-correctable error is injected (by reading SRAM_U data or unstacking), it fetches the HardFault_Handler from SRAM_U and this causes Core Lockup which is a system reset source on the MCU.

Can you read the MCR_SRS[LOCKUP] flag?

Regards,

Daniel

Helloyt · ‎07-24-2023

Hi ,@danielmartynek

I think your mean is to supervise the LOCKUP flag in System Reset Status Register (RCM_SRS) ,following pictures shows the change of RCM_SRC[LOCKUP]

pic1 before ecc error occur

pic2 a reset occured

the picture2 shows the core was lockup , but I not sure whether the NULL vector caused core lockup or the ecc double bit error caused core lockup with SRAM cleared.

p.s. I found that this time the ecc error triggered after the EIM gloale enable instead of EIM channel enable yesterday ,with little code changed.

danielmartynek · ‎07-21-2023

Hi @Helloyt,

Can you post the images in a better resolution?

Thank you,

BR, Daniel

Helloyt · ‎07-23-2023

Hi @danielmartynek ,picture illustrated above are displayed in a better way here ,and you can read problem described above for detail

pic1 before inject double ecc error

pic2 jump into EIM_DRV_INIT()

pic3 change view to assembly code meanwhile supervise the SRAM content

pic4 ecc double bit error occur , SRAM had been cleared and hardfault interrupr had not yet been triggered

pic5 hardfault triggered but SRAM content (vector address) is NULL ,thus the program crashed

usually ,to reflect the real register action ,breakpoints should not set in step,so the 2 pictures above show the same program without breakpoints in middle process

pic1 before inject double ecc error

pic2 mcu reset and program run to the same breakpoint