ECC ram fault Injection & ECC RAM Punctual fault Management

FabioG · ‎05-17-2024

Hi,

In RM rev 8 is written:

1)So, We enable FCCU reaction for unrecoverable ECC faults and we disabled ERM: SM1.DATA_ECC seems to focus on FCCU usage ...

2) When we develop SRAM Fault injection, we take a look to example package S32K3_SAF_1.0.4_EVAL_D2312.exe. at the end of post follows the Fault Injection snippets

We see that, after configuring and injecting fault to SRAM1 (EMCEM_EIM_CH_1) and triggering it by reading at 0x2024D000 SRAM1 address, an Hard Fault Exeption happens, we have checked on target this behaviour.

a) I Don't find the point on manual that describes this behaviour (Hard fault), can you tell me where is?

b) This hard fault exeption happens also with multibit ECC errors at runtime? Par 20.5 above talks about FCCU or ERM interrupt response and NOT of a hard fault exeption (due to a precise error on bus fault.) If yes how can it be compliant with a FCCU reaction. How should manage it ?

2) In SRAM(0,1) fault injection after eMcem_InjectFault(...SRAM_INJ_CH..), it is necessary to trigger process by reading from a SRAM address. For triggering a DCACHE or ICACHE, after injecting into its fault channell what trigger should I use? I haven't any data address as a trigger, so how sould I trigger it?

3) In a S32k344 MCU in lockstep mode Should I trigger only CM0 Caches ? I suppose that in lockstep mode both CM0 and CM1 cores acts from/to the same memory space, isn't it?

Best Regards,

Fabio

#if defined(S32K344) || defined(S32K358)    
    Platform_InstallIrqHandler(ERM_1_IRQn, &ERM_1_ISR, NULL_PTR);
    Platform_SetIrqPriority(ERM_1_IRQn, 0U);
    IntCtrl_Ip_ClearPending(ERM_1_IRQn);
    Platform_SetIrq(ERM_1_IRQn, TRUE);
    
#if defined(S32K344)    
    IP_ERM->CR0 = ERM_CR0_ENCIE0(1);

    if (eMcem_SetupInjectionChannel(EMCEM_EIM_CH_1, 0, 1 ) != (Std_ReturnType)E_OK)
    {
        while (1U);
    }
    /* When EIM is injected, the STM0 interrupt is processed before reading 0x2042D000 which causing reading from the affected memory section before the intentinal read */
    Platform_SetIrq(STM0_IRQn, FALSE);
    if (eMcem_InjectFault(EMCEM_EIM_CH_1) != (Std_ReturnType)E_OK)
    {
        while (1U);
    }
#elif defined(S32K358)
    IP_ERM_0->CR0 = ERM_CR0_ENCIE0(1);

    if (eMcem_SetupInjectionChannel(EMCEM_EIM_2_CH_0, 0, 1 ) != (Std_ReturnType)E_OK)
    {
        while (1U);
    }
    /* When EIM is injected, the STM0 interrupt is processed before reading 0x2042D000 which causing reading from the affected memory section before the intentinal read */
    Platform_SetIrq(STM0_IRQn, FALSE);
    if (eMcem_InjectFault(EMCEM_EIM_2_CH_0) != (Std_ReturnType)E_OK)
    {
        while (1U);
    }
#endif
    
    /*Address 0x2042D000 is uder SRAM0 for S32K358 and under SRAM1 for S32K344 controller and is not used in demo application.*/
    u32FaultyEccAddress = (0x2042D000);
    u32Read = *(uint32 *)(u32FaultyEccAddress);
    MCAL_DATA_SYNC_BARRIER();

#if defined(S32K344) 
    /* Disabling EIM_EIMCR_GEIEN register in the BusFault is disabling EIM interrupt trigger, so this workaround */
    if( u32FaultyEccAddress == 0xAA55AA55 )
    {
        u32FaultyEccAddress = 0;
        /*Invoke ERM_IRQ*/
        S32_NVIC->ISPR[(uint32)(ERM_1_IRQn) >> 5U] = (uint32)(1UL << ((uint32)(ERM_1_IRQn) & (uint32)0x1FU));
    }
    

    IP_EIM->EIMCR = EIM_EIMCR_GEIEN(0);
    IP_EIM->EICHEN = EIM_EICHEN_EICH1EN(0); 
    IP_ERM->CR0 = ERM_CR0_ENCIE0(0);
#endif
#if defined(S32K344) 
    eMcem_ClearFaults(EMCEM_EIM_CH_1);

and Hard Fault /EIM_IRQ Snippets:

void HardFault_Handler(void)
{
    /* Check if HardFault is forced */
    if( SCB.HFSR & 0x40000000UL )
    {
        if( 0U != SCB.CFSR.B.BFSR )
        {
            /* BusFault exception occurred while it was masked */
            BusFault_Handler();
        }
        else
        {
            while(1U);
        }
    }
    else
    {
        while(TRUE){};
    }
}
void BusFault_Handler(void)
{
        /* ERM ECC multibit error injected*/
        if (u32FaultyEccAddress == S32_SCB->BFAR)
        {
			/*Clean all error in register.*/
            S32_SCB->CFSR = S32_SCB->CFSR;
            S32_SCB->BFAR = 0;
#if defined(S32K344) 
            IP_EIM->EIMCR = EIM_EIMCR_GEIEN(0);
#elif defined(S32K358) || defined(S32K388)
            IP_EIM_2->EIMCR = EIM_EIMCR_GEIEN(0);
#endif
            u32FaultyEccAddress = 0xAA55AA55;

        }
}

void ERM_1_ISR(void)
{
#if defined(S32K358) || defined(S32K388)
    IP_EIM_0->EIMCR = EIM_EIMCR_GEIEN(0);
    IP_EIM_0->EICHEN = EIM_EICHEN_EICH1EN(0);
    IP_ERM_0->CR0 = ERM_CR0_ENCIE0(0);
#endif
    /*Implement code here.*/
}

#ifdef __cplusplus
}

petervlna · ‎05-23-2024

Hello,

Sorry but I think I dont know the way I can read on cache! I think I can write only on SRAM or TCM, but cache is managed by MCU in a trasparent way...Probabilly I miss someting....How I can write to cache ?

As for any cache the read is done by core. So you simply make core to read from cache by executing same instruction multiple times. For example while(1) or for(;;).

B) If ECC multibit fault is managed by FCCU, , should I have to skip any related management into HardFault Exeption caused by the same multibit error ?

This is application specific. It depends on how you wish to handle the fault.

Some applications require redundancy, and some not.

best regards,

Peter

View solution in original post

petervlna · ‎05-20-2024

Hello,

a) I Don't find the point on manual that describes this behavior (Hard fault), can you tell me where is?

This is described in core reference manual:

b) This hard fault exception happens also with multibit ECC errors at runtime? Par 20.5 above talks about FCCU or ERM interrupt response and NOT of a hard fault exception (due to a precise error on bus fault.) If yes how can it be compliant with a FCCU reaction. How should manage it ?

It be triggered on multibit ECC. I also expect you have some other cause for hard fault. Like reset escalation due to ECC. Make sure that you execute the ECC error injeciton only once and stop after reset. To prevent cyclic behavior.

2) In SRAM(0,1) fault injection after eMcem_InjectFault(...SRAM_INJ_CH..), it is necessary to trigger process by reading from a SRAM address.

Yes, as the ECC reporting is active on reads.

For triggering a DCACHE or ICACHE, after injecting into its fault channel what trigger should I use? I haven't any data address as a trigger, so how should I trigger it?

Any read on cache will trigger ECC if the is corrupt.

3) In a S32k344 MCU in lockstep mode Should I trigger only CM0 Caches ? I suppose that in lockstep mode both CM0 and CM1 cores acts from/to the same memory space, isn't it?

Good questions. I expect that only 1 would be enough, as if there is mismatch RCCU will be triggered.

Best regards,

Peter

FabioG · ‎05-21-2024

A) I say:

"For triggering a DCACHE or ICACHE, after injecting into its fault channel what trigger should I use? I haven't any data address as a trigger, so how should I trigger it?"

1)You replied:

"Any read on cache will trigger ECC if the is corrupt."

My new question:

Sorry but I think I dont know the way I can read on cache! I think I can write only on SRAM or TCM, but cache is managed by MCU in a trasparent way...Probabilly I miss someting....How I can write to cache ?

B) If ECC multibit fault is managed by FCCU, , sould I have to skip any related management into HardFault Exeption caused by the same multibit error ?

Best Regards,

Fabio

petervlna · ‎05-23-2024

Hello,

Sorry but I think I dont know the way I can read on cache! I think I can write only on SRAM or TCM, but cache is managed by MCU in a trasparent way...Probabilly I miss someting....How I can write to cache ?

As for any cache the read is done by core. So you simply make core to read from cache by executing same instruction multiple times. For example while(1) or for(;;).

B) If ECC multibit fault is managed by FCCU, , should I have to skip any related management into HardFault Exeption caused by the same multibit error ?

This is application specific. It depends on how you wish to handle the fault.

Some applications require redundancy, and some not.

best regards,

Peter