FCCU Recovery & Reaction Types with FS26 SBC

JRodrigues · ‎10-02-2024

Hello everyone,

I'm working with the FCCU Module on the S32K344 microcontroller and integrating it with FS26 and, although it was already discussed here in the forum, i am still having trouble understanding the faults' recovery and reaction types.

From the Safety Manual i got this:

R1: Enter Alarm State and asserts Alarm Interrupt. If timeout expires, enters Fault State, indicates error in EOUT and asserts NMI Interrupt "so that the application can attempt to recover the fault"
R2: Enters Fault State, indicates error in EOUT and asserts NMI Interrupt "so that the application can attempt to recover the fault"
R3: Enters Fault State, indicates error in EOUT Pins and gives the chip a functional reset

But, from the following thread, S32K344 Fault maps, Fault Reactions and SPD eMCEM

it seems that R3 actually gives the chip a Power-on-Reset. So, what is the correct assumption?

Also, while on this, what is the intent of the "so that the application can attempt to recover the fault" in the R2 type? I assumed that once the module enters the FAULT State it is a "critical" state and the objective was to store the fault's information and give the chip a reset (set the reaction type to short reset reaction). Am i thinking wrong?

Moreover, in the List of Fault Sources table from AN13323, the first fault groups (NCF[0] to NCF[3]) have the recommended recovery mechanism as a Functional Reset. Knowing the Source Module i configured them to be Hardware-Recoverable and the reaction type to be a Short Reset Reaction. Is it correct or do i need to set any other configuration (Activate EOUT Pins, Enable Fault State NMI, etc)?

For NCF[4] fault group the recommended recovery mechanism is "Interrupt followed by SBC initiated POR recovery initiated in interrupt service routine", so i assume it is either R1 or R2 (depending on the fault configuration) + R3, but my question is: how do i configure that in this table on Design Studio?

Best Regards,

JRodrigues

petervlna · ‎10-03-2024

Hello,

It seems that R3 actually gives the chip a Power-on-Reset. So, what is the correct assumption?

Indeed it can be confusing, but if you look at the safety manual chapter 2.7.8 The R3 fault-reaction type and 2.7.9 Diagram: FCCU fault-reaction type processes you will see:

So Chip itself cannot issue a power on reset. It can only signal to outside world that there is need for such and an SBC can then cut power for the chip or do power on reset.

However the chip itself will also issue functional reset, as this is what it can control. It is for the case where the SBC is not desired to do power on reset or will fail to react.

so that the application can attempt to recover the fault" in the R2 type?

Some faults are not critical for the system as whole and will only affect small area of usage that system can run further.

Imagine it like safety state in car, when can is still able to run, but with limited functionality so you can go to service and don't need to tow it.

Some fault can be recovered, like voltage drop on ADC for example, where it can be caused by weak power supply, or induction,etc... once this issue fade the ADC works fine again, and you don't need to reset whole system.

Knowing the Source Module i configured them to be Hardware-Recoverable and the reaction type to be a Short Reset Reaction. Is it correct or do i need to set any other configuration (Activate EOUT Pins, Enable Fault State NMI, etc)?

This is dependent on your application. Each application will have different requirements.

In general for NCF[0] - [3] you will need at least reset. As ECC fauls in flash will require re-programming of it in order to clear them.

For NCF[4] fault group the recommended recovery mechanism is "Interrupt followed by SBC initiated POR recovery initiated in interrupt service routine", so i assume it is either R1 or R2 (depending on the fault configuration) + R3, but my question is: how do i configure that in this table on Design Studio?

I am not familiar with this in S32DS, but

You simply check all options I assume. as you need EOUT to be active. Also Alarm with timeout, and the NMI.

So if you are able to solve it in ALARM interrupt, then you will recover from there. If not then EOUT will signal out the issue and SBC will take actions.

Best regards,

Peter

JRodrigues · ‎10-03-2024

Hello @petervlna,

Thank you for your quick reply!

However the chip itself will also issue functional reset, as this is what it can control. It is for the case where the SBC is not desired to do power on reset or will fail to react.

I have a question on this. FS26 has a "microcontroller recovery strategy" which, in case a fault is indicated through the EOUT Pins, it opens the watchdog window to perform a fault recovery strategy and the goal is to "avoid resetting the microcontroller while it is trying to recover the application from a failure event". So, in this case, what type of reaction is this? R1, R2 or R3?

I thought of indicating the error through the EOUT Pins, set the reaction as a short reset and expect a functional reset to happen, making the program start again from the beginning (the main function) and clear all faults in "eMcem_Init()", but even though the Functional Reset happens, the program doesn't restart and FS26 eventually closes that watchdog window, opens its initialization phase again and, since i can't initialize it because the program didn't restart, it eventually gives a POR. I wanted to avoid this POR.

(I'm sorry if it's confusing. Feel free to ask for more details or help understanding what i am trying to say)

Some fault can be recovered, like voltage drop on ADC for example, where it can be caused by weak power supply, or induction,etc... once this issue fade the ADC works fine again, and you don't need to reset whole system.

So for some fault groups i can call the "eMcem_ClearFaults(nFaultId)" API function inside the NMI Handler after storing the fault information?

So if you are able to solve it in ALARM interrupt, then you will recover from there. If not then EOUT will signal out the issue and SBC will take actions.

But with that fault group being Voltage Related Errors, does it make sense going to the ALARM Interrupt since it clears the bit that sets the fault through SW?

I assumed that since it is a problem related to voltages, even if i try to clear it through SW, it will be set again because it's an HW issue. Am i assuming it wrong?

petervlna · ‎10-04-2024

Hello,

So, in this case, what type of reaction is this? R1, R2 or R3?

i assume that it is on your application needs. If you select R3 then you will issue functional reset and it doesnt matter if the external SBC will not issue power on reset, your device will be reset.

Not whole, just part witch functional reset resets.

I would use R2 for this scenario, as on the alarm timeout you will have reset from fault state anyway.

R3 will be used basically in scenario where the uC is not able to continue execution and require reset.

So for some fault groups i can call the "eMcem_ClearFaults(nFaultId)" API function

I am not familiar with these drivers, but I can answer you in general how uC works. For API expression please refer to your driver documentation.

I assumed that since it is a problem related to voltages, even if i try to clear it through SW, it will be set again because it's an HW issue. Am i assuming it wrong?

Well, sometimes yes, and sometimes no. If you have transient voltage drop on non vital supplies, like ADC, then after glitch is gone, you can continues without need for reset of the whole device.

If you have voltage drop on core/memories for example, it is highly unlikely that you can execute anything in interrupt with unstable core voltage and you will need some safe state like reset to execute.

best regards,

Peter

JRodrigues · ‎10-04-2024

Hello @petervlna,

I'm sorry for insisting on this, but i am still confused with all the different reaction types due to seeing different opinions on this from both NXP Employees and the documentation available on this topic.

Just to clarify: In terms of the configuration in Design Studio, what makes the difference between both R2 and R3 types is the "Fault State NMI Enable" option, which is enabled in R2 but disabled in R3. Is this correct?

Also, the SW or HW Recoverable recovery-types aren't directly related to these reaction-types, correct? If i am wrong, how are they related in terms of Design Studio configurations?

Moreover, if i don't select the reaction-type to be short reset reaction, but call "Mcu_PerformReset()" inside the NMI Handler, isn't it the same as setting the reaction-type to be short reset reaction?

Finally, we've discussed how i could clear some faults inside the NMI Handler, but this is another topic that has different opinions. The documentation states that once we enter the NMI Handler the SW Recovery will store the fault information and initiate a functional reset and there's no fault clearance. So, what is the correct assumption considering the Device Safety Flow below?

Best Regards,

JRodrigues

petervlna

Hello,

Just to clarify: In terms of the configuration in Design Studio, what makes the difference between both R2 and R3 types is the "Fault State NMI Enable" option, which is enabled in R2 but disabled in R3. Is this correct?

Differences between each reactions in SAF are described in safety manual:

I do not know if NMI is disabled in R3, most probably it is enabled as backup if the reset fails.

This is rather question for SAF team. Which will require a ticket at NXP.com as the information are not public.

Also, the SW or HW Recoverable recovery-types aren't directly related to these reaction-types, correct? If i am wrong, how are they related in terms of Design Studio configurations?

You will configure the SW and HW recoverable faults for your application needs in the FCCU.

Moreover, if i don't select the reaction-type to be short reset reaction, but call "Mcu_PerformReset()" inside the NMI Handler, isn't it the same as setting the reaction-type to be short reset reaction?

ok, and if your core do not have stable voltage? Then how you will execute the interrupt on voltage violation? cna you trust your RAM and core with unstable voltage?

I think is better to use HW short reset which is directly wired from FCCU to RGM.

But all depends on the standard requirements and how you intend to reach it.

once we enter the NMI Handler the SW Recovery will store the fault information and initiate a functional reset and there's no fault clearance. So, what is the correct assumption considering the Device Safety Flow below?

Again, this really depends on the safety level / standard you want to achieve.

Sure, you can use NMI for fault analyzes as this is common for automotive products.

Best regards,

Peter

JRodrigues

Hello @petervlna,

I assumed the "Fault State NMI" is disabled in the R3 type because in its definition on the Safety Manual it doesn't mention the "Asserts the fault interrupt" as in the R2 type, but i will speak with someone from the SAF Team to clarify this.

Thank you for your help!

Best Regards,

JRodrigues