FCCU Eout Pin

HeebeomPark · ‎10-12-2023

We are not using FCCU Eout Pin. Instead of Eout Pin, we plan to use NMI and trigger reset in the interrupt vector.

Our understanding of Eout Pin is to give the chance to recover the fault. In our use case, it could have more reset.

In order to ensure we would like to predict the probability of the reset due to the fault notification from FCCU. We would like to compare between Eout Pin use case and No Eout Pin use case how often more reset could occur.

Is there any way that we can quantitively estimate it?

davidfantl · ‎02-13-2024

Hello,

Per the latest response (and previous responses) on the topic of FCCU Eout Pin, we are considering this topic closed. Thank You.

在原帖中查看解决方案

HeebeomPark · ‎02-02-2024

Dear Dave Fantl,

Please share the following :

- The failure modes covered by SM3.FCCU_MON [PMIC and FCCU Eout]

- The cut-set [2nd/3th] mitigated by SM3.FCCU_MON [PMIC and FCCU Eout]

Also, what is the relation with reaching to ASIL D by using SM3.FCCU_MON [PMIC and FCCU Eout]

Many thanks,

Yashwant_Singh · ‎02-11-2024

Hello Heebeom,

1. PMIC is responsible for monitoring the EOUT and other signals to generate a system state indication signal.

2. The system uses the system state indication signal (or signals) to ensure a reliable transition into a system safe state. The system waits for a grace period before performing that transition to allow the chip to apply a recovery measure without being disturbed.

3. SM3.FCCU_MON is the monitoring of the FCCU output pins by the PMIC. EOUTs indicate the error sate of MCU to the PMIC. As such it is an integral part of the safe state handling (see Chapter 5 of the Safety Manual) and an intrinsic element of the fault reaction and fault recovery flow assumed by all safety mechanism (see sections 2.7 and 2.8 within the Safety Concept chapter of the Safety Manual) and the overall Safety Concept

4. Any fault occurrence has to be serviced within FTTI and in case of fault reaction timeout EOUT is asserted so the external system monitoring the EOUT signal knows that system safe state has to be achieved within FTTI.

5. Chip functional reset(R3 reaction) is the recommended reaction in case of critical fault such as Core lockups, uncorrectable SRAM/FLASH errors etc. R3 reaction is the assertion of chip functional reset along EOUT assertion.

6. Now in your scenario

The NMIs are used to trigger a reset but the critical faults might render the application incapable of doing so(catch -22 situation). Such critical faults include those of XBAR,SRAM, FLASH etc. These are the common cause failures which we are trying to caution you about.
Without EOUT monitoring the timeout of your fault handling has to be managed accordingly.
This is an impact analysis that has to be performed by you for your programming of the fault handling aspects in context of your intended processing for the selected faults.

7. NXP recommends an R3 reaction for faults considered in point 5(See the fault map in Reference Manual). This avoids the complications mentioned for your approach.

8. The dependent failure analysis (and hence the cut sets) is performed for the chip’s architecture with respect to the assumed safety concept. NXP’S safety concept already recommends the reaction for faults associated with the fault recovery concept are not repeated as part of the dependent failure analysis performed for this device. Instead related dependencies are listed in the AoU(s) associated with this concept (e.g. master safety core) and are not considered for the dependent failure analysis. Deviating from this concept does invalidate this assumptions and would require a corresponding analysis as is already highlighted

9. In other words the above discussed common cause failures are relevant for your context alone and it's related fault handling. You need to determine the related cut-sets yourself and justify the mitigation measures to cover them.

10. For all the reasons mentioned above EOUT monitoring cannot be avoided irrespective of the the Target ASIL.

Hoping this finally clears it.

Thanks!

-Yashwant

HeebeomPark · ‎02-12-2024

Dear Yashwant,

thank you very much for your answer. However the answer is not corresponding to the original question we asked about the failure modes and cutsets covered by SM3.FCCU_MON.

Due to the fact that NXP performs the safety anlaysis for the internal component of the MCU, NXP only knows the cutsets. VNE cannot know the internal cutsets of NXP MCU. We kindly ask to provide the information we requested.

Also in response to the comment from #8 like " related dependencies are listed in the AoU(s) associated with this concept (e.g. master safety core) ", can you specify which AoU requirements? Can you provide the identification number of the corresponding AoU(es)?

Many thanks,

davidfantl · ‎02-13-2024

Hello,

Per the latest response (and previous responses) on the topic of FCCU Eout Pin, we are considering this topic closed. Thank You.

HeebeomPark · ‎11-08-2023

We are going to use NMI instead of Eout Pin for ASIL D. However, in terms of the dependent failure can you be more specific about the dependent failure? If we use NMI instead of Eout Pin, why it cannot be used for ASIL D? Can you show up the architecture and the failure and so on in more detail?

Yashwant_Singh · ‎11-20-2023

Hello,

The failure of CGM to correctly generate the core_clock can be the common cause failure for both the AXBS and the Core M7 cores themselves. Another example would be the failure of CGM to correctly generate the AIPS_PLAT clock which will render the various on chip register interfaces unusable for the application cores.

In the event of such faults the application core would not be able to trigger a reset as either the interconnect or the register interfaces are not clocked correctly. Therefore the application will not be able to assert a functional reset by reacting to the NMI sent by the FCCU as a fault reaction.

Although there are measures in place such as independent clock monitoring to take care of such dependent failures but as mentioned in my previous response it is not possible to “quantify” such dependent failures to accurately estimate the probability of reset assertion.

It is recommended to assert EOUT pins because if the internal functional reset generation within the chip is not successful, then PMIC will assert destructive reset to the SoC via RESET_B pin when the watchdog timer window in the PMIC expires. This certainly ups the probability of reset assertion as it avoids dependent failures if we compare the EOUT vs Non EOUT case. This is why we recommend EOUT assertion (irrespective of ASIL target actually) owing to the said higher probability.

Hoping this helps!

Thanks!

-Yashwant

Yashwant_Singh · ‎12-05-2023

Posting on behalf of Michael

As you can extract from the safety manual of the K3 you are changing one of the fundamental aspects of the K3 Safety Concept:

without the observation of the FCCU EOUT pads you cannot identify all situations when the device enters the FaultRec operation mode
à this impacts the coverage via an external HW as is specified in the chapter 2 of the SM
è there is a rather obvious change in coverage of common cause failures that are assumed to be covered by the external SW related HW
When you plan to trigger a reset within the NMI handler, you are assuming that the NMI handler (which is SW) is capable
to manage all the faults that are otherwise handled by a full HW based fault management that only involves the FCCU and RGM
è As some of the AoU identify there are sometimes requirements for a proper management of faults that may result in a catch-22
situation when you are assuming a working core for performing the fault management, but actually some elements required for a
proper code execution are compromised that would be required by the core (e.g. interconnect, memory I/F, etc)
As Yashwant showed in the example given for such a situation (which is just one of many), you may not be able to respond with SW
when one of the clocks used by the core (in his example the AIPS_PLAT clock) is compromised. Note that this just an example, there
are much more of these dependencies.

The ISO 26262 specifies for such situations the need to perform a safety impact analysis to be performed by the party
that is changing such an assumption, which must include the in-context specifics of the scenario. This is a work that must and can only be done
by Veoneer, since only you have the specifics associated for this condition and the scenario. This is a responsibility NXP cannot take over.
As you are certainly aware this requires to include at least the following aspects:

Which faults are programmed to be recovered by the NMI and not a reset [a topic that is dependent on your system setup]
The capabilities of the external HW (other System Base chip) that is observing the device, and the impact of not observing the FCCU EOUT
Any potential impact of the faults managed via the first bullet on the execution of the core assigned as Master Safety Core by Veoneer
These are all aspects defined by Veoneer

It is not clear to me on how you expect NXP to answer the question for a ‘probability of the reset due to the fault notification by the FCCU’
when the majority of related information is dependent on programmable aspects (first bullet) or your specific HW (second bullet) or even
your intended processing in context of the selected faults. This is a detailed investigation that should/must have been done as part of the above
stated impact analysis. You cannot shortcut this.

Hope this clarifies the question.

Thanks!

-Yashwant

HeebeomPark · ‎12-11-2023

In response to the common cause failures can you provide the following?

- The cutsets [at least 2/3 orders] , which can be covered/mitigated by using Eout Pin that PMIC triggers reset

Yanchen_Shang · ‎11-30-2023

Hello Yashwant,

Thanks for the explaination.

For your information, NXP PMIC is not used in the module, another System Base chip is used with NXP MCU. Could you let us know what effect will have if the EOUT pin is not used for S32K388 MCU? Thanks.

HeebeomPark · ‎11-20-2023

Thank you for your answer.

However in S32K388-289pins_2022_R2.1.xlsx , the relevant SM "SM3.FCCU_MON" is not stated for the portion of what you explained to mitigate the common cause failure. The SM is only used for PADs.

Also it shall be specified in safety manual?

Many thanks,

Heebeom,

Yashwant_Singh · ‎11-24-2023

Hello,

The SM3.FCCU_MON is the external monitoring of Fault Collection and Control Unit Outputs and in the FMEDA it is used for the context of Random hardware failures and not the dependent failures.
The off-chip (and hence independent) safety-related hardware would observe the EOUT signal and asserts the reset externally if the application fails to do so within FTTI. Hence the higher probability for reset assertion when comparing with the non EOUT case.
Please refer the assumptions 59529 and 59496 for more details on EOUT monitoring and section 2.7.8 The R3 fault-reaction type in the safety manual.

Hoping this helps!

Thanks!

-Yashwant

HeebeomPark · ‎11-28-2023

Dear Yashwant,

thank you for your kind answer. The following are the your answers that are prefixed with [Answer from Yashwant_Nr]. I have commented below with the prefix[Question from HB_Nr]

[Answer from Yashwant_1]The SM3.FCCU_MON is the external monitoring of Fault Collection and Control Unit Outputs and in the FMEDA it is used for the context of Random hardware failures and not the dependent failures.

[Question from HB_1] In this case, the external watchdog as indirect detection measures for the

following modes in FMEDA could cover? Also if FCCU Eout Pin is not used, the following portion of FMEDA can be "Not applicable"?

Yashwant_Singh · ‎10-24-2023

Hello,

The probability of reset assertion (or failure to assert the reset) via the SW route can be estimated using the failure rate of the respective modules being utilized in the reset trigger path. The failure rates of FCCU, CORE, NVIC, Interconnects and RGM can be derived from the FMEDA to mathematically compute the probability of a successful reset assertion.

However by configuring the FCCU to assert functional resets for your faults under consideration would bypass the core, interconnects etc. and so has a higher probability of enabling system safe-state.

Regarding the EOUT vs non EOUT it must also be noted that apart from the random hardware failures there is also the case of Dependent failures such as common clock or power for the modules in the reset assertion path and currently there is now way to quantify the dependent failure analysis.

We would strongly recommend to make use of FCCU EOUT pins if the MCU is targeted for ASIL-D application to avoid dependent failures. If the MCU is targeted for ASIL-B application, then maybe you can be more flexible and decide not to use FCCU EOUT.

Thanks!

-Yashwant

HeebeomPark · ‎01-18-2024

Dear Yashwant,

for ASIL D, it was strongly recommended that FCCU Eout Pin shall be used.

Can you explain why the ASIL D requires this in terms of the diagnostic coverage and which failure modes?

Many thanks,

Yashwant_Singh · ‎01-18-2024

Hello Heebeom,

In the initial post of this thread the question was more around FCCU and the device was not mentioned (which was ascertained later in this thread and in other parallel ones to be K3xx) so my initial response was more general in nature.

The intention behind "We would strongly recommend making use of FCCU EOUT pins if the MCU is targeted for ASIL-D application to avoid dependent failures. If the MCU is targeted for ASIL-B application, then maybe you can be more flexible and decide not to use FCCU EOUT" was to cover both the K1xx (targeting ASIL B) and K3x (targeting ASIL D) where the latter has the FCCU/EOUT indication available and the former doesn't.

The K1xx relies upon the external system monitoring the MCUs error flags/status registers for error indication.

Hoping this clears it up.

Thanks!

-Yashwant

HeebeomPark · ‎01-18-2024

Dear Yashwant,

thank you very much. We are going to use S32K3xx. In case ASIL B, then the more flexibility instead of using the EoutPin of FCCU? However for ASIL D, then the Eout Pin must be used?

Many thanks,

Yashwant_Singh · ‎01-18-2024

Hello Heebeom,

In S32K3xx EOUT monitoring is applicable for both ASIL B and ASIL D use cases. This is because the overall chip architecture and susceptibility to dependent failures remains the same.

The major difference between ASIL B and D is the use of M7 cores in split lock and lockstep respectively.

Thanks!

-Yashwant

HeebeomPark · ‎01-22-2024

thank you very much.

Can you explain it with graphical view? So in case lockstep is used, the FCCU is some mitigation measures? For the detail, for which failure modes, the FCCU is needed to reach to ASIL D?

With the provided the FMDEA, FCCU Eout Pin mechanism [SM3.FCCU_MON] are used for FCCU Pins and PAD. It is not used for the core. In order to understand the usage of PMIC with FCCU Eout Pin, we need more information.