MPC5777M: using xBIST for critical safety applications

emmanuel_sirand · ‎12-16-2020

My concern is about xBIST of MPC5777M for critical safety applications.

I would like to launch LBIST/MBIST, as recommended in Reference Manual, Safety Manual and AN5131 in order to detect as soon as possible the internal failures of MPC5777M peripherals, memories and blocks.

But the current documentation of MPC5777M (listed above) does not provide sufficient level of technical description and verification activities to guarantee that :

- it can really detect the internal failures (coverage announced up to 90 and 94% for LBIST, depending of patterns used for boundary scan), not known for MBIST

- it will not affect the availability of MPC5777M when it is performed off-line and/or online (as we intend to use also as power-up tests without applications running, if permitted) even is we configure all LBIST/MBIST failures as recoverable

Indeed, worst of all will be to "lose" MPC5777M after xBIST launching: the xBIST disadvantage would be much greater than benefits.

Moreover, as most of xBIST is described in AN5131, not in "specification document" like data sheet or reference manual, I wonder in which way this critical feature, depending of boundary scan patterns (provided in 4 codes off/on-line in AN5131) has been tested by NXP to guarantee that it will run in all environment with all configurations from customers.

Our applications are critical safety, non-automotive.

Please advise. Thanks.

sandeepkumarbom · ‎01-19-2021

Hi Emmanuel,

The verification of the BIST to detect internal failures is not something that we provide to our customers. This is performed within the NXP team and will provide the overall coverage provided by the BIST. It is hard to guarantee it when the customer has options that impact the coverage, table 8 from App note AN5131. The app note is the only definition that we specify. it is hard to put something in the datasheet about that. If the customer wants to use a particular code, NXP team can help validate the option.

One sort of nomenclature issue. Boundary-scan typically refers to the scanning of the pins. The term is usually not used for internal scan chains.

Regarding the availability of the MPC5777M when running the BIST whether offline/online, it is recommended to check the type of failure before starting any safety application. If it is configured to run within the power-up test procedure without any application, it still can flag failures in bad memory regions if any, and go into the recommended safe state. So this is always a benefit I would say.

Let me know if you have any further questions.

Thanks,

Sandeep

emmanuel_sirand · ‎01-21-2021

Hi Sandeep,

Thanks for your feedback.

More precisely, here are the topics that I would like to understand more:

MPC5777M OFFLINE BIST

My opinion: As ONLINE BIST enables to tests all LBIST & MBIST partitions, we plan to use ONLINE BIST. As OFFINE BIST cannot test all LBIST partitions and to avoid multiple tests (OFFLINE & ONLINE) that may require additional verifications for us to justify correct execution to authorities, I plan to use ONLINE only.

My question: Is it mandatory to execute OFFLINE BIST first (as recommended in AN5131) ? Does BIST work correctly without OFFLINE that has been run previously (i.e. only ONLINE) ?

MPC5777M ONLINE BIST during shutdown

My opinion: For safety critical applications, it is more convenient to run (ONLINE) BIST at power-up, before SW execution. Then the complete environment is at a known safe state when critical applications start running. Then we perform CBIT (continuous built In Tests) to monitor important features of MPC5777M (collected in FCCU) as well as external interfaces (loopbacks, activity checks, range checks,…). At power-up, before launching ONLINE BIST, we plan to log in external NVM the information that we are about to run the BIST (so we expect a reset) then we wait in infinite loop to not interfere with BIST.

My question: is it mandatory to execute ONLINE BIST during system shutdown (as recommended in AN5131) ? Does ONLINE BIST work correctly at power-up ? If yes, which Core is it recommended to configure STCU2 ? Is it IO Core (only used for initialization) or Core 0 (used for critical applications, as well as Core 1) ?

BIST Sanction

My opinion: We prefer to configure all LBIST and MBIST (ONLINE) partitions failures to recoverable, so there will be no reset and no reset escalation. As BIST mechanism and BIST patterns are not defined in "specification" document (RM or data sheet), I can not justify that there will be no untimely wrong (fake) failure. So we prefer to use our SW to monitor FCCU BIST flags and STCU2 results and take the appropriate action. In case of failures of memory or logical partitions used, SW will log the BIST partition error (in external NVM) and will hold in reset.

My question: What do you think of this behavior? Why Safety Manual and AN5131 recommend to configure BIST errors as unrecoverable but the attached code examples of AN5131 show the contrary, i.e. configure as recoverable? Don’t you trust the STCU2 sanction?

LBIST mechanism:

Is LBIST identical to production tests (NXP or TSMC) ? How close are they?

Could you please explain a bit the LBIST behavior? How can a single LBIST partition (=5) test slightly different functions like SDADC_[0,2,4,6,8], SARADC_[0,4,B], BAR, CRC_0, DSPI_[0,1,4,6,12], IIC_0, PSI5_0, SENT_0, LINFlex_[0,1,14,16], CAN, PBRIDGE_0, MEMU, WKPU, NAR, SPU, PIT?

Could you please explain the value of LBIST registers :

STCU2_LB_CTRL: SCEN_ON/OFF = 5 cycles
STCU2_LB_PCS = pattern stop
STCU2_LB_PRPG : PRPG start value
STCU2_LB_MISREL : expected pattern

MBIST mechanism :

My question: Is MBIST equivalent to write/read test of 0 / 1 pattern at all memory addresses ? Are all memory cells tested ? If we run MBIST full test mode (or reduced RunBIST mode), can we remove the SW legacy write/read test of the whole memory (pattern AA, 55, incremental) as already covered by MBIST?

Is MBIST identical to production test at NXP (or TSMC) ?

MBIST recoverable / unrecoverable errors:

My opinion: We can configure the STCU2 logical and memory errors as recoverable or unrecoverable on a partition basis only . STCU2 is not capable to distinguish correctable or non-correctable errors (for SECDED memories). My understanding is that all single or multiple memory errors generate unrecoverable or recoverable errors, depending on the configuration of the partition but not on the nature of the error (single or multiple or other). To distinguish the nature of memory error, SW has to analyze the MEMU errors status, as documented in AN5131.

Please confirm my understanding.

So It seems the sentence in §11.1 of AN5131 is confusing “In general the device should be configured such that if there is an LBIST failure, or MBIST detects uncorrectable failures, the STCU2 will cause a destructive reset, causing execution of the self-test again.”

It should be written “In general the device should be configured such that if there is an LBIST failure or MBIST failure, the STCU2 will cause a destructive reset, causing execution of the self-test again.”

Please confirm or explain the sentence of AN5131.

BIST results analysis:

My opinion: in case of LBIST error (for a specific partition), we can not detect which block inside a partition is wrong . For example for LBIST partition 5, there are more than 10 different blocks. We plan to use some of them only. So we have to apply the strongest sanction (reset) to all blocks of the same partition, due to the fact that a potential error could affect a critical used block...

Is there a way to detect which block has failed (by analysis of pattern results or other) ?

Please confirm.

Thanks again!

Emmanuel

sandeepkumarbom · ‎02-08-2021

rest of the questions,

Is LBIST identical to production tests (NXP or TSMC) ? How close are they?

[Sandeep] - Yes the partitions provided in the App note are guaranteed configuration and test done at NXP.

Could you please explain the value of LBIST registers :

STCU2_LB_CTRL: SCEN_ON/OFF = 5 cycles

[Sandeep] This register is for LBIST controller. registers define the Controller to run sequentially or in sequence.

STCU2_LB_PCS = pattern stop

[Sandeep] This defines the pattern count.

STCU2_LB_PRPG : PRPG start value

[Sandeep] This defines the seed value.

STCU2_LB_MISREL : expected pattern

[Sandeep] This defines the MISR expected value - low or high

Is MBIST identical to production test at NXP (or TSMC) ? [Sandeep] - Yes the partitions provided in the App note are guaranteed configuration and test done at NXP.

MBIST recoverable / unrecoverable errors:

My opinion: We can configure the STCU2 logical and memory errors as recoverable or unrecoverable on a partition basis only . STCU2 is not capable to distinguish correctable or non-correctable errors (for SECDED memories). My understanding is that all single or multiple memory errors generate unrecoverable or recoverable errors, depending on the configuration of the partition but not on the nature of the error (single or multiple or other). To distinguish the nature of memory error, SW has to analyze the MEMU errors status, as documented in AN5131.

Please confirm my understanding.

So It seems the sentence in §11.1 of AN5131 is confusing “In general the device should be configured such that if there is an LBIST failure, or MBIST detects uncorrectable failures, the STCU2 will cause a destructive reset, causing execution of the self-test again.”

It should be written “In general the device should be configured such that if there is an LBIST failure or MBIST failure, the STCU2 will cause a destructive reset, causing execution of the self-test again.”

Please confirm or explain the sentence of AN5131.

[Sandeep] - your understanding is correct. you can read the address from the MEMU and take the reaction.

My opinion: in case of LBIST error (for a specific partition), we can not detect which block inside a partition is wrong . For example for LBIST partition 5, there are more than 10 different blocks. We plan to use some of them only. So we have to apply the strongest sanction (reset) to all blocks of the same partition, due to the fact that a potential error could affect a critical used block...

Is there a way to detect which block has failed (by analysis of pattern results or others)

[Sandeep] - please refer to the attached EB.

sandeepkumarbom · ‎02-08-2021

Hi Emmanuel,

Sorry for the late response to these questions. I have to go check with some other folks before responding. Here are some feedback and response for your questions below,

My question: Is it mandatory to execute OFFLINE BIST first (as recommended in AN5131) ? Does BIST work correctly without OFFLINE that has been run previously (i.e. only ONLINE) ?

[Sandeep] - It is not necessary. It is a recommendation so that you perform the BIST before starting the safety application. Doing offline BIST ensures no critical errors in the system causes an issue. You can run online bist without offline bist.

My question: is it mandatory to execute ONLINE BIST during system shutdown (as recommended in AN5131) ? Does ONLINE BIST work correctly at power-up ? If yes, which Core is it recommended to configure STCU2 ? Is it IO Core (only used for initialization) or Core 0 (used for critical applications, as well as Core 1) ?

[Sandeep] - It is not mandatory to execute ONLINE BIST during the system shutdown. It is a recommendation because this way at shutdown you can perform all the checks and do not interfere without any application interference or any device configuration. Online testing is intended for a full BIST of the MCU. The major difference between online/offline is it can be configured as per the system requirements when it can run and perform all the checks for the BIST. Some customers have timing requirements for the application startup, so in this case, some checks can be performed at shutdown via online BIST.

But to run online BIST at power-up is not possible. Online BIST is not configurable for a power-up test.

My question: What do you think of this behavior? Why Safety Manual and AN5131 recommend to configure BIST errors as unrecoverable but the attached code examples of AN5131 show the contrary, i.e. configure as recoverable? Don’t you trust the STCU2 sanction?

[Sandeep] - Your described behavior should work. I only worry that part of the cycle, the software is still accountable for any unrecoverable faults from failures of memory or logical partitions. This is one reason for an offline bist to do part of the BIST before starting the application.

emmanuel_sirand · ‎02-09-2021

Hi Sandeep,

Many thanks for your answers ! Could you please clarify and confirm my understanding :

The document that you provided EB823 refers to MPC577xK and not MPC5777M so the LBIST and MBIST partitions are different. So I could not find any relevant answer about the means to detect the particular peripheral (FlexRay, CAN, DSPI,…) that failed during LBIST, as the registers STCU2_LBSSW (for online) provide the failure status for the overall partition that may contain many different peripherals. MBIST provide more information as 78 different bit status enable to detect which memory has failed (SRAM, NAR, HSM, IMEM,…) and often more : which sector of the memory has failed.

My question is: is there a way to detect which particular peripheral has failed, inside the same LBIST partition? For instance, by accessing the peripheral itself? Is the peripheral failure logged anywhere?

You said “Yes the partitions provided in the App note are guaranteed configuration and test done at NXP.”

We plan to use online “medium” self-test level, as described in AN5131 (48ms duration).

My question is: Could you please confirm that this BIST configuration is tested during NXP production tests, at full operating temperature range (Ta -40/+125C or Tj -40/+150C), on each device before shipping to customer (i.e. unitary tests)? And not only tested during NXP V&V tests on a batch of device/samples?

Thanks again

Emmanuel

MPC5777M: using xBIST for critical safety applications

MPC5777M: using xBIST for critical safety applications

MCUs & MPUs