2148022_en-US

S32G: PFE FW error reported after after 4~5 consecutive A53 ungraceful reset

Hi PFE experts

Customer: LGE/Mobis

Platform: S32G2

Module: PFE slave driver 1.6.0

Our customer is using BSP36, PFE slave driver 1.6.0 and PFE MCAL driver 1.3.0.

Customer is implementing A53 ungraceful reset. If A53 doesn't send heart beat, M7 will force reset A53.

After 4~5 consecutive reset tests, PFE FW outputs the error log below.

이미지 (1).png

They using the following sequence as below:

/* 1. Disable Interrupt router for GIC500 */

disable_a53_interrupt_routing();

/* 2. Remove the PFE logical interface */
pfe_logif_disable();

/* 3. Flush the slave HIF internal Rx bd */
Eth_43_PFE_ChannelBdFlushRx(PFE_PHY_IF_ID_HIF1);

/* 4. Clear PFE Port coherency register */
REG_WRITE32(0x4007CA00, 0x0);

/* 5. Turn off A53 cores/partition */
Bl_DisableCore(0, 1);
Bl_DisableCore(1, 1);
Bl_DisableCore(2, 1);
Bl_DisableCore(3, 1);
Bl_DisablePartition(1);

/* 6. Enable Interrupt router for GIC500 */
enable_a53_interrupt_routing();

/* 7. Turn on A53 cores/partition */
Bl_StartApplication();

I set up a similar environment and could not reproduce the issue, the logs from my test are normal.

could you help to analysis that under what condition the error will occur?

Appreaciate for your support.

Best Regards,

Leo

PFE

Re: S32G: PFE FW error reported after after 4~5 consecutive A53 ungraceful reset

This also makes us confused. Reading the code, I think this is for the use case when 1 EMAC is connected to multiple HIF channels of the same driver. But I think this is a use case not recommended anymore. And it should definitely not be used in slave mode.

I can see that this command does not give an error, but I don't know if it would create this error or not:
libfci_cli logif-update -i emac2 --egress hif

Re: S32G: PFE FW error reported after after 4~5 consecutive A53 ungraceful reset

Hi @Sebastian_Raizer

What makes me confused is that the tx_port =PFE_PHY_IF_ID_HIF. Under what condition the tx_port is equal to PFE_PHY_IF_ID_HIF? I will ask customer to share their PFE configuration.

BR,

Leo

Re: S32G: PFE FW error reported after after 4~5 consecutive A53 ungraceful reset

Hi @Sebastian_Raizer

Thanks for your answering, I will confirm with customer how they configure the PFE.

BR,

Leo

Re: S32G: PFE FW error reported after after 4~5 consecutive A53 ungraceful reset

Additional info from my PFE colleagues.
Normally with PFE Linux slave, the function fp_replica_hif_rx_scaling() is executed but the condition if(PFE_PHY_IF_ID_HIF == tx_port) is false so this code is skipped and the function returns with new_port = tx_port;.
For frames targeted at PFE Linux slave driver, the value of tx_port should be 7 in this case (HIF1). But somehow here the value is 3, which I think is a special value used in multi interface mode of PFE Linux standalone driver. This value should not be used for a Linux slave. This might be the hint of a mis-configuration of the bridge or the flexible parser.

Re: S32G: PFE FW error reported after after 4~5 consecutive A53 ungraceful reset

Hello, this error comes from the FW in function fp_replica_hif_rx_scaling(). It's the infinite loop protection for the while loop in this function. It tries to find the next active HIF to send the packet to but cannot find one (it just got disabled by the BD flush).

To investigate further, try to look if there is a specific option to enable traffic spreading. Is the Linux driver really configured with only 1 HIF. I'm not sure if this FW code is always executed or only in some specific configuration. The FW should determine that the frame is aimed at a disabled interface and drop it.

Other thing is that the Linux driver must not be active during the call of Eth_43_PFE_ChannelBdFlushRx(). If the kernel is in Panic, this is the case, but depending on how they trigger the hart beat loss, it might not be the case. To ensure this, maybe it is safer to turn off the Linux partition before calling the BD flush API. Or they can make sure that the kernel is in Panic or other mode where nothing is executed.

I assume they use bridge mode and it looks like traffic is flowing during the recovery process. Did you setup this as well in your environment?