Hi PFE experts
Customer: LGE/Mobis
Platform: S32G2
Module: PFE slave driver 1.6.0
Our customer is using BSP36, PFE slave driver 1.6.0 and PFE MCAL driver 1.3.0.
Customer is implementing A53 ungraceful reset. If A53 doesn't send heart beat, M7 will force reset A53.
After 4~5 consecutive reset tests, PFE FW outputs the error log below.
They using the following sequence as below:
|
/* 1. Disable Interrupt router for GIC500 */
disable_a53_interrupt_routing(); /* 2. Remove the PFE logical interface */ /* 3. Flush the slave HIF internal Rx bd */ /* 4. Clear PFE Port coherency register */ /* 5. Turn off A53 cores/partition */ /* 6. Enable Interrupt router for GIC500 */ /* 7. Turn on A53 cores/partition */ |
I set up a similar environment and could not reproduce the issue, the logs from my test are normal.
could you help to analysis that under what condition the error will occur?
Appreaciate for your support.
Best Regards,
Leo
This also makes us confused. Reading the code, I think this is for the use case when 1 EMAC is connected to multiple HIF channels of the same driver. But I think this is a use case not recommended anymore. And it should definitely not be used in slave mode.
I can see that this command does not give an error, but I don't know if it would create this error or not:
libfci_cli logif-update -i emac2 --egress hif
What makes me confused is that the tx_port =PFE_PHY_IF_ID_HIF. Under what condition the tx_port is equal to PFE_PHY_IF_ID_HIF? I will ask customer to share their PFE configuration.
BR,
Leo
Thanks for your answering, I will confirm with customer how they configure the PFE.
BR,
Leo
Hello, this error comes from the FW in function fp_replica_hif_rx_scaling(). It's the infinite loop protection for the while loop in this function. It tries to find the next active HIF to send the packet to but cannot find one (it just got disabled by the BD flush).
To investigate further, try to look if there is a specific option to enable traffic spreading. Is the Linux driver really configured with only 1 HIF. I'm not sure if this FW code is always executed or only in some specific configuration. The FW should determine that the frame is aimed at a disabled interface and drop it.
Other thing is that the Linux driver must not be active during the call of Eth_43_PFE_ChannelBdFlushRx(). If the kernel is in Panic, this is the case, but depending on how they trigger the hart beat loss, it might not be the case. To ensure this, maybe it is safer to turn off the Linux partition before calling the BD flush API. Or they can make sure that the kernel is in Panic or other mode where nothing is executed.
I assume they use bridge mode and it looks like traffic is flowing during the recovery process. Did you setup this as well in your environment?