eTSEC (P1020 & P2020) stops receiving

pvilian · ‎09-20-2018

Hello

I have a problem with the eTSEC controller(on both P1020 and P2020) that under heavy load with malformed packets (mostly CRC errors but also short frame errors), it stops receiving and doesn’t recover even when the malformed packets stop coming. No more interrupts are generated and the RBPTR0 register (RxBD ring 0 pointer) will not be updated anymore. I am using the Linux "gianfar" driver that configures 2 Rx queues for P1020 (as it has support for 2 groups, one group per core) and only one Rx queue for P2020. However, the problem occurs on both eTSEC versions.

The only way to recover RX functionality is to stop RX DMA ( MACCFG1[Rx_EN] = 0) and start it again (MACCFG1[Rx_EN] = 1).

All these investigations were triggered by some customer issues when it is Tx that stops under heavy load with malformed packets. I was not able to reproduce this TX stop but we can easily reproduce the Rx stop.

A possible hint to the actual root cause of the problem may be RSTAT bit 6 that eTSEC sets just before the error condition to occur (The RSTAT has the 0x02000080 value when the problem occurs).

That bit is not documented and I would be very interested what that bit may mean when set by the HW . It is a w1c bit as I can reset it by just writing 1.

Looking at the HW counters implemented in eTSEC, I can see that both RPKT and RDRP packets get incremented (with the same amount !) which to me it looks like the MAC is still receiving frames but it is dropping them as the DMA RX hangs.

I couldn't find a documented errata for this behavior.

I need a way to recover Rx functionality without manual intervention. Of course, performance degradation is expected when packets are malformed (due to collisions) but our customers expect RX functionality to work well when no more malformed packets are coming.

As RX is asynchronous, I don't see a way for the driver to restart the interface. No Rx error is triggered when such condition occurs.

Any help is appreciated

Best regards,

Vilian

norihiromichiga · ‎03-25-2024

Hello Vilian,

I'm FAE of distributor for NXP products in Japan. I hope you find this messasge.

We are facing unexpected Rx stop issue on P2020 and we are interested in the issue you reported in this thread.

If possible, could you explain us more detail about your test conditions and actual malformed packets you used?

Regards,

Norihiro Michigami

AVNET

bpe · ‎09-24-2018

The observations described here are typical for problems with the interface
clocks. If the controller sees an irregularity on an input clock and takes
it as a false edge, the FIFO may lock. There is no way to detect this
situation by reading registers because the hardware "trusts" the clocks
and has no means to detect deviations from the expected waveforms.
Another reason for Rx lockups can be improper behaviour of Rx_DV,
e.g. spikes, incorrect transition timing, etc. The suggestion is
to make sure all MAC/PHY interface signals fully satisfy the requirements
of the processor Hardware Specification and the specification of the
MAC/PHY interface and are free of any irregularities.

Have a great day,
Platon

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------