I have a problem with the eTSEC controller(on both P1020 and P2020) that under heavy load with malformed packets (mostly CRC errors but also short frame errors), it stops receiving and doesn’t recover even when the malformed packets stop coming. No more interrupts are generated and the RBPTR0 register (RxBD ring 0 pointer) will not be updated anymore. I am using the Linux "gianfar" driver that configures 2 Rx queues for P1020 (as it has support for 2 groups, one group per core) and only one Rx queue for P2020. However, the problem occurs on both eTSEC versions.
The only way to recover RX functionality is to stop RX DMA ( MACCFG1[Rx_EN] = 0) and start it again (MACCFG1[Rx_EN] = 1).
All these investigations were triggered by some customer issues when it is Tx that stops under heavy load with malformed packets. I was not able to reproduce this TX stop but we can easily reproduce the Rx stop.
A possible hint to the actual root cause of the problem may be RSTAT bit 6 that eTSEC sets just before the error condition to occur (The RSTAT has the 0x02000080 value when the problem occurs).
That bit is not documented and I would be very interested what that bit may mean when set by the HW . It is a w1c bit as I can reset it by just writing 1.
Looking at the HW counters implemented in eTSEC, I can see that both RPKT and RDRP packets get incremented (with the same amount !) which to me it looks like the MAC is still receiving frames but it is dropping them as the DMA RX hangs.
I couldn't find a documented errata for this behavior.
I need a way to recover Rx functionality without manual intervention. Of course, performance degradation is expected when packets are malformed (due to collisions) but our customers expect RX functionality to work well when no more malformed packets are coming.
As RX is asynchronous, I don't see a way for the driver to restart the interface. No Rx error is triggered when such condition occurs.
Any help is appreciated