eTSEC (P1020 & P2020) stops receiving

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

eTSEC (P1020 & P2020) stops receiving

619 Views
pvilian
Contributor I

Hello

I  have a problem with the eTSEC controller(on both P1020 and P2020) that under heavy load with malformed packets  (mostly CRC errors but also short frame errors), it stops receiving and doesn’t recover even when the malformed packets stop coming. No more interrupts are generated and the RBPTR0 register (RxBD ring 0 pointer) will not be updated anymore. I am using the Linux "gianfar" driver that configures 2 Rx queues for P1020 (as it has support  for 2 groups, one group per core) and only one Rx queue for P2020. However, the problem occurs on both eTSEC versions.

The only way to recover RX functionality is to stop RX DMA ( MACCFG1[Rx_EN] = 0) and start it again (MACCFG1[Rx_EN] = 1).

All these investigations were triggered by some customer issues when it is Tx that stops under heavy load with malformed packets. I was not able to reproduce this TX stop but we can easily reproduce the Rx stop.

 

A possible hint to the actual root cause of the problem may be  RSTAT bit 6 that eTSEC sets just before the error condition to occur (The RSTAT has the 0x02000080 value when the problem occurs).

That bit  is not documented and I would be very interested what that bit may mean when set by the HW . It is a w1c bit as I can reset it by just writing 1.

Looking at the HW counters implemented in eTSEC, I can see that both RPKT and RDRP packets get incremented (with the same amount !) which to me it looks like the MAC is still receiving frames but it is dropping them as the DMA RX hangs.

I couldn't find a documented  errata for this behavior.

I need a way to recover Rx functionality without manual intervention. Of course, performance degradation is expected when packets are malformed (due to collisions) but our customers expect RX functionality to work well when no more malformed packets are coming.

As RX is asynchronous, I don't see a way for the driver to restart the interface. No Rx error is triggered when such condition occurs.

Any help is appreciated

Best regards,

Vilian

Tags (3)
0 Kudos
1 Reply

447 Views
bpe
NXP Employee
NXP Employee

The observations described here are typical for problems with the interface
clocks. If the controller sees an irregularity on an input clock and takes
it as a false edge, the FIFO may lock. There is no way to detect  this
situation by reading registers because the hardware "trusts" the clocks
and has no means to detect deviations from the expected waveforms.
Another reason for Rx lockups can be improper  behaviour of Rx_DV,
e.g. spikes, incorrect transition timing, etc. The suggestion is
to make sure all MAC/PHY interface signals fully satisfy the requirements
of the processor Hardware Specification and the specification of the
MAC/PHY interface and are free of any irregularities.


Have a great day,
Platon

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos