Hello,
We've discovered that MCF5485 DMA, which we use to receive frames from FEC, under heavy ethernet rx load doesn't update next rx BD in the table sometimes. I.e. the frame is received, but BD status & len fields may remain unchanged. We use the latest MCD DMA API (v0.3 (2004-04-26)) from Freescale to work with DMA.
To demonstrate the issue - here is the example of the appropriate rx BDT area snapshots, which we see on entering into 3 consecutive rx interrupt handler calls:
- last rx (OK):
BD # | BD [ST & LEN] | BD [ADR] | Comments |
---|---|---|---|
143 | 0x18800252 | 0x19EF650 | Just received frame |
144 | 0x90000600 | 0x19EFE50 | Wait for the next frame here |
145 | 0x90000600 | 0x19F0650 |
- next rx ('missed' frame):
BD # | BD [ST & LEN] | BD [ADR] | Comments |
---|---|---|---|
143 | 0x08800252 | 0x19EF650 | Previously received and processed frame |
144 | 0x90000600 | 0x19EFE50 | Frame is actually rxed to [ADR], but [ST & LEN] wasn't updated !!! |
145 | 0x90000600 | 0x19F0650 |
- next rx (after miss):
BD # | BD [ST & LEN] | BD [ADR] | Comments |
---|---|---|---|
143 | 0x08800252 | 0x19EF650 | Previously received and processed frame |
144 | 0x90000600 | 0x19EFE50 | 'Missed' frame |
145 | 0x18800223 | 0x19F0650 | Just received frame (after the missed one) |
Observations are as follows:
- the DMA RX interrupt is happening on receiving this 'missed' frame,
- the 'missed' rx frame itself is actually written into the memory (into the buffer pointed by BD),
- no errors in FEC MIB counters detected in 'missed'-situations,
- 'missing' may happen at any location of BDT (we have 192 rx BD in the table),
- switching D-Cache OFF doesn't help (though we locate BDT & buffers in the noncacheable area),
- moving BDT to local SRAM (from external SDRAM) doesn't help,
- the fec driver, and the appropriate rx bd processing code was reviewed & rechecked for ~1000 times..
Any thoughts? Did anyone face with such FEC MCD behaviour?
Thanks in advance,
Yuri
Have you checked the Chip Errata? The FEC suffers from SECF064 (early silicon), SECF067, SECF010, SECF175.
You might be suffering from:
SECF175: Simultaneous FIFO Read/Write Results in Data Corruption
- Fix Plan: Will not be fixed
- Workaround: Use TCP/IP to recover from the data corruption (doesn't apply for your problem though)
Try and make sure your FIFO pointers are aligned if you can.
Where is the "BD" documented in the Manual? Is it in the FEC, DMA or FIFO section? From the Reference Manual I can't work out in my head how the FEC works at all. The FEC chapter says that the "31.4.2.1 Receive Frame Status Word (RFSW)" is in the FIFO after the data. But that isn't the BD (although some bits from there might end up in the BD).
I've seen various "simultaneous access" bugs in hardware before. I suspect that some other asynchronous operation like handling a Transmit interrupt (or some other DMA operation) is reading or writing a controller register at the same time that the hardware is about to try and update the BD and it all goes wrong. I'd suggest keeping an in-memory circular event log, timestamped to the microsecond (or better). I suspect you'll find the "DMA Rx Interrupt" that happens on the "missed frame" happening just after some other operation where your code read or wrote a FEC, FIFO or DMA register. Look for a "time correlation" and then try to make it worse (like reading a register 20 times instead of once).
If you can characterise the problem reliably enough to recognise it, you probably have enough information to confidently work around it. If you want to report it to Freescale you may need to provide software demonstrating the problem running on their evaluation board.
Search for "MCF5485 FEC" and go through some of the other reports. Note that the MCF5485 FEC looks to be derived from the MCP5200 chip, so it would be worth reading that chip's Manual, Errata and any posts on problems with that chip too.
https://community.freescale.com/message/20009
https://community.freescale.com/message/12666#12666
https://community.freescale.com/message/441783#441783
Tom