I'd like to say "congratulations on finding that problem", but I don't believe it. You may have made the problem go away by changing your code, but your explanation isn't the real fix.
> I have been unable to find in the MCF5235 documentation any mention
> that the FEC and CPU might clash when reading BDs.
That's because they can't.
You initially had the the buffers in external SDRAM, and that is certainly single-ported. The CPU gains bus access through the Bus Arbiter and performs a memory cycle. Before the FEC can read either a Buffer Descriptor or a Buffer, it has to go through the same thing. It has to gain exclusive access. There is no such thing as "attempting to read and failing". These are serious chips with a long history of professional and complicated bus arbitration.
The CPU is accessing SDRAM all of the time. There can be NO difference between the FEC wanting access to a BD and the CPU wanting access to that memory address, OR ANY OTHER MEMORY ADDRESS. The CPU is accessing memory all the time, so if there was any sort of "collision", the address wouldn't matter.
Unless you had the data cache enabled, in which you could have stale data problems, but you don't have it enabled.
So that can't be the problem.
Unless... If you have the Buffer Descriptors in the on-chip SRAM then it is possible that the CPU can be accessing it via its direct port while the FEC is doing likewise through the "Back Door". But the "any address" still applies. Also, if you read "Table 6-1. RAMBAR Field Descriptions" you'll find Freescale thought about this. You can select which device (CPU or FEC) has priority in the upper and lower 32k banks of the SRAM. The lower priority device WAITS until the other one has finished. So it can't "collide" there either.
Freescale have got this wrong when they've bought in some hardware designs that don't follow their own well-established practices. For instance, the LCD Controller in the MCF5329 has a "read to clear" interrupt status register. If this is read at the same time that the hardware is trying to set a new status bit and that access "collides", then the chip update fails and you get missing interrupts. But that's a shared register between a state machine and the CPU and not the arbitrated-for memory where the BDs are. Since it was bought in it didn't follow the standard "write one to clear" and wasn't changed to match. Details on that here:
https://community.freescale.com/message/85870#85870
One way it could go wrong would be if your "looping through the BDs" code was not just READING them, but was WRITING them at the same time. Specifically, if you temporarily wrote the BD with the "E" bit clear during that loop it would cause this problem as the FEC would read the BD, see the "E" bit clear and then overflow. It has lost the packet and it doesn't look again until you hit RDAR. Were you doing that as part of your "flag as processed" handling?
Your code changes are valid. That's the normal way to do it. You have your pointers starting from the last one, and looping "forward" until it has processed all of the non-empty BDs. And you don't touch them at all between the time you write to RDAR and when you get a receive interrupt. Or you can, but you probably shouldn't.
> flagged as having already been processed
Flagged? Are you storing this "flag" in a separate bit in the BD? I'd hope RO1 or RO2? You shouldn't need this as you should only need one pointer (or array index) and the "E" bit.
I'll warn you of another thing people get wrong. There's a simple and logical way to handle interrupts in most Motorola/Freescale CPUs. It is quite simple and magic. A lot of other manufacturers have got this wrong, and people who are familiar with these broken solutions tend to write broken interrupt code on the Coldfire parts. I'm not saying you've done this (yet :-), and this probably isn't your problem, but it may cause you other problems, so. It may also help others reading this.
A broken CPU design that has you read an interrupt status register and then write it back with a bit cleared to ack that interrupt causes all sorts of problems. The controller might have set another bit between your read and write, so you can then lose that bit - unless they add some complicated "we won't clear bits we've set between your read and write" which is opaque and magic, goes wrong and doesn't allow multiple readers or writers, or the read/write to be interrupted. "Read to clear" is also a pain as you have to remember all the set bits even if you don't want to handle them then. And if the register has interrupt bits from different peripherals (I'm looking at you, Zilog) then it gets horrible.
The Freescale way is "write a ONE to a bit to clear it". This is NOT clear in the FEC chapter. The EIR bits are marked as "RW" when they should be marked as "R W1C" as is done in manuals for other products. You have to have closely read the first paragraph in "19.2.4.1 Ethernet Interrupt Event Register (EIR)" to spot this if you weren't expecting it.
So your FEC interrupt routine should be coded as:
uint32_t my_eir = EIR; /* Read all status bits */
EIR = my_eir; /* Clear all the SET ones we've just read *FIRST* */
/* NOW handle the set bits we've just read. */
if (my_eir & EIR_RXF) handle_rxf();
if (my_eir & EIR_TXF) handle_txf();
if (my_eir & EIR_ERROR_BITS) handle_errors();
And so on for any other bits you're interested in.
You don't clear the interrupt bits AFTER servicing them as that causes a race condition that will miss them occasionally..
You can always write individual "one" bits back to EIR just before the individual service routines, but the above example is better.
The same applies to almost all interrupt service routines on these chips (except the MCF5329 LCDC one).
Tom