How can we prevent a receive fifo overrun on an MCF5235 FEC

Ahlan · ‎01-16-2014

We have an MCF5235 @ 150MHz connected to a Texas tlk100 transceiver using MMI

Everything works "mostly"

Occasionsally we lose an ethernet frame.

The OV bit is set in the receive buffer and the count of "receive fifo overuns" is incremented.

This happens even on 10Mb ethernet.

The receive buffer is in DRAM.

The fifo is at its default setting of 0x500 for receive (out of 0x600)

Has anyone any idea what we must do in order to prevent receive fifo overruns?

Best wishes,

Ahlan

Ahlan · ‎01-30-2014

Dear Tom,

Thank you for all your help and many suggestions.

Fortunately (at long last) I have been able to solve the problem.

It appears that if the CPU reads a BD at the same time as the FEC attempts to read the same BD, the FEC is locked out. Instead of retrying it sets the OV bit in the current BD (the one prior to the one it attempted to read) and discards the packet.

The clue to this behaviour was the fact that it was always the last BD that was flagged OV irrespective of how many BDs there were.

The problem in my code was that when we received a receive interrupt I looped through the BDs looking for BDs that were not empty and not flagged as having already been processed. In a similar manner, when buffers were freed, I looped through the BDs looking for a non-empty processed BD to reset to Empty. These loops always started with the first BD and worked forward. Therefore at some point, albeit very rarely, the FEC having just used the last BD would attempt to access the first BD in the ring at the same time as the CPU was looping through the BDs.

Because the first BD was always read as a starting point for these searches it was the most likely to clash with the FEC.

My solution is that instead of looping through the ring from start to finish, I now have two pointers. One pointing to the next BD expected to have data and another pointing to the BD next to be reused (ie set to Empty)

This pointer system works because the FEC does not search the BDs looking for the next free but has an internal pointer that it simply increments. My new algorithm takes this into account as it knows that it too only has to start searching from the point it left off.

Using this algorithm I no longer have any Fifo overruns.

Unfortunately I have been unable to find in the MCF5235 documentation any mention that the FEC and CPU might clash when reading BDs.

Once again thank you for all your help,

Best wishes,

Ahlan

View solution in original post

TomE · ‎01-16-2014

> How can we prevent a receive fifo overrun?

The simple answer is "don't connect it to a live Ethernet network".

What software drivers are you using? Are you using interrupts or polling (don't laugh, we used to do that)?

Check all of the FEC Errata items to make sure they're not causing your problems. The FEC has more problems at 10MHz than at 100MHz. If it is overflowing at 10MHz then it is more likely one of these bugs than a real "memory wasn't fast enough" overrun.

Check the Arbiter programming. Check the FEC has highest priority. See if M2_P_EN has any effect. Play around with other settings. Are your Descriptors in SRAM or DRAM? You should put them in SRAM if you can.

> The fifo is at its default setting of 0x500 for receive (out of 0x600)

What register values do you mean by that? I assume you have the EMRBR set to 0x600 (1536), but the FRSR defaults to 0x100 and a maximum value of 0x300. Do you mean "0x600 - 0x500 = 0x100", or is the "default setting" something in the software configuration"?

> The OV bit is set in the receive buffer

Do you mean in the FEC's Receive Buffer DESCRIPTOR or are you referring to a software data structure that is reporting the overrun?

What happens if you don't read and clear the Buffer Descriptors in time? Some Ethernet controllers can tell you the difference between "didn't get the memory in time to write the data" and the separate "didn't have a free buffer to write the data into". The FEC seems to set the OV bit for both of these cases. So your problem might have been that your code didn't remove and free buffers fast enough. You should check for this condition by tracking the number of free descriptors you have. You can always increase the number of descriptors and buffers t give more time. How many do you have currently?

You should also look for bottlenecks or stalls in your code that is preventing it from emptying the receive rings in time. Ideally the rings should get serviced under interrupts with the buffers put on a receive queue, and free buffers put back on the ring from a pool.

> Occasionsally we lose an ethernet frame.

I suspect that occasionally your software is doing something longwinded and time consuming and isn't allowing the rings to be services. Also, it is likely that occasionally there is a "packet storm" on your network from other devices generating lots of Broadcast or Multicast packets, and this spike in network traffic is overloading your device. We've had "Avahi Multicast Storms" here with PCs telling each other about printers every now and then. You should run Wireshark and see what your network traffic is like around the time when you lost a message.

Can your embedded code handle back-to-back small Ethernet packets? The maximum rate at 100MHz is 672 bits or one frame every 6.72us! You may not be able to service the interrupts that fast let alone do anything useful. At 10MHz they may come in at 67us. If you only have 8 descriptors you'll overrun in 600us of not freeing them.

What's wrong with losing messages? You should be using higher level protocols (TCP) that recover from this. You can lose Ethernet frames anywhere in the system.

Tom

Ahlan · ‎01-22-2014

>How can we prevent a receive fifo overrun?

Dear Tom – That you for detailed reply. :-)

>The simple answer is "don't connect it to a live Ethernet network".

I am not connected to a live Ethernet network but a closed network in which we have full control of the traffic.

>What software drivers are you using? Are you using interrupts or polling (don't laugh, we used to do that)?

We have written all the software – so any bugs are entirely mine. We use interrupts.

> Check all of the FEC Errata items to make sure they're not causing your problems. The FEC has more problems at 10MHz than at 100MHz. If it is overflowing at 10MHz then it is more likely one of these bugs than a real "memory wasn't fast enough" overrun.

I have checked the MCF5235 errata (rev 5 6/2011) which includes fixes and known problems concerning the FEC

>Check the Arbiter programming. Check the FEC has highest priority. See if M2_P_EN has any effect. Play around with other settings. Are your Descriptors in SRAM or DRAM? You should put them in SRAM if you can.

On the MCF5235 descriptors "must reside in memory external to the FEC" The FEC is on-chip so this seems to preclude my loading the descriptors in SRAM. When I try to load the descriptors in SRAM the processor freezes and then I can't even use the BDI interface.

> The fifo is at its default setting of 0x500 for receive (out of 0x600)

>What register values do you mean by that? I assume you have the EMRBR set to 0x600 (1536), but the FRSR defaults to 0x100 and a maximum value of 0x300. Do you mean "0x600 - 0x500 = 0x100", or is the "default setting" something in the software configuration"?

I meant that read only FRBR returns 0x600. This I understand to indicate how much FiFo the FEC has. FRSR defaults to 0x500 which is the start address of the receive Fifo therefore I assume that the receive fifo is 0x600-0x500=0x100. It appears that bit 10 is always one so the lowest value that can be set is 0x400 making the rec fifo 0x200. I tried doing this but then I had lots of Tx Fifo underruns.

> The OV bit is set in the receive buffer

>Do you mean in the FEC's Receive Buffer DESCRIPTOR or are you referring to a software data structure that is reporting the overrun?

I mean bit 1 in the Receive Buffer descriptor.

>What happens if you don't read and clear the Buffer Descriptors in time? Some Ethernet controllers can tell you the difference between "didn't get the memory in time to write the data" and the separate "didn't have a free buffer to write the data into". The FEC seems to set the OV bit for both of these cases. So your problem might have been that your code didn't remove and free buffers fast enough. You should check for this condition by tracking the number of free descriptors you have. You can always increase the number of descriptors and buffers t give more time. How many do you have currently?

There are sufficient buffers available. The descriptor that underruns is always the last descriptor in the ring. Ie it has bit 13 (W) set as well as OV. At this time the first descriptor in the ring is empty ie bit 15 (E) is set at the word pointed to by ERDSR

>You should also look for bottlenecks or stalls in your code that is preventing it from emptying the receive rings in time. Ideally the rings should get serviced under interrupts with the buffers put on a receive queue, and free buffers put back on the ring from a pool.

The ring is serviced by interrupts.

> Occasionally we lose an Ethernet frame.

> I suspect that occasionally your software is doing something longwinded and time consuming and isn't allowing the rings to be services. Also, it is likely that occasionally there is a "packet storm" on your network from other devices generating lots of Broadcast or Multicast packets, and this spike in network traffic is overloading your device. We've had "Avahi Multicast Storms" here with PCs telling each other about printers every now and then. You should run Wireshark and see what your network traffic is like around the time when you lost a message.

I don't think our software is busy doing anything longwinded that prevents the rings from being serviced. I find it odd that the overrun is always on the last descriptor of the ring irrespective of how many descriptors we have. There are no bursts of traffic.

>Can your embedded code handle back-to-back small Ethernet packets? The maximum rate at 100MHz is 672 bits or one frame every 6.72us! You may not be able to service the interrupts that fast let alone do anything useful. At 10MHz they may come in at 67us. If you only have 8 descriptors you'll overrun in 600us of not freeing them.

I haven't tried this but we have far more descriptors and buffers than we need and at the time of the overrun there are free buffers. The next descriptor in the ring (the first) is flagged as empty.

>What's wrong with losing messages? You should be using higher level protocols (TCP) that recover from this. You can lose Ethernet frames anywhere in the system.

Of course we can recover but I would rather not lose frames if it can be avoided. We wrote the code so unless it is a hardware problem for which there isn't a workaround then in principal we should be able to fix it.

Best wishes,

Ahlan

TomE · ‎01-23-2014

> On the MCF5235 descriptors "must reside in memory external to

> the FEC" The FEC is on-chip so this seems to preclude my

> loading the descriptors in SRAM.

I quote "proof by doing it the other way". We have a product that has the descriptor ring AND the buffers in SRAM, and it works fine. I just had to give the code a serious reworking as it polled the ring (and got hung up for 200ms sometimes), used 1600-byte buffers and only had EIGHT descriptors and buffers. I recoded it to have 64 ring entries and 256-byte buffers, and then handled the chained data buffers.

We do it that way so we don't have to handle all the cache-flushing needed if the rings or buffers are in cached memory. I then found our code had the data cache unintentionally DISABLED, so turned that on and it got seriously faster.

It has always been recommended (all the way back to the MPC860) to put the descriptors in Static RAM so they can be accessed faster by the hardware. A main memory access on these things takes 20-30 CPU clock cycles. Or over 80 on the i.MX53 I'm coding now.

> When I try to load the descriptors in SRAM the processor freezes

You're doing something wrong. Fix it. Have you remembered to enable the "Back Door"? This is in the DMA chapter but I think the same applies to FEC access.

The backdoor enable bit must be set in the SCM RAMBAR as well as

the secondary port valid bit in the Core RAMBAR in order to enable

backdoor accesses from the DMA to SRAM.

> The descriptor that underruns is always the last descriptor in the ring.

That's a clue. Ours doesn't do that. I'd suspect an end-of-ring or ring-wrap bug in your code or in the ring settings. It looks like you're not hitting the RDAR register when you freeing ring entries and the wrap happens, or something equivalent.

> Of course we can recover but I would rather not lose frames if it can be avoided.

Yes, agreed. I hate mysteries. You find other nasty things when you start turning the rocks over. You should make this bug optional for regression testing the rest of the protocol stack :-).

It is always useful looking at something that works to see if you're doing something different that may be wrong. Google for "mcf5235 EMRBR" and see what you find. You should also have a look at FNET to see if they handle this controller.

Tom

Ahlan · ‎01-23-2014

I quote "proof by doing it the other way". We have a product that has the descriptor ring AND the buffers in SRAM, and it works fine. I just had to give the code a serious reworking as it polled the ring (and got hung up for 200ms sometimes), used 1600-byte buffers and only had EIGHT descriptors and buffers. I recoded it to have 64 ring entries and 256-byte buffers, and then handled the chained data buffers.

I am pleased to hear that you have an MCF5235 FEC working – this means that I must be doing something wrong rather than it being a hardware fault. So all I need to do is find out what. ;-)

We do it that way so we don't have to handle all the cache-flushing needed if the rings or buffers are in cached memory. I then found our code had the data cache unintentionally DISABLED, so turned that on and it got seriously faster.

It has always been recommended (all the way back to the MPC860) to put the descriptors in Static RAM so they can be accessed faster by the hardware. A main memory access on these things takes 20-30 CPU clock cycles. Or over 80 on the i.MX53 I'm coding now.

I don't think it is a speed issue. As we get the same problem at 10Mbs.

> When I try to load the descriptors in SRAM the processor freezes
You're doing something wrong. Fix it. Have you remembered to enable the "Back Door"? This is in the DMA chapter but I think the same applies to FEC access.

The backdoor enable bit must be set in the SCM RAMBAR as well as the secondary port valid bit in the Core RAMBAR in order to enable backdoor accesses from the DMA to SRAM.

Thanks for that Tip! :-)

It is such a long time ago since we setup the SDRAM that I totally forgot that we had to enable BDE and SPV in their respective RAMBAR. Is there any reason why we shouldn't always have these enabled – is there a performance penalty? Ie Should we only set these if we are using the FEC with descriptors in SDRAM?

We now have the descriptors in SDRAM – unfortunately we still have exactly the same problem :-(

> The descriptor that underruns is always the last descriptor in the ring.

That's a clue. Ours doesn't do that. I'd suspect an end-of-ring or ring-wrap bug in your code or in the ring settings. It looks like you're not hitting the RDAR register when you freeing ring entries and the wrap happens, or something equivalent.

This is the weird thing. We always have the overruns on the last descriptor in the ring. Its control = 0x2802 and all the other descriptors are 0x8000. ie Empty!

If we had a timing problem then I am confident that the overrun would occur on other descriptors and not only the last one in the ring.

Every time I set the Empty bit in a descriptor I write to RDAR immediately and unconditionally afterwards.

I am at a loss as to what "end of ring" or ring-wrap bug I could have written.

Any ideas?

Ahlan

TomE · ‎01-23-2014

> > I then found our code had the data cache unintentionally DISABLED, so turned that on and it got seriously faster.

Some figures on that to let you know how important the cache is. We're logging data to flash and then later downloading that over Ethernet to a PC using UDP. The investigation was to get that download faster.

Initial measurements: 3.3MB/s (Ethernet theory limit is 12 MB/s at 100MHz)

Enable Data Cache: 5.3 MB/s

Enable RAM Bursting: 5.6 MB/s

Improve memcpy(): 6.7 MB/s (unrolled burst copy)

16 byte aligned destination memcpy(): 7.0 MB.s

Fix inefficient pthread code: 8.0 MB/s

Rewrite UDP Checksum in assembly: 9.0 MB.s

> I totally forgot that we had to enable BDE and SPV in their respective RAMBAR.

> Is there any reason why we shouldn't always have these enabled – is there a performance penalty?

None that I know of. They may be present to support "security" in some platforms to stop someone using DMA to read a block of memory holding passwords or something.

> We always have the overruns on the last descriptor in the ring. Its control = 0x2802 and all the other descriptors are 0x8000. ie Empty

We set our rings up as:

rx_nbuf[i].status = RX_BD_E; 0x8000

rx_nbuf[NUM_RXBDS - 1].status |= RX_BD_W; 0x8000 | 0x2000

0x2802 is RX_BD_W | RX_BD_L | RX_BD_OV, as expected.

Are your BDs aligned on a 16 byte boundary?

Do you have the Data Cache enabled? If so, how are you handling cache flush on Buffer and BD handover? If so, turn the data cache OFF and see if the problem goes away. Put the BDs in uncached (check for this) SRAM and see if it changes.

> Every time I set the Empty bit in a descriptor I write to RDAR immediately and unconditionally afterwards.

Set Empty, then FLUSH CACHE, then write to RDAR (unless not using cache in which case you should be, or put BDs in SRAM, but you still have to flush RX and TX Buffers).

> I am at a loss as to what "end of ring" or ring-wrap bug I could have written.

> Any ideas?

Well, I can't see your source code from here, and I don't want to. These sort of bugs are where the work gets really fun.

Keep staring at the code. Sleep on it. Take a long shower. Add lots of debug testing and logging. And mainly, look at someone else's code. Seriously. That one.

Tom

Ahlan · ‎01-27-2014

Dear Tom,

> > I then found our code had the data cache unintentionally DISABLED, so turned that on and it got seriously faster.

We found that we got the best performance for our code if we dedicated the whole of the cache to instructions. Ie We have the instruction cache on and data cache off.

MOVE.L #0x01400000, D0
MOVEC.L D0, CACR
NOP

-- Define Cacheable Range 0 .. 0FFFFFFH
MOVE.L #0x0000C000, D0
MOVEC.L D0, ACR0
MOVE.L #0x0, D0
MOVEC.L D0, ACR1

-- Enable Cache
MOVE.L #0x80400000, D0
MOVEC.L D0, CACR

>>We set our rings up as:
rx_nbuf[i].status = RX_BD_E; 0x8000
rx_nbuf[NUM_RXBDS - 1].status |= RX_BD_W; 0x8000 | 0x2000

So do we.

>>Are your BDs aligned on a 16 byte boundary?

The rings start on a 16 bit boundary as do all the individual buffers (the memory area pointed to by the BD)

>Do you have the Data Cache enabled? If so, how are you handling cache flush on Buffer and BD handover? If so, turn the data cache OFF and see if the problem goes away. Put the BDs in uncached (check for this) SRAM and see if it changes.

We deliberately run with Data Cache disabled. I don't understand what you mean by putting the BDs in uncached SRAM. The only memory we have is DRAM and the on-chip SDRAM which I thought bypassed the cache.

>Set Empty, then FLUSH CACHE, then write to RDAR (unless not using cache in which case you should be, or put BDs in SRAM, but you still have to flush RX and TX Buffers).

I thought that flushing the cache was a very expensive operation. However the BDs are in SDRAM as are the buffers so surely I don't have to worry about flushing any caches.

>Keep staring at the code. Sleep on it. Take a long shower. Add lots of debug testing and logging. And mainly, look at someone else's code. Seriously. That one.

After lots of staring and countless long showers I am squeaky clean but alas still none the wiser. Have you any recommendation as to whose code I should take a look at?

I still find it decidedly odd that it is always the last BD that has OV set.

Ahlan

TomE · ‎01-27-2014

> We found that we got the best performance for our code if we dedicated the whole of the cache to instructions.

You must have functions or code paths that are larger than half the cache. If performance/speed is a requirement it might be worth finding what code paths are "busting the cache" and tightening them up a bit.

>>Are your BDs aligned on a 16 byte boundary?

> The rings start on a 16 bit boundary as do all the individual buffers

Is that a typo? They're both meant to be 16 BYTE aligned according to "19.2.5.1 Driver/DMA Operation with Buffer Descriptors".

> I don't understand what you mean by putting the BDs in uncached SRAM. The only memory

> we have is DRAM and the on-chip SDRAM which I thought bypassed the cache.

The on-chip memory is SRAM (Static RAM) and not SDRAM (Synchronous Dynamic RAM). Yes, it bypasses the cache. I still recommend you put the Descriptors in there to see if anything changes.

Do you have the "big pads" on the end of all your buffers to handle the "write off the end of the buffer" bugs? It might be writing off the end and clobbering some other data structure in memory after the last buffer.

Move/reorder/pad you memory around and see if the problem goes away. Add "1" to the size of all allocations (for "n" BDs allocate "n+1" and so on) in case there's an off-by-one bug in the code.

> I thought that flushing the cache was a very expensive operation.

It depends. If you set a cache to write-through then you don't need to flush it as there's no write-back required. The MCF5235 is only write-through, so it doesn't need to be flushed on writes. It does have to be invalidated for READS. The Reference Manual tells you it takes 512 clocks to invalidate the entire cache, or you can use CPUSHL to invalidate single lines. The full invalidate only takes a couple of microseconds. You don't have it enables so it doesn't matter.

> Have you any recommendation as to whose code I should take a look at?

I answered that in my second post in this thread.

You should also sprinkle "magic volatile" keywords through your code in case the compiler is re-ordering some important operations. All your FEC Register definitions should be volatile and accessed through volatile pointers. Ditto the Buffer Descriptors. You could first recompile your code with the optimisation turned right down to see if it has any effect on this problem, and if that fixes it, start using volatiles and inspecting the assembly to see if you can see a reordering problem.

Tom

Ahlan · ‎01-28-2014

>You must have functions or code paths that are larger than half the cache. If performance/speed is a requirement it might be worth finding what code paths are "busting the cache" and tightening them up a bit.

We bust the cache because we have a lot of context switches. Code gets interrupted by a higher priority task and if the cache is small when it continues the code will have been displaced.

>Is that a typo? They're both meant to be 16 BYTE aligned according to "19.2.5.1 Driver/DMA Operation with Buffer Descriptors".

Yes that was indeed a typo. The BDs are on 16 byte boundaries.

I tried your idea of experimenting with M2_P_EN although I don't understand what this is. Is there anything to do other than set this bit? I also tried fixed arbitration and parking an the highest priority. I am a bit confused by the way MPARK is described in so much that certain bits are set on Reset that are documented as reserved and should be clear. They are supposed to be read as zero but aren't. If I write to MPARK should I set these to zero or leave them set? I tried setting MPARK to 0x32E15000 but this was ignored. Setting it to 0x2E05000 resulted in it being set to 0x2E15000. Not that it made the slightest bit of difference. Have you any experience with this register?

I also tried setting PRI0 and PRI1 in SRAM RAMBAR but again this seemed to have no effect.

>Do you have the "big pads" on the end of all your buffers to handle the "write off the end of the buffer" bugs? It might be writing off the end and clobbering some other data structure in memory after the last buffer.

> Move/reorder/pad you memory around and see if the problem goes away. Add "1" to the size of all allocations (for "n" BDs allocate "n+1" and so on) in case there's an off-by-one bug in the code.

A good idea. Although the buffers and BD have been moved from DRAM to SRAM but this and padding is certainly a good idea.

> Have you any recommendation as to whose code I should take a look at?
> I answered that in my second post in this thread.

I was looking for a specific recommendation. What little code I have found wasn't really very inspiring. Perhaps I'm not very good at searching for code.

>You should also sprinkle "magic volatile" keywords through your code in case the compiler is re-ordering some important operations. All your FEC Register definitions should be volatile and accessed through volatile pointers. Ditto the Buffer Descriptors. You could first recompile your code with the optimisation turned right down to see if it has any effect on this problem, and if that fixes it, start using volatiles and inspecting the assembly to see if you can see a reordering problem.

I have placed Voltaile all processor registers and the BDs. And because I don't entirely trust compilers I also checked the code as assembler.

Do you think the fault could be with the transceiver? We use the Texas TLK110. I wonder if the OV bit being set in the last BD doesn't mean something special. Is the FEC trying to tell me something and misusing the OV to tell me it. However it does correlate with IEEE_R_MACERR so perhaps not.

Ahlan

TomE · ‎01-28-2014

Going back to your original email:

> Occasionsally we lose an ethernet frame.

So I assume that isn't every time the code steps from the last to the first descriptor, but when it happens, that is where it happens.

So start trying to make it happen more often. Look for any correlation between "external" operations (network packets, traffic, something on the network) and internal (whatever else your code is doing).

Flood-ping the thing. That's always a good stress test. You my have a "controlled network" but if there's a PC on it then it will be sending weird packets every now and then as it tried to phone home to Redmond (or Russia or Fort Meade)..

I'd write some code to detect when an overflow has happened, and then FREEZE the buffers so you can see the received message stream. Then run Wireshark at the same time and when the overflow happens, look for the "frozen data" in the Wireshark capture, and see if there's any pattern in that or in the following frames. Repeat and correlate. Repeat and correlate.

Put some of the internal processes/tasks in tight loops, or increase the scheduling interrupt tick rate and see if the overflows are triggered by a particular task or interrupt.

Add code to LOG (with timestamp) every interrupt and context switch and then freeze that on every overrun. Repeat and correlate.

Add code to LOG and timestamp every Ethernet interrupt and how many free BDs there are at every one (make the code traverse the list and count them). Look for things that delay the interrupts for so long that you run out of BDs. Then find why the interrupts didn't happen.

> we have a lot of context switches.

So I'd then suspect a bug or hazard in the context switch code. Is the code correctly saving all the CPU registers and status on the switch? Is it locking out interrupts properly during the switch? Are you checking your maximum stack depth? What does the stack run into if it gets too deep?

Try adding IPL-7 interrupt disables around blocks of code in the Ethernet handler to see if that stops it from happening.

Are you using/abusing IPL7 interrupts anywhere? Are you using them for any peripheral interrupt? If you are, then try to get rid of them, use IPL6 instead. IPL7 is dangerous.

A continuing big problem for people using all MCF52xx CPUs (a problem fixed in the MCF53xx) is the absolute and MANUAL (the CPU won't help you here) requirement for absolutely unique LEVEL and PRIORITY levels for all interrupts. If you have any duplicates then things can go strangely wrong. Note you CAN have duplicate level/priority pairs as long as they're on different controllers, so CAN0 and CAN1 on separate controllers can have the same level/priority assignments for their buffers.

Are you using CAN? Each controller chews up 16 interrupt/priority pairs. Each TPU Interrupt has to be on a different level/priority. There are WAY more interrupts in this CPU than there are level/priority pairs. So if you're using CAN and/or the TPU and you aren't seriously running out of interrupts then you're not programming it properly.

(Edit) I see from your other posts that you are using both CAN and the TPU (or trying to assign the pins at least), so this limitation of the chip could be causing you problems.

'

Are you getting any spurious/illegal interrupts? Would you know if you were or does your code silently recover? Another poster in this forum had a similar problem:

MCF5235 Interrupt Vector 191

When "intelligent" investigation doesn't work, hit it with a stick. Move all the data blocks around. Add padding blocks. Flood-ping it. Change the compiler optimisation. Look for anything that makes it better or worse.

> I am a bit confused by the way MPARK is described

Do you know enough to know which underruns you're getting? I doubt if memory bandwidth is the problem unless you've got priorities badly wrong or some other DMA stealing the bus (have you?). it looks like you're running out of buffer descriptors. if the bus is parked on a device then it gets immediate access to the bus. If parked on something else, then it takes an extra clock to switch the bus. Unless you've got a very high DMA load from something it is better to park on the CPU. You might want to run FIXED instead or Round Robin though.

> I tried setting MPARK to 0x32E15000 but this was ignored.

That' should be an illegal value. There's something wrong with the manual though. It says the top 6 bits all read as zero and aren't writable, but default to "001100". Likewise bit 16 is "reserved read zero" but supposedly defaults to "1". The same chapter/values appear in the MCF5213 and MCF5271. That may be correct as they might be factory or diagnostic bits or something. Make sure you're only doing 32-bit reads and writes to these registers though, as Freescale have some misleading register definitions in their headers:

Re: PIT hw boo-boo. Read if you need accurate PIT (CF)

> I was looking for a specific recommendation.

I only know our code and I can't show you that. Have you looked at FNET?

> What little code I have found wasn't really very inspiring.

Who cares? It isn't there to be admired. If it works better that your code does it is a working example for you to find the significant difference.

> Do you think the fault could be with the transceiver?

Unlikely. Do you have different hardware? Do you have an Evaluation Board? Run your code on that.

I think you need to log events to an in-memory ring buffer with timestamps and then correlate with the overruns.

Tom

TomE · ‎01-28-2014

I wrote:

> That' should be an illegal value. There's something wrong with the manual though.

Sure is. I just happen to have an MCF5235 on my desk at the moment. The chip does not do what the manual says. But we should probably do as the manual says.

One thing that is almost working as expected is the part that says:

System software should guarantee that the programmed Mn_PRTY fields are

unique, otherwise the hardware defaults to the initial-state priorities.

Writing any illegal values (like binary 01010101) results in the default values in those fields.

The problem is that the register diagram only lists "M3_PRTY, M2_PRTY, M0_PRTY" without any "M1_PRTY". Note 3 says there is an "M1_PRTY" and it has to be set properly, so OBVIOUSLY those are bits 16 and 17,

Bits 16 and 17 are documented as "Read as zero, write as zero", but if you do that the register doesn't work at all unless you use 1, 2 and 3 in the other PRTY slots.

Of the top 6 "read as zero, reset to "001100" bits, the top two always read as zero while the next four depend on each other. You can only write certain values, but not others. They're some sort of undocumented command bits.

Bit 24 is meant to be "Read as zero, write as zero, initialise to zero", but it isn't. Sometimes it reads back as "1" and can't be changed, and other times it reads back as zero.

Bit 25 is read/write as well.

I don't mind "secret diagnostic bits" in registers as long as they read and write as zero and don't do anything weird like these ones do. If they don't behave as expected, they should be documented better.

What this means to us programmers is that we should always WRITE constant values to this register and not read/modify/write it, as that can result in writing values that don't match the documentation.

Tom

Keywords for when I want to find this with a search later: MCF5325 11.3.3 Bus Master Park Register (MPARK) MPARK M3PRTY M2PRTY M1PRTY M0PRTY PRKLAST

Ahlan · ‎01-30-2014

Dear Tom,

Thank you for all your help and many suggestions.

Fortunately (at long last) I have been able to solve the problem.

It appears that if the CPU reads a BD at the same time as the FEC attempts to read the same BD, the FEC is locked out. Instead of retrying it sets the OV bit in the current BD (the one prior to the one it attempted to read) and discards the packet.

The clue to this behaviour was the fact that it was always the last BD that was flagged OV irrespective of how many BDs there were.

The problem in my code was that when we received a receive interrupt I looped through the BDs looking for BDs that were not empty and not flagged as having already been processed. In a similar manner, when buffers were freed, I looped through the BDs looking for a non-empty processed BD to reset to Empty. These loops always started with the first BD and worked forward. Therefore at some point, albeit very rarely, the FEC having just used the last BD would attempt to access the first BD in the ring at the same time as the CPU was looping through the BDs.

Because the first BD was always read as a starting point for these searches it was the most likely to clash with the FEC.

My solution is that instead of looping through the ring from start to finish, I now have two pointers. One pointing to the next BD expected to have data and another pointing to the BD next to be reused (ie set to Empty)

This pointer system works because the FEC does not search the BDs looking for the next free but has an internal pointer that it simply increments. My new algorithm takes this into account as it knows that it too only has to start searching from the point it left off.

Using this algorithm I no longer have any Fifo overruns.

Unfortunately I have been unable to find in the MCF5235 documentation any mention that the FEC and CPU might clash when reading BDs.

Once again thank you for all your help,

Best wishes,

Ahlan

TomE · ‎01-30-2014

I'd like to say "congratulations on finding that problem", but I don't believe it. You may have made the problem go away by changing your code, but your explanation isn't the real fix.

> I have been unable to find in the MCF5235 documentation any mention

> that the FEC and CPU might clash when reading BDs.

That's because they can't.

You initially had the the buffers in external SDRAM, and that is certainly single-ported. The CPU gains bus access through the Bus Arbiter and performs a memory cycle. Before the FEC can read either a Buffer Descriptor or a Buffer, it has to go through the same thing. It has to gain exclusive access. There is no such thing as "attempting to read and failing". These are serious chips with a long history of professional and complicated bus arbitration.

The CPU is accessing SDRAM all of the time. There can be NO difference between the FEC wanting access to a BD and the CPU wanting access to that memory address, OR ANY OTHER MEMORY ADDRESS. The CPU is accessing memory all the time, so if there was any sort of "collision", the address wouldn't matter.

Unless you had the data cache enabled, in which you could have stale data problems, but you don't have it enabled.

So that can't be the problem.

Unless... If you have the Buffer Descriptors in the on-chip SRAM then it is possible that the CPU can be accessing it via its direct port while the FEC is doing likewise through the "Back Door". But the "any address" still applies. Also, if you read "Table 6-1. RAMBAR Field Descriptions" you'll find Freescale thought about this. You can select which device (CPU or FEC) has priority in the upper and lower 32k banks of the SRAM. The lower priority device WAITS until the other one has finished. So it can't "collide" there either.

Freescale have got this wrong when they've bought in some hardware designs that don't follow their own well-established practices. For instance, the LCD Controller in the MCF5329 has a "read to clear" interrupt status register. If this is read at the same time that the hardware is trying to set a new status bit and that access "collides", then the chip update fails and you get missing interrupts. But that's a shared register between a state machine and the CPU and not the arbitrated-for memory where the BDs are. Since it was bought in it didn't follow the standard "write one to clear" and wasn't changed to match. Details on that here:

https://community.freescale.com/message/85870#85870

One way it could go wrong would be if your "looping through the BDs" code was not just READING them, but was WRITING them at the same time. Specifically, if you temporarily wrote the BD with the "E" bit clear during that loop it would cause this problem as the FEC would read the BD, see the "E" bit clear and then overflow. It has lost the packet and it doesn't look again until you hit RDAR. Were you doing that as part of your "flag as processed" handling?

Your code changes are valid. That's the normal way to do it. You have your pointers starting from the last one, and looping "forward" until it has processed all of the non-empty BDs. And you don't touch them at all between the time you write to RDAR and when you get a receive interrupt. Or you can, but you probably shouldn't.

> flagged as having already been processed

Flagged? Are you storing this "flag" in a separate bit in the BD? I'd hope RO1 or RO2? You shouldn't need this as you should only need one pointer (or array index) and the "E" bit.

I'll warn you of another thing people get wrong. There's a simple and logical way to handle interrupts in most Motorola/Freescale CPUs. It is quite simple and magic. A lot of other manufacturers have got this wrong, and people who are familiar with these broken solutions tend to write broken interrupt code on the Coldfire parts. I'm not saying you've done this (yet :-), and this probably isn't your problem, but it may cause you other problems, so. It may also help others reading this.

A broken CPU design that has you read an interrupt status register and then write it back with a bit cleared to ack that interrupt causes all sorts of problems. The controller might have set another bit between your read and write, so you can then lose that bit - unless they add some complicated "we won't clear bits we've set between your read and write" which is opaque and magic, goes wrong and doesn't allow multiple readers or writers, or the read/write to be interrupted. "Read to clear" is also a pain as you have to remember all the set bits even if you don't want to handle them then. And if the register has interrupt bits from different peripherals (I'm looking at you, Zilog) then it gets horrible.

The Freescale way is "write a ONE to a bit to clear it". This is NOT clear in the FEC chapter. The EIR bits are marked as "RW" when they should be marked as "R W1C" as is done in manuals for other products. You have to have closely read the first paragraph in "19.2.4.1 Ethernet Interrupt Event Register (EIR)" to spot this if you weren't expecting it.

So your FEC interrupt routine should be coded as:

uint32_t my_eir = EIR; /* Read all status bits */

EIR = my_eir; /* Clear all the SET ones we've just read *FIRST* */

/* NOW handle the set bits we've just read. */

if (my_eir & EIR_RXF) handle_rxf();

if (my_eir & EIR_TXF) handle_txf();

if (my_eir & EIR_ERROR_BITS) handle_errors();

And so on for any other bits you're interested in.

You don't clear the interrupt bits AFTER servicing them as that causes a race condition that will miss them occasionally..

You can always write individual "one" bits back to EIR just before the individual service routines, but the above example is better.

The same applies to almost all interrupt service routines on these chips (except the MCF5329 LCDC one).

Tom

Ahlan · ‎01-30-2014

Unless... If you have the Buffer Descriptors in the on-chip SRAM then it is possible that the CPU can be accessing it via its direct port while the FEC is doing likewise through the "Back Door".

The BDs are in on-chip SRAM

But the "any address" still applies. Also, if you read "Table 6-1. RAMBAR Field Descriptions" you'll find Freescale thought about this. You can select which device (CPU or FEC) has priority in the upper and lower 32k banks of the SRAM. The lower priority device WAITS until the other one has finished. So it can't "collide" there either.

I tried giving the FEC priority by setting RAMBAR to BA << 16 + 0xE01 however this did not cure the problem.

One way it could go wrong would be if your "looping through the BDs" code was not just READING them, but was WRITING them at the same time. Specifically, if you temporarily wrote the BD with the "E" bit clear during that loop it would cause this problem as the FEC would read the BD, see the "E" bit clear and then overflow. It has lost the packet and it doesn't look again until you hit RDAR. Were you doing that as part of your "flag as processed" handling?

The loops are searching, ie only reading.

Flagged? Are you storing this "flag" in a separate bit in the BD? I'd hope RO1 or RO2? You shouldn't need this as you should only need one pointer (or array index) and the "E" bit.

The Flag is RO1 and is used as a sanity check.

I'll warn you of another thing people get wrong. There's a simple and logical way to handle interrupts in most Motorola/Freescale CPUs. It is quite simple and magic…

Thanks for warning us but we know about this. The first thing we do in our interrupt routine is write to EIR to clear the interrupt. We don't have to read EIR because the MCF5235 has separate vector for each interrupt source.

Best wishes,

Ahlan

TomE · ‎01-30-2014

>> But the "any address" still applies.

By that I mean you still can't have a "collision" on the BD itself.

It is possible that since you're not running the CPU in the normal expected way (with the Data Cache enabled), that with the right/wrong code the CPU can keep the data buses so busy that the FEC is getting starved and is really under-running. Your code that was looping through the BDs may havebeen this "worst case bus use" sort of code.

If that is the case, then that was a specific function that reliably caused this problem, but there could be other parts of your code that run less often that could cause the same problem. Memory copies, bus copies, Ethernet Checksum calculation and so on.

You could test this by writing some short functions that loop reading a block of SRAM or SDRAM to see if running these makes the overrun problem worse. You could run these in a low priority thread so they do that when there's nothing else to do. Then you'd start changing things to try and fix this.

What clock rate is the CPU running at? Do you have it running at its normal maximum speed or are you running it slower?

Do you have 32-bit wide SDRAM or 16 bit wide? The latter would double the bus loading.

Are you running with the debugger connected? That can sometimes have an effect by slowing the CPU down.

> I tried giving the FEC priority by setting RAMBAR to BA << 16 + 0xE01 however this did not cure the problem.

As you're only using it for Data you should check:

Table 6-1. RAMBAR Field Descriptions

Address space masks (ASn)

This section refers to:

6.2.4 Power Management

Data Only: RAMBAR[7:0] 0x35

It is possible that the SRAM is being "hit" for every CPU instruction access if you don't have the other cycle types masked. it is possible (but I don't know) that masking this properly for Data Only may free it it so the FEC can get to it more often.

I'd also try some more MPARK options, specifically fixed priority instead of round-robin.

Tom

How can we prevent a receive fifo overrun on an MCF5235 FEC

How can we prevent a receive fifo overrun on an MCF5235 FEC

General