Why might MCF5235 Flexcan loose remote frames?

Ahlan · ‎08-26-2013

We program a message buffer to respond to a remote frame as described in 21.4.6.1 of the MCF5235 reference manual.

If this is the only use of the FlexCan then it appears to work perfectly.

If the program uses other MBs to send and recieve messages then occasionally remote frames are lost.

However the MB is still setup correctly and is active because if the remote frame is retransmitted (after a delay of 30ms) the expected response is transmitted.

This suggests that we are doing something to the FlexCAN that somehow causes it to loose the remote frame.

The MB setup to respond to the remote frame is not accessed.

From my reading the reference manual I do not understand how we can cause the remote frame to be lost if we do not access the MB, We do not seem to lose any other frames, just remote frames and then only rarely.

Can anyone gives us a clue as to what we might be doing wrong?

Best wishes,

Ahlan

Ahlan · ‎08-28-2013

I wrote that "The MB setup to respond to the remote frame is not accessed"

By this I mean that we do not access the MB control/status word, which is the only thing I can think of that might cause the remote frame to be dropped.

I am sure of this because I programmed the debug module to raise interrupt 12 if the word in question was accessed.

(AATR=0xFF00, ABLR=ADR(_Control),ABHR=ADR(_Control)+1,TDR={TRC1,EBL,EAR})

I still have no clue as to what I am doing wrong?

Does nobody use remote frames?

Ahlan.

TomE · ‎08-29-2013

Here's another suggestion.

Find as many examples of FLexCAN code as you can, and see if you can spot any difference between the way they're accessing the control registers versus the way you're doing it.

One place to start:

Linux/drivers/net/can/flexcan.c - Linux Cross Reference - Free Electrons

But be warned, that's a HORRIBLE driver. It delays reading the CAN messages until a soft interrupt (keyword NAPI) and thus drops messages on high data rate buses. It also only supports FlexCAN hardware that has the FIFO, which your one doesn't. But the point here is to look for how the registers are accessed and in what order.

Tom

Ahlan · ‎09-02-2013

I don't need to look at how others have programmed the remote frame as this, by itself, works perfectly. I simply set up the reply data and set the MB code to 1010 and then leave it to automatically reply to every incoming remote frame request. My code does not access the MB control/status register thereafter. I have verifed this using the debug module. If the Flexcan is not used for anything else then every remote frame request gets a reply.

Unfortunately if I move the code into something that uses the FlexCAN then, after a long time, a remote frame request will be missed. But only temporarily, the MB is still correctly programmed because the next remote frame request will solicit a response.

I therefore think that I am doing something to the FlexCAN that somehow affects the MB setup for remote frames. However I am not doing anything to the MB itself and that is what stumps me. What can I possibly be doing that would cause a MB to miss an incoming frame other than by accessing the control status word of the MB.

Either the FlexCAN on the MCF5235 has a bug or there is way of causing FlexCAN to miss frames.

I wonder if anyone at Freescale or within the community has experienced similar problems or can suggest what it is that I am doing wrong.

With the exception of this problem our FlexCAN implementation works reliably even under heavy load.

Our current workaround is simply to retransmit the remote frame request but I am not happy with this solution and would prefer to understand what is causing the problem.

Ahlan

TomE · ‎09-03-2013

> My code does not access the MB control/status register thereafter.

Bit it is accessing other registers. And it is those accesses that are causing the remote frames to be missed.

That's why I suggested:

>> Find as many examples of FLexCAN code as you can, and see

>> if you can spot any difference between the way they're accessing

>> the control registers versus the way you're doing it.

I'm talking about the way your ACKing the interrupts, what you're doing with the control and status registers and how you've got all the other registers and message buffers set up.

If you're doing anything markedly different to the other code example, change it to the way they do it and see if it gets better.

If that doesn't work quickly, it is time to make it a lot WORSE. Set up a test where the remote frames happen a lot more often, and then start upping the frequency of all your other operations. See if it a specific transmit or receive that is triggering the loss. That should get you a simple test case you can make "fail at will". That should then be simple enough to spot a pattern.

I suspect that you're somehow programming a message buffer that looks like an "alias" of the incoming remote and is taking the match.

Tom

Ahlan · ‎10-15-2013

Basically we program the FlexCAN to respond to an incoming RTR.

We program the MB with code 01010 and that is all.

Not interupts, no other access to the Flexcan.

If left alone the FlexCAN will respond to each and every RTR without problem.

However in reality this isn't very useful.

In our applications we use one byte of data and periodically toggle the most significant bit to show that the application is still working.

Unfortunately if we occasionally write to the MB data, FlexCAN appears to drop (ignore) the RTR.

Ie The RTR is transmitted on the CAN bus but the FlexCAN does not respond.
However the MB is correctly programmed because if another RTR is sent, FlexCAN responds by transmitting the expected response.

I understand that if we read the MB control/status word of a FlexCAN MB then the MB is locked and therefore will not participate in the arbitration process until it is unlocked by either locking another MB or reading the running timer.

However we are NOT accessing the MB control word. We are simply writing to data byte 0.

Have you any idea why this might cause FlexCAN to ignore an incoming RTR for the MB?
I can't find any reference in the MCF5235 RM describing this effect.

If it is a restriction then how can we asynchronously modify the data of a MB programmed to respond to an RTR without the possibility that an RTR will be missed?

Hoping someone can help us resolve this isse.

Ahlan

TomE · ‎10-15-2013

> Flexcan loose remote frames.

Because you have to tighten up those loose frames :-)

> However we are NOT accessing the MB control word. We are simply writing to data byte 0.

You are writing to a shared resource.

You're doing so in a way that violates some basic design assumptions, and is not following the Reference Manual.

Which are that the whole message buffer is either owned by the hardware, and you're not allowed to touch any of it, or it is owned by you and isn't in use by the hardware.

This sort of thing happens all the time when there's a register that the user and hardware have to access.

The Silicon designers have to handle this by making the location dual-ported so it can support simultaneous access, or there has to be some "arbitration hardware" that decides who gets it.

Or in many cases, like this CAN controller and parts of Ethernet controllers, then the "lock" for part of the hardware is assumed to be set by some other operation. In this case, writing the Control/Status word.

Sometimes the hardware designer gets it very wrong and doesn't provide a workaround. For instance in the LCD Controller in the MCF5329, there an "MCF_LCDC_LISR" register that reports the current interrupts. It is a read-to-clear register, and it is only meant to be read immediately after an interrupt has happened. I was polling it to see when the next interrupt happened, and it didn't work sometimes. If the chip is trying to set a bit in that register on the same clock cycle that it is being read, the "read" wins and the bit-set fails completely.

TheMCF5235 Reference Manual states:

21.4.1 Transmit Process

The CPU prepares or changes an MB for transmission by executing the following steps:

1. Writing the control/status word to hold Tx MB inactive (CODE = 1000).

2. Writing the ID word.

3. Writing the data bytes.

4. Writing the control/status word (active CODE, LENGTH).

NOTE

The first and last steps are mandatory!

So you're not allowed to just "change a byte" when the MB is owned by the hardware. You have to force it inactive first.

I was worried that the "inactive/write/active" sequence might make the buffer miss a match (like you're getting now), but this seems to be handled properly. The following section says a message will "wait" until an MB is unlocked:

21.4.4 Matching Process

If the last matching MB is locked, then the new message

remains in the SMB, waiting for the MB to be unlocked (see Section 21.4.5.3, “Locking and

Releasing Message Buffers”).

If you still have problems, it might be possible to have TWO matching MBs (with the alternate data in them) and then "ping-pong" their activations, havving two of them enabled during the overlap. But it looks like "Inactivate/Write/Activate" should work.

Tom

Ahlan · ‎11-05-2013

Dear Tom,

Thanks for your help and explanation of how the FlexCan probably works on the MCF5235.

However...

Originally we did exactly what the RM tells us to do in 21.4.1

However when we do that we lose RTR requests.

We assumed that this was because of step 1.

Ie That making the MB inactive would remove it from the matching process.

In table 21-14 the code 1000 is described as inactive, buffer is not ready for transmit and will participate in the arbitration process.

However we assumed that was a typo and what they meant to write was "does not participate"

Our experience is that the"Deactivate/Write/Activate" does cause us to miss a match.

Ie. Is not handled properly by the hardware. 21.4.4 takls about locking rather than making the MB inactive.

If we skip step 1 and don't make the MB inactive then we lose a lot less RTR requests, ie the situation is much improved but is not perfect.

Inspired by your description of dual porting and having to instruct the hardware as to who has control of a shared resource

we modifed the code to read the control word rather than setting it to Transmit inactive (1000)

I hoped that this would lock the MB and that incoming RTRs would be held in the SMB waiting for the MB to be unlocked (as described in 21.4.4)

Unfortunately this had no effect at all.

So the upshot of all this is that we still have the problem.

Any other ideas?

Best wishes,

Ahlan

TomE · ‎11-05-2013

> Any other ideas?

Submit a Technical Request. Prepare to be disappointed with the initial "pattern-match" responses, but keep trying.

As well, go through your local Sales/Technical rep. We've had very good experience with ours. They can get through to people at Freescale who can get to the technical department.

Try programming TWO filters as RTRs and change one at a time.

Accept that RTRs simply don't work and use some other mechanism for your higher level protocol.

Accept that (this completely breaks the whole purpose of CAN of course) that the RTR responses are unreliable and add retries to the other end so it can handle missing a response and goes for "two out of three" or whatever.

Tom

TomE · ‎08-28-2013

> Does nobody use remote frames?

That I can answer! At least 24 people do. Have you SEARCHED for an answer before posting? Type "remote frame" into the search box and see if any of the previous 24 threads that show up match your problem. Then ask Google

Tom

Ahlan · ‎09-02-2013

Dear Tom,

If I search "remote frame" in inverted commas (to avoid results for other types of frame (eg Ethernet)) there are only eleven threads and one of them is mine. Unfortunately none of the others deal with the issue I am asking help on. :-(

A Google search finds advice against using remote frames as some manufacturers have not implemented the feature correctly. Although these faulty implementations are normally concerned with how to interpret the data length of the remote frame it is not inconceivable that Freescale have also not implement the feature correctly, albeit in a different way.

One problem is that not many people seem to use the Remote Frame feature - possibly because of the faulty implementations.

Why might MCF5235 Flexcan loose remote frames?

Why might MCF5235 Flexcan loose remote frames?

General