Recovering from FLEXCAN_RxOverflow

howardg · ‎03-20-2019

With the virgin, unmodified twrke18f_flexcan_interrupt_transfer example and a KE18 twr board I can easily overflow the CAN receiver but I cannot figure out how to recover the receiver short of doing a full reinit of the peripheral. Overflowing (overrun is more accurate but I'm using the terminology from the awful documentation) is expected and I'd prefer to recover the receiver with something less drastic than a full peripheral reset, but that's how I'm going to have to deploy the silly thing in the absence of a solution.

It's not clear why the example code doesn't anticipate error recovery, but it doesn't. Can some kind soul please offer a patch for the example code or point me to a better example or (heaven forbid) a solution or (no way) documentation copy edited by someone with English fluency?

I wish my hardware guys would move off of the Kinetis parts. I'm tempted to toss the boards onto their desks and make dealing with horrible documentation, half-baked examples, bloated, slow [0], constantly changing dev environment, and lousy drivers their problem.

[0] A Xeon @ 2.7Ghz w/ 32gb, almost fully unloaded, but MCUX crawls.

mjbcswitzerland · ‎03-20-2019

Hi

It should not be possible to get an overrun since when the input is full it will not receive anything else (to protect what it already has in its buffer(s)). There are flags set to indicate that it happened (the flags can be cleared) but no further recovery should be needed.

I was tempted to point you to the uTasker project (we now have a turn-key CANopen solution integrated for Kinetis) and CAN simulation
- http://www.utasker.com/docs/uTasker/uTaskerCAN.PDF

- https://www.youtube.com/watch?v=Ha8cv_XEvco

but I realised it hasn't been used on the KE18F yet (although it has been prepared for it).
Its FlexCAN driver supports Coldfire (since 2006) and Kinetis (since 2011) and I haven't heard of any issues with receptions so I expect that the KE18 wouldn't be any different (its FlexCAN module is identical to other Kinetis parts apart from its clocking).

Do you have any specific details about what is getting stuck?

Regards

Mark

Complete Kinetis solutions, training and support:http://www.utasker.com/kinetis.html
Kinetis KE1x:
- http://www.utasker.com/kinetis/FRDM-KE15Z.html
- http://www.utasker.com/kinetis/TWR-KE18F.html

View solution in original post

mjbcswitzerland · ‎03-20-2019

Hi

It should not be possible to get an overrun since when the input is full it will not receive anything else (to protect what it already has in its buffer(s)). There are flags set to indicate that it happened (the flags can be cleared) but no further recovery should be needed.

I was tempted to point you to the uTasker project (we now have a turn-key CANopen solution integrated for Kinetis) and CAN simulation
- http://www.utasker.com/docs/uTasker/uTaskerCAN.PDF

- https://www.youtube.com/watch?v=Ha8cv_XEvco

but I realised it hasn't been used on the KE18F yet (although it has been prepared for it).
Its FlexCAN driver supports Coldfire (since 2006) and Kinetis (since 2011) and I haven't heard of any issues with receptions so I expect that the KE18 wouldn't be any different (its FlexCAN module is identical to other Kinetis parts apart from its clocking).

Do you have any specific details about what is getting stuck?

Regards

Mark

Complete Kinetis solutions, training and support:http://www.utasker.com/kinetis.html
Kinetis KE1x:
- http://www.utasker.com/kinetis/FRDM-KE15Z.html
- http://www.utasker.com/kinetis/TWR-KE18F.html

howardg · ‎03-21-2019

Thank you Mark, your reply+my reply contained the solution. The overflowing frame was being received correctly even though it was received with status RxOverflow, and the bug was as you predicted on my/the example code's end. Both my code and the example treated RxOverflow as requiring some sort of recovery when it recovers on its own, hands off. Treating RxOverflow as an exception instead of something to (possibly) note and move on from will cause no end of grief for those adhering to the example code.

Howard

howardg · ‎03-21-2019

Mark Butcher wrote:

Do you have any specific details about what is getting stuck?

Thanks for replying. Of course I will add detail. The driver and example code were generated with the online tool on 10 Feb 2019. It claims to be SDK 2.5.0. The target MCU is the MKE18F512xxx16. The firmware is the example named "twrke18f_flexcan_interrupt_transfer" that knows of only one rx message buffer #9, the hardware is a KE18 twr board, the CAN is a real (but @250kpbs) bus I've been using on my bench for years, and the CAN test tool that I break the KE18+drivers with is from Silverleaf. It generates arbitrary 8 data byte extended frames and has a button I can mouseclick somewhat quicker than 50ms to break things with :-)

The symptom of "stuck"iness is that the driver, on receiving a frame, happily keeps calling its callback with status kStatus_FLEXCAN_RxIdle if I give it at least 50ms between frames. But I can't give it 50ms between frames in the real world, it's not my call: It has to survive seeing back-to-back frames saturating the bus for an indeterminate length of time.

So when the interframe delay is too short for the driver, my rcv callback (lightly modified from the example in that I ripped out the unused xmit stuff and added printfs to dump status so I could see it freeze up), or the part to deal with, the final frame the callback receives arrives with status kStatus_FLEXCAN_RxOverflow and that's the last frame the driver will send me. That final frame is indeed correct insofar as I can tell but the RxOverflow condition was fatal. I can keep sending frames, the hardware appears to ack because my CAN tool isn't retrying them forever, but they're not being passed to the callback so I don't see 'em.

Attempting to glean meaning from the documentation :smileysad: is a challenge. Bless you for deciphering them, you deserve a medal for patience and fortitude, plus cash money from NXP because they saved a lot by failing to create usable documentation and resilient drivers for the 20 year old peripheral. The docs (such as they) are suggest that recovery is accomplished by reading the free running timer, reading the c/s word of another mailbox, reading the c/s word of the afflicted mailbox (warns not to do it, doesn't matter), writing empty (0x04) to the c/s word of the afflicted mailbox & says that will un-jam it (it doesn't), says nothing about reiniting the peripheral (that fixes it but it's a bit extreme).

It should not be possible to get an overrun since when the input is full it will not receive anything else (to protect what it already has in its buffer(s)). There are flags set to indicate that it happened (the flags can be cleared) but no further recovery should be needed.

What flags, the c/s status word, or are there other flags? I missed the other flags.

Thanks, Howard

Recovering from FLEXCAN_RxOverflow

Recovering from FLEXCAN_RxOverflow

Freedom Development Platform

Kinetis K Series MCUs