Mark Butcher wrote:
Do you have any specific details about what is getting stuck?
Thanks for replying. Of course I will add detail. The driver and example code were generated with the online tool on 10 Feb 2019. It claims to be SDK 2.5.0. The target MCU is the MKE18F512xxx16. The firmware is the example named "twrke18f_flexcan_interrupt_transfer" that knows of only one rx message buffer #9, the hardware is a KE18 twr board, the CAN is a real (but @250kpbs) bus I've been using on my bench for years, and the CAN test tool that I break the KE18+drivers with is from Silverleaf. It generates arbitrary 8 data byte extended frames and has a button I can mouseclick somewhat quicker than 50ms to break things with :-)
The symptom of "stuck"iness is that the driver, on receiving a frame, happily keeps calling its callback with status kStatus_FLEXCAN_RxIdle if I give it at least 50ms between frames. But I can't give it 50ms between frames in the real world, it's not my call: It has to survive seeing back-to-back frames saturating the bus for an indeterminate length of time.
So when the interframe delay is too short for the driver, my rcv callback (lightly modified from the example in that I ripped out the unused xmit stuff and added printfs to dump status so I could see it freeze up), or the part to deal with, the final frame the callback receives arrives with status kStatus_FLEXCAN_RxOverflow and that's the last frame the driver will send me. That final frame is indeed correct insofar as I can tell but the RxOverflow condition was fatal. I can keep sending frames, the hardware appears to ack because my CAN tool isn't retrying them forever, but they're not being passed to the callback so I don't see 'em.
Attempting to glean meaning from the documentation :smileysad: is a challenge. Bless you for deciphering them, you deserve a medal for patience and fortitude, plus cash money from NXP because they saved a lot by failing to create usable documentation and resilient drivers for the 20 year old peripheral. The docs (such as they) are suggest that recovery is accomplished by reading the free running timer, reading the c/s word of another mailbox, reading the c/s word of the afflicted mailbox (warns not to do it, doesn't matter), writing empty (0x04) to the c/s word of the afflicted mailbox & says that will un-jam it (it doesn't), says nothing about reiniting the peripheral (that fixes it but it's a bit extreme).
It should not be possible to get an overrun since when the input is full it will not receive anything else (to protect what it already has in its buffer(s)). There are flags set to indicate that it happened (the flags can be cleared) but no further recovery should be needed.
What flags, the c/s status word, or are there other flags? I missed the other flags.
Thanks, Howard