AnsweredAssumed Answered

i.MX53: Linux 2.6.35 FlexCAN Fifo Handling Bug causing Delayed Receives.

Question asked by TomE on Nov 30, 2015
Latest reply on Dec 9, 2015 by TomE

We have a working system running Linux 3.4 on an i.MX53.


We've had to go back to 2.6.35 as that's the only OS that supports the video-in hardware on the chip.


Now I'm trying to get the rest of the hardware working as reliably as it does on Kernel 3.4.


The 2.6.35 FlexCAN driver defaults to using 32 receive MBs and 32 transmit MBs. This may work for very simple ID-based messaging, but anything that needs to send a stream of data over CAN (booting, flashing, configuring, debugging) requires the messages be received in the same order they are sent. The default setup can't do that. This has been covered before:


To send in the right order the driver can be told to use one TX MB. To receive in the right order it can be told to use the FIFO.


I tried that, and then the FIFO got in a very funny state. It seemed to be keeping 3 or 4 messages internally, and it would only signal one was available when the next one arrived. So it was behaving like the input and output pointers got misaligned.


Checking the manual, it says the FIFO has to be handled by:


i.MX53 Reference Manual

34.4.7 Rx FIFO

Upon receiving the interrupt, the ARM must read the frame
(accessing an message buffer in the 0x80 address) and then clear the
interrupt. The act of clearing the interrupt triggers
the FIFO engine to replace the message buffer in 0x80 with the next
frame in the queue, and then issue another interrupt to the ARM.


The Linux 3.4 FlexCAN FIFO driver obeys those rules (edited down to show the important instructions):


static void flexcan_read_fifo(const struct net_device *dev,
                  struct can_frame *cf) {
    reg_ctrl = flexcan_read(&mb->can_ctrl);
    reg_id = flexcan_read(&mb->can_id);
    *(__be32 *)(cf->data + 0) = cpu_to_be32(flexcan_read(&mb->data[0]));
    *(__be32 *)(cf->data + 4) = cpu_to_be32(flexcan_read(&mb->data[1]));
    /* mark as read */
    flexcan_write(FLEXCAN_IFLAG_RX_FIFO_AVAILABLE, &regs->iflag1);


The Linux 2.6.35 FIFO driver does this:


void flexcan_mbm_isr(struct net_device *dev) {
    /* Read iflag1 and iflag2 and work out the masking */
    iflag1 = __raw_readl(flexcan->io_base + CAN_HW_REG_IFLAG1) &
             __raw_readl(flexcan->io_base + CAN_HW_REG_IMASK1);
    iflag2 = __raw_readl(flexcan->io_base + CAN_HW_REG_IFLAG2) &
             __raw_readl(flexcan->io_base + CAN_HW_REG_IMASK2);

    __raw_writel(iflag1, flexcan->io_base + CAN_HW_REG_IFLAG1);    #### Clear ALL set interrupts!
    __raw_writel(iflag2, flexcan->io_base + CAN_HW_REG_IFLAG2);

    if (flexcan->fifo) {
        flexcan_fifo_isr(dev, iflag1);        #### This function reads one message only.
        iflag1 &= 0xFFFFFF00;


It clears the interrupts and THEN reads the FIFO. That's completely the wrong order. Reading the manual, I'm surprised it ever delivers the second message. I'm guessing the reception of a new message into the FIFO sets the interrupt request again, so that explains why it can read multiple MBs late.


It should also probably loop reading the FIFO while there's something in it. Instead it reads ONE message, and then cycles through up to 56 Transmit message buffers before returning for another interrupt to read the next one. That's inefficient and is ignoring the highest priority interrupt. Thinking about this though, the state machine (that transfers the next message in) is probably running off a slow clock and may take a long time (relative to the CPU) to get the FIFO ready for the next read. So it may not be worth checking or waiting for. I may measure this later.


This bug would show up in a protocol where two devices are exchanging commands and responses. The replies would have to be "pushed" through the FIFO by protocol retries. It might work, but would be very slow.


In order to trigger this condition you have to have interrupts disabled by something else for long enough for 2 (or 3 or 4) messages to be in the FIFO before its interrupt got serviced.


I can't find any fixes for this anywhere is the FlexCAN driver in 2.6.35 is an "orphan". By 2.6.38 it had been replaced by the mainstream driver which gets it right. I don't have the option of using that driver as too much changed in the kernel.


Has anyone found and fixed this already (or found they had the problem and gave up)?


The Reference Manual doesn't have any instructions on when to clear message buffer interrupts for "normal" receive and transmit one. Since the buffer has to be locked to work on it, the order shouldn't matter. Except it should clear the interrupts before re-enabling a buffer for sending or transmitting (and