imx53 FlexCAN interface

MarkRoy · ‎02-26-2013

Hello,

I've been trying to get the FlexCAN modules to work correctly on an i.mx536 processor. The board is a custom board similar to the QSB. I have connected the CAN2 interface on its ALT2 muxing option (pins E5 and E6). I have also double checked the iomuxing and it looks all correct.

I am using the 11.09 Linux BSP (2.6.35.3) and have enabled the FlexCAN driver by adding/registering the platform data similar to how it is done in mx53_evk.c and mx53_ard.c for CAN2.

I have also compiled/installled canutils (as described in All Boards FlexCAN ) and libsocketcan. When I boot, everything looks good, and I can see the driver attributes in /sys/devices/platform/FlexCAN.1/

Continuing wth the All Boards FlexCAN instructions, I try and run "conconfig" but get "RTNETLINK answers; Operation not supported". However, since it seems to be just configuring the bitrate, I can set that myself through the attributes interface above (ie "echo 125000 > /sys/devices/platform/FlexCAN.1/bitrate" ) and it seems to work.

Then I bring up the interface:

ifconfig can1 up

Next I attempt to send a can message using cansend as per the instructions in the aforementioned post:

cansend can1 -i0x100 11 22 33 44

Issuing this command seems to "work", but I see no output on my can2 TX pin when I look at it with the oscilloscope. Going back to the driver attributes, if I dump the transmit message buffers I see the message sitting in the buffer, but it doesnt look like it has been sent since the timestamp portion of the message buffer structure is not completed. If I understand the reference manual, the timestamp should be automatically filled in when the message is sent.

$ cat dump_xmit_mb

mb[32]::CS:0xc040000 ID:0x4000000 DATA[1~2]:0x21,0x16

...

Also, looking at the message in the buffer, it seems the CODE field is 0b1100 with no RTR set which indicates "Transmit data frame unconditionally once" and it should automatically return to "inactive" 0b0000 after the message has been sent.

It seems to me from looking at all of this that the asynchronous portion of the module is working correctly, allowing me to set/read registers, etc.. but for whatever reason, the synchronous portion is not working so the message buffer is not being processed and sent.

Dumping the registers for the module after trying to send a frame gives the following:

$ cat dump_reg

MCR::0x4087023f

CTRL::0x2a4c0c4

RXGMASK::0x0

RX14MASK:0x0

ECR::0x0

ESR::0x0

IMASK2::0xffffffff

IMASK1::0xffffffff

IFLAG2::0x0

IFLAG1::0x0

The only interesting thing to me at this point in these regs is the MCR, which when compared to the reference manual description of the register contains reasonable default settings.

From all of this, it leads me to believe that perhaps something is not right in the clocking of the module. In the platform data I have listed:

.root_clk_id = "lp_apm" which was copied from one of the other boards configs. This is the only configuration i have done for the clocking of the module. I have not changed any of the settings pertaining to the CAN clocks in clock.c. If I look at /proc/cpu/clocks I can see lp_apm-0 listed with a rate of 24MHz and both can1 and can2 clocks are listed also with a rate of 24MHz.

Any advice is appreciated.

MarkRoy · ‎03-01-2013

I've figured out part of the problem. The RX line was not being pulled up by the IOMUXing so it was permanently holding the bus. The solution to this problem was to modify the MX53_CAN_PAD_CTRL macro in iomux-mx53.h and adding PAD_CTL_PUS_100K_UP so that the pin is pulled to idle the bus.

This seems to fix part of my problems, now I can see some activity on the bus when I try to transmit, but only the first message seems to be partially sent. I'm still debugging so will post back when I have figured the rest out.

Mark.

MarkRoy · ‎03-05-2013

Ok, I have the driver working, but will need to make a couple modifications to it. The problem that I am having is that when I transmit a CAN message on an empty bus, there is obviously no ACK received.

When this happens, the FlexCAN module does not do anything with the message in the buffer, but simply triggers an interrupt with the ack error flag set. However, since the message is still in the buffer, the FlexCAN module automatically retries the message, incrementing the error counter each time. When the error counter reaches 128, it no longer increments the error counter but keeps retrying.

The problem with the driver is that in this event, the interrupt is being called so frequently that it effectively locks up the entire Linux kernel. Also, whatever process requested the transfer via socket never returns to execution.

I plan on trying to fix this problem by modifying the driver so that when an ACK error is received and the error counter is at 128, it will abort any open transfers in the message buffers. This might not be the proper way to do things with the CAN protocol, but it's the only way I see around the problem. If the reason for the ACK error is that there are no other nodes on the bus, then it shouldnt matter and the calling process can re-attempt the transfer at a later time.

JerryFan · ‎03-05-2013

Per the CAN specification, the transmitter is expecting a ack from the target node, if there is no such node in the bus, a no-ack error will raised. Aborting all the transfers once a NO-ack issued is kind of too tough, I recommanded to abort the transfter caused no-ack issue only.

MarkRoy · ‎03-06-2013

The problem with that is that there is no way to identify which specific message buffer was not acknowledged, thus the only way to really abort is to abort all of them.

TomE · ‎11-30-2015

The "CAN Standards" or the "CAN general knowledge that seems to determine what the chips are meant to do" is fairly specific in that case (of a bus without any other devices). This happens in every modern car with the first device to power up when you turn the key.

The original Bosch Spec says under "Fault Confinement":

3. When a TRANSMITTER sends an ERROR FLAG the TRANSMIT ERROR COUNT is increased by 8.
Exception 1:
If the TRANSMITTER is ’error passive’ and detects an ACKNOWLEDGMENT

ERROR because of not detecting a ’dominant’ ACK and does not detect a
’dominant’ bit while sending its PASSIVE ERROR FLAG.

So the first device wakes up, tries to send its first message, gets an ACK error and retries at (usually) 100% bus-busy. As you documented, it counts up to 128 and then keeps going. The chip is meant to keep quiet about this as it is totally expected. When the next device powers up its controller ACKs the repeating message, it has been sent, and the sender drops out of error-passive and gets back to normal.

Unfortunately the FlexCAN implementation kills your CPU when this happens. It would make sense if in the "Exception 1" case above (Passive and Ack Error) the logic that decides to not increment the counter also decided to not set the ACK error bit.

But the error interrupts are useless unless something is monitoring them. The code creates an SKB with "frame->can_id = CAN_ERR_FLAG | CAN_ERR_CRTL" and feeds it up the pipe. Unless you've registered a socket listener to receive these "error frames" they're totally wasted cycles.

In the Mainstream CAN code there's an option to tell the CAN driver that you don't care about error interrupts, and to not waste CPU time handling them. This is the "CAN_CTRLMODE_BERR_REPORTING" option. That seemed to come in around Kernel Release 3.7.

Check line 942 of the following:

http://lxr.free-electrons.com/source/drivers/net/can/flexcan.c?v=3.18

938          * enable the "error interrupt" (FLEXCAN_CTRL_ERR_MSK),
939          * on most Flexcan cores, too. Otherwise we don't get
940          * any error warning or passive interrupts.
941          */
942         if (priv->devtype_data->features & FLEXCAN_HAS_BROKEN_ERR_STATE ||
943             priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)
944                 reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
945         else
946                 reg_ctrl &= ~FLEXCAN_CTRL_ERR_MSK;
937

In the code I'm using (based on the mainstream 3.4 driver) I've simply disabled the error interrupt completely.

Tom

MarkRoy · ‎02-26-2013

Digging into clock.c, I put in some debug printks and have verified that LP_APM is correctly selecting the oscillator clock output, and can_clk_sel is correctly selecting lp_apm as the source for CAN_CLK_ROOT. So clocking looks right.

MarkRoy · ‎02-26-2013

I just modified dump_reg to include the free running timer and have verified that the clock seems to be running correctly as the value of the timer is constantly changing.

imx53 FlexCAN interface

imx53 FlexCAN interface

i.MX53