We are using an iMx6 (quad and dual on different boards) and we are seeing infrequent instances where the CAN module does not properly initialize at power on. This is under yocto linux using socketCAN. Most of the time things come up and work as expected. However sometimes the can0 interface cannot be activated. When I run canconfig can0, the state is STOPPED. I tell it to start but it stays in the STOPPED state. A restart command replies that we are not in the BUS-OFF state so it isn't required. I have also tried ifconfig down/up and that doesn't get it working. The only thing I have found that will recover it is to do a full reboot of the system.
It seems to happen more frequently when there is CAN traffic during system boot. I have a bench-top system where my CAN bus is normally idle and I never see the condition when that is the case. However, if I get CAN traffic running and then reboot, I may see it happen once or twice in a month. Similarly, when the target device is placed on a vehicle (where CAN is active when we boot), we see the same "stuck in STOP" behavior happen very infrequently. I am pretty sure something is going wrong at a low level because network manager usually issues the "Gained carrier" message for can0, but when this happens it does not. That and the fact that canconfig is showing STOPPED tells me that it isn't a problem in the application layer.
Any ideas on what is happening that causes the initialization at boot to fail and force the block into the STOPPED state? Or possibly how to "kick it in the pants" and get it going again without having to do a full reboot?
Thanks,
Rick
Hello all,
has there ever been any progress on this issue?
We are experiencing the same issue (CAN interface going into STOPPED and can't be restarted) on our hardware based on a Apalis iMX6 (imx6q). In our case it happens even without a physical connection to other CAN devices. On all of our platforms we are using a gateway process to distribute CAN messages between local processes and a socketcan interface. Some of those local processes may simulate devices which aren't available as physical entities. And if I start 9 of them at the "same" time the interface goes into STOPPED and can't be restarted via canconfig or "ip link".
The simulated devices are CANopen nodes, so the simulation processes send a bootup message once started plus a heartbeat every second. I can monitor the gateway process and verify that all simulations have been started and send those messages. However, nothing of this is carried to the outside as the interface is STOPPED.
Platform is Toradex Apalis module running their LXDE Image based on linux kernel 4.14.170-3.0.4+gbaa6c24240a4. Needless to say that CAN transmission to the outside generally work.
Best regards,
Vitus
I think I found the cause of the issue in the meantime. The flexcan module in kernel 4.14 does call pm_runtime_get_sync() on two occasion. Similar to what other modules do. But in contrast to those it considers the calls to fail when it returns something unequal to zero. Other modules only fail if the return value is below zero and then they call pm_runtime_put_noidle() so that the next call will succeed.
I've implemented a patch to change the behaviour of flexcan and so far the issue did not reappear. See attachment, applies to Toradex' branch toradex_4.14-2.3.x-imx.
Best regards,
Vitus
Hi Rick
if issue happens when there is CAN traffic, this may introduce additional noise on
board and its power supplies. One can check them (for example ripples) with
oscilloscope using guidelines in i.MX6 System Development User’s Guide.
Noise also can affect stability of processor clocks (crystals stability).
https://www.nxp.com/docs/en/user-guide/IMX6DQ6SDLHDG.pdf
For software side one can try latest official nxp Linux L4.14.78_1.0.0
linux/arch/arm/boot/dts/imx6q-sabreauto-flexcan1.dts
imx6q-sabreauto-flexcan1.dts\dts\boot\arm\arch - linux-imx - i.MX Linux kernel
Linux L4.14.78_1.0.0 Documentation
Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
It is not just CAN traffic, but CAN traffic during initialization of the CAN module. When the CAN module initializes properly (which is 99% of the time), then everything works fine and we can process CAN traffic for hours. So I don't believe that it is a general issue with power supply or clock stability. It is possible that there is some kind of a stability issue early in boot, but CAN seems to be the only module that is having problems. I am thinking it is something more like we get unexpected interrupts or other changes during init, and the driver perhaps marks the module as "bad" and doesn't allow it to be used by the system.
Thanks,
Rick
@rick_klaus, did you ever resolve this? I'm seeing the same issues. Every 30-40 power cycles the CAN interface is STOPPED, and will never start again until I cycle power again.
to narrow down issue one can enable flexcan clocks in uboot (with CCM_CCGR0 register),
recheck sect.26.7.1 FLEXCAN Initialization Sequence i.MX6DQ Reference Manual
http://www.nxp.com/docs/en/reference-manual/IMX6DQRM.pdf
and try without "socketCAN" just with NXP linux described on link
Best regards
igor