i.MX6 Marsboard. FlexCan low performance

mikitadzivakov · ‎10-08-2014

Hi, i have a problem with the performance of the FlexCan on MarsBoard. Reading at a bitrate 1000000 leads to a loss of about 3% of the packets when the CAN bus is loaded to 18%.

I use Linux kernel version 3.0.15, standart flexcan driver and libraries from this source All Boards FlexCAN. FIFO buffers enabled by default.

I want to know: this is normal for this controller? I ask the Board to improve the performance for normal operation with bus load at least 60%.

Thanks.

TomE · ‎11-30-2014

We're running CAN at 1MBit/sec and had overrun problems on an i.MX53.

The problem is that the FlexCAN drivers have been written to use the NAPI interface. This "New API" dates from a decade or more ago. This is a "fast interrupt" scheme that delays handing data from the hardware to the higher level drivers until a delayed thread (delayed until all other interrupts have completed). The problem with the FlexCAN implementation is that it delays reading the HARDWARE Registers (to read the data) until then.

Unfortunately, other drivers haven't been rewritten (as of Kernel 3.4, I'm looking at YOU, Ethernet driver) and so that driver is free to read multiple Ethernet packets and forward them all the way up into the Network stack during its interrupt, locking out the reading of the Flexcan. After a month or two of solid work I found that the NAPI calls were getting delayed by up to 6ms, and by using "/sys/kernel/debug/tracing" was able to find that we had been given a kernel that had SLUB debugging on, and that was slowing it down hugely. That sort of thing should never be on in production kernels! The Kernel had a bug in "mxc_cpufreq_init() which meant it wouldn't run at all if SLUB debugging was turned off! Fixing THAT bug and turning SLUB debugging off improved things, but the real fix was to SERIOUSLY REWRITE flexcan.c so as to read the FIFO dry during the interrupt into a large internal software FIFO and to deliver that during the NAPI thread.

Linux isn't a "real time operating system". It can (and does) lock up for a long time and for all sorts of reasons. You can't expect it to keep up with any hardware that doesn't have huge internal buffering, or independent DMA. Flexcan doesn't have that and the drivers compound the problem.

Just to put the icing on it, the CPU supports a sophisticated and flexible interrupt priority system, so you can give sensitive peripherals priority over other ones. But unlike a CPU architecture made for embedded work, you can't easily have interrupts interrupting other interrupts (like you can on the Coldfire with 7 separate levels). So a low priority interrupt can block any higher ones. To put the icing on that, the Linux port we've got (based on the mainline) doesn't even have any code to enable interrupt priority. All interrupts are running at the same level. I had to rewrite parts of the Kernel to fix that as well.

Tom

i.MX6 Marsboard. FlexCan low performance

i.MX6 Marsboard. FlexCan low performance

i.MX6_All

i.MX6Dual

Linux