iMX6 FlexCAN generates 'BUG: scheduling while atomic' on every CAN send

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

iMX6 FlexCAN generates 'BUG: scheduling while atomic' on every CAN send

Jump to solution
4,131 Views
lsirobway
Contributor II

I have successfully installed the FlexCAN driver on iMX6 Dual Lite, and I can communicate with it. However, every time I send data (either from my own code or through canutils 'cangen' or 'cansend'), a kernel BUG is triggered. As the stack trace reveals, the cause is function flexcan_chip_start() in flexcan.c:

GS920# ./scTest can0

can0 at index 2

Wrote 16 bytes

root@gs920:/media/sdcard/GS920# BUG: scheduling while atomic: swapper/0/0/0x00000100

1 lock held by swapper/0/0:

#0: (((&priv->restart_timer))){+.-...}, at: [<8002fe60>] call_timer_fn+0x0/0xe8

Modules linked in:

CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.15.7 #5

Backtrace:

[<80011ba4>] (dump_backtrace) from [<80011d40>] (show_stack+0x18/0x1c)

r6:00000000 r5:00000000 r4:00000000 r3:00000000

[<80011d28>] (show_stack) from [<805de120>] (dump_stack+0x88/0xa4)

[<805de098>] (dump_stack) from [<805db3ec>] (__schedule_bug+0x64/0x7c)

r5:80867268 r4:80867268

[<805db388>] (__schedule_bug) from [<805e09b0>] (__schedule+0x540/0x5f8)

r4:bf7c7b00 r3:00000002

[<805e0470>] (__schedule) from [<805e0b5c>] (schedule+0x38/0x88)

r10:8085c000 r9:00000000 r8:00000001 r7:00000000 r6:00004e20 r5:00000000

r4:00002710

[<805e0b24>] (schedule) from [<805e03c4>] (schedule_hrtimeout_range_clock+0xe0/0x158)

[<805e02e4>] (schedule_hrtimeout_range_clock) from [<805e0450>] (schedule_hrtimeout_range+0x14/0x18)

r10:808644c4 r8:bf335e00 r7:00000000 r6:c09b8000 r5:c09b8000 r4:bf335800

[<805e043c>] (schedule_hrtimeout_range) from [<8002fb58>] (usleep_range+0x50/0x58)

[<8002fb08>] (usleep_range) from [<803f2964>] (flexcan_chip_start+0x7c/0x440)

[<803f28e8>] (flexcan_chip_start) from [<803f3994>] (flexcan_set_mode+0x28/0x54)

r10:bf335800 r9:803f1680 r8:803f1680 r7:00000100 r6:00000001 r5:bf335800

r4:bf335800

[<803f396c>] (flexcan_set_mode) from [<803f1718>] (can_restart+0x98/0xd4)

r5:8085c000 r4:bf335800

[<803f1680>] (can_restart) from [<8002fed0>] (call_timer_fn+0x70/0xe8)

r5:8085c000 r4:8085de00

[<8002fe60>] (call_timer_fn) from [<80030810>] (run_timer_softirq+0x1b0/0x24c)

r10:8085e0c0 r8:bf335800 r7:00000000 r6:8085de40 r5:808b3500 r4:bf335e74

[<80030660>] (run_timer_softirq) from [<8002a1bc>] (__do_softirq+0x128/0x26c)

r10:00000001 r9:00000100 r8:00000001 r7:8085c000 r6:8085e080 r5:8085e084

r4:00000000

[<8002a094>] (__do_softirq) from [<8002a5f8>] (irq_exit+0xb0/0x104)

r10:805e82a8 r9:8085c000 r8:00000000 r7:f4000100 r6:00000000 r5:0000001d

r4:8085c000

[<8002a548>] (irq_exit) from [<8000f31c>] (handle_IRQ+0x44/0x9c)

r4:80858ffc r3:00000182

[<8000f2d8>] (handle_IRQ) from [<80008540>] (gic_handle_irq+0x30/0x64)

r6:8085df20 r5:80864a98 r4:f400010c r3:000000a0

[<80008510>] (gic_handle_irq) from [<80012864>] (__irq_svc+0x44/0x58)

Exception stack(0x8085df20 to 0x8085df68)

df20: 00000001 00000001 00000000 80867268 8085c000 80864588 80864538 8085c000

df40: 00000000 8085c000 805e82a8 8085df74 8085df38 8085df68 800633f0 8000f6a0

df60: 200e0013 ffffffff

r7:8085df54 r6:ffffffff r5:200e0013 r4:8000f6a0

[<8000f678>] (arch_cpu_idle) from [<8005c72c>] (cpu_startup_entry+0xfc/0x160)

[<8005c630>] (cpu_startup_entry) from [<805d81e8>] (rest_init+0xb0/0xd8)

r7:808045f8 r3:00000000

[<805d8138>] (rest_init) from [<807d3b4c>] (start_kernel+0x310/0x374)

r5:808b22c0 r4:80864638

[<807d383c>] (start_kernel) from [<10008074>] (0x10008074)

flexcan 2090000.flexcan can0: writing ctrl=0x0729a055

BUG: scheduling while atomic: swapper/0/0/0x00000100

1 lock held by swapper/0/0:

#0: (((&priv->restart_timer))){+.-...}, at: [<8002fe60>] call_timer_fn+0x0/0xe8

Modules linked in:

CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.15.7 #5

Backtrace:

[<80011ba4>] (dump_backtrace) from [<80011d40>] (show_stack+0x18/0x1c)

r6:00000000 r5:00000000 r4:00000000 r3:00000000

[<80011d28>] (show_stack) from [<805de120>] (dump_stack+0x88/0xa4)

[<805de098>] (dump_stack) from [<805db3ec>] (__schedule_bug+0x64/0x7c)

r5:80867268 r4:80867268

[<805db388>] (__schedule_bug) from [<805e09b0>] (__schedule+0x540/0x5f8)

r4:bf7c7b00 r3:00000002

[<805e0470>] (__schedule) from [<805e0b5c>] (schedule+0x38/0x88)

r10:8085c000 r9:00000000 r8:00000001 r7:00000000 r6:00004e20 r5:00000000

r4:00002710

[<805e0b24>] (schedule) from [<805e03c4>] (schedule_hrtimeout_range_clock+0xe0/0x158)

[<805e02e4>] (schedule_hrtimeout_range_clock) from [<805e0450>] (schedule_hrtimeout_range+0x14/0x18)

r10:808644c4 r8:bf335e00 r7:00000000 r6:c09b8000 r5:c09b8000 r4:bf335800

[<805e043c>] (schedule_hrtimeout_range) from [<8002fb58>] (usleep_range+0x50/0x58)

[<8002fb08>] (usleep_range) from [<803f2c50>] (flexcan_chip_start+0x368/0x440)

[<803f28e8>] (flexcan_chip_start) from [<803f3994>] (flexcan_set_mode+0x28/0x54)

r10:bf335800 r9:803f1680 r8:803f1680 r7:00000100 r6:00000001 r5:bf335800

r4:bf335800

[<803f396c>] (flexcan_set_mode) from [<803f1718>] (can_restart+0x98/0xd4)

r5:8085c000 r4:bf335800

[<803f1680>] (can_restart) from [<8002fed0>] (call_timer_fn+0x70/0xe8)

r5:8085c000 r4:8085de00

[<8002fe60>] (call_timer_fn) from [<80030810>] (run_timer_softirq+0x1b0/0x24c)

r10:8085e0c0 r8:bf335800 r7:00000000 r6:8085de40 r5:808b3500 r4:bf335e74

[<80030660>] (run_timer_softirq) from [<8002a1bc>] (__do_softirq+0x128/0x26c)

r10:00000001 r9:00000100 r8:00000001 r7:8085c000 r6:8085e080 r5:8085e084

r4:00000000

[<8002a094>] (__do_softirq) from [<8002a5f8>] (irq_exit+0xb0/0x104)

r10:805e82a8 r9:8085c000 r8:00000000 r7:f4000100 r6:00000000 r5:0000001d

r4:8085c000

[<8002a548>] (irq_exit) from [<8000f31c>] (handle_IRQ+0x44/0x9c)

r4:80858ffc r3:00000182

[<8000f2d8>] (handle_IRQ) from [<80008540>] (gic_handle_irq+0x30/0x64)

r6:8085df20 r5:80864a98 r4:f400010c r3:000000a0

[<80008510>] (gic_handle_irq) from [<80012864>] (__irq_svc+0x44/0x58)

Exception stack(0x8085df20 to 0x8085df68)

df20: 00000001 00000001 00000000 80867268 8085c000 80864588 80864538 8085c000

df40: 00000000 8085c000 805e82a8 8085df74 8085df38 8085df68 800633f0 8000f6a0

df60: 200e0013 ffffffff

r7:8085df54 r6:ffffffff r5:200e0013 r4:8000f6a0

[<8000f678>] (arch_cpu_idle) from [<8005c72c>] (cpu_startup_entry+0xfc/0x160)

[<8005c630>] (cpu_startup_entry) from [<805d81e8>] (rest_init+0xb0/0xd8)

r7:808045f8 r3:00000000

[<805d8138>] (rest_init) from [<807d3b4c>] (start_kernel+0x310/0x374)

r5:808b22c0 r4:80864638

[<807d383c>] (start_kernel) from [<10008074>] (0x10008074)


Does anyone know of a fix for this?

Labels (3)
Tags (2)
1 Solution
2,721 Views
lsirobway
Contributor II

I have tested the bug fix (simply replacing all calls to usleep_range(x,y) in flexcan.c with udelay(x)) and it solves the problem.

This thread can be closed.

View solution in original post

0 Kudos
12 Replies
2,721 Views
lsirobway
Contributor II

I should have added that the kernel version is 3.15.7 and the root filesystem is Yocto 1.6.1.

0 Kudos
2,721 Views
fabio_estevam
NXP Employee
NXP Employee

Keith, do 3.16.2 or 3.17-rc4 show the same problem?

It would be nice to report this to linux-can.

You should also add on Cc the folks reported by:

./scripts/get_maintainer.pl -f drivers/net/can/flexcan.c

0 Kudos
2,721 Views
lsirobway
Contributor II

I have reported this bug to the FlexCAN maintainers and they replied that they have already developed a patch which will be added to mainline shortly, and then backported to stable versions.

0 Kudos
2,722 Views
lsirobway
Contributor II

I have tested the bug fix (simply replacing all calls to usleep_range(x,y) in flexcan.c with udelay(x)) and it solves the problem.

This thread can be closed.

0 Kudos
2,721 Views
fabio_estevam
NXP Employee
NXP Employee

Excellent!

0 Kudos
2,721 Views
lsirobway
Contributor II

OK, I have reported this to the maintainers.

0 Kudos
2,721 Views
fabio_estevam
NXP Employee
NXP Employee

To which mailing list have you reported this issue?

0 Kudos
2,721 Views
lsirobway
Contributor II

linux-can@vger.kernel.org and all of the email addresses returned by ./scripts/get_maintainer.pl -f drivers/net/can/flexcan.c as suggested in your post.

0 Kudos
2,721 Views
alejandrolozan1
NXP Employee
NXP Employee

Hi,

Have you tried with kernel 3.10.17? I have not seen this issue with the L3.10.17_1.0.0_IMX6QDLS_BUNDLE

https://www.freescale.com/webapp/Download?colCode=L3.10.17_1.0.0_IMX6QDLS_BUNDLE&appType=license&loc...

Best Regards,

Alejandro

0 Kudos
2,721 Views
lsirobway
Contributor II

No, I have only tried this kernel version 3.15.7. Are you suggesting that Freescale introduced a scheduling bug after getting the FlexCAN driver to work in 3.10.17?

I have narrowed down the problem to restart timeouts after entering the BUSOFF state. My board is not yet physically connected to a CAN bus, so it goes into a BUSOFF state. Normal CAN bus operation means that after the restart timeout, the driver should automatically come out of BUSOFF state. I can vary the timeout period and, no matter how long it is, the BUG is triggered immediately the timeout occurs.

So a short term workaround is to not let the CAN bus go into BUSOFF, but this is not a viable real world solution.

0 Kudos
2,721 Views
fabio_estevam
NXP Employee
NXP Employee

3.15.7 is not maintained by FSL. If you use this version you should report the bug to the can developers as I suggested previously.

0 Kudos
2,721 Views
lsirobway
Contributor II

OK, so putting together all the answers in this thread, it seems that:

  • 3.10.17 might not have a problem
  • 3.15.7 definitely has a problem, but it is not maintained by FSL
  • 3.16.2 and 3.17-rc4 are maintained by FSL, and may or may not have a problem

Is this a fair summary of the problem? If so, I will build all of these versions and test with each of them.

0 Kudos