Deadlock on i.MX6UL running NXP Linux kernel 4.1.15_2.0.1

felixradensky · ‎01-03-2017

Hello,

I'm experiencing a system freeze on Variscite i.MX6UL SOM running NXP kernel 4.1.15_2.0.1. The easiest way to reproduce the problem is to boot with init=/bin/bash kernel argument and leave the system idle for several minutes. Enabling lockdep in the kernel reveals the following circular locking dependency in NXP run-time PM code:

[ INFO: possible circular locking dependency detected ]
4.1.15-224489-gfae15ad #11 Not tainted
-------------------------------------------------------
kworker/0:1/19 is trying to acquire lock:
(bus_freq_mutex){+.+...}, at: [<80023eb8>] reduce_bus_freq_handler+0x18/0x28

but task is already holding lock:
((&(&low_bus_freq_handler)->work)){+.+...}, at: [<80040e84>] process_one_work+0xf4/0x44c

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 ((&(&low_bus_freq_handler)->work)){+.+...}:
       [<80041cb8>] __cancel_work_timer+0x8c/0x1b4
       [<8002539c>] request_bus_freq+0xa4/0x148
       [<802c9808>] mxsfb_runtime_resume+0xc/0x14
       [<8032f64c>] __rpm_callback+0x2c/0x60
       [<8032f6a0>] rpm_callback+0x20/0x80
       [<803304f8>] rpm_resume+0x360/0x530
       [<8033072c>] __pm_runtime_resume+0x64/0x90
       [<802cb114>] mxsfb_probe+0x698/0x9e4
       [<8032a160>] platform_drv_probe+0x44/0xac
       [<803289fc>] driver_probe_device+0x174/0x2b0
       [<80328c10>] __driver_attach+0x94/0x98
       [<80326f1c>] bus_for_each_dev+0x68/0x9c
       [<8032815c>] bus_add_driver+0x148/0x1f0
       [<80329204>] driver_register+0x78/0xf8
       [<80009670>] do_one_initcall+0x8c/0x1d8
       [<80763d40>] kernel_init_freeable+0x124/0x1c4
       [<80563e3c>] kernel_init+0x8/0xe8
       [<8000f3f8>] ret_from_fork+0x14/0x3c

-> #0 (bus_freq_mutex){+.+...}:
       [<80568f0c>] mutex_lock_nested+0x68/0x408
       [<80023eb8>] reduce_bus_freq_handler+0x18/0x28
       [<80040ef4>] process_one_work+0x164/0x44c
       [<80041228>] worker_thread+0x4c/0x49c
       [<800469a0>] kthread+0xe8/0x100
       [<8000f3f8>] ret_from_fork+0x14/0x3c

other info that might help us debug this:

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
lock((&(&low_bus_freq_handler)->work));
                               lock(bus_freq_mutex);
                               lock((&(&low_bus_freq_handler)->work));
lock(bus_freq_mutex);

*** DEADLOCK ***

2 locks held by kworker/0:1/19:
#0: ("events"){.+.+.+}, at: [<80040e84>] process_one_work+0xf4/0x44c
#1: ((&(&low_bus_freq_handler)->work)){+.+...}, at: [<80040e84>] process_one_work+0xf4/0x44c

With CONFIG_PM disabled the problem is not reproducible. I can also reproduce the freeze on Variscite i.MX6Q SOM running the same kernel version.

I'd appreciate any feedback on solving the problem.

Felix.

emptyfridge · ‎10-25-2019

Hy Guys,

Anything new on this topic?

I have this behavior on i.MX6 SoloX. Again an other processor....

workaround

echo 0 > /sys/devices/soc0/soc/soc:busfreq/enable

is working. What exactly happens with this workaround? Is it possible to run a system stable with this?

is there any other solution?

kernel verions 4.9.123, rev 6a71cbc089755afd6a86c005c22a1af6eab24a70. same problem as in 4.1.15...

thanks gys!

cheers

tom

davemcmordie · ‎02-28-2017

I am seeing virtually identical behaviour on 3.14.38 (trace below). This suggests that this bug has been with us for a while now. Thanks for the workaround.

======================================================
[ INFO: possible circular locking dependency detected ]
3.14.38+ #4 Not tainted
-------------------------------------------------------
kworker/0:3/702 is trying to acquire lock:
(bus_freq_mutex){+.+...}, at: [<80029480>] reduce_bus_freq_handler+0x20/0x30

but task is already holding lock:
((&(&low_bus_freq_handler)->work)){+.+...}, at: [<80049f74>] process_one_work+0x128/0x46c

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 ((&(&low_bus_freq_handler)->work)){+.+...}:
[<80049660>] flush_work+0x44/0x280
[<8004a8fc>] __cancel_work_timer+0x9c/0x1c8
[<8004aa54>] cancel_delayed_work_sync+0x14/0x18
[<8002af94>] request_bus_freq+0xac/0x19c
[<80416870>] gpmi_runtime_resume+0x24/0x2c
[<80394244>] pm_generic_runtime_resume+0x34/0x40
[<803983a8>] __rpm_callback+0x34/0x68
[<80398404>] rpm_callback+0x28/0x7c
[<803993b8>] rpm_resume+0x3b0/0x584
[<803995f8>] __pm_runtime_resume+0x6c/0x98
[<80418868>] gpmi_init+0x1c/0x140
[<804178f4>] gpmi_nand_probe+0x1ec/0x89c
[<803909e4>] platform_drv_probe+0x4c/0xac
[<8038ef40>] driver_probe_device+0x128/0x268
[<8038f168>] __driver_attach+0x9c/0xa0
[<8038d2dc>] bus_for_each_dev+0x68/0x9c
[<8038e9ac>] driver_attach+0x24/0x28
[<8038e624>] bus_add_driver+0x150/0x1f8
[<8038f7d0>] driver_register+0x80/0x100
[<80390928>] __platform_driver_register+0x5c/0x64
[<809d1428>] gpmi_nand_driver_init+0x1c/0x20
[<80008944>] do_one_initcall+0x110/0x160
[<80996d18>] kernel_init_freeable+0x148/0x1e8
[<807037b4>] kernel_init+0x10/0xf0
[<8000e848>] ret_from_fork+0x14/0x2c

-> #0 (bus_freq_mutex){+.+...}:
[<800722a4>] lock_acquire+0x68/0x7c
[<8070cd14>] mutex_lock_nested+0x78/0x464
[<80029480>] reduce_bus_freq_handler+0x20/0x30
[<80049ff0>] process_one_work+0x1a4/0x46c
[<8004aed8>] worker_thread+0x140/0x3f8
[<8005124c>] kthread+0xdc/0xf0
[<8000e848>] ret_from_fork+0x14/0x2c

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock((&(&low_bus_freq_handler)->work));
lock(bus_freq_mutex);
lock((&(&low_bus_freq_handler)->work));
lock(bus_freq_mutex);

*** DEADLOCK ***

2 locks held by kworker/0:3/702:
#0: ("events"){.+.+.+}, at: [<80049f74>] process_one_work+0x128/0x46c
#1: ((&(&low_bus_freq_handler)->work)){+.+...}, at: [<80049f74>] process_one_work+0x128/0x46c

stack backtrace:
CPU: 0 PID: 702 Comm: kworker/0:3 Not tainted 3.14.38+ #4
Workqueue: events reduce_bus_freq_handler
Backtrace:
[<80012254>] (dump_backtrace) from [<8001246c>] (show_stack+0x18/0x1c)
r6:00000000 r5:bd4ae038 r4:80a1cdd8 r3:bcf6ce00
[<80012454>] (show_stack) from [<80709d68>] (dump_stack+0x94/0xcc)
[<80709cd4>] (dump_stack) from [<80707240>] (print_circular_bug+0x300/0x30c)
r6:80b2eda0 r5:80b2ec90 r4:80b2ec90 r3:bcf6ce00
[<80706f40>] (print_circular_bug) from [<8007140c>] (__lock_acquire+0x16c0/0x1d74)
r10:00000002 r9:80b167b0 r8:80fb3784 r7:bcf6d290 r6:bcf6d290 r5:80f67464
r4:bcf6d278 r3:bcf6d278
[<8006fd4c>] (__lock_acquire) from [<800722a4>] (lock_acquire+0x68/0x7c)
r10:80a0aaec r9:80029480 r8:bcf6ce00 r7:00000001 r6:600e0013 r5:bd4ae000
r4:00000000
[<8007223c>] (lock_acquire) from [<8070cd14>] (mutex_lock_nested+0x78/0x464)
r7:bd4ae018 r6:80f67464 r5:00000000 r4:bd4ae000
[<8070cc9c>] (mutex_lock_nested) from [<80029480>] (reduce_bus_freq_handler+0x20/0x30)
r10:00000000 r9:bf7bfd00 r8:00000000 r7:bd4afea8 r6:bf7bc780 r5:80a7772c
r4:80a0aaec
[<80029460>] (reduce_bus_freq_handler) from [<80049ff0>] (process_one_work+0x1a4/0x46c)
r4:bd41ea80 r3:80029460
[<80049e4c>] (process_one_work) from [<8004aed8>] (worker_thread+0x140/0x3f8)
r10:bf7bc780 r9:bd4ae000 r8:bd4ae028 r7:bd4ae000 r6:bd41ea98 r5:bf7bc7b0
r4:bd41ea80
[<8004ad98>] (worker_thread) from [<8005124c>] (kthread+0xdc/0xf0)
r10:00000000 r9:00000000 r8:00000000 r7:8004ad98 r6:bd41ea80 r5:bcc112c0
r4:00000000
[<80051170>] (kthread) from [<8000e848>] (ret_from_fork+0x14/0x2c)
r7:00000000 r6:00000000 r5:80051170 r4:bcc112c0

felixradensky · ‎01-04-2017

As a workaround one can disable bus frequency scaling (runtime PM) like this:

echo "0 1" > /sys/devices/platform/soc/soc:busfreq/enable

This fixes the freeze I've reported earlier, but fixing the code to avoid deadlock would be better.

Felix.

Deadlock on i.MX6UL running NXP Linux kernel 4.1.15_2.0.1

Deadlock on i.MX6UL running NXP Linux kernel 4.1.15_2.0.1

i.MX6Quad

i.MX6UL

Linux

Suspected Software Defect