we have custom board based on IMX6Q processor, we are observing kernel crash in version 4.14.336 randomly. Following is crash dump
Sep 10 09:39:52 user.emerg kernel: BUG: spinlock already unlocked on CPU#1, sh/8911
Sep 10 09:39:52 user.emerg kernel: lock: rcu_preempt_state+0x0/0x300, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
Sep 10 09:39:52 user.warn kernel: CPU: 1 PID: 8911 Comm: sh Tainted: G O 4.14.336 #1
Sep 10 09:39:52 user.warn kernel: Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Sep 10 09:39:52 user.warn kernel: [<8010ed8c>] (unwind_backtrace) from [<8010b598>] (show_stack+0x10/0x14)
Sep 10 09:39:52 user.warn kernel: [<8010b598>] (show_stack) from [<80610e84>] (dump_stack+0x84/0x98)
Sep 10 09:39:52 user.warn kernel: [<80610e84>] (dump_stack) from [<801663e0>] (do_raw_spin_unlock+0xc0/0x11c)
Sep 10 09:39:52 user.warn kernel: [<801663e0>] (do_raw_spin_unlock) from [<806175fc>] (_raw_spin_unlock_irqrestore+0xc/0x44)
Sep 10 09:39:52 user.warn kernel: [<806175fc>] (_raw_spin_unlock_irqrestore) from [<8017a2c0>] (note_gp_changes+0x70/0x98)
Sep 10 09:39:52 user.warn kernel: [<8017a2c0>] (note_gp_changes) from [<8017a610>] (rcu_process_callbacks+0xb4/0x4d4)
Sep 10 09:39:52 user.warn kernel: [<8017a610>] (rcu_process_callbacks) from [<801015f8>] (__do_softirq+0xd8/0x230)
Sep 10 09:39:52 user.warn kernel: [<801015f8>] (__do_softirq) from [<8012abe4>] (irq_exit+0xbc/0x104)
Sep 10 09:39:52 user.warn kernel: [<8012abe4>] (irq_exit) from [<8016b818>] (__handle_domain_irq+0x80/0xe8)
Sep 10 09:39:52 user.warn kernel: [<8016b818>] (__handle_domain_irq) from [<801014d8>] (gic_handle_irq+0x4c/0x90)
Sep 10 09:39:52 user.warn kernel: [<801014d8>] (gic_handle_irq) from [<8010bfcc>] (__irq_svc+0x6c/0xa8)
Sep 10 09:39:52 user.warn kernel: Exception stack(0xa8d81dd0 to 0xa8d81e18)
Sep 10 09:39:52 user.warn kernel: 1dc0: 00000001 bfb6a9e0 00000002 00000002
Sep 10 09:39:52 user.warn kernel: 1de0: 00000001 ffffe000 801dbca8 00000000 309cf75f 000000e2 a94bb388 a8d81e34
Sep 10 09:39:52 user.warn kernel: 1e00: 00000000 a8d81e20 801dbca8 80148594 800f0113 ffffffff
Sep 10 09:39:52 user.warn kernel: [<8010bfcc>] (__irq_svc) from [<80148594>] (preempt_count_add+0x64/0x14c)
Sep 10 09:39:52 user.warn kernel: [<80148594>] (preempt_count_add) from [<801dbca8>] (__lru_cache_add+0x10/0xd4)
Sep 10 09:39:52 user.warn kernel: [<801dbca8>] (__lru_cache_add) from [<801f91b4>] (handle_mm_fault+0x4d4/0xa18)
Sep 10 09:39:52 user.warn kernel: [<801f91b4>] (handle_mm_fault) from [<801130a0>] (do_page_fault+0x114/0x390)
Sep 10 09:39:52 user.warn kernel: [<801130a0>] (do_page_fault) from [<80101328>] (do_DataAbort+0x50/0xe8)
Sep 10 09:39:52 user.warn kernel: [<80101328>] (do_DataAbort) from [<8010c35c>] (__dabt_usr+0x3c/0x40)
Sep 10 09:39:52 user.warn kernel: Exception stack(0xa8d81fb0 to 0xa8d81ff8)
Sep 10 09:39:52 user.warn kernel: 1fa0: 004e2630 00000000 2e028c8c 50a73ffc
Sep 10 09:39:52 user.warn kernel: 1fc0: 004d9398 004da008 00000000 00000000 00000000 00000000 004d9398 00000000
Sep 10 09:39:52 user.warn kernel: 1fe0: 004e2630 7ea5b370 00430af9 76e3af06 600f0030 ffffffff'
any one else seen similar issues/need pointer to identify root cause.
we are seeing issues in 80-90% boards (6-7 boards) in Test Bed, Frequency vary from 1-2 days. we did not do DDR Stress Test, As this product exist from couple of years, did not seen these issue. Earlier we had not noticed with previous kernel version, we are reverifying with 4.14.327 with same test bed configuration. if you suggest DDR stress test will help to analyze issue, we can perform.
Hello,
Is your board able to pass the DDR stress test? An over-night test could be helpful in this case.
How often do you see this crash in your boards? How many boards presents this issue?
Best regards.