Hello Team,
I have a custom board powered with imx6 running with Yocto Dunfell. The Linux kernel version used in the product is 5.4.161. We have 8GB emmc connected to SDHCI2.
During poweron, Init mounted couple of partitions from eMMC and then reported failure while attempting to mount few more partitions. We have observed this failure only couple of times. currently we do not have any known steps to reproduce this issue. Based on the debug logs, there is SDHCI register dump followed with a CPU stall.
When i searched internet related to the nature of issue, i saw a patch from NXP team similar to the issue. This patch is not present in the Linux kernel used in our product. The link to the patch is https://patchwork.kernel.org/project/linux-mmc/patch/1460741387-23815-10-git-send-email-aisheng.dong...
I have two questions,
1) Based on the SDHCI register dump, Does it indicate the cause of failure?
2) Does the patch mentioned in the above link is required for imx6 SDHCI controller?
Here is the SDHCI register dump and stack trace. I've attached the boot logs as attachment.
[ 27.967507] mmc2: Got data interrupt 0x00100000 even though no data operation was in progress.
[ 27.976123] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 27.982565] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000002
[ 27.989007] mmc2: sdhci: Blk size: 0x00000200 | Blk cnt: 0x000000b8
[ 27.995449] mmc2: sdhci: Argument: 0x03200101 | Trn mode: 0x00000033
[ 28.001891] mmc2: sdhci: Present: 0x01ed8008 | Host ctl: 0x00000030
[ 28.008331] mmc2: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 28.014772] mmc2: sdhci: Wake-up: 0x00000008 | Clock: 0x0000001f
[ 28.021213] mmc2: sdhci: Timeout: 0x0000008f | Int stat: 0x00000000
[ 28.027654] mmc2: sdhci: Int enab: 0x117f100b | Sig enab: 0x117f100b
[ 28.034094] mmc2: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
[ 28.040535] mmc2: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000a000
[ 28.046976] mmc2: sdhci: Cmd: 0x0000061b | Max curr: 0x00ffffff
[ 28.053416] mmc2: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xfff6dbff
[ 28.059857] mmc2: sdhci: Resp[2]: 0x320f5903 | Resp[3]: 0x00000900
[ 28.066296] mmc2: sdhci: Host ctl2: 0x00000008
[ 28.070740] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000
[ 28.077180] mmc2: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 28.084750] mmc2: sdhci-esdhc-imx: cmd debug status: 0x2120
[ 28.090409] mmc2: sdhci-esdhc-imx: data debug status: 0x2200
[ 28.096155] mmc2: sdhci-esdhc-imx: trans debug status: 0x2300
[ 28.101988] mmc2: sdhci-esdhc-imx: dma debug status: 0x24e0
[ 28.107647] mmc2: sdhci-esdhc-imx: adma debug status: 0x2500
[ 28.113393] mmc2: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 28.119140] mmc2: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 28.125407] mmc2: sdhci: ============================================
[ 32.472944] usb_otg_vbus: disabling
[ 32.476439] CAN XCVR: disabling
[ 32.482406] DA9063_LDO7: disabling
[ 32.489856] DA9063_LDO8: disabling
[ 32.497283] DA9063_LDO9: disabling
[ 32.508828] DA9063_LDO5: disabling
[ 32.516257] DA9063_LDO6: disabling
[ 32.523683] DA9063_LDO10: disabling
[ 49.132870] rcu: INFO: rcu_sched self-detected stall on CPU
[ 49.138454] rcu: 3-....: (2099 ticks this GP) idle=a62/1/0x40000002 softirq=461/461 fqs=1050
[ 49.147068] (t=2100 jiffies g=-587 q=2)
[ 49.150992] NMI backtrace for cpu 3
[ 49.154487] CPU: 3 PID: 446 Comm: kworker/3:2H Tainted: G O 5.4.161-ts+imx-2.3.0-arc-proteus+g5e0f69a172c8 #1
[ 49.165702] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 49.172245] Workqueue: kblockd blk_mq_run_work_fn
[ 49.176973] [<80110300>] (unwind_backtrace) from [<8010b480>] (show_stack+0x10/0x14)
[ 49.184731] [<8010b480>] (show_stack) from [<80a33b40>] (dump_stack+0xc0/0xdc)
[ 49.191966] [<80a33b40>] (dump_stack) from [<80a14b38>] (nmi_cpu_backtrace+0x84/0xbc)
[ 49.199804] [<80a14b38>] (nmi_cpu_backtrace) from [<80a14c50>] (nmi_trigger_cpumask_backtrace+0xe0/0x134)
[ 49.209381] [<80a14c50>] (nmi_trigger_cpumask_backtrace) from [<80a269a0>] (rcu_dump_cpu_stacks+0x98/0xd0)
[ 49.219047] [<80a269a0>] (rcu_dump_cpu_stacks) from [<8018d220>] (rcu_sched_clock_irq+0x658/0x874)
[ 49.228016] [<8018d220>] (rcu_sched_clock_irq) from [<801939dc>] (update_process_times+0x2c/0x60)
[ 49.236894] [<801939dc>] (update_process_times) from [<801a5ac4>] (tick_sched_timer+0x5c/0xc0)
[ 49.245511] [<801a5ac4>] (tick_sched_timer) from [<80194a58>] (__hrtimer_run_queues+0x130/0x1cc)
[ 49.254302] [<80194a58>] (__hrtimer_run_queues) from [<801952d8>] (hrtimer_interrupt+0x12c/0x2dc)
[ 49.263181] [<801952d8>] (hrtimer_interrupt) from [<8010f398>] (twd_handler+0x2c/0x38)
[ 49.271107] [<8010f398>] (twd_handler) from [<80180524>] (handle_percpu_devid_irq+0x98/0x14c)
[ 49.279638] [<80180524>] (handle_percpu_devid_irq) from [<8017a528>] (generic_handle_irq+0x20/0x34)
[ 49.288687] [<8017a528>] (generic_handle_irq) from [<8017ab40>] (__handle_domain_irq+0x64/0xdc)
[ 49.297395] [<8017ab40>] (__handle_domain_irq) from [<804a24d0>] (gic_handle_irq+0x48/0x9c)
[ 49.305753] [<804a24d0>] (gic_handle_irq) from [<80101aac>] (__irq_svc+0x6c/0x90)
[ 49.313236] Exception stack(0xd8bb5c60 to 0xd8bb5ca8)
[ 49.318292] 5c60: fe8d8008 00000024 807e8ee4 00000000 d8749380 d8749000 d8749800 d8749000
[ 49.326472] 5c80: 00000000 00000000 00000000 00007e00 00000000 d8bb5cb0 807e8f20 807f0468
[ 49.334648] 5ca0: a0010013 ffffffff
[ 49.338145] [<80101aac>] (__irq_svc) from [<807f0468>] (esdhc_readl_le+0x20/0x18c)
[ 49.345725] [<807f0468>] (esdhc_readl_le) from [<807e8f20>] (sdhci_card_busy+0x3c/0x4c)
[ 49.353740] [<807e8f20>] (sdhci_card_busy) from [<807dc760>] (__mmc_switch+0x1c4/0x3d4)
[ 49.361750] [<807dc760>] (__mmc_switch) from [<807dca0c>] (mmc_flush_cache+0x68/0x94)
[ 49.369586] [<807dca0c>] (mmc_flush_cache) from [<807db2ac>] (_mmc_hw_reset+0x14/0x90)
[ 49.377507] [<807db2ac>] (_mmc_hw_reset) from [<807d6100>] (mmc_hw_reset+0x64/0x170)
[ 49.385255] [<807d6100>] (mmc_hw_reset) from [<807e6658>] (mmc_blk_reset+0x2c/0x120)
[ 49.393003] [<807e6658>] (mmc_blk_reset) from [<807e69d4>] (mmc_blk_mq_rw_recovery+0x288/0x3f8)
[ 49.401707] [<807e69d4>] (mmc_blk_mq_rw_recovery) from [<807e6bc4>] (mmc_blk_mq_complete_prev_req.part.0+0x80/0x23c)
[ 49.412233] [<807e6bc4>] (mmc_blk_mq_complete_prev_req.part.0) from [<807e6df4>] (mmc_blk_rw_wait+0x74/0x130)
[ 49.422151] [<807e6df4>] (mmc_blk_rw_wait) from [<807e7934>] (mmc_blk_mq_issue_rq+0x2d4/0x93c)
[ 49.430768] [<807e7934>] (mmc_blk_mq_issue_rq) from [<807e82ec>] (mmc_mq_queue_rq+0x130/0x268)
[ 49.439388] [<807e82ec>] (mmc_mq_queue_rq) from [<8042af10>] (blk_mq_dispatch_rq_list+0xbc/0x6b8)
[ 49.448269] [<8042af10>] (blk_mq_dispatch_rq_list) from [<8042ffa0>] (blk_mq_sched_dispatch_requests+0x104/0x1a4)
[ 49.458536] [<8042ffa0>] (blk_mq_sched_dispatch_requests) from [<80428be0>] (__blk_mq_run_hw_queue+0xe4/0x1a4)
[ 49.468549] [<80428be0>] (__blk_mq_run_hw_queue) from [<8014cacc>] (process_one_work+0x1d8/0x430)
[ 49.477428] [<8014cacc>] (process_one_work) from [<8014cd54>] (worker_thread+0x30/0x558)
[ 49.485525] [<8014cd54>] (worker_thread) from [<8015247c>] (kthread+0x108/0x144)
[ 49.492927] [<8015247c>] (kthread) from [<801010e8>] (ret_from_fork+0x14/0x2c)
[ 49.500148] Exception stack(0xd8bb5fb0 to 0xd8bb5ff8)
[ 49.505201] 5fa0: 00000000 00000000 00000000 00000000
[ 49.513381] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 49.521560] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 112.162869] rcu: INFO: rcu_sched self-detected stall on CPU
[ 112.168451] rcu: 3-....: (8364 ticks this GP) idle=a62/1/0x40000002 softirq=461/461 fqs=4183
[ 112.177062] (t=8404 jiffies g=-587 q=3)
[ 112.180985] NMI backtrace for cpu 3
[ 112.184479] CPU: 3 PID: 446 Comm: kworker/3:2H Tainted: G O 5.4.161-ts+imx-2.3.0-arc-proteus+g5e0f69a172c8 #1
[ 112.195693] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 112.202231] Workqueue: kblockd blk_mq_run_work_fn
Thanks for your time..
I see that you are using the 5.4.161, not the released version by NXP, how about the result with our released BSP: