SDHCI REGISTER DUMP

santhosh2 · ‎12-20-2019

Hi,

I have encounter below error and i have no clue about this, help to find the root cause

I have implemented watch dog timer to auto boot if the kernel hangs but in the below error condition the processor took 10 min to reboot itself

[107434.234408] mmc0: Timeout waiting for hardware interrupt.
[107434.239900] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[107434.246428] mmc0: sdhci: Sys addr: 0xbbbde51c | Version: 0x00000002
[107434.252955] mmc0: sdhci: Blk size: 0x00000004 | Blk cnt: 0x00000001
[107434.259481] mmc0: sdhci: Argument: 0x9410d004 | Trn mode: 0x00000003
[107434.266008] mmc0: sdhci: Present:   0x01d88008 | Host ctl: 0x00000013
[107434.272534] mmc0: sdhci: Power:     0x00000002 | Blk gap: 0x00000080
[107434.279060] mmc0: sdhci: Wake-up:   0x00000008 | Clock:    0x0000001f
[107434.285587] mmc0: sdhci: Timeout:   0x0000008f | Int stat: 0x00000103
[107434.292112] mmc0: sdhci: Int enab: 0x107f110b | Sig enab: 0x107f110b
[107434.298639] mmc0: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000502
[107434.305165] mmc0: sdhci: Caps:      0x07eb0000 | Caps_1:   0x8000b407
[107434.311690] mmc0: sdhci: Cmd:       0x0000353a | Max curr: 0x00ffffff
[107434.318217] mmc0: sdhci: Resp[0]:   0x00001000 | Resp[1]: 0x00000000
[107434.324744] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]: 0x00000000
[107434.331268] mmc0: sdhci: Host ctl2: 0x00000088
[107434.335798] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x78089208
[107434.342323] mmc0: sdhci: ============================================
[107434.349298] AR6000: SDIO bus operation failed! MMC stack returned : -110
[107434.356191] __HIFReadWrite, addr:0X000868, len:00000004, Write, Sync
[107474.169657] mmc0: Timeout waiting for hardware interrupt.

igorpadykov · ‎12-20-2019

Hi Santhosh

such issue may occur if during reset not all board power supplies shortly turned off,

so one can check reset schematic as it is done on i.MX8M Mini EVK with signal GPIO1_IO02

(as WDOG_B) connected to pmic

i.MX 8M Mini Evaluation Kit LPDDR4 Design Files

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

santhosh2 · ‎12-27-2019

Hi,

i have enabled the Watch dog but watch dog timer is not kicking in when the processor stalls in particular error

I have encounterd below error

[88877.842909] Task dump for CPU 0:
[88877.846135] v4l2src1:src    R running task        0 4591      1 0x00000203
[88877.853186] Call trace:
[88877.855642] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88877.860783] [<ffff80007ac48d80>] 0xffff80007ac48d80
[88877.865660] Task dump for CPU 3:
[88877.868885] swapper/3       R running task        0     0      1 0x00000002
[88877.875935] Call trace:
[88877.878381] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88877.883522] [<ffff000008d8f804>] __schedule+0x19c/0x5e8
[88877.888748] [<ffff000008d9012c>] schedule_idle+0x1c/0x38
[88877.894060] [<ffff00000810d010>] do_idle+0xd0/0x1e0
[88877.898937] [<ffff00000810d2bc>] cpu_startup_entry+0x24/0x28
[88877.904598] [<ffff000008090430>] secondary_start_kernel+0x110/0x120
[88939.984535] INFO: rcu_preempt detected stalls on CPUs/tasks:
[88939.990210]    0-...: (18 GPs behind) idle=2d2/140000000000000/0 softirq=1394806/1394807 fqs=252946
[88939.999167]    3-...: (14 GPs behind) idle=13a/140000000000000/0 softirq=274054/274054 fqs=252946
[88940.007950]    (detected by 2, t=556682 jiffies, g=635261, c=635260, q=20647)
[88940.014914] Task dump for CPU 0:
[88940.018140] v4l2src1:src    R running task        0 4591      1 0x00000203
[88940.025191] Call trace:
[88940.027646] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88940.032785] [<ffff80007ac48d80>] 0xffff80007ac48d80
[88940.037661] Task dump for CPU 3:
[88940.040886] swapper/3       R running task        0     0      1 0x00000002
[88940.047934] Call trace:
[88940.050380] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88940.055519] [<ffff000008d8f804>] __schedule+0x19c/0x5e8
[88940.060745] [<ffff000008d9012c>] schedule_idle+0x1c/0x38
[88940.066058] [<ffff00000810d010>] do_idle+0xd0/0x1e0
[88940.070937] [<ffff00000810d2bc>] cpu_startup_entry+0x24/0x28
[88940.076598] [<ffff000008090430>] secondary_start_kernel+0x110/0x120
[88940.832520] INFO: rcu_sched detected stalls on CPUs/tasks:
[88940.838019]    0-...: (1 GPs behind) idle=2d2/140000000000000/0 softirq=1360205/1394807 fqs=189277
[88940.846890]    3-...: (1 GPs behind) idle=13a/140000000000000/0 softirq=268734/274054 fqs=189277
[88940.855586]    (detected by 1, t=414882 jiffies, g=183, c=182, q=30)
[88940.861769] Task dump for CPU 0:
[88940.864998] v4l2src1:src    R running task        0 4591      1 0x00000203
[88940.872048] Call trace:
[88940.874504] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88940.879644] [<ffff80007ac48d80>] 0xffff80007ac48d80
[88940.884522] Task dump for CPU 3:
[88940.887749] swapper/3       R running task        0     0      1 0x00000002
[88940.894799] Call trace:
[88940.897245] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88940.902386] [<ffff000008d8f804>] __schedule+0x19c/0x5e8
[88940.907612] [<ffff000008d9012c>] schedule_idle+0x1c/0x38
[88940.912927] [<ffff00000810d010>] do_idle+0xd0/0x1e0
[88940.917805] [<ffff00000810d2bc>] cpu_startup_entry+0x24/0x28
[88940.923464] [<ffff000008090430>] secondary_start_kernel+0x110/0x120

I have no clue whats the error the board is not able to recover itself even after Watch dog timer is implemented

How can i resolve this issue

Regards

Santhosh

igorpadykov · ‎12-27-2019

Hi Santhosh

could you clarify, had your board design implemented reset curcut with

short power-off whole board, as it is implemented in

i.MX8M Mini EVK with signal GPIO1_IO02 (as WDOG_B) connected to pmic.

Best regards
igor

santhosh2 · ‎12-30-2019

Hi igor

I am using the NXP Eval Kit.(I.Mx8mmini) and i am facing this issue in the eval board itself

Regards

Santhosh

igorpadykov · ‎12-30-2019

Hi Santhosh

please check with oscilloscope if i.MX8M Mini EVK

GPIO1_IO02 (as WDOG_B) connected to pmic signal is toggling.

Best regards
igor

santhosh2 · ‎01-02-2020

Hi Igor,

We had put the scope and checked WDOG_B TP53 point and found the constant 1 Volts. Any other test you want to us to do to find the root cause

igorpadykov · ‎01-02-2020

one can try linux shutdown command with Demo Images from

i.MX Software and Development Tools | NXP

Best regards
igor

santhosh2 · ‎01-02-2020

Igor,

As we have customized the image to meet our end application use case. We have built entire application around it and tested all the features of the system and now we are doing endurance test where the device is subjected to more 96Hrs testing without power off. At this stage we don't want to use the NXP prebuilt image as we may need to re-do the customization and again re-run all the testing that's been carried from many months which intern may delay our product delivery. With reference to captured logs let us know what test we can do to find the root cause since my processor watch dog timer is also not kicking in to reboot the system when its in hang stage.

Code Snippet of my watch dog timer is below

/*
* This function simply sends an IOCTL to the driver, which in turn ticks
* the Watchdog card to reset its internal timer so it doesn't trigger
* a reset.
*/
static void keep_alive(void)
{
int dummy;
int ret;

ret = ioctl(fd, WDIOC_KEEPALIVE, &dummy);
time(&time2);
sec=(int)difftime(time2, time1);
if (!ret)
printf("sec:%d\n",sec);
}

Note : keep_alive function is called every 5 second in main function to reset the watch dog timer

Regards

Santhosh Kumar S

igorpadykov · ‎01-02-2020

Hi Santhosh

one can try to toggle GPIO1_IO02 (WDOG_B) via software, configure it as gpio.

Best regards
igor

santhosh2 · ‎01-03-2020

Igor,

When application crashes the processor will be in hang state, it can be recovered only by POR or software reset using Watch dog timer. So i feel that toggling GPIO1_IO02 (WDOG_B) may not help as the kernel is in hang stage and all application gets crashed at this point.

And the GPIO1_IO02 is connected to PMIC IC WDOG that will not help at all. As we need to reboot the Processor to recover from the hang state

Do advise how can we find root cause of the issue

Regards

Santhosh Kumar S

santhosh2 · ‎01-03-2020

Igor,

I found this link in net.. for the issue that we are trying to solve,

https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

Let me know, would it help us in anyway to find the root cause

Regards

Santhosh

igorpadykov · ‎01-03-2020

one can try wdog unit test

wdog\test - imx-test - i.MX Driver Test Application Software

with Demo Images from

i.MX Software and Development Tools | NXP

Best regards
igor

santhosh2 · ‎01-05-2020

Igor,

I have done WDT unit test Independently and its rebooting when ioctl(fd, WDIOC_KEEPALIVE, 0); is not reset'ed at regular interval. I have already tested all possible case earlier.

Only at SDHCI REGISTER DUMP its not recovering or rebooting. It takes 10 to 30 minutes to reboot itself.

If possible, Can you please check with any kernel team in NXP i feel its something to do with CPU STALL error that's given in the below link.

https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

Your help really matters to solve this issue.

Regards

Santhosh kumar

SDHCI REGISTER DUMP

SDHCI REGISTER DUMP

i.MX 8 Family | i.MX 8QuadMax (8QM) | 8QuadPlus

i.MX 8M | i.MX 8M Mini | i.MX 8M Nano

i.MX6_All

Suspected Software Defect

Yocto Project

i.MX 8M Mini Evaluation Kit LPDDR4 Design Files