Hi,
I have encounter below error and i have no clue about this, help to find the root cause
I have implemented watch dog timer to auto boot if the kernel hangs but in the below error condition the processor took 10 min to reboot itself
[107434.234408] mmc0: Timeout waiting for hardware interrupt.
[107434.239900] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[107434.246428] mmc0: sdhci: Sys addr: 0xbbbde51c | Version: 0x00000002
[107434.252955] mmc0: sdhci: Blk size: 0x00000004 | Blk cnt: 0x00000001
[107434.259481] mmc0: sdhci: Argument: 0x9410d004 | Trn mode: 0x00000003
[107434.266008] mmc0: sdhci: Present: 0x01d88008 | Host ctl: 0x00000013
[107434.272534] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[107434.279060] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x0000001f
[107434.285587] mmc0: sdhci: Timeout: 0x0000008f | Int stat: 0x00000103
[107434.292112] mmc0: sdhci: Int enab: 0x107f110b | Sig enab: 0x107f110b
[107434.298639] mmc0: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000502
[107434.305165] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x8000b407
[107434.311690] mmc0: sdhci: Cmd: 0x0000353a | Max curr: 0x00ffffff
[107434.318217] mmc0: sdhci: Resp[0]: 0x00001000 | Resp[1]: 0x00000000
[107434.324744] mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000
[107434.331268] mmc0: sdhci: Host ctl2: 0x00000088
[107434.335798] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x78089208
[107434.342323] mmc0: sdhci: ============================================
[107434.349298] AR6000: SDIO bus operation failed! MMC stack returned : -110
[107434.356191] __HIFReadWrite, addr:0X000868, len:00000004, Write, Sync
[107474.169657] mmc0: Timeout waiting for hardware interrupt.
Hi Santhosh
such issue may occur if during reset not all board power supplies shortly turned off,
so one can check reset schematic as it is done on i.MX8M Mini EVK with signal GPIO1_IO02
(as WDOG_B) connected to pmic
Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi,
i have enabled the Watch dog but watch dog timer is not kicking in when the processor stalls in particular error
I have encounterd below error
[88877.842909] Task dump for CPU 0:
[88877.846135] v4l2src1:src R running task 0 4591 1 0x00000203
[88877.853186] Call trace:
[88877.855642] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88877.860783] [<ffff80007ac48d80>] 0xffff80007ac48d80
[88877.865660] Task dump for CPU 3:
[88877.868885] swapper/3 R running task 0 0 1 0x00000002
[88877.875935] Call trace:
[88877.878381] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88877.883522] [<ffff000008d8f804>] __schedule+0x19c/0x5e8
[88877.888748] [<ffff000008d9012c>] schedule_idle+0x1c/0x38
[88877.894060] [<ffff00000810d010>] do_idle+0xd0/0x1e0
[88877.898937] [<ffff00000810d2bc>] cpu_startup_entry+0x24/0x28
[88877.904598] [<ffff000008090430>] secondary_start_kernel+0x110/0x120
[88939.984535] INFO: rcu_preempt detected stalls on CPUs/tasks:
[88939.990210] 0-...: (18 GPs behind) idle=2d2/140000000000000/0 softirq=1394806/1394807 fqs=252946
[88939.999167] 3-...: (14 GPs behind) idle=13a/140000000000000/0 softirq=274054/274054 fqs=252946
[88940.007950] (detected by 2, t=556682 jiffies, g=635261, c=635260, q=20647)
[88940.014914] Task dump for CPU 0:
[88940.018140] v4l2src1:src R running task 0 4591 1 0x00000203
[88940.025191] Call trace:
[88940.027646] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88940.032785] [<ffff80007ac48d80>] 0xffff80007ac48d80
[88940.037661] Task dump for CPU 3:
[88940.040886] swapper/3 R running task 0 0 1 0x00000002
[88940.047934] Call trace:
[88940.050380] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88940.055519] [<ffff000008d8f804>] __schedule+0x19c/0x5e8
[88940.060745] [<ffff000008d9012c>] schedule_idle+0x1c/0x38
[88940.066058] [<ffff00000810d010>] do_idle+0xd0/0x1e0
[88940.070937] [<ffff00000810d2bc>] cpu_startup_entry+0x24/0x28
[88940.076598] [<ffff000008090430>] secondary_start_kernel+0x110/0x120
[88940.832520] INFO: rcu_sched detected stalls on CPUs/tasks:
[88940.838019] 0-...: (1 GPs behind) idle=2d2/140000000000000/0 softirq=1360205/1394807 fqs=189277
[88940.846890] 3-...: (1 GPs behind) idle=13a/140000000000000/0 softirq=268734/274054 fqs=189277
[88940.855586] (detected by 1, t=414882 jiffies, g=183, c=182, q=30)
[88940.861769] Task dump for CPU 0:
[88940.864998] v4l2src1:src R running task 0 4591 1 0x00000203
[88940.872048] Call trace:
[88940.874504] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88940.879644] [<ffff80007ac48d80>] 0xffff80007ac48d80
[88940.884522] Task dump for CPU 3:
[88940.887749] swapper/3 R running task 0 0 1 0x00000002
[88940.894799] Call trace:
[88940.897245] [<ffff000008085cf4>] __switch_to+0x94/0xd8
[88940.902386] [<ffff000008d8f804>] __schedule+0x19c/0x5e8
[88940.907612] [<ffff000008d9012c>] schedule_idle+0x1c/0x38
[88940.912927] [<ffff00000810d010>] do_idle+0xd0/0x1e0
[88940.917805] [<ffff00000810d2bc>] cpu_startup_entry+0x24/0x28
[88940.923464] [<ffff000008090430>] secondary_start_kernel+0x110/0x120
I have no clue whats the error the board is not able to recover itself even after Watch dog timer is implemented
How can i resolve this issue
Regards
Santhosh
Hi Santhosh
could you clarify, had your board design implemented reset curcut with
short power-off whole board, as it is implemented in
i.MX8M Mini EVK with signal GPIO1_IO02 (as WDOG_B) connected to pmic.
Best regards
igor
Hi Santhosh
please check with oscilloscope if i.MX8M Mini EVK
GPIO1_IO02 (as WDOG_B) connected to pmic signal is toggling.
Best regards
igor
one can try linux shutdown command with Demo Images from
i.MX Software and Development Tools | NXP
Best regards
igor
Igor,
As we have customized the image to meet our end application use case. We have built entire application around it and tested all the features of the system and now we are doing endurance test where the device is subjected to more 96Hrs testing without power off. At this stage we don't want to use the NXP prebuilt image as we may need to re-do the customization and again re-run all the testing that's been carried from many months which intern may delay our product delivery. With reference to captured logs let us know what test we can do to find the root cause since my processor watch dog timer is also not kicking in to reboot the system when its in hang stage.
Code Snippet of my watch dog timer is below
/*
* This function simply sends an IOCTL to the driver, which in turn ticks
* the Watchdog card to reset its internal timer so it doesn't trigger
* a reset.
*/
static void keep_alive(void)
{
int dummy;
int ret;
ret = ioctl(fd, WDIOC_KEEPALIVE, &dummy);
time(&time2);
sec=(int)difftime(time2, time1);
if (!ret)
printf("sec:%d\n",sec);
}
Note : keep_alive function is called every 5 second in main function to reset the watch dog timer
Regards
Santhosh Kumar S
Hi Santhosh
one can try to toggle GPIO1_IO02 (WDOG_B) via software, configure it as gpio.
Best regards
igor
Igor,
When application crashes the processor will be in hang state, it can be recovered only by POR or software reset using Watch dog timer. So i feel that toggling GPIO1_IO02 (WDOG_B) may not help as the kernel is in hang stage and all application gets crashed at this point.
And the GPIO1_IO02 is connected to PMIC IC WDOG that will not help at all. As we need to reboot the Processor to recover from the hang state
Do advise how can we find root cause of the issue
Regards
Santhosh Kumar S
Igor,
I found this link in net.. for the issue that we are trying to solve,
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
Let me know, would it help us in anyway to find the root cause
Regards
Santhosh
one can try wdog unit test
wdog\test - imx-test - i.MX Driver Test Application Software
with Demo Images from
i.MX Software and Development Tools | NXP
Best regards
igor
Igor,
I have done WDT unit test Independently and its rebooting when ioctl(fd, WDIOC_KEEPALIVE, 0); is not reset'ed at regular interval. I have already tested all possible case earlier.
Only at SDHCI REGISTER DUMP its not recovering or rebooting. It takes 10 to 30 minutes to reboot itself.
If possible, Can you please check with any kernel team in NXP i feel its something to do with CPU STALL error that's given in the below link.
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
Your help really matters to solve this issue.
Regards
Santhosh kumar