Kernel panic occurs when dumping the value of SAR ADC on BSP40.

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Kernel panic occurs when dumping the value of SAR ADC on BSP40.

723 Views
Jeff-CF-Huang
Contributor II

Hi Sir,

I retrieve the SAR ADC value using the instructions below.

echo 1 > /sys/bus/iio/devices/iio:device0/scan_elements/in_voltage4_en
echo 4096 > /sys/bus/iio/devices/iio:device0/buffer/length
echo 1 > /sys/bus/iio/devices/iio:device0/buffer/enable
hexdump -e '"iio0 :" 8/2 "%04x " "\n"' /dev/iio:device0 | head -512
echo 0 > /sys/bus/iio/devices/iio:device0/buffer/enable

When repeated, this causes a kernel panic. Could you help check it?

[ 389.508403] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
i[ 389.508421] rcu: 0-...0: (14 ticks this GP) idle=54b/0/0x3 softirq=4916/4917 fqs=2626
i[ 389.508432] (detected by 6, t=5252 jiffies, g=9233, q=265)
o[ 389.508439] Task dump for CPU 0:
0[ 389.508442] task:swapper/0 state:R running task stack: 0 pid: 0 ppid: 0 flags:0x0000000a
[ 389.508458] Call trace:
:[ 389.508460] __switch_to+0xf8/0x14c
0[ 389.508480] 0x0
000 0001 0000 0001 0000 0002 0005 0000
iio0 :0001 0000 0001 0000 0001 0000 0001 0000
iio0 :0000 0000 0002 0005 0000 0001 0000 0001
iio0 :0000 0001 0000 0002 0005 0000 0001 0000
iio0 :0001 0000 0001 0000 0002 0006 0000 0002
iio0 :0006 0000 0001 0000 0002 0006 0000 0001
iio0 :0000 0002 0005 0000 0001 0000 0001 0000
[ 416.284408] mmc0: Timeout waiting for hardware interrupt.
[ 416.284418] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 416.284423] mmc0: sdhci: Sys addr: 0x00000010 | Version: 0x00000002
[ 416.284428] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000000
[ 416.284432] mmc0: sdhci: Argument: 0x000a2838 | Trn mode: 0x0000002b
[ 416.284438] mmc0: sdhci: Present: 0x01fd8008 | Host ctl: 0x00000013
[ 416.284443] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 416.284447] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x00000077
[ 416.284451] mmc0: sdhci: Timeout: 0x0000000f | Int stat: 0x00000003
[ 416.284455] mmc0: sdhci: Int enab: 0x117f100b | Sig enab: 0x117f100b
[ 416.284460] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
[ 416.284465] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b400
[ 416.284471] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00ffffff
[ 416.284475] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x01de677f
[ 416.284479] mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x00000900
[ 416.284483] mmc0: sdhci: Host ctl2: 0x00000000
[ 416.284487] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xd0001210
[ 416.284491] mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 416.284496] mmc0: sdhci-esdhc-imx: cmd debug status: 0x2100
[ 416.284502] mmc0: sdhci-esdhc-imx: data debug status: 0x2200
[ 416.284506] mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
[ 416.284510] mmc0: sdhci-esdhc-imx: dma debug status: 0x2400
[ 416.284514] mmc0: sdhci-esdhc-imx: adma debug status: 0x2510
[ 416.284518] mmc0: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 416.284522] mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 416.284526] mmc0: sdhci: ============================================
[ 426.524405] mmc0: Timeout waiting for hardware interrupt.
[ 426.524410] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 426.524414] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00000002
[ 426.524420] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000000
[ 426.524424] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000023
[ 426.524428] mmc0: sdhci: Present: 0x01fd8009 | Host ctl: 0x00000013
[ 426.524432] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 426.524436] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x00000077
[ 426.524440] mmc0: sdhci: Timeout: 0x0000000f | Int stat: 0x00018000
[ 426.524444] mmc0: sdhci: Int enab: 0x117f100b | Sig enab: 0x117f100b
[ 426.524451] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
[ 426.524455] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b400
[ 426.524459] mmc0: sdhci: Cmd: 0x00000cdb | Max curr: 0x00ffffff
[ 426.524463] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x01de677f
[ 426.524467] mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x00000900
[ 426.524471] mmc0: sdhci: Host ctl2: 0x00000000
[ 426.524475] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000
[ 426.524480] mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 426.524483] mmc0: sdhci-esdhc-imx: cmd debug status: 0x2100
[ 426.524487] mmc0: sdhci-esdhc-imx: data debug status: 0x2200
[ 426.524491] mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
[ 426.524495] mmc0: sdhci-esdhc-imx: dma debug status: 0x2400
[ 426.524499] mmc0: sdhci-esdhc-imx: adma debug status: 0x2500
[ 426.524503] mmc0: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 426.524507] mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 426.524512] mmc0: sdhci: ============================================
[ 436.764407] mmc0: Timeout waiting for hardware cmd interrupt.
[ 436.764414] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 436.764418] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00000002
[ 436.764422] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000000
[ 436.764427] mmc0: sdhci: Argument: 0x50480000 | Trn mode: 0x00000023
[ 436.764431] mmc0: sdhci: Present: 0x01fd8008 | Host ctl: 0x00000013
[ 436.764437] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 436.764441] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x00000077
[ 436.764445] mmc0: sdhci: Timeout: 0x0000000f | Int stat: 0x00000001
[ 436.764449] mmc0: sdhci: Int enab: 0x117f100b | Sig enab: 0x117f100b
[ 436.764453] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
[ 436.764458] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b400
[ 436.764462] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00ffffff
[ 436.764466] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0x01de677f
[ 436.764471] mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x00000900
[ 436.764475] mmc0: sdhci: Host ctl2: 0x00000000
[ 436.764479] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000
[ 436.764482] mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 436.764485] mmc0: sdhci-esdhc-imx: cmd debug status: 0x2100
[ 436.764489] mmc0: sdhci-esdhc-imx: data debug status: 0x2200
[ 436.764493] mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
[ 436.764497] mmc0: sdhci-esdhc-imx: dma debug status: 0x2400
[ 436.764502] mmc0: sdhci-esdhc-imx: adma debug status: 0x2500
[ 436.764506] mmc0: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 436.764510] mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 436.764513] mmc0: sdhci: ============================================
[ 447.004405] mmc0: Timeout waiting for hardware cmd interrupt.
[ 447.004413] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 447.004417] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00000002
[ 447.004422] mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000000
[ 447.004427] mmc0: sdhci: Argument: 0x50480000 | Trn mode: 0x00000023
[ 447.004431] mmc0: sdhci: Present: 0x01fd8008 | Host ctl: 0x00000013
[ 447.004436] mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000080
[ 447.004439] mmc0: sdhci: Wake-up: 0x00000008 | Clock: 0x00000077
[ 447.004443] mmc0: sdhci: Timeout: 0x0000000f | Int stat: 0x00000001
[ 447.004447] mmc0: sdhci: Int enab: 0x117f100b | Sig enab: 0x117f100b
[ 447.004452] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000502
[ 447.004456] mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x0000b400
[ 447.004463] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00ffffff
[ 447.004468] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x01de677f
[ 447.004472] mmc0: sdhci: Resp[2]: 0x325b5900 | Resp[3]: 0x00000900
[ 447.004476] mmc0: sdhci: Host ctl2: 0x00000000
[ 447.004480] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000
[ 447.004483] mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP =========
[ 447.004487] mmc0: sdhci-esdhc-imx: cmd debug status: 0x2100
[ 447.004493] mmc0: sdhci-esdhc-imx: data debug status: 0x2200
[ 447.004497] mmc0: sdhci-esdhc-imx: trans debug status: 0x2300
[ 447.004501] mmc0: sdhci-esdhc-imx: dma debug status: 0x2400
[ 447.004505] mmc0: sdhci-esdhc-imx: adma debug status: 0x2500
[ 447.004509] mmc0: sdhci-esdhc-imx: fifo debug status: 0x2680
[ 447.004513] mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2750
[ 447.004516] mmc0: sdhci: ============================================

0 Kudos
12 Replies

388 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @Jeff-CF-Huang

Thanks for the clarification.

Currently we are doing the discussion with the robustness of our Linux ADC driver implementation with internal development teams. We are looking forward improving the robustness of it in future release.

Thanks again for your findings.

 

Best Regards

Chenyin

 

0 Kudos

560 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @Jeff-CF-Huang

Thanks for the feedback.

  1. We have tested the way I mentioned in previous post for 30+ minutes without any issues found.
  2. Currently, we do not see similar issues during the recent work except this case.

 

In order to make it clear and continue the discussion with our internal experts, would you please help to clarify whether the test method I mentioned in previous post could be acceptable or have to test it with your original way?

Would you please help to clarify whether the “Considering that other processes might use the ADC or the system might go into suspend mode, we will disable the ADC in advance after retrieving the value.” is the only reason that you design such test case?( disable the ADC after retrieving the value each time)

Thanks in advance.

 

 

Best Regards

Chenyin

0 Kudos

558 Views
Jeff-CF-Huang
Contributor II

Hi Chenyin,

Yes, this is the only reason in our user scenario.
In our opinion, the test method you mentioned is a workaround solution.
We still prefer to use our original method, except that disabling the ADC does not cause this issue.

Best regards,

Jeff Huang

0 Kudos

568 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @Jeff-CF-Huang

Thanks for the feedback.

I could understand that :

“while [ 1 ]; do ./adc_stress.sh; sleep 0.1; done” is done in a process from a real scenario.

 

Since we ever suggest testing it the following way:

---------------------------------------------------------

echo 1 > /sys/bus/iio/devices/iio:device0/scan_elements/in_voltage4_en
echo 4096 > /sys/bus/iio/devices/iio:device0/buffer/length
echo 1 > /sys/bus/iio/devices/iio:device0/buffer/enable
while :
do
hexdump -e '"iio0 :" 8/2 "%04x " "\n"' /dev/iio:device0 | head -512
sleep 0.1;
done
echo 0 > /sys/bus/iio/devices/iio:device0/buffer/enable

-----------------------------------------------------------

May I know if the reason using the content of ./adc_stress.sh instead of the way above is what mentioned from your previous post: “Considering that other processes might use the ADC or the system might go into suspend mode, we will disable the ADC in advance after retrieving the value.”? Or some other considerations?

Thanks

 

Best Regards

Chenyin

0 Kudos

563 Views
Jeff-CF-Huang
Contributor II

Hi Chenyin,

Is it still possible to encounter this issue when disabling the ADC?
The issue may lead to abnormalities in our automotive charging function.

Best regards,

Jeff Huang

0 Kudos

613 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @Jeff-CF-Huang

Sorry for the delay.

I have reproduced this issue on local board.

Since the RCU related, it should be related to overwhelming file system read and update. May I have your comments that why using such script for stressing? Is such usage reasonable or compatible to any real use case?

I also have discussed with internal expert on this issue,  from his opinion,  it's more reasonable to move the ADC configuration out of the while loop and only leave hexdump in the loop of stressing, may I know if it is applicable for your case?

Thanks

 

Best Regards

Chenyin

0 Kudos

609 Views
Jeff-CF-Huang
Contributor II

Hi Chenyin,

In a real scenario, we will have a process to monitor the value of the channels every 100ms.
It will look like this:

while [ 1 ]; do ./adc_stress.sh; sleep 0.1; done

However, despite the 100ms delay, the issue still occurs.

Considering that other processes might use the ADC or the system might go into suspend mode, we will disable the ADC in advance after retrieving the value.


P.S.
We used the instructions below, and the issue can still be reproduced.

while [ 1 ]; do ./iio_generic_buffer -a -c 1 -N 0 -g -l 4096; done



Best regards,

Jeff Huang

0 Kudos

646 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @Jeff-CF-Huang

Thanks for the feedback.

I am investigating the issue and will update it once any findings made.

 

Best Regards

Chenyin

0 Kudos

675 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @Jeff-CF-Huang

Thanks for the kindly clarification.

May I know how you test it repeatedly? Via certain test tools or a simple infinite loop? Is there any delays made between each loop?

Thanks.

 

Best Regards

Chenyin

0 Kudos

658 Views
Jeff-CF-Huang
Contributor II

Hi Chenyin,

We simply run the script on the console using the command below with the attached file.

while [ 1 ]; do ./adc_stress.sh; done

Best regards,

Jeff Huang

0 Kudos

685 Views
chenyin_h
NXP Employee
NXP Employee

Hello, @Jeff-CF-Huang

Thanks for the questions

May I know which kind of board you are working with? Reference board or custom board? Any settings changes to the BSP 40.0 default configurations?

I tried reproduce your issues on local RDB3 with BSP40.0(SD boot), but no issue found with 3 times repeated tests.

Would you please double check the testing settings? sorry for your inconvenience.

 

Best Regards

Chenyin

 

0 Kudos

681 Views
Jeff-CF-Huang
Contributor II

Hi Sir,

3 times is not enough for testing.
We will perform the test repeatedly over 30 minutes.

0 Kudos