A strange hang on custom imx6ull

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

A strange hang on custom imx6ull

1,710 次查看
changbaoma
Contributor III

Hi, NXP experts

I encountered a very strange hang on my custom imx6ull board. My board has 2 RMII ethernet interfaces and 2 USB host interfaces and 1GB DDR3.  My hardware design use rmii with 50MHz clk from MAC to PHY. The 1GB DDR3 has been calibrated using NXP DDR tools, and the overnight stress test has been successfully carried out without error. The calibrated DDR parameters have been integrated into u-boot.

Our software is based on L5.4.70_2.3.0 BSP. My board gos well, but the strange hang can be reproducted if ifconfig down both eth0 and eth1 and then wait a small while.  I am sure the CPU is hang, because at that time shell can not input any more and hang forever, but if enable imx-watchdog, i can see watchdog reset.

Any of the below condition can not reproduct the hang:

1. Do not ifconfig down both eth0 and eth1.

   only ifconfig down one of two do not lead to hang.

2. If ifconfig down both eth0 and eth1, and then quickly run cmd 'top -d 1' , will also not lead to hang.

   but will hang soon after quitting the cmd 'top -d 1'.

3. One or both of the two USB interfaces enumerate usb device(eg. a USB drive), will not lead to hang.

    if unplug all usb devices will hang soon.

4. Force u-boot using EVK board default 512MB DDR parameters also not lead to hang. That is change the u-boot only.

5. Change linux kernel CPUfreq Governor to 'performance' from 'ondemand' also not lead to hang.

 

I cannot distinguish whether it is belong to a software issue (such as DDR parameters, MAC/PHY driver) or a hardware issue (such as a power supply system, but hardware team can not find any misbehave supply).  

 

Any suggestion to narrow down the case?

Any help is appreciated.

 

0 项奖励
10 回复数

1,701 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Changbao

 

one can try to disable OP-TEE as described in sect.5.6.10 OP-TEE enablement attached Yocto

Guide and then debug it using AN4553 Using Open Source Debugging Tools for Linux on i.MX Processors
https://www.nxp.com/docs/en/application-note/AN4553.pdf

 

Best regards
igor

0 项奖励

1,694 次查看
changbaoma
Contributor III

Hello, @igorpadykov 

Thanks for your reply.  In my project OP-TEE have been disabled.

Beside kernel GDB, any other suggestion?

0 项奖励

1,684 次查看
igorpadykov
NXP Employee
NXP Employee

one can verify that uboot imx_v2020.04 version used in the case and try to rebuild all from scratch:

https://source.codeaurora.org/external/imx/uboot-imx/tree/?h=imx_v2020.04_5.4.70_2.3.0

 

Best regards
igor

 

0 项奖励

1,677 次查看
changbaoma
Contributor III

Hi, @igorpadykov 

I have used NXP yocto project Linux 5.4.70_2.3.0​ + Linux 5.4.70_2.3.4 Patch, the u-boot version of which is already imx_v2020.04. And i also tried the old version u-boot 2016, the situation do not make any differences

Best regards

0 项奖励

1,670 次查看
igorpadykov
NXP Employee
NXP Employee

one can try to perform other tests, check for example temperature dependency: heat or cool.

 

Best regards
igor

0 项奖励

1,662 次查看
changbaoma
Contributor III

The value readed from /sys/devices/virtual/thermal/thermal_zone0/temp is 36500 which seems keep in line with my room temperature environment.

0 项奖励

1,651 次查看
igorpadykov
NXP Employee
NXP Employee

please try to test with heating/cooling board, not at room temperature.

From description this may be caused by ddr errors. So may be suggested to run ddr test

at various temperatures or with memtester.

 

Best regards
igor

0 项奖励

1,637 次查看
changbaoma
Contributor III

Hello, @igorpadykov 

Today i put my imx6ull board(with 1GB DDR) into a environment test chamber with high/low temperature(environment setting is 65~0℃)  and do memtest testing.  Everything works fine if i do not do ifconfig eth0 and eth1 down. see below:

root@root:~# cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
ondemand
root@Orona:~# free
total used free shared buff/cache available
Mem: 1024968 23488 966972 8856 34508 978312
Swap: 0 0 0
root@root:~# memtester 900M 5
memtester version 4.3.0 (32-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 900MB (943718400 bytes)
got 900MB (943718400 bytes), trying mlock ...locked.
Loop 1/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok

...


Loop 5/5:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok

Done.

 

High/low temperature&memtest will test continue a overnight . (update: 24Hours pass and cpu runs well )

Memtester test pass at room temperature too.

A month ago, 7x24Hours high/low temperature testing without memtest is also passed.

It seems that the environment temperature do not cause DDR errors and hang the CPU.

 

Any other suggestions? @igorpadykov 

0 项奖励

1,596 次查看
igorpadykov
NXP Employee
NXP Employee

from the description the issue looks more related with DDR,
if the DDR timing/training were not generated properly, it will lead hangs/not work in
CAAM/PMIC/HDMI and other modules randomly.

 

Best regards
igor

0 项奖励

1,567 次查看
changbaoma
Contributor III

Hi, @igorpadykov  

Maybe i have found the root cause. I use the new version v1.1 DDR config tool(https://community.nxp.com/t5/i-MX-Processors-Knowledge-Base/i-MX6UL-ULL-ULZ-DRAM-Register-Programmin... ) to create DDR3 parameters instand of the old version v0.01(https://community.nxp.com/t5/i-MX-Processors-Knowledge-Base/i-MX6ULL-DDR3-Script-Aid/ta-p/1127297). Now with the new DDR3 parameters, i can't reproduce the strange hang.

The main difference of those two DDR3 parameters above is DDR refresh rate, as follow:

changbaoma_0-1642121515080.png

I don't know how this difference cause the strange hang.