i.MX8MQ Hangs suddenly at anytime

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

i.MX8MQ Hangs suddenly at anytime

13,881 次查看
kunalkotecha1
Senior Contributor II

Hi NXP Team,

 

Issue:

We are seeing total system hang at any point of time. It might hang in an hour or it also can take more than 2 days. Whenever the board hangs. we are not able to enter anything in the console nor we are able to ping the system(we have the Ethernet support in our board). We also ran DDR Stress test for about 4-5 days, and we didn't found any issue in DDR stress test.

Reproducibility:

As already said above the reproducing time is not fix. However, we are trying to reproduce the issue by running '(memtester 1024M &) X 2" [2 instance of memtester]

Setup:

1) Custom board based on i.MX8MQ

2) Disabled 2 cores and using as dual core system

3) Linux OS by NXP version L5.4.3

 

Requesting to help on this and suggest approaches.

 

Regards,

Kunal

标签 (1)
0 项奖励
回复
11 回复数

13,804 次查看
sinanakman
Senior Contributor III

Hi Kunal

If your DDR tool and the memtester is not reproducing the issue and if the hang is happening in all boards (presumably while running Linux), I would hook up a JTAG debugger and see if the TAP is still alive when it hangs and try to see where it hangs. It could be an issue in the kernel or one of the drivers that your custom board needs (perhaps an interface designed differently than the eval board).

Just my two dimes.

Regards

Sinan Akman

0 项奖励
回复

13,869 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Kunal

 

for such case may be recommended to check if latest RPA tool was used

for configuration

https://community.nxp.com/t5/i-MX-Processors-Knowledge-Base/i-MX-8M-Family-DDR-Tool-Release/ta-p/110...

- try latest linux releases

- check if latest i.MX8MQ silicon revision was used

- if issue happens on just few boards, this may be caused by poor soldering.

One can try to resolder chip.

- memtester tests (stresses) not only memory but also power supplies. So if power supplies

are not good (instabilities, ripples), it may cause board fault. So may be recommended to

check also power supplies using  i.MX 8MDQLQ Hardware Developer’s Guide

 

Best regards
igor

0 项奖励
回复

13,821 次查看
kunalkotecha1
Senior Contributor II

Hi Igor,

 

We are using the latest RPA tool and stress test is running for more than 4 days successfully. Do you think it can be still an issue with DDR? This issue is seen in almost all boards. Do we still need to check the RPA tool or , latest silicon revision is used or not?

Also, regarding memtester stresses power supply,meaning, it will test/stress DDR power supply or overall system's power supply?

Our team is following the Hardware developer guide suggested by you, but still we will have a cross-check again in case if something is missed.  We are also using the same PMIC IC as used in the EVK. Apart from this anything else that we should debug further?

 

Regards,

Kunal

0 项奖励
回复

13,753 次查看
kunalkotecha1
Senior Contributor II

Hi @igorpadykov,

 

We have tried to narrow down the issue further and found that the system is not hanging with 3GB RAM. We are seeing the system hang with only 4GB RAM. I have attached the patch that we have done in optee-os for enabling 4GB RAM. Can you please verify the same and let us know that it is proper or not?

 

Any specific reason behind the 3GB system working properly and 4GB not working properly? Please help with this as soon as possible, as it is getting critical for us.

 

Note: System with 4GB hangs after some random minutes/hours.

 

Regards,

Kunal

0 项奖励
回复

13,714 次查看
kunalkotecha1
Senior Contributor II

Hi @igorpadykov,

 

We also tried with the latest 25 RPA and kept the same for a stress test. It is still running for 2 days. However, our board is still hanging. Can you please help with the same?

 

@jan_spurek, Can you please help me on this, if this seems related to DDR? Basically, our board is getting hanged at any random point in time and we are only running memtester in Linux. This issue is seen with a 4GB system only. If we use 3GB then we don't see any hang.

 

Regards,

Kunal

0 项奖励
回复

13,711 次查看
jan_spurek
NXP Employee
NXP Employee

Hi Kunal,

1. Could you please check if all the related patches are applied?

https://community.nxp.com/t5/iMX-and-Vybrid-Support/i-MX8-DDR-Related-Features-and-Patches-Summary/t...

I would focus mainly on those that deal with memory size modifications:

https://community.nxp.com/t5/iMX-and-Vybrid-Support/8MScale-8MScale-Mini-memory-size-change-TEE-entr...

https://community.nxp.com/t5/iMX-and-Vybrid-Support/8M-Scale-mini-845-board-4GB-memory-support-summa...

2. In the RPA, if you select only 1 frequency setpoint instead of 3 in "Number of frequency setpoints", does it help?

3. In the RPA, if you select "Option 1" instead of "Automatic" in "LPDDR4 MR4 manual de-rate workaround - Temperature Derating Options for errata e50125", does it help?

Best Regards,

Jan

0 项奖励
回复

13,641 次查看
kunalkotecha1
Senior Contributor II
 
We have tried out the mentioned options from the document.
 
  1. Disabled the op-tee from the yocto as per mentioned steps by disabling optee and clearing the TEE, ATF, imx-boot. (Thus 4GB-2GB_2GB.7z memory-related patchsets are not required as per the documents.)
  2. In our current.dts file we have disabled the GPU so we believe that we don't require a CMA-related patch.
However, with the changes above the issue is still being reproduced at our end.
 

Regards,

Kunal

0 项奖励
回复

13,627 次查看
kunalkotecha1
Senior Contributor II

Hi @jan_spurek,

 

To further narrowing down the issue, we tried using the initramfs to bypass the eMMC usage. We found 1 crash at runtime sometimes. We are only running memtester utility. I have attached the crash file. Any suggestions are welcome.

 

Regards,

Kunal

0 项奖励
回复

13,667 次查看
kunalkotecha1
Senior Contributor II

Hi @jan_spurek,

 

We tried setting the frequency point as 1 and Selecting Option-1 instead of Automatic in RPA. However, the results were the same, the system hanged in some 30 minutes. We are now waiting for access to the links mentioned in your last comment, for the double-checking of 4GB related modifications.

 

Regards,

Kunal

标记 (1)
0 项奖励
回复

13,703 次查看
kunalkotecha1
Senior Contributor II

Hi @jan_spurek,

 

Thank you for your prompt response. I am not able to access the 3 links that you suggested. Can you please provide the access or send it over an email?

 

We have not tried with the frequency setpoint as 1 and Option 1 in RPA. We will try this as well, after verifying the links suggested by you are taken care of by us or not. 

 

Regards,

Kunal

0 项奖励
回复

13,562 次查看
oliver_chen
NXP Employee
NXP Employee

Hi @kunalkotecha1 ,

Regarding your issues, I suggest you to do following things.

1. If possible, strongly suggest you upgrading the BSP version to L5.4.70

2. Please download the latest version of V3.20 DDR tool to generate timing file.  https://community.nxp.com/docs/DOC-340179

3. Please follow this link to implement the 4GB memory support

   https://community.nxp.com/t5/iMX-and-Vybrid-Support/8M-Scale-845-850-board-4GB-memory-support-summar...

4. If your issue is still existed, please disable auto-derating feature and apply workaround option1 in RPA and re-generate timing file again.

oliver_chen_0-1615969163204.png

 

B.R

Oliver

0 项奖励
回复