Hi NXP Team,
We are seeing total system hang at any point of time. It might hang in an hour or it also can take more than 2 days. Whenever the board hangs. we are not able to enter anything in the console nor we are able to ping the system(we have the Ethernet support in our board). We also ran DDR Stress test for about 4-5 days, and we didn't found any issue in DDR stress test.
As already said above the reproducing time is not fix. However, we are trying to reproduce the issue by running '(memtester 1024M &) X 2" [2 instance of memtester]
1) Custom board based on i.MX8MQ
2) Disabled 2 cores and using as dual core system
3) Linux OS by NXP version L5.4.3
Requesting to help on this and suggest approaches.
If your DDR tool and the memtester is not reproducing the issue and if the hang is happening in all boards (presumably while running Linux), I would hook up a JTAG debugger and see if the TAP is still alive when it hangs and try to see where it hangs. It could be an issue in the kernel or one of the drivers that your custom board needs (perhaps an interface designed differently than the eval board).
Just my two dimes.
for such case may be recommended to check if latest RPA tool was used
- try latest linux releases
- check if latest i.MX8MQ silicon revision was used
- if issue happens on just few boards, this may be caused by poor soldering.
One can try to resolder chip.
- memtester tests (stresses) not only memory but also power supplies. So if power supplies
are not good (instabilities, ripples), it may cause board fault. So may be recommended to
check also power supplies using i.MX 8MDQLQ Hardware Developer’s Guide
We are using the latest RPA tool and stress test is running for more than 4 days successfully. Do you think it can be still an issue with DDR? This issue is seen in almost all boards. Do we still need to check the RPA tool or , latest silicon revision is used or not?
Also, regarding memtester stresses power supply,meaning, it will test/stress DDR power supply or overall system's power supply?
Our team is following the Hardware developer guide suggested by you, but still we will have a cross-check again in case if something is missed. We are also using the same PMIC IC as used in the EVK. Apart from this anything else that we should debug further?
We have tried to narrow down the issue further and found that the system is not hanging with 3GB RAM. We are seeing the system hang with only 4GB RAM. I have attached the patch that we have done in optee-os for enabling 4GB RAM. Can you please verify the same and let us know that it is proper or not?
Any specific reason behind the 3GB system working properly and 4GB not working properly? Please help with this as soon as possible, as it is getting critical for us.
Note: System with 4GB hangs after some random minutes/hours.
We also tried with the latest 25 RPA and kept the same for a stress test. It is still running for 2 days. However, our board is still hanging. Can you please help with the same?
@jan_spurek, Can you please help me on this, if this seems related to DDR? Basically, our board is getting hanged at any random point in time and we are only running memtester in Linux. This issue is seen with a 4GB system only. If we use 3GB then we don't see any hang.
1. Could you please check if all the related patches are applied?
I would focus mainly on those that deal with memory size modifications:
2. In the RPA, if you select only 1 frequency setpoint instead of 3 in "Number of frequency setpoints", does it help?
3. In the RPA, if you select "Option 1" instead of "Automatic" in "LPDDR4 MR4 manual de-rate workaround - Temperature Derating Options for errata e50125", does it help?
To further narrowing down the issue, we tried using the initramfs to bypass the eMMC usage. We found 1 crash at runtime sometimes. We are only running memtester utility. I have attached the crash file. Any suggestions are welcome.
We tried setting the frequency point as 1 and Selecting Option-1 instead of Automatic in RPA. However, the results were the same, the system hanged in some 30 minutes. We are now waiting for access to the links mentioned in your last comment, for the double-checking of 4GB related modifications.
Thank you for your prompt response. I am not able to access the 3 links that you suggested. Can you please provide the access or send it over an email?
We have not tried with the frequency setpoint as 1 and Option 1 in RPA. We will try this as well, after verifying the links suggested by you are taken care of by us or not.
Hi @kunalkotecha1 ,
Regarding your issues, I suggest you to do following things.
1. If possible, strongly suggest you upgrading the BSP version to L5.4.70
2. Please download the latest version of V3.20 DDR tool to generate timing file. https://community.nxp.com/docs/DOC-340179
3. Please follow this link to implement the 4GB memory support
4. If your issue is still existed, please disable auto-derating feature and apply workaround option1 in RPA and re-generate timing file again.