DDR stress test cause system hang in Linux

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

DDR stress test cause system hang in Linux

Jump to solution
65,708 Views
johncoll
Contributor II

Hello NXPs,
Release: Yocto-Sumo (4.14.98_2.0.0_GA)
Board: i.MX8MQ EVK based custom board
We're working on iMX8MQ based custom-designed board with 2GB DDR4 RAM (MCIMX8M-EVK 3GB LPDDR4 RAM) now we're having an issue when doing stress test RAM in linux.
The system always hang when doing RAM stress test with more than 760MB RAM.

Screenshot from 2020-02-26 22-05-56.png

I've already done RAM testing with DDR Tool and mtest in Uboot and all PASSED, then configured the linux system recognized with 2GB RAM.
Does anyone provide the solution for this issue? Any guidance or suggestions will be really helpful.
Best regards,
John.

Updated: Add hang log and DDR Controller Configuration Spreadsheet 

0 Kudos
Reply
1 Solution
63,442 Views
johncoll
Contributor II

hello all,

There're seem to have many solutions to resolve the issue with updated patch. But on my case we resolved it by fixing the VPU power supply ripple base on Power Consumption Measurement document. 

Best regards.

John

View solution in original post

0 Kudos
Reply
22 Replies
63,443 Views
johncoll
Contributor II

hello all,

There're seem to have many solutions to resolve the issue with updated patch. But on my case we resolved it by fixing the VPU power supply ripple base on Power Consumption Measurement document. 

Best regards.

John

0 Kudos
Reply
62,198 Views
Russell_Kook
Contributor I

We are having the same problem with the i.MX8M plus.
The i.MX8M plus evk was referenced. However, the package type we use is MIMX8ML6CVNKZAB, and it passed the stress test of "mscale_ddr_tool_v3.30" without any problem.
The part name of the LPDDR we are using is MT53D1024M32D4DT-046 WT:D(4GB).

The Yocto version I use is 5.10-hardknott.

And I also corrected the memory size information of u-boot and ATF.

The linux boot is fine, but the memory test using memtesr or stress-ng fails.
If It set a test area that is exactly larger than 720 MB, the system crashes..
for example :
1. If it run up to 14th "memtester 50M 1000 &", there is no problem. On the 15th run, the system crashes.
2. "memtester 720M 1000 &" is fine but "memtester 750M 1000 &" crashes the system.
3. "stress-ng --vm 1 --vm-bytes 720" is fine, but "stress-ng --vm 1 --vm-bytes 750" crashes the system.

 

I'm looking forward to a lot of advice.

0 Kudos
Reply
62,182 Views
BiyongSUN
NXP Employee
NXP Employee

You need to use linux command "free" to get how many memory for memtester.

0 Kudos
Reply
62,152 Views
Russell_Kook
Contributor I

root@imx8mpevk:~# free
        total     used      free  shared  buff/cache  available
Mem:  3763556    49684   3480184    8892      233688    3413216
Swap:       0        0         0

I typed because I do not have permission to upload a captured image.

You can see that there is a lot of free memory left.

I have attached my iomem info below.

root@imx8mpevk:~# cat /proc/iomem
30200000-3020ffff : 30200000.gpio gpio@30200000
30210000-3021ffff : 30210000.gpio gpio@30210000
30220000-3022ffff : 30220000.gpio gpio@30220000
30230000-3023ffff : 30230000.gpio gpio@30230000
30240000-3024ffff : 30240000.gpio gpio@30240000
30260000-3026ffff : 30260000.tmu tmu@30260000
30280000-3028ffff : 30280000.watchdog watchdog@30280000
30330000-3033ffff : 30330000.pinctrl pinctrl@30330000
30350000-3035ffff : 30350000.efuse efuse@30350000
30380000-3038ffff : 30380000.clock-controller clock-controller@30380000
30660000-3066ffff : 30660000.pwm pwm@30660000
30670000-3067ffff : 30670000.pwm pwm@30670000
30680000-3068ffff : 30680000.pwm pwm@30680000
30690000-3069ffff : 30690000.pwm pwm@30690000
30890000-3089ffff : 30890000.serial serial@30890000
30a20000-30a2ffff : 30a20000.i2c i2c@30a20000
30a30000-30a3ffff : 30a30000.i2c i2c@30a30000
30aa0000-30aaffff : 30aa0000.mailbox mailbox@30aa0000
30b50000-30b5ffff : 30b50000.mmc mmc@30b50000
30b60000-30b6ffff : 30b60000.mmc mmc@30b60000
30bb0000-30bbffff : 30bb0000.spi fspi_base
30bd0000-30bdffff : 30bd0000.dma-controller dma-controller@30bd0000
30bf0000-30bfffff : 30bf0000.ethernet ethernet@30bf0000
32ec0000-32ecffff : 32ec0000.media-blk-ctrl media-blk-ctrl@32ec0000
33000000-33001fff : 33000000.dma-apbh dma-apbh@33000000
38300000-38300423 : hantrodec0
38310000-38310423 : hantrodec0
38320000-383207cf : hx280enc
3d800000-3dbfffff : 3d800000.ddr-pmu ddr-pmu@3d800000
40000000-943fffff : System RAM
40480000-418bffff : Kernel code
418c0000-41b3ffff : reserved
41b40000-41d1ffff : Kernel data
43000000-4300bfff : reserved
4501a000-498dbfff : reserved
94400000-a43fffff : reserved
a4400000-bfffffff : System RAM
ba000000-bfffffff : reserved
100000000-17fffffff : System RAM
17b000000-17effffff : reserved
17f180000-17f1dffff : reserved
17f1e0000-17f7e0fff : reserved
17f7e1000-17f83cfff : reserved
17f83f000-17f847fff : reserved
17f848000-17fffffff : reserved

0 Kudos
Reply
62,132 Views
BiyongSUN
NXP Employee
NXP Employee

So far, haven't seen problem.

 

root@imx8mqevk:~# free
total used free shared buff/cache available
Mem: 3037148 386204 2563424 9004 87520 2557760
Swap: 0 0 0
root@imx8mqevk:~#
root@imx8mqevk:~# uname -a
Linux imx8mqevk 4.14.98-imx_4.14.98_2.0.0_ga+g5d6cbea #1 SMP PREEMPT Sun Apr 14 10:53:57 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
root@imx8mqevk:~# memtester 2048M 1000
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 2048MB (2147483648 bytes)
got 2048MB (2147483648 bytes), trying mlock ...locked.


Loop 2/1000:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : testing 359

0 Kudos
Reply
62,117 Views
Russell_Kook
Contributor I

You don't seem to have carefully read what I wrote in the first place.

I'm talking about a problem that occurs on a newly developed board.

0 Kudos
Reply
62,037 Views
Russell_Kook
Contributor I

I have confirmed that this problem has been patched in u-boot released in 2021.4.
Enjoy debugging everyone.

0 Kudos
Reply
62,622 Views
alif_chen
Contributor I

Hello.

Is there any update on how to solve the issue?

We currently face the same issue on i.MX6ULL and wanna reference to the solution.

Thank you

0 Kudos
Reply
63,767 Views
igorpadykov
NXP Employee
NXP Employee

Hi John

if " there's no much different between our board and KIT", then

could you try to reproduce issue on nxp reference board

with software described on

i.MX Software and Development Tools | NXP 

Best regards
igor

0 Kudos
Reply
63,767 Views
johncoll
Contributor II

Hi Igor
My custom board are
CPU: MIMX8MQ6CVAHZAB
RAM: 2GB DDR4 MT40A512M16JY-075E
Uboot: imx_v2018.03_4.14.98_2.0.0_ga
Kernel: imx_4.14.98_2.0.0_ga
My DDR is DDR4 not LPDDR4 so I used MX8M_DDR4_RPA_v9.xlsx then based on that I modified it to match our RAM configuration, you can refer it in the attachement. I followed Chapter 4 in MSCALE_DDR_Tool_User_Guide.pdf to build uboot up and runing, the system can boot to linux and work fined. Then in Linux I used stress command to test memory with 760MB.
I have tested RAM using both DDR Tool and Uboot mtest, it's all show PASS. What I saw is that when application try to allocate or malloc a memory more than 760MB in Linux, it cause the hang issue, for example this command 'stress --vm 1 --vm-bytes 760M'

0 Kudos
Reply
63,767 Views
igorpadykov
NXP Employee
NXP Employee

Hi John

suggest to try standard linux tool "memtester".

Best regards
igor

0 Kudos
Reply
63,767 Views
johncoll
Contributor II

Hi Igor

I used "memtester" and same issue happened, this time with "memtester" the system will hange if input memory bigger than 768MBScreenshot from 2020-03-02 23-32-42.png

Best regards

John

0 Kudos
Reply
63,766 Views
igorpadykov
NXP Employee
NXP Employee

Hi John

memtester (opposite to ddr tool) also stresses power supplies, so errors using

memtester point to power supplies (ripples, instabilities) issues. May be recommended

to recheck board design using sect.3.6. Power connectivity/routing

i.MX8M Hardware Developer’s Guide

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply
63,767 Views
johncoll
Contributor II

Hi Igor

I double checked with hardware team, then following test cases in Power Consumption Measurement document and comparing to IMX8MQ EVK board, there's no much different between our board and KIT.

Today I built new bl31.bin from without OP-TEE and the Linux system isn't hang anymore.

Is there any idea what might be the root cause?

Best regards

John

0 Kudos
Reply
63,720 Views
haochengdong
Contributor II

We ran into the same problem and had no idea, did you solve it?

0 Kudos
Reply
62,619 Views
alif_chen
Contributor I

Hello,

 

We ran into the same problem, too.

Did you guys have any idea on how to solve it?

Thank you

0 Kudos
Reply
62,126 Views
alif_chen
Contributor I

I enabled OPTEE. Finally, I find out the problem comes from reserved memory. It do not really locked.

The following commit solved my problem.

 

https://source.codeaurora.org/external/imx/imx-optee-os/commit/?h=imx_5.4.47_2.2.0&id=995908f2a91c06...

0 Kudos
Reply
62,605 Views
alif_chen
Contributor I

Hi all,

 

reply the issue myself.

It turns out that optee-os revise the DDR in the device tree with "reserved-memory".

However, the "#address-cells" and "#size-cells" are wrong.

I traced the code in optee-os, and then found a commit that helps.

https://source.codeaurora.org/external/imx/imx-optee-os/commit/?h=imx_5.4.47_2.2.0&id=995908f2a91c06...

The issue has been resolved by patching that.

0 Kudos
Reply
63,151 Views
johncoll
Contributor II

Hello.

We had new hardware version and there's some improved on power design but there is still same problem. Do you guys have any suggestion to debug this issue?

0 Kudos
Reply
63,767 Views
igorpadykov
NXP Employee
NXP Employee

Hi John

 may be recommended to use latest MX8M_LPDDR4_RPA_v24 tool with test:

i.MX8 MSCALE SERIES DDR Tool Release (V3.10) 

and try attached patch.

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply