iMX8QM and LPDDR4 memory errors

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

iMX8QM and LPDDR4 memory errors

2,939件の閲覧回数
emanuele79
Contributor III

Hello,
We are introducing the Nanya NT6AN512T32AV-J1I (https://www.nanya.com/en/Product/4404/NT6AN512T32AV-J1I) memory on our Apalis iMX8 (i.MX8QM) module as a replacement for the Micron MT53D512M32D2DS-046 IT:D (mt53d512m32d2ds-046-ait-d )

We have reviewed the timing specifications, and the Nanya part appears to be a drop-in replacement for the Micron one, as both seem to have the same timing requirements.

We tested several modules equipped with the Nanya memory using the same test setup as for the Micron version (Linux BSP + Memtester [memtester-4.6.0.tar.gz To be sure we have also re-run the test on the Micron modules.

The Nanya memory performs well at different temperature ranges, but in the higher range (~50°C to ~80°C), during the Bit Spread (but not only) test in Memtester, we observed the following failures (these are a subset of the total failures we had):

FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x00000000031a9a80.
FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x0000000009eb5440.
FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x000000000172b040.
FAILURE: 0xfffffffffbffffff != 0xffffffffffffffff at offset 0x000000000792e570.
FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x000000000a13fb00.

It is important to note that these errors appear only when we start testing at -40°C and ramp up to 85°C without rebooting. If we start the test at 85°C and ramp down to -40°C, we do not observe any errors. Due to this, we suspect that LPDDR4 training at -40 is affecting the reliability of the memory (maybe there are signal integrity issues when training is done at -40 degrees).

We also dumped some DDR controller registers related to memory training, and we noticed significant differences between the two memory types (see attachments). We suspect that some termination or drive strength parameters may need to be tuned for the Nanya memory, but we are not sure which ones.

We have also attached the RPA (Register Programming Aid) Excel sheet containing the DDR controller configuration currently used for both memory types.

Any advice or suggestions are welcome, but specifically, we have the following questions:

  • Can the comparison of the training result registers help identify the root cause or highlight key differences between the two memory types?

  • Have you any suggestion around memory configuration to improve training and signal integrity?

Thank you in advance for your support.

 

ラベル(1)
タグ(1)
0 件の賞賛
返信
30 返答(返信)

1,651件の閲覧回数
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Sorry about that, there is no other way except disable DQS2DQ training during operation. 

BTW : what is your test result i told you before change the CA Vref by change the MR12 value and do not change any parameters except MR12?

BR.

B.R

0 件の賞賛
返信

1,543件の閲覧回数
emanuele79
Contributor III

Hi @pengyong_zhang ,

I have tested all MR12 values between 0x40 and 0x47.

Kind regards,
Emanuele

0 件の賞賛
返信

1,423件の閲覧回数
hongting_dong
NXP Employee
NXP Employee

Hi @emanuele79 

Want to double some details.

1. could you able to test booting with a higher voltage?

2. could you get the VTSA results to see if the NANYA margin is not good at cold or across temperature?

3. in history,  changing the ODT /drive strength is no help right?

4. if setting CA Vref into range 0, is there any help?

Thanks

Best Regards

0 件の賞賛
返信

1,109件の閲覧回数
emanuele79
Contributor III
Hi,
could you provide me the VTSA for the iMX8QM.
As far as I know there is no such tool for this SoC.
Thanks,
best regards.
0 件の賞賛
返信

1,071件の閲覧回数
hongting_dong
NXP Employee
NXP Employee

Hi @emanuele79 

  • Once you create a baseline and are comfortable with the Write Eye generation procedure, we recommend the following steps for temperature testing:
    • Reduce desired LP4 device to -40C (in our lab, we use a thermo-stream)
    • Once at -40C, power up the board and start the VTSA tool (at this point, the idea is that the DDR is initialized/trained at -40C)
    • Perform the Write Eye test on each byte lane
    • Once completed, increase the temperature to 25C without powering cycling/re-starting the board or tool to maintain the trained values taken at -40C
    • Once at 25C, re-take the Write Eyes for each bye lane
    • Once completed, increase the temperature to 85C without powering cycling/re-starting the board or tool to maintain the trained values taken at -40C
    • Once at 85C, re-take the Write Eyes for each bye lane

Thanks

Best Regards

0 件の賞賛
返信

1,085件の閲覧回数
hongting_dong
NXP Employee
NXP Employee
0 件の賞賛
返信

1,499件の閲覧回数
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Yes, you are right.

0 件の賞賛
返信

1,711件の閲覧回数
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Our. In the errat,  ERR050102: DRAM: Periodic hardware based DQS2DQ calibration is not supported.

Description
If periodic hardware based DQS2DQ calibration is enabled, the resultant latency introduced can cause underrun conditions,
or worst case a lock-up, in some of the key sub-systems, such as the display and imaging interfaces, impacting their
performance capabilities.
Workaround
Currently DQS2DQ calibration only takes place on power up and when resuming from low power modes. To date no failures or
stability issues have been observed across the full process, voltage and temperature ranges.

B.R

0 件の賞賛
返信

1,539件の閲覧回数
emanuele79
Contributor III

Hi @pengyong_zhang,

the RAM datasheet reports:

DQS Interval Oscillator
As voltage and temperature change on the SDRAM die, the DQS clock tree delay will shift and may require re-training.

This suggests that DQS2DQ training should be supported and might be necessary in certain conditions.

Has NXP considered that disabling DQS2DQ could potentially cause malfunctions with some memory chips that rely on this training?

Would it be possible for NXP to help us identify a workaround or solution, and to assess whether the current errata can be resolved?

We have observed the following behavior:

  • If we start the test at 85 °C and let the temperature fall to -40 °C, we do not see any issues.
  • Conversely, if we start from a lower temperature (e.g., -10 °C) and increase it (e.g., up to 60 °C), we observe memory failures, even within this reduced range.
  • We also noticed that periodically suspending to RAM and resuming resolves the issue, likely because, as you mentioned, DQS2DQ training is performed during resume.
  • We enabled DQS2DQ training, but unfortunately, the results were still negative (and even worse).

Is this information useful for your analysis?

Best regards,
Emanuele

0 件の賞賛
返信

1,526件の閲覧回数
emanuele79
Contributor III
An additional note the "DQS Interval Oscillator" datasheet chapter is also reported in JEDEC Standard No. 209-4: "Low Power Double Data Rate 4 (LPDDR4)".
BR,
Emanuele
0 件の賞賛
返信

1,679件の閲覧回数
emanuele79
Contributor III

Hello @pengyong_zhang,

Is there any other option except having it completely disabled?

Kind regards,
Emanuele

0 件の賞賛
返信

1,809件の閲覧回数
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Nanya means that this error is caused by the board not having regular DQS2DQ training during operation?

B.R

0 件の賞賛
返信

1,804件の閲覧回数
emanuele79
Contributor III

Hi @pengyong_zhang ,

yes, it is.

Emanuele

0 件の賞賛
返信

1,869件の閲覧回数
emanuele79
Contributor III

Hello @pengyong_zhang , @hongting_dong ,

Nanya analyzed the behaviour and this is their conclusion:


Based on LA MRS setting confirmation, MR18/MR19 were disabled during memory test.
NTC suggest enabling MR18/MR19 on the platform because if the platform does not perform tDQS2DQ offset, the default settings may not meet NTC devices' requirements.

I couldn’t find any setting or information related to the MR18/MR19 registers in the RPA, SCFW, or the i.MX8QM Reference Manual.

Since these are read-only registers on the RAM side, my understanding is that the memory controller or the SCFW should read and use them somewhere at runtime.

Could you help me identify how we might proceed?

Emanuele

0 件の賞賛
返信

2,192件の閲覧回数
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Keep the ZPROG_ASYM_PD_DRV_DQ_48 and ZPROG_ASYM_PD_DRV_DQ_60 as default value. Do not change it.

Then gradually test different MR12 values. Sorry, i can not reproduce your problem, because i do not gave your test board and environment. So you need to test it for your self, and find the right Vref value. Also i think the best way is still ask to Nanya  talk about this problem, see if them can reproduce your problem and give you the workaround.

B.R

0 件の賞賛
返信

2,317件の閲覧回数
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Did you run the test about change MR12 register change? What is the test result. And when you run this test, do not change any parameter except MR12.

B.R

0 件の賞賛
返信

2,247件の閲覧回数
emanuele79
Contributor III

Hello @pengyong_zhang,

I have tested MR12 only change from 0x47 to 0x40.

As well as ZPROG_ASYM_PD_DRV_DQ_48 and ZPROG_ASYM_PD_DRV_DQ_60.

Every configuration failed.
I'm not really able to understand if any of these test was better or worse.

Kind regards,
Emanuele

0 件の賞賛
返信

2,340件の閲覧回数
emanuele79
Contributor III

Hello @pengyong_zhang , @hongting_dong ,

can you focus particularly on this fact:

> It is important to note that these errors appear only when we start testing at -40°C and ramp up to 85°C without rebooting.

As a workaround, can we retrigger memory training on a working system by patching the scfw?
Given the kind of errors we reported (bit flip) is there any trained "parameter" which can help us identify the root problem?

Thank you in advance,
kind regards.

Emanuele

0 件の賞賛
返信

2,451件の閲覧回数
hongting_dong
NXP Employee
NXP Employee

Hi @emanuele79 

Please also update info:

Nanya Failing device info :

MFD Date code: 

LOT number is : 

And have you tried Nanya's latest device part, from my remember, their device parts have some updates in ZQ cal

Best Regards

0 件の賞賛
返信

2,400件の閲覧回数
emanuele79
Contributor III

Hello,
on the memory there is written:

Nanya2447

NT6AN512T32AV-J1I

9423W1EF 3 TW

Let me know if this is the information you need.
Thanks!

Regards,
Emanuele

 

0 件の賞賛
返信