iMX8QM and LPDDR4 memory errors

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

iMX8QM and LPDDR4 memory errors

3,036 Views
emanuele79
Contributor III

Hello,
We are introducing the Nanya NT6AN512T32AV-J1I (https://www.nanya.com/en/Product/4404/NT6AN512T32AV-J1I) memory on our Apalis iMX8 (i.MX8QM) module as a replacement for the Micron MT53D512M32D2DS-046 IT:D (mt53d512m32d2ds-046-ait-d )

We have reviewed the timing specifications, and the Nanya part appears to be a drop-in replacement for the Micron one, as both seem to have the same timing requirements.

We tested several modules equipped with the Nanya memory using the same test setup as for the Micron version (Linux BSP + Memtester [memtester-4.6.0.tar.gz To be sure we have also re-run the test on the Micron modules.

The Nanya memory performs well at different temperature ranges, but in the higher range (~50°C to ~80°C), during the Bit Spread (but not only) test in Memtester, we observed the following failures (these are a subset of the total failures we had):

FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x00000000031a9a80.
FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x0000000009eb5440.
FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x000000000172b040.
FAILURE: 0xfffffffffbffffff != 0xffffffffffffffff at offset 0x000000000792e570.
FAILURE: 0xffffffffffffffff != 0xfffffffffbffffff at offset 0x000000000a13fb00.

It is important to note that these errors appear only when we start testing at -40°C and ramp up to 85°C without rebooting. If we start the test at 85°C and ramp down to -40°C, we do not observe any errors. Due to this, we suspect that LPDDR4 training at -40 is affecting the reliability of the memory (maybe there are signal integrity issues when training is done at -40 degrees).

We also dumped some DDR controller registers related to memory training, and we noticed significant differences between the two memory types (see attachments). We suspect that some termination or drive strength parameters may need to be tuned for the Nanya memory, but we are not sure which ones.

We have also attached the RPA (Register Programming Aid) Excel sheet containing the DDR controller configuration currently used for both memory types.

Any advice or suggestions are welcome, but specifically, we have the following questions:

  • Can the comparison of the training result registers help identify the root cause or highlight key differences between the two memory types?

  • Have you any suggestion around memory configuration to improve training and signal integrity?

Thank you in advance for your support.

 

Tags (1)
0 Kudos
Reply
30 Replies

1,728 Views
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Sorry about that, there is no other way except disable DQS2DQ training during operation. 

BTW : what is your test result i told you before change the CA Vref by change the MR12 value and do not change any parameters except MR12?

BR.

B.R

0 Kudos
Reply

1,620 Views
emanuele79
Contributor III

Hi @pengyong_zhang ,

I have tested all MR12 values between 0x40 and 0x47.

Kind regards,
Emanuele

0 Kudos
Reply

1,500 Views
hongting_dong
NXP Employee
NXP Employee

Hi @emanuele79 

Want to double some details.

1. could you able to test booting with a higher voltage?

2. could you get the VTSA results to see if the NANYA margin is not good at cold or across temperature?

3. in history,  changing the ODT /drive strength is no help right?

4. if setting CA Vref into range 0, is there any help?

Thanks

Best Regards

0 Kudos
Reply

1,186 Views
emanuele79
Contributor III
Hi,
could you provide me the VTSA for the iMX8QM.
As far as I know there is no such tool for this SoC.
Thanks,
best regards.
0 Kudos
Reply

1,148 Views
hongting_dong
NXP Employee
NXP Employee

Hi @emanuele79 

  • Once you create a baseline and are comfortable with the Write Eye generation procedure, we recommend the following steps for temperature testing:
    • Reduce desired LP4 device to -40C (in our lab, we use a thermo-stream)
    • Once at -40C, power up the board and start the VTSA tool (at this point, the idea is that the DDR is initialized/trained at -40C)
    • Perform the Write Eye test on each byte lane
    • Once completed, increase the temperature to 25C without powering cycling/re-starting the board or tool to maintain the trained values taken at -40C
    • Once at 25C, re-take the Write Eyes for each bye lane
    • Once completed, increase the temperature to 85C without powering cycling/re-starting the board or tool to maintain the trained values taken at -40C
    • Once at 85C, re-take the Write Eyes for each bye lane

Thanks

Best Regards

0 Kudos
Reply

1,162 Views
hongting_dong
NXP Employee
NXP Employee
0 Kudos
Reply

1,576 Views
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Yes, you are right.

0 Kudos
Reply

1,788 Views
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Our. In the errat,  ERR050102: DRAM: Periodic hardware based DQS2DQ calibration is not supported.

Description
If periodic hardware based DQS2DQ calibration is enabled, the resultant latency introduced can cause underrun conditions,
or worst case a lock-up, in some of the key sub-systems, such as the display and imaging interfaces, impacting their
performance capabilities.
Workaround
Currently DQS2DQ calibration only takes place on power up and when resuming from low power modes. To date no failures or
stability issues have been observed across the full process, voltage and temperature ranges.

B.R

0 Kudos
Reply

1,616 Views
emanuele79
Contributor III

Hi @pengyong_zhang,

the RAM datasheet reports:

DQS Interval Oscillator
As voltage and temperature change on the SDRAM die, the DQS clock tree delay will shift and may require re-training.

This suggests that DQS2DQ training should be supported and might be necessary in certain conditions.

Has NXP considered that disabling DQS2DQ could potentially cause malfunctions with some memory chips that rely on this training?

Would it be possible for NXP to help us identify a workaround or solution, and to assess whether the current errata can be resolved?

We have observed the following behavior:

  • If we start the test at 85 °C and let the temperature fall to -40 °C, we do not see any issues.
  • Conversely, if we start from a lower temperature (e.g., -10 °C) and increase it (e.g., up to 60 °C), we observe memory failures, even within this reduced range.
  • We also noticed that periodically suspending to RAM and resuming resolves the issue, likely because, as you mentioned, DQS2DQ training is performed during resume.
  • We enabled DQS2DQ training, but unfortunately, the results were still negative (and even worse).

Is this information useful for your analysis?

Best regards,
Emanuele

0 Kudos
Reply

1,603 Views
emanuele79
Contributor III
An additional note the "DQS Interval Oscillator" datasheet chapter is also reported in JEDEC Standard No. 209-4: "Low Power Double Data Rate 4 (LPDDR4)".
BR,
Emanuele
0 Kudos
Reply

1,756 Views
emanuele79
Contributor III

Hello @pengyong_zhang,

Is there any other option except having it completely disabled?

Kind regards,
Emanuele

0 Kudos
Reply

1,886 Views
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Nanya means that this error is caused by the board not having regular DQS2DQ training during operation?

B.R

0 Kudos
Reply

1,881 Views
emanuele79
Contributor III

Hi @pengyong_zhang ,

yes, it is.

Emanuele

0 Kudos
Reply

1,946 Views
emanuele79
Contributor III

Hello @pengyong_zhang , @hongting_dong ,

Nanya analyzed the behaviour and this is their conclusion:


Based on LA MRS setting confirmation, MR18/MR19 were disabled during memory test.
NTC suggest enabling MR18/MR19 on the platform because if the platform does not perform tDQS2DQ offset, the default settings may not meet NTC devices' requirements.

I couldn’t find any setting or information related to the MR18/MR19 registers in the RPA, SCFW, or the i.MX8QM Reference Manual.

Since these are read-only registers on the RAM side, my understanding is that the memory controller or the SCFW should read and use them somewhere at runtime.

Could you help me identify how we might proceed?

Emanuele

0 Kudos
Reply

2,269 Views
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Keep the ZPROG_ASYM_PD_DRV_DQ_48 and ZPROG_ASYM_PD_DRV_DQ_60 as default value. Do not change it.

Then gradually test different MR12 values. Sorry, i can not reproduce your problem, because i do not gave your test board and environment. So you need to test it for your self, and find the right Vref value. Also i think the best way is still ask to Nanya  talk about this problem, see if them can reproduce your problem and give you the workaround.

B.R

0 Kudos
Reply

2,394 Views
pengyong_zhang
NXP Employee
NXP Employee

Hi @emanuele79 

Did you run the test about change MR12 register change? What is the test result. And when you run this test, do not change any parameter except MR12.

B.R

0 Kudos
Reply

2,324 Views
emanuele79
Contributor III

Hello @pengyong_zhang,

I have tested MR12 only change from 0x47 to 0x40.

As well as ZPROG_ASYM_PD_DRV_DQ_48 and ZPROG_ASYM_PD_DRV_DQ_60.

Every configuration failed.
I'm not really able to understand if any of these test was better or worse.

Kind regards,
Emanuele

0 Kudos
Reply

2,417 Views
emanuele79
Contributor III

Hello @pengyong_zhang , @hongting_dong ,

can you focus particularly on this fact:

> It is important to note that these errors appear only when we start testing at -40°C and ramp up to 85°C without rebooting.

As a workaround, can we retrigger memory training on a working system by patching the scfw?
Given the kind of errors we reported (bit flip) is there any trained "parameter" which can help us identify the root problem?

Thank you in advance,
kind regards.

Emanuele

0 Kudos
Reply

2,528 Views
hongting_dong
NXP Employee
NXP Employee

Hi @emanuele79 

Please also update info:

Nanya Failing device info :

MFD Date code: 

LOT number is : 

And have you tried Nanya's latest device part, from my remember, their device parts have some updates in ZQ cal

Best Regards

0 Kudos
Reply

2,477 Views
emanuele79
Contributor III

Hello,
on the memory there is written:

Nanya2447

NT6AN512T32AV-J1I

9423W1EF 3 TW

Let me know if this is the information you need.
Thanks!

Regards,
Emanuele

 

0 Kudos
Reply