We have custom board with a Vybrid controller and LPDDR2 Memory (IS43LD16640A).
We recently managed to get it working with the DRAM clock running at 400 MHz (WL=3, RL=6), but due to power concerns, we need to clock it down to 200 MHz (WL=2, RL=4). Both the MCU and DRAM are running from the same clock (PLL1pfd3), the resulting MCU frequency will be 198MHz. Both setups use burst length (BL) 4.
Assuming this would be an easy task, we simply updated the DRAM controller timings and hoped for the best. Not surprisingly, this did not work.
Using a JLINK-debugger, we loaded the DRAM using a functional setup (running at 400 MHz), and then immediately reset the MCU and initialized the DRAM controller with the experimental setup (200 MHz). From this we were able to read back the pattern written, indicating that nothing is wrong with the read-operation.
Next we wrote an ant-pattern to the same region we just read. When reading it back, we observed that the first two bytes of every other burst were wrong.
We have previously observed that a 32bit write produces the following result:
write -> readback 0x8000_0000 -> 0x0000_aaaa 0x4000_0000 -> 0x0000_aaaa 0x2000_0000 -> 0x0000_aaaa 0x1000_0000 -> 0x0000_aaaa [...] 0x0008_0000 -> 0x0000_aaaa 0x0004_0000 -> 0x0000_aaaa 0x0002_0000 -> 0x0000_aaaa 0x0001_0000 -> 0x0000_aaaa 0x0000_8000 -> 0x8000_aaaa 0x0000_4000 -> 0x4000_aaaa 0x0000_2000 -> 0x2000_aaaa 0x0000_1000 -> 0x1000_aaaa [...] 0x0000_0008 -> 0x0008_aaaa 0x0000_0004 -> 0x0004_aaaa 0x0000_0002 -> 0x0002_aaaa 0x0000_0001 -> 0x0001_aaaa
It seems that the missing two bytes "flow over" to the next 32-bit word:
(gdb) x/16b addr 0x80000000: 0x00 0x00 0x55 0x55 0x55 0x55 0x00 0x80 0x80000008: 0xaa 0xaa 0xaa 0xaa 0xaa 0xaa 0x00 0x55 (gdb) p/x *addr $59 = 0x55550000 (gdb) set *addr = 0xaabbccdd (gdb) p/x *addr $60 = 0xccdd0000 (gdb) x/16b addr 0x80000000: 0x00 0x00 0xdd 0xcc 0xbb 0xaa 0x00 0x80 0x80000008: 0xaa 0xaa 0xaa 0xaa 0xaa 0xaa 0x00 0x55
We are therefore wondering: What parameters do we specifically need to change when changing the DRAM clock from 400MHz to 200 MHz?
Following is a diff between the functional 400MHz setup, and the experimental 200 MHz setup:
Solved! Go to Solution.
First, I probably owe you an apology for not thinking about this earlier. I was caught up in trying to figure out how the data was being shifted in the memory.
When I saw your resgister results above, the problem became obvious. The fact that your DLL_LOCK_VALUEs at 200 MHz are all 0x00 indicates that the DLLs thinks it takes zero delay elements to make up a full clock cycle. Then, when it tries to go calculate the number of delay elements for Write and Read DQS delays, whatever number you put in still comes out as 0 delay. DQS strobes are not properly delayed and data doess't get latched in properly. I'm surprised that you actually had a board that worked in this case.
So I talked to the SOC design engineer. There are only 128 delay elements in the DLL. They are roughly 30 picoseconds each, so in full clock mode, the DLL can only handle a period of 3.84 nanoseconds, which equates to a frequency of ~ 260 MHz. Then the SOC engineer pointed out a register field that we had reserved titled PARAM_HALF_CLOCK_MODE, which is used for low frequencies. Apparantely I did not realize what low frequency was when I was working on the last revision and reserved this field.
Setting PARAM_HALF_CLOCK_MODE to 1 lets the DLL sync on only a half clock cycle with its 128 delay elements, so clock periods of 7.68 nanoseconds can then be supported (130 MHz). Even lower frequencies can be supported operating the DLL in bypass, which is a different bit that you don't need to concern yourself with.
So the full solution to your problem is simply to set PHY03/PHY19/PHY35 bit 24 to 'b1.
I think that would make the full register setting = 0x01430115 (note that I cut the DLL starting point in half).
Please give that a try and let me know how it works for you.
The description of this field will be something like (ie draft)
Determines if the master delay line locks on a full clock cycle or a half clock cycle.
Within the Master DLL there are only 128 delay elements that can be used to determine a lock. For frequencies of operation below 300 MHz, it is necessary to limit the lock period to only a half clock cycle so that the master delay line does not become staturated.
For both LPDDR2 and DDR3:
if you change frequency you need to change all time depended/calibration parameters.
we have no reference design for LPDDR2 let alone 200MHz - I cannot test it.
The registers you are referring to "has no meaning" for LPDDR2 according the the RM.
Nevertheless, we did try to manipulate these values as you suggested. As far as we can see, there are no difference from the results explained above.
Sorry for delay, but still overloaded with different topic.
right. i'm sorry. corrected.
LPDDR2 uses CR (WR,RL) and three sets of PHY data slices timing regs.
It looks like incorrect write latency or incorrect output enable window (PHY00 and PHY01,...)
write timing is tuned in PHY04/20/36 0x20 is center (you use 0x27) try 0x00-0x40
try to change WRLAT_ADJ in CR132
Some other hints:
Please try to add ZQ calibration request during programming registers. CR154 -> 0x6828 7000
You can also try to increase drive strength of signals 0x00000140 -> 0x00000180 (0x000001C0)
Please change all DQ and DQS from 0x00000140 to 0x00010140 (0x00010180 or 0x000101C0)
Thank you for finally getting back to us.
The symptoms of our memory issue seems to have changed somewhat (see: [GDB] Vybrid LPDDR200 @ 200MHz - Pastebin.com ). As you can see, we are unable to write to the two first columns.
You say that it seems the output enable window may be incorrect. Could you suggest a way for us to find the correct window?
We assume by saying "change all DQ and DQS from 0x00000140 to 0x00010140 [...]", you mean PHY00/01/16/17/32/33? The bit you are suggesting we change is in a field listed as "RESERVED", which leads us to wonder why we should write to this field in the first place? Nevertheless, doing this change does not produce any significant results.
Adjusting WRLAT_ADJ only seemed to make things worse. From our tests, we see that in addition to not being able to write to the two first columns, we were also unable to write to several of the first byte-positions following 0x80000000 (DRAM_START). It seemed like some kind of addressing issue.
The ZQ calibration did not seem to make a difference. Following are calculated values:
Increasing the drive strength did not seem to make a difference.
Mark is my colleague, I asked him to help. TheAdmiral please review my recommendations.
Thank you for your quick response.
Indeed we are using the 16 bit interface. We apologize if that was unclear before.
We have verified CR12 and CR49 to reflect WL=2, RL=4.
Setting CR132 to 0x204 seemed to have an immideate effect, and the first two columns were suddenly filled: [GDB] Vybrid LPDDR200 @ 200MHz, WRLATADJ=2, RDLATADJ=4 - Pastebin.com . However, as you can see, the write-readback of "0x12345678" at 0x8000_0000 returned a bit shuffled. The addresses 0x8000_0001, 0x8000_0002 also seems unwritable.
Setting CR132 to 0x203 (as suggested by RM), once again rendered the first two columns inaccessible: [GDB] Vybrid LPDDR200 @ 200MHz, WRLATADJ=2, RDLATADJ=3 - Pastebin.com
In regards to PHY00/01/16/17/32/33 output enable range. It would be helpful if we could get some pointers on how to ajdust so to avoid semi-qualified guesswork.
I see that Jiri tagged me in his last post. Unfortunately I didn't get the e-mail notification. I was just reviewing some notes from Jiri (He is leaving NXP), and I ran across this post. I am very sorry for the delay. But if you are still having troubles, I will do my best to help you.
First, I find it very surprising that you think you have a Write issue, but a setting of 0x204 vs. 0x203 helps you fix it. The parameter you are changing affects reads and not writes.
So, lets start with the assumption that changing the Read timing parameter is helping your case. The recommendatin to set RDLAT_ADJ = RL - 1 was based on work with DDR3 and was necessary to open enable the pads one cycle earlier to allow the Gate signal a reasonable amount of time to de-assert before Read Data arrived. That logic is really not neccessary for LPDDR2, so I might revise the note to make the recommendation for DDR3 only.
If you do want to keep RDLAT_ADJ = 3 (as per original setting), you may also consider setting a pull down resistor on the DQS IOMUX pads. Because the Gate signal does not work with LPDDR2, the work around is to put pull down resistors on the DQS strobes to keep them from spuriously strobing data because the LPDDR2 device has not taken positive control of the Byte lane. This can be accomplished by changing the following IOMUX settings:
0x400482c4 = 0x0001018C
0x400482c8 = 0x0001018C
The other possibility might be to add one or two to the current value of PHY_RDLAT (DDRMC_CT126 bits [13:8]), if you want to keep RDLAT_ADJ = 3. I think this has less chance to work than other options, but it is a possibility.
Now, if the actual problem is in the Write timing (or it could be a combination of Read and Write timings), here are the things you can do to fine tune the Write timings:
For registers PHY00, PHY16 and PHY32:
Field OE_START delays the enableing of the DQ pads from when the Controller sends the Write Enable signal. Initial recommendation is adding a 1/2 clock delay, but if you are seeing problems with the very beginning of a Write, maybe this filed should be set to 0x0.
Field OE_END delays the de-enabling fo the DQ pads from when the Controller stops sending the Write Enable signal (the signal should technically last BL/2 cycles). I think a setting of 0x7 is okay for now. Once things are working correctly, you might consider reducing it, just to tighten things up a bit. I don't think the problem is that the DQ data is cutting out too early in a burst.
For registers PHY01, PHY17 and PHY33:
Field OE_START delays the enableing of the DQS pads from when the Controller sends the Write Enable signal. Initial recommendation is adding a 1/2 clock delay, but if you are seeing problems with the very beginning of a Write, maybe this filed should be set to 0x0.
Field OE_END delays the de-enabling fo the DQS pads from when the Controller stops sending the Write Enable signal (the signal should technically last BL/2 cycles). I think a setting of 0x7 is okay for now. Once things are working correctly, you might consider reducing it, just to tighten things up a bit. I don't think the problem is that the DQs strobe is cutting out too early in a burst.
By the way, PHY32 and PHY33 are for the Command/Address signals. In practical terms, there settings have no effect, but I think for consistency, it would be better to set them to the same settings as the other two in the group.
Please let me know how these settings work for you and your boards. I am in the processes of making editorial corrections to the Vybrid Reference Manual and would like to update it with any necessary changes for LPDDR2.
Thank you for your wonderfully detailed answer.
We have tested the changes you suggested, but unfortunately we do not see any significant changes in our write/readback.
Remember, the issues described in this thread are only present on some of our boards. The same boards do not seem to have any problems running the DRAM at 400 MHz.
Sorry my first round of suggestions did not work.
I understand that this problem is just on a couple of board. I was trying to adjust timing parameters that might be on the hairy edge that would cause problems on outlier boards. For example, the pull down resistors would take care or boards that might be more noisier than the rest.
I would like to review the settings that you have for registers CR00 - CR161 and PHY00 - PHY52 if you could please attach those. Maybe I can come up with some other ideas.
Thanks again for your reply.
Here are the settings you asked for: [ARM] Vybrid LPDDR2@200MHz DS5-debug init-script - Pastebin.com
Edit: Please note that the register addresses have been swapped for a structure call. This is due to the fact that I sett the DRAM-settings in U-Boot and have a native MOC-program that generate the printout in the link.
Also note that the PHY[0..36]-registers are grouped by data-slice.
I'm about half way through reviewing your initialization parameters. I am seeing some errors, the biggest of which is CR12. There are only certain combintions of WL and RL that are allowed because of the way that MR2 is allowed to be programmed. WL = 2 and Rl = 8 is definately not allowed. At 200 MHz, the values I get are WL = 1 and RL = 3.
I'll let you know more tomorrow.
We are unsure about your previous statement. Our CR12 register is set to 0x00000208, which, according to the RM, translates to WL=2, RL=4.
Below is an extract from the CR12 register description. Notice the 1-bit offset on CASLAT_LIN.
However the macros defined in U-Boot actually states that the offset of CASLAT_LIN is 0, contradicting the RM. We are aware of several of these inconsistencies between the RM and U-Boot. We are also aware of multiple inconsistencies between the RM and the example "Golden" (provided in the following thread: Executing from LPDDR2, and again in Vybrid LPDDR2-configuration - IS43LD16640A), and within the RM itself. Some of these inconsistencies are listed here:
I'm sorry this took so long. I took the opportunity to thouroughly update a register programming add for Vybrid and LPDDR2, and then used it to check your register settings. It was quite the long process. I have attached the programming aid if you want to take a look. I have still left it with a draft revision number and would welcome any comments that you might have on it.
I went through your register settings and I have some comments. The ones in BOLD are the ones I feel might be having the most effect on you. Other ones are basically just difference between your settings and our current register settings.
Register CR12: If I read the datasheet correctly, for 200 MHz operations, WL = 1 and RL =3. I would use these values regardless of the speed grade.
Register CR14: Regardless of what the datasheet says about tRP, the minimum value that this field supports is 6. That value has to change. Also, TRAS_MIN is 42 ns on my datasheet, which gives me 9 clock periods at 200 MHz. Not sure if your datasheet says different.
Register CR21: Are you sure your memory supports tRAS lockout and concurrent auto pre-charge? Even if it does, if you are having trouble, you may want to disable these two options.
Register CR22: The value of tDAL should equal the field values of tRP + tWR. From the other registers, I am getting 4 + 3 = 7. Not sure if a higher setting will cause problems, however.
Register CR28: There is only one chip select, so this field could be set to 0x00 to disable an unnecessary counter.
Register CR72: I would recommend performing ZQ calibrations during initializations. In general, Drive Strengths are weaker until they are properly calibrated.
Register CR77: This won’t lead to data corruption, but all of our testing is with DI_RD_INTLEAVE disabled. In general, we let Reads from the same port execute in the order they were received.
Register CR78: We recommend a Q_FULLNESS setting of 0x7 to allow the arbiter enough room to reorder commands.
Register CR91: This is only a performance hit. Not sure you need to add extra cycle delays here, except that R2W_SMCSDL does need to be set to a minimum of 2.
Register CR98 & CR99: WRLVL_DL_0/1 is not supported for LPDDR2. Best to leave these registers 0x00000000.
Register CR118: Bits [15:8] are reserved. They should be set to 0x00. I looked at the IP manual and there is no definition for these bits. Is this being read back this way?
Register CR132: The value for RDLAT_ADJ should match the value for RL. The reference manual is being changed to indicate RL-1 only for DDR3. The reason that one clock cycle is being subtracted is to allow enough room for Read Gate to work effectively. But there is no Read Gate for LPDDR2, so best to leave them equal.
Register CR138: This is missing from your settings. PHYDRAM_CK_EN should be set to 1 clock cycle to give pads time to turn on.
Register CR154: Recommend enabling the PAD_ZQ_HW_FOR to ZQ calibrate the processor pad (CR72 is for the LPDDR2 memory). Also, 31 cycles for sampling the comparator is probably not necessary, but shouldn’t have that much of an effect.
Register PHY02/18/34: A high value of RD_DL_SET adds unnecessary delay to completing a Read burst. We recommend a setting of 0x4. We also recommend a value of two delay elements (0x1) for WR_DB_ADJ to reduce “hunting” effects. Don’t set a value in bits [7:4]. We made this reserved for a reason: It could cause the PHY to time out. And don’t set any values for GATE_CLOSE_CFG and GATE_CFG. Read Gate is not used on LPDDR2 and should be left alone.
Register PHY03/19/35: A high value of DLL_PHASE_SET reduces the ability of the DLL to control jitter. Recommend setting to lowest value that still allows reliable DLL Lock. To my knowledge, we don’t have a customer needing more than 0x4.
Register PHY49: PHY_WRLV_DL sets the SDCLK strobe delay in relation to the CA[9:0] signals. 0x20 is a 90-degree phase shift. A value of 0x04 seems very low. Recommend something closer to 0x20. This may be the #1 cause for you problems. Bit [7:0] are reserved and should be set to 0x00.
Register PHY50: We do not recommend setting the EN_SW_HALF_CYCLE field, although it is probably not effecting anything in light of other settings.
Register PHY00/16: After things are working correctly, you might want to move these registers back to 0x00000013.
Register PHY01/17: After things are working correctly, you might want to move these registers back to 0x00000015.
Please let me know if you have any questions, or if nothing that I have brought out is making a difference for you.
PS. I should not have said anything about register CR12 earlier until I fully vetted my comment. Sorry about any confustion.
Once again, thank you for your detailed response. We have looked over it, and we're sorry to say that we still do not have all cards running at 200 MHz.
We will try to address some of your remarks:
CR12: Setting WL=1, RL=3 did have a significant effect on our tests, but we cannot determine whether or not this was an improvement. The DRAM datasheet explicitly states WL=2, RL=4, so we have chosen to keep this setting.
CR22: The RM is not clear whether to use tRP for all banks or per-banks. We were previously using all banks, but have since changed to per-bank to comply with your remark.
CR132: Setting <RL/WL>_ADJ=<RL/WL> had significant effects on our tests. All collumns are now filled, and the only problem that remains is what appears to some addressing-issues (see: Vybrid LPDDR2@200MHz, GATE_CFG and GATE_CLOSE_CFG != 0 - Pastebin.com )
PHY02/18/34: You state that GATE_CLOSE_CFG and GATE_CFG are unused for LPDDR2. Upon disabling this, we were not able read back any correct data, and writes seemed unresponsive, only reading back what appears to be noise (see [GDB] Vybrid LPDDR2@200MHz GATE_CFG=GATE_CLOSE_CFG=0 - Pastebin.com ).
PHY03/19/35: We assume you meant DLL_PHASE_DET in this remark. By setting this field to 0x4, the DLL is unable to lock. 0x7 was previously suggested by jiri-b36968.
We have also implemented the rest of your remarks. They have not yelded any significant changes to our test results, but we have decided to keep them per your reasoning.
The current state of our DDR settings is now as follows: [ARM] Vybrid LPDDR2@200MHz DS5-debug init-script after remarks - Pastebin.com
Some interesting results here.
As per setting DLL_PHASE_DET to 0x7 just to make the DLL lock, what this tells me is the board design probably has noisy power supplies that is causing internal clocks to have a large amount of jitter. You may want to take a look at ripple on the bulk capacitors underneath the processor on the following supplies:
The problem with setting this field to 0x7 is that it reduces the amount of jitter that the DLL can reject. It’s a Catch-22 situation. There is too much jitter which forces the field to be set to a higher value, which then reduces the ability of the DLL to reject jitter.
Jitter could have a direct effect on address and command since the address and command signals are clocked in on both the rising and falling edges of the SDCLK signal (unlike DDR3), which makes the valid data window much smaller, and the positioning of the SDCLK signal much more critical.
As for the GATE_CLOSE_CFG and the GATE_CFG settings, I went back to the IP and it actually looks like these signals are used for LPDDR2 (unlike our other i.MX controllers, which completely take out the gate circuitry for LPDDR2). In particular, when the mobile bit is set [PHY_50, bit 13], the PHY will extend its internal GATE circuitry open signal by the amount
in GATE_CLOSE_CFG. The actual Gate Close signal is then derived by counting the number of falling DQS edges, and when the PHY sees 2 edges (Burst Length = 4) then it will close the gate.
But I would like you to try an experiment: Set GATE_CLOSE_CFG to 0x3 and GATE_CFG to 0x0. In other words, set registers PHY_02 and PHY_18 to 0x00380018 and give that a try. One field may be necessary while the other is causing you trouble.
As for the WL/RL, I just wanted to make sure you are using the correct settings. They are based on the DDR-400 column of the datasheet, and not on the speed grade of the part. Here is an excerpt of the JEDEC standard that I am using as a reference. But at the same time, the extra cycles shouldn’t be causing you a problem.
Going forward, I would like to see a listing of your current register settings after you achieved this statement:
All collumns are now filled, and the only problem that remains is what appears to some addressing-issues.
This will help me determine how to proceed. Can you tell me what you mean by some addressing issues?
I think the next step will be to find the best value for PHY_WRLV_DL in register PHY_49. This is the strobe adjust value for the
Command/Address signals. Also, I am curious: Are you still setting a value in bits [7:0] pf PHY_49? We have reserved this field because we don’t want it to adversely affect timings. Have you tried setting [7:0] to 0x00.