Write Leveling register WL_SW_RESx

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Write Leveling register WL_SW_RESx

Jump to solution
3,842 Views
sugiyamatoshihi
Contributor V

Hi, @janspurek 

I checked DDR timing by ddr_stress_tester_v2.60.

Then I  got the result 'Success: DDR Stress test completed!!!' .  However I checked MMDCx_MPWLGCR (0x021B0808) register, its value is 0x000000C0. So,  WL_SW_RES3, WL_SW_RES2 were set 1.

Questions

1. Does calibration is performed only SW Write Leveling Calibration in ddr_stress_tester_v2.60 not HW callibration?

2. What does this register value mean ?

3. What timing value is specified?

Best Regards,

Sugiyama

Labels (1)
1 Solution
2,746 Views
TheAdmiral
NXP Employee
NXP Employee

Hi Sugiyama-san,

When the Write Leveling calibration routine is being performed, it will tell you if you are running the HW version or the SW version. The code has both, and the determinition is made when the source code is compiled. The same SW version could be compiled either way. This is an example printout of the standard test running Write Level HW calibration:

Start write leveling calibration...
running Write level HW calibration
Write leveling calibration completed, update the following registers in your initialization script
    MMDC_MPWLDECTRL0 ch0 (0x021b080c) = 0x0019001B
    MMDC_MPWLDECTRL1 ch0 (0x021b0810) = 0x0024001D

Register MPWLGCR does the following things:

Bit 0 will initiate HW Write Leveling Calibration routine.

Bits [11:8] will report if an error was detected on any Byte Lane during HW Write Leveling Calibration. These fields are meaningless if running the SW Write Leveling Calibration routine. The reason that these fields are meaningless is because the coarse calibration routine is not being conducted and the MPWLHWERR register is not being updated. So there is no mechanism to report an error.

Bit 1 will initiate SW Write Leveling Calibration routine.

Bits [7:4] will report the results of the prime bit being sent back from the DDR3 device. These fields are meaningless if running the HW Write Leveling Calibration routine. The reason that these fields are meaningless is because there is no way to determine the last delay value used in making the comparison. They may toggle high/low, but they provide no useful information.

Bit 2 sets an additional amount of delay to wait to issue the DQS pulse after Bit 1 is asserted.

The timing value specified by the Write Leveling Calibration routine in registers MPWLDECTRL0/1 are the number of 1/256 fractions of a clock cycle that the DQS strobe needs to be delayed so it arrives at the same time as the SDCLK edge.

Register MPWLHWERR is rather poorly named. Yes, if an entire byte field is either all high or all low (ie, 0xFF or 0x00), then there is an error. But the register is also used to determine which eighth of a clock cycle should be used as the base starting point for the fine calibration phase. It is only used in the HW calibration mode. Each byte represents the results of 8 different DQS to SDCLK timing measurements made with different delays applied to the DQS strobe. The indications are as follows:

- 0 means SDCLK is low when DQS strobe edge arrives, 1 means that SDCLK is high when DQS strobe edge arrives.

Bit 0 - 0/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 1 - 1/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 2 - 2/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 3 - 3/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 4 - 4/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 5 - 5/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 6 - 6/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 7 - 7/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

A value of 0x1F means that a low clock level was only seen when 5/8, 6/8, 7/8 tCK delays were added.

A value of 0x3E means that a low clock level was only seen  when 0/8, 6/8, 7/8 tCK delays were added.

If bit 7 was low and bit 0 was high, then the fine calibration routine would start with a base delay of 7/8 tCK and would run through 8/8 tCK delay finding the exact 1/256 delay value at which the DQS edge arrives just before the SDCLK edge. This is the case of HW_WL2_DQ = 0x1F.

If bit 0 was low and bit 1 was high, then the fine calibration routine would start with a base delay of 0/8 tCK and would run through 1/8 tCK delay finding the exact 1/256 delay value at which the DQS edge arrives just before the SDCLK edge. This is the case of HW_WLn_DQ = 0x3E.

Yes, the Stress Test check the MPWLGCR HW_ERR fields and will report in the debug printout if there was an error detected.

>> It is important to note that the determination of a HW_ERR is only made during the coarse calibration of the routine. So, if a field in the MPWLGCR register does not equal either 0x00 or 0xFF, then no error will be reported. Even if the results look like 00011111, and it doesn't seem as if there was a 10 edge, the results must be considered like a circle, with bit 7 wrapping back around to bit 0.

Having said all that, the DDR Stress Test does something that we do not advertise to the users. The Stress Test iself looks at the values of the MPWLDECTRL0/1 fields before reporting results, and if it sees any filed with a value greater than 200/256 delay (reported as half-cycle = 0x1 and ABS_OFFSET > 0x48), the DDR Stress test will reset the Write Leveling delay for this lane to 0x000 and not report it in the log.

The reason that the DDR Stress test does this is because a delay of more than 78% a clock cycle means that the DQS edge is arriving within the JEDEC tolerence of 25% of the clock edge. In most cases, DQS is arriving < 5% tCK of the SDCLK edge in the early case, and it does not make sense to delay the DQS strobe almost a full clock cycle and add extra latency to each Write burst just to make the two edges align exactly. In this case, we are guilty of making a decision for the customer without telling them we are doing it so that we don't have to provide the above explanation to every customer. They don't need to know it.

Now to explain your results: In the left screen shot, you have added a capacitor to the SDCLK traces, effectively slowing the SDCLK signal. (You could have manually added some delay to the SDCLK trace using register MPSDCTRL). So in the first part of the calibration routine, 0/8 delay caused the DQS strobe to arrive before the clock edge, 1/8, 2/8, 3/8, and 4/8 delay caused the DQS strobe to arrive after the rising edge of the clock, and 5/8, 6/8, and 7/8 delay caused the DQS strobe to arrive after the falling edge of the clock. Using this information, the Write Leveling routine starts with a base delay of 0/8 tCK and added 1/256 tCK sequentially. In this case, it looks like 1/256 and 2/256 delay causes the DQS edge to arrive before the rising CLK edge and a 3/256 delay causes the DQS edge to arrive after the rising CLK edge (The test already knows the results of 0/256 delay, and the results of 32/256 delay). Eventually in the fine tune routine, the algorithm will walk back and find the edge by decreasing the amount of delay, and then will fine tune itself to find the exact value.

>> Another important thing to note is that there are not 256 delay elements in a full clock cycle. The length of a delay element is fixed at ~ 16 pico seconds. So @ 400 MHZ with tCK = 2.5 nanoseconds, there will be ~ 156 delay elements. Therefore, not every increase of 1/256 will add another delay element.

For the example on the right, the first part of the calibration routine has determined that 0/8, 1/8, 2/8, 3/8, and 4/8 delays have caused the DQS edge to arrive after the rising edge of the SDCLK. Then 5/8, 6/8, and 7/8 delays have caused the DQS edge to arrive after the falling edge of the SDCLK. Moving into the Fine Tune portion of the Calibration routine, the algorithm already knows that 7/8 delay causes the DQS strobe to arrive before the SDCLK edge and the 8/8 = 0/8 delay causes the strobe edge to arrive after the SDCLK edge. So the fine tune routine works within these two bounds to find the best setting. In your screen shot, it looks like 225/256 delay through 255/256 delay all cause the DQS strobe to arrive before the SDCLK. Therefore, the algorithm concludes that a setting of 256/256 = 0/256 is best, based on the results of the coarse calibration routine. Actually, the algorithm may have concluded that 255/256 is best, but our automatic correction is going to reset it to 0/256.

Hopefully this clears up all of your questions.

Cheers,

Mark

View solution in original post

0 Kudos
Reply
12 Replies
2,746 Views
jan_spurek
NXP Employee
NXP Employee

Hello Sugiyama,

1. DDR Stress Tester performs HW Write leveling Calibration.

2. WL_SW_RES is only a status flag that indicates the prime bit's setting - when set, meaning that we read back a "1" for the prime bit from the DDR chip. It does not indicate an error.

3. Could please be more specific?

Best Regards,

Jan

2,746 Views
sugiyamatoshihi
Contributor V

Hi, Jan,

Thank you for answer.

1,2, Regarding to  WL_SW_RES, it describes write-leveling software result.. So i thought  calibration tool use SW write leveling. Does it means WL_SW_RES set even though HW write leveling run?

3. What DQS delay rising edge capture CLK? 1/8, 2/8,...7/8delay?

4.I saw the register 44.12.61 MMDC PHY Write Leveling HW Error Register(MMDCx_MPWLHWERR). 

This register value is 3E1F3E3E.  WL2_DQ=1F is different from others. What is considered?

5. Does ddr_stress_tester check WL_HW_ERRx and return error if WL_HW_ERRx is set?

Best Regards,

Sugiyama

0 Kudos
Reply
2,746 Views
jan_spurek
NXP Employee
NXP Employee

Hello Sugiyama,

Does it means WL_SW_RES set even though HW write leveling run?

I cannot confirm this. I would need to check with the design team to be sure. But it seems this is the case.

What DQS delay rising edge capture CLK? 1/8, 2/8,...7/8delay?

In the first phase of the write leveling calibration the MMDC performs calibration for all of these delays to roughly identify the interval where the actual delay is located. The results are stored in MMDC_MPWLHWERR and MMDC looks there for transition from 0 to 1. If no transition is found, an error is indicated in MPWLGCR[HW_WL_ERR#] and the calibration ends with failure.

This should also provide answer to the question 4. Value 1F for Byte 2 has no 0 to 1 transition. Therefore the Stress Tester will fail with this result.

Best Regards,

Jan

2,746 Views
sugiyamatoshihi
Contributor V

Hi, Jan,

Regarding to question 4,  Stress tester didn't fail and Calibration was successful done. It is strange.

I think HW calibration run automatically, but it might stop only first 1/8 delay detection based on waveform. 

OK_NG_compare.jpg

Left waveform is success case. Right waveform is fail case.  There seems no fine tune in right case.

However, these are the same line, it just touch clock line to high capacitance probe in left waveform. 

What is considered this case?

Best Regards,

Sugiyama

0 Kudos
Reply
2,746 Views
jan_spurek
NXP Employee
NXP Employee

Hello Sugiyama,

sorry for the delay, I was on vacation.

TheAdmiral‌ could you please share your analysis here?

Best Regards,

Jan

0 Kudos
Reply
2,747 Views
TheAdmiral
NXP Employee
NXP Employee

Hi Sugiyama-san,

When the Write Leveling calibration routine is being performed, it will tell you if you are running the HW version or the SW version. The code has both, and the determinition is made when the source code is compiled. The same SW version could be compiled either way. This is an example printout of the standard test running Write Level HW calibration:

Start write leveling calibration...
running Write level HW calibration
Write leveling calibration completed, update the following registers in your initialization script
    MMDC_MPWLDECTRL0 ch0 (0x021b080c) = 0x0019001B
    MMDC_MPWLDECTRL1 ch0 (0x021b0810) = 0x0024001D

Register MPWLGCR does the following things:

Bit 0 will initiate HW Write Leveling Calibration routine.

Bits [11:8] will report if an error was detected on any Byte Lane during HW Write Leveling Calibration. These fields are meaningless if running the SW Write Leveling Calibration routine. The reason that these fields are meaningless is because the coarse calibration routine is not being conducted and the MPWLHWERR register is not being updated. So there is no mechanism to report an error.

Bit 1 will initiate SW Write Leveling Calibration routine.

Bits [7:4] will report the results of the prime bit being sent back from the DDR3 device. These fields are meaningless if running the HW Write Leveling Calibration routine. The reason that these fields are meaningless is because there is no way to determine the last delay value used in making the comparison. They may toggle high/low, but they provide no useful information.

Bit 2 sets an additional amount of delay to wait to issue the DQS pulse after Bit 1 is asserted.

The timing value specified by the Write Leveling Calibration routine in registers MPWLDECTRL0/1 are the number of 1/256 fractions of a clock cycle that the DQS strobe needs to be delayed so it arrives at the same time as the SDCLK edge.

Register MPWLHWERR is rather poorly named. Yes, if an entire byte field is either all high or all low (ie, 0xFF or 0x00), then there is an error. But the register is also used to determine which eighth of a clock cycle should be used as the base starting point for the fine calibration phase. It is only used in the HW calibration mode. Each byte represents the results of 8 different DQS to SDCLK timing measurements made with different delays applied to the DQS strobe. The indications are as follows:

- 0 means SDCLK is low when DQS strobe edge arrives, 1 means that SDCLK is high when DQS strobe edge arrives.

Bit 0 - 0/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 1 - 1/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 2 - 2/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 3 - 3/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 4 - 4/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 5 - 5/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 6 - 6/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

Bit 7 - 7/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.

A value of 0x1F means that a low clock level was only seen when 5/8, 6/8, 7/8 tCK delays were added.

A value of 0x3E means that a low clock level was only seen  when 0/8, 6/8, 7/8 tCK delays were added.

If bit 7 was low and bit 0 was high, then the fine calibration routine would start with a base delay of 7/8 tCK and would run through 8/8 tCK delay finding the exact 1/256 delay value at which the DQS edge arrives just before the SDCLK edge. This is the case of HW_WL2_DQ = 0x1F.

If bit 0 was low and bit 1 was high, then the fine calibration routine would start with a base delay of 0/8 tCK and would run through 1/8 tCK delay finding the exact 1/256 delay value at which the DQS edge arrives just before the SDCLK edge. This is the case of HW_WLn_DQ = 0x3E.

Yes, the Stress Test check the MPWLGCR HW_ERR fields and will report in the debug printout if there was an error detected.

>> It is important to note that the determination of a HW_ERR is only made during the coarse calibration of the routine. So, if a field in the MPWLGCR register does not equal either 0x00 or 0xFF, then no error will be reported. Even if the results look like 00011111, and it doesn't seem as if there was a 10 edge, the results must be considered like a circle, with bit 7 wrapping back around to bit 0.

Having said all that, the DDR Stress Test does something that we do not advertise to the users. The Stress Test iself looks at the values of the MPWLDECTRL0/1 fields before reporting results, and if it sees any filed with a value greater than 200/256 delay (reported as half-cycle = 0x1 and ABS_OFFSET > 0x48), the DDR Stress test will reset the Write Leveling delay for this lane to 0x000 and not report it in the log.

The reason that the DDR Stress test does this is because a delay of more than 78% a clock cycle means that the DQS edge is arriving within the JEDEC tolerence of 25% of the clock edge. In most cases, DQS is arriving < 5% tCK of the SDCLK edge in the early case, and it does not make sense to delay the DQS strobe almost a full clock cycle and add extra latency to each Write burst just to make the two edges align exactly. In this case, we are guilty of making a decision for the customer without telling them we are doing it so that we don't have to provide the above explanation to every customer. They don't need to know it.

Now to explain your results: In the left screen shot, you have added a capacitor to the SDCLK traces, effectively slowing the SDCLK signal. (You could have manually added some delay to the SDCLK trace using register MPSDCTRL). So in the first part of the calibration routine, 0/8 delay caused the DQS strobe to arrive before the clock edge, 1/8, 2/8, 3/8, and 4/8 delay caused the DQS strobe to arrive after the rising edge of the clock, and 5/8, 6/8, and 7/8 delay caused the DQS strobe to arrive after the falling edge of the clock. Using this information, the Write Leveling routine starts with a base delay of 0/8 tCK and added 1/256 tCK sequentially. In this case, it looks like 1/256 and 2/256 delay causes the DQS edge to arrive before the rising CLK edge and a 3/256 delay causes the DQS edge to arrive after the rising CLK edge (The test already knows the results of 0/256 delay, and the results of 32/256 delay). Eventually in the fine tune routine, the algorithm will walk back and find the edge by decreasing the amount of delay, and then will fine tune itself to find the exact value.

>> Another important thing to note is that there are not 256 delay elements in a full clock cycle. The length of a delay element is fixed at ~ 16 pico seconds. So @ 400 MHZ with tCK = 2.5 nanoseconds, there will be ~ 156 delay elements. Therefore, not every increase of 1/256 will add another delay element.

For the example on the right, the first part of the calibration routine has determined that 0/8, 1/8, 2/8, 3/8, and 4/8 delays have caused the DQS edge to arrive after the rising edge of the SDCLK. Then 5/8, 6/8, and 7/8 delays have caused the DQS edge to arrive after the falling edge of the SDCLK. Moving into the Fine Tune portion of the Calibration routine, the algorithm already knows that 7/8 delay causes the DQS strobe to arrive before the SDCLK edge and the 8/8 = 0/8 delay causes the strobe edge to arrive after the SDCLK edge. So the fine tune routine works within these two bounds to find the best setting. In your screen shot, it looks like 225/256 delay through 255/256 delay all cause the DQS strobe to arrive before the SDCLK. Therefore, the algorithm concludes that a setting of 256/256 = 0/256 is best, based on the results of the coarse calibration routine. Actually, the algorithm may have concluded that 255/256 is best, but our automatic correction is going to reset it to 0/256.

Hopefully this clears up all of your questions.

Cheers,

Mark

0 Kudos
Reply
2,746 Views
sugiyamatoshihi
Contributor V

Hi, Mark,

Thank you for your clear explanation. I mostly understood.

I attached doc for my understanding. Do you think this doc is correct?

Also, I'd like to just confirm your statements and delay value consideration.

>> It is important to note that the determination of a HW_ERR is only made during the coarse calibration of the routine. So, if a field in the MPWLGCR register does not equal either 0x00 or 0xFF, then no error will be reported. Even if the results look like 00011111, and it doesn't seem as if there was a 10 edge, the results must be considered like a circle, with bit 7 wrapping back around to bit 0.

<Question> I think it only output 8 DQS edge for coarse calibration, and fine calibration immediately  start after coarse. I don't think wrap case appeared. Are there any case to wrapping back?

You mentioned that Actually, the algorithm may have concluded that 255/256 is best, but our automatic correction is going to reset it to 0/256.

I found the community comment in https://community.nxp.com/docs/DOC-105652. That tells that  the returned delay value is greater than ¾ of a clock cycle, it will "zero out" the delay value.

<Question> Does this means  if delay value is over ¾ clock (0x140) delay, it change to 0x000 because of JEDEC tolerance?

Best Regards,

Sugiyama

0 Kudos
Reply
2,746 Views
TheAdmiral
NXP Employee
NXP Employee

Hi Sugiyama-san,

Your document looks to be correct.

<Question> I think it only output 8 DQS edge for coarse calibration, and fine calibration immediately  start after coarse. I don't think wrap case appeared. Are there any case to wrapping back?

Yes, the fine calibration portion begins immediately after the coarse calibration portion. My comment was mostly about how the fine calibration portion determine the starting point for beginning its search. In the following cases, the starting point is easy to determine. It is simply the coarse delay value used for the stage with the left most '0' result that happens after a '1' result:

00011110b  >> Fine calibration run starts with 0/8 tCK delay.

00111100b  >> Fine calibration run starts with 1/8 tCK delay.

01111000b  >> Fine calibration run starts with 2/8 tCK delay.

11110000b  >> Fine calibration run starts with 3/8 tCK delay.

11100001b  >> Fine calibration run starts with 4/8 tCK delay.

11000011b  >> Fine calibration run starts with 5/8 tCK delay.

10000111b  >> Fine calibration run starts with 6/8 tCK delay.

If the results reported back look like 00001111b, then the statement above causes confusion because people will say there is no '0' that follows a '1'. But the reality is that the left most '0' is the correct choicse, for a delay of 7/8 tCK cycle. That is because if another 1/8 delay were added, 8/8 tCK = 0/8 tCK and the result would be '1'. Then it would look like the left most '0' did follow a '1' bit. Maybe if I showed what the results would like like if I applied 0/8 - 23/8 tCK delay:

00001111_00001111_00001111

The you should be able to see that both a 7/8 tCK and a 15/8 tCK delay would be the correct starting point. For portable applications, it is always correct to choose the smallest value. 

One other note about immediately starting the fine calibration routine: If the fine calibration routine cannot find a starting point (because the results value were either 0xFF or 0x00), then the HW_ERR flag will be set. It's not that the controller will do this first. It happens because the controller is trying to do a fine calibration, but cannot find a correct starting point, so it just gives an error flag instead.

<Question> Does this means  if delay value is over ¾ clock (0x140) delay, it change to 0x000 because of JEDEC tolerance?

Yes, that is what it means. The actual set point is 0x148 just to make sure we are <25% away.

In practice, very few customers will have designs where SDCLK arrives >5% tCK ahead of the DQS strobe.

Cheers,

Mark

0 Kudos
Reply
2,746 Views
sugiyamatoshihi
Contributor V

Hi, Mark,

Thank you for detail explanation. Now I understood.

I'd like to just confirm. Does set point difference 0x8 between 0x148 and 0x140 means considered margin value?

Does this margin considered as setup time for SDCLK to DQS rising edge?

Best Regards,

Sugiyama

0 Kudos
Reply
2,746 Views
TheAdmiral
NXP Employee
NXP Employee

Hi Sugiyama,

First, unless a customer grossly makes errors in laying out the DDR traces, no one is going to have a board in which the DQS strobe edge arrives 25% of tCK after the SDCLK edge. That would require making the DQS trace over 2.5 inches longer than the SDCLK trace.

But, if we said that we were going to reset a delay of exactly 25% to 0%, we would have customers compain for months that this, that or the other thing could make it 26% and cause a JEDEC violation. (Even though a JEDEC violation would have no practical effect on memory operation (ie, not memory corruption)

So, just to keep customers from complaining and our wasting time, we picked a number that was a little less than 25%. We are actually resetting any value that is greater than 200/256 (0x148) mostly because 200 was a nice round number.

Picking the number 200 had nothing to do with considerations for setup time for SDCLK to DQS rising edge. There is not setup time consideration for this parameter. It is simply edge to edge to try to keep the DQS byte lanes roughly aligned with teh SDCLK because the SDCLK strobe is the clock signal that runs the inner logic of the DDR devices.

Cheers,

Mark

0 Kudos
Reply
2,746 Views
sugiyamatoshihi
Contributor V

Hi, Mark,

Thank you for detail explanation.

I clearly understood.

Best Regards,

Sugiyama

0 Kudos
Reply
2,746 Views
sugiyamatoshihi
Contributor V

Hi, Jan,

I'd like to ask more detail question about HW Write Leveling Calibration. So, I will use Technical support CASE, because it might be involved design team.

Do you agree?

Best Regards,

Sugiyama

0 Kudos
Reply