Hi Sugiyama-san,
When the Write Leveling calibration routine is being performed, it will tell you if you are running the HW version or the SW version. The code has both, and the determinition is made when the source code is compiled. The same SW version could be compiled either way. This is an example printout of the standard test running Write Level HW calibration:
Start write leveling calibration...
running Write level HW calibration
Write leveling calibration completed, update the following registers in your initialization script
MMDC_MPWLDECTRL0 ch0 (0x021b080c) = 0x0019001B
MMDC_MPWLDECTRL1 ch0 (0x021b0810) = 0x0024001D
Register MPWLGCR does the following things:
Bit 0 will initiate HW Write Leveling Calibration routine.
Bits [11:8] will report if an error was detected on any Byte Lane during HW Write Leveling Calibration. These fields are meaningless if running the SW Write Leveling Calibration routine. The reason that these fields are meaningless is because the coarse calibration routine is not being conducted and the MPWLHWERR register is not being updated. So there is no mechanism to report an error.
Bit 1 will initiate SW Write Leveling Calibration routine.
Bits [7:4] will report the results of the prime bit being sent back from the DDR3 device. These fields are meaningless if running the HW Write Leveling Calibration routine. The reason that these fields are meaningless is because there is no way to determine the last delay value used in making the comparison. They may toggle high/low, but they provide no useful information.
Bit 2 sets an additional amount of delay to wait to issue the DQS pulse after Bit 1 is asserted.
The timing value specified by the Write Leveling Calibration routine in registers MPWLDECTRL0/1 are the number of 1/256 fractions of a clock cycle that the DQS strobe needs to be delayed so it arrives at the same time as the SDCLK edge.
Register MPWLHWERR is rather poorly named. Yes, if an entire byte field is either all high or all low (ie, 0xFF or 0x00), then there is an error. But the register is also used to determine which eighth of a clock cycle should be used as the base starting point for the fine calibration phase. It is only used in the HW calibration mode. Each byte represents the results of 8 different DQS to SDCLK timing measurements made with different delays applied to the DQS strobe. The indications are as follows:
- 0 means SDCLK is low when DQS strobe edge arrives, 1 means that SDCLK is high when DQS strobe edge arrives.
Bit 0 - 0/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.
Bit 1 - 1/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.
Bit 2 - 2/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.
Bit 3 - 3/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.
Bit 4 - 4/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.
Bit 5 - 5/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.
Bit 6 - 6/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.
Bit 7 - 7/8 tCK time delay applied to DQS strobe Reports status of clock level when DQS edge arrives.
A value of 0x1F means that a low clock level was only seen when 5/8, 6/8, 7/8 tCK delays were added.
A value of 0x3E means that a low clock level was only seen when 0/8, 6/8, 7/8 tCK delays were added.
If bit 7 was low and bit 0 was high, then the fine calibration routine would start with a base delay of 7/8 tCK and would run through 8/8 tCK delay finding the exact 1/256 delay value at which the DQS edge arrives just before the SDCLK edge. This is the case of HW_WL2_DQ = 0x1F.
If bit 0 was low and bit 1 was high, then the fine calibration routine would start with a base delay of 0/8 tCK and would run through 1/8 tCK delay finding the exact 1/256 delay value at which the DQS edge arrives just before the SDCLK edge. This is the case of HW_WLn_DQ = 0x3E.
Yes, the Stress Test check the MPWLGCR HW_ERR fields and will report in the debug printout if there was an error detected.
>> It is important to note that the determination of a HW_ERR is only made during the coarse calibration of the routine. So, if a field in the MPWLGCR register does not equal either 0x00 or 0xFF, then no error will be reported. Even if the results look like 00011111, and it doesn't seem as if there was a 10 edge, the results must be considered like a circle, with bit 7 wrapping back around to bit 0.
Having said all that, the DDR Stress Test does something that we do not advertise to the users. The Stress Test iself looks at the values of the MPWLDECTRL0/1 fields before reporting results, and if it sees any filed with a value greater than 200/256 delay (reported as half-cycle = 0x1 and ABS_OFFSET > 0x48), the DDR Stress test will reset the Write Leveling delay for this lane to 0x000 and not report it in the log.
The reason that the DDR Stress test does this is because a delay of more than 78% a clock cycle means that the DQS edge is arriving within the JEDEC tolerence of 25% of the clock edge. In most cases, DQS is arriving < 5% tCK of the SDCLK edge in the early case, and it does not make sense to delay the DQS strobe almost a full clock cycle and add extra latency to each Write burst just to make the two edges align exactly. In this case, we are guilty of making a decision for the customer without telling them we are doing it so that we don't have to provide the above explanation to every customer. They don't need to know it.
Now to explain your results: In the left screen shot, you have added a capacitor to the SDCLK traces, effectively slowing the SDCLK signal. (You could have manually added some delay to the SDCLK trace using register MPSDCTRL). So in the first part of the calibration routine, 0/8 delay caused the DQS strobe to arrive before the clock edge, 1/8, 2/8, 3/8, and 4/8 delay caused the DQS strobe to arrive after the rising edge of the clock, and 5/8, 6/8, and 7/8 delay caused the DQS strobe to arrive after the falling edge of the clock. Using this information, the Write Leveling routine starts with a base delay of 0/8 tCK and added 1/256 tCK sequentially. In this case, it looks like 1/256 and 2/256 delay causes the DQS edge to arrive before the rising CLK edge and a 3/256 delay causes the DQS edge to arrive after the rising CLK edge (The test already knows the results of 0/256 delay, and the results of 32/256 delay). Eventually in the fine tune routine, the algorithm will walk back and find the edge by decreasing the amount of delay, and then will fine tune itself to find the exact value.
>> Another important thing to note is that there are not 256 delay elements in a full clock cycle. The length of a delay element is fixed at ~ 16 pico seconds. So @ 400 MHZ with tCK = 2.5 nanoseconds, there will be ~ 156 delay elements. Therefore, not every increase of 1/256 will add another delay element.
For the example on the right, the first part of the calibration routine has determined that 0/8, 1/8, 2/8, 3/8, and 4/8 delays have caused the DQS edge to arrive after the rising edge of the SDCLK. Then 5/8, 6/8, and 7/8 delays have caused the DQS edge to arrive after the falling edge of the SDCLK. Moving into the Fine Tune portion of the Calibration routine, the algorithm already knows that 7/8 delay causes the DQS strobe to arrive before the SDCLK edge and the 8/8 = 0/8 delay causes the strobe edge to arrive after the SDCLK edge. So the fine tune routine works within these two bounds to find the best setting. In your screen shot, it looks like 225/256 delay through 255/256 delay all cause the DQS strobe to arrive before the SDCLK. Therefore, the algorithm concludes that a setting of 256/256 = 0/256 is best, based on the results of the coarse calibration routine. Actually, the algorithm may have concluded that 255/256 is best, but our automatic correction is going to reset it to 0/256.
Hopefully this clears up all of your questions.
Cheers,
Mark