Hi
I have a just designed board, based on iMX6DL and mostly based on Sabre design. Main difference is that it has only RAM, eMMC and RGMII interface to communicate with another board. Voltage is fixed and set to 1.125V for VDDARM and VDDSOC. Ram is 4 x MT41K256M16HA-125 chips powered from 1.3V(one sample is running on 1.5V). Total memory is 2Gb with 64 bit bus. Layout and routing for DDR3 have been fully copypasted from Sabre board.
And now I have the following problem
Board perfectly identifies by DDR Stress tool v2.52, but fails to do a calibration at 400 MHz and saying
Starting DQS gating calibration
. . . . . . . . . . . . . . ERROR FOUND, we can't get suitable value !!!!
dram test fails for all values.
Error: failed during ddr calibration
When I am reading back values from RAM, I see the following data
0x10000000: 0x00000000 0x00000000 0xFFFFFFFF 0xFFFFFFFF
0x10000010: 0x01010101 0x01010101 0xFEFEFEFE 0xFEFEFEFE
0x10000020: 0x02020202 0x02020202 0xFDFDFDFD 0xFDFDFDFD
0x10000030: 0x03030303 0x03030303 0xFCFCFCFC 0xFCFCFCFC
0x10000040: 0xFF00FF12 0x40FFFF90 0xFFBBCCFF 0x40FFFF90
0x10000050: 0xFFBBFFFF 0xFFFFFF00 0xFFB3FFFF 0xFFFFFFFF
0x10000060: 0xFFB3FFFF 0xFFFFFFFF 0xFFA2FFFF 0xFFFFFFFF
0x10000070: 0xFF20FFFF 0xFFFFFFFF 0xFF20FFFF 0xFFFFFFFF
0x10000080: 0xFFFFCC12 0xFFFFFFFF 0xFF20FFFF 0xFFFFFFFB
0x10000090: 0xFF00FFF3 0xFFFFFFFB 0xFF00FD73 0x79FBFFFA
0x100000A0: 0xFF00ED00 0x49FBFFF2 0xFFFFCCFF 0x49FBFFD2
0x100000B0: 0xFAFFCCFF 0x49FFFFD2 0xBAFBCCFF 0x48FFFF90
0x100000C0: 0xFF00FF12 0x40FFFF90 0xFFBBCCFF 0x40FFFF90
0x100000D0: 0xFFBBFFFF 0xFFFFFF00 0xFFB3FFFF 0xFFFFFFFF
0x100000E0: 0xFFFFFFFF 0xFFFFFF90 0xFFFFFFFF 0xFFFFFF90
0x100000F0: 0xFFFFFFFF 0xFFFFFFFF 0xFFFFFFFF 0xFFFFFFFF
I don’t know what must read, but I presume that problem starts at 0x10000040 and calibration test fails, because it doesn’t see expected data
When I kick memory Stress Test at 400Mhz – I see it failing with usually that burst of 8x32bit have not been written. Memory test around failed location usually looks like this
0x10000160: 0x10000160 0x10000164 0x10000168 0x1000016C
0x10000170: 0x10000170 0x10000174 0x10000178 0x1000017C
0x10000180: 0xBA2000F3 0x49FFFF90 0xBA20FF73 0x48FBFF90
0x10000190: 0x0000FFFF 0x48FFFF00 0xFFFFFFFF 0x40FBFFFF
0x100001A0: 0x100001A0 0x100001A4 0x100001A8 0x100001AC
0x100001B0: 0x100001B0 0x100001B4 0x100001B8 0x100001BC
0x100001C0: 0x100001C0 0x100001C4 0x100001C8 0x100001CC
0x100001D0: 0x100001D0 0x100001D4 0x100001D8 0x100001DC
0x100001E0: 0x100001E0 0x100001E4 0x100001E8 0x100001EC
0x100001F0: 0x100001F0 0x100001F4 0x100001F8 0x100001FC
0x10000200: 0x10000200 0x10000204 0x10000208 0x1000020C
0x10000210: 0x10000210 0x10000214 0x10000218 0x1000021C
0x10000220: 0x10000220 0x10000224 0x10000228 0x1000022C
0x10000230: 0x10000230 0x10000234 0x10000238 0x1000023C
0x10000240: 0x10000240 0x10000244 0x10000248 0x1000024C
0x10000250: 0x10000250 0x10000254 0x10000258 0x1000025C
Address of first failure is always different, for example another run, where it has 8x32bit words failed sequentially.
0x10000120: 0x10000120 0x10000124 0x10000128 0x1000012C
0x10000130: 0x10000130 0x10000134 0x10000138 0x1000013C
0x10000140: 0xFFFFFFFF 0x84FFFFF2 0xFF20FFFF 0x40FFFFF2
0x10000150: 0xFE20FFFF 0x94FBFFD2 0xFA20FFFF 0xFFFBFF90
0x10000160: 0xBA00FFF3 0xFF00FF90 0xBA00FFF3 0x49FBFF90
0x10000170: 0xFFFFFDFF 0x49FFFF90 0xFFFFFDFF 0x49FFFF90
0x10000180: 0x10000180 0x10000184 0x10000188 0x1000018C
0x10000190: 0x10000190 0x10000194 0x10000198 0x1000019C
0x100001A0: 0x100001A0 0x100001A4 0x100001A8 0x100001AC
0x100001B0: 0x100001B0 0x100001B4 0x100001B8 0x100001BC
0x100001C0: 0x100001C0 0x100001C4 0x100001C8 0x100001CC
0x100001D0: 0x100001D0 0x100001D4 0x100001D8 0x100001DC
0x100001E0: 0x100001E0 0x100001E4 0x100001E8 0x100001EC
0x100001F0: 0x100001F0 0x100001F4 0x100001F8 0x100001FC
0x10000200: 0x10000200 0x10000204 0x10000208 0x1000020C
0x10000210: 0x10000210 0x10000214 0x10000218 0x1000021C
If I will try to write into faulty locations by Stress Toll 2.52, it will always pass. Also if I will start Stress test again, next time it will successfully overwrite what has not been written before and will fail in different location which will be few hundreds words after first fault.
My first thought was that there is an issue with system integrity and something critical has not been copied or I have excessive noise on VREF.
DDR can work with 4 words bursts, may during test it switchin 4 words mode and 4x64 bits can give me 8x32 bits. To verify this, I reduced bus width to 32bit and I got exactly the same effect.
This is memory dumps for 32bit mode
0x100000C0: 0x100000C0 0x100000C4 0x100000C8 0x100000CC
0x100000D0: 0x100000D0 0x100000D4 0x100000D8 0x100000DC
0x100000E0: 0xFF00FF00 0xFE6BFFFF 0xBAB3FFFF 0xBAA0FFFF
0x100000F0: 0x00A0FFF3 0xFF20FF73 0xFF00FFFF 0xFF00FDFF
0x10000100: 0x10000100 0x10000104 0x10000108 0x1000010C
0x10000110: 0x10000110 0x10000114 0x10000118 0x1000011C
0x10000120: 0x10000120 0x10000124 0x10000128 0x1000012C
0x10000130: 0x10000130 0x10000134 0x10000138 0x1000013C
0x10000140: 0x10000140 0x10000144 0x10000148 0x1000014C
0x10000150: 0x10000150 0x10000154 0x10000158 0x1000015C
0x10000160: 0x10000160 0x10000164 0x10000168 0x1000016C
0x10000170: 0x10000170 0x10000174 0x10000178 0x1000017C
0x10000180: 0x10000180 0x10000184 0x10000188 0x1000018C
0x10000190: 0x10000190 0x10000194 0x10000198 0x1000019C
0x100001A0: 0x100001A0 0x100001A4 0x100001A8 0x100001AC
0x100001B0: 0x100001B0 0x100001B4 0x100001B8 0x100001BC
0x10000240: 0x10000240 0x10000244 0x10000248 0x1000024C
0x10000250: 0x10000250 0x10000254 0x10000258 0x1000025C
0x10000260: 0xFF00FF12 0xFFB3FFFF 0xFFA2FFFF 0xFEA0FFFF
0x10000270: 0xBAA0FFFF 0xBA20FFFF 0x0000FFFF 0xFF00FDFF
0x10000280: 0x10000280 0x10000284 0x10000288 0x1000028C
0x10000290: 0x10000290 0x10000294 0x10000298 0x1000029C
0x100002A0: 0x100002A0 0x100002A4 0x100002A8 0x100002AC
0x100002B0: 0x100002B0 0x100002B4 0x100002B8 0x100002BC
0x100002C0: 0x100002C0 0x100002C4 0x100002C8 0x100002CC
0x100002D0: 0x100002D0 0x100002D4 0x100002D8 0x100002DC
0x100002E0: 0x100002E0 0x100002E4 0x100002E8 0x100002EC
0x100002F0: 0x100002F0 0x100002F4 0x100002F8 0x100002FC
0x10000300: 0x10000300 0x10000304 0x10000308 0x1000030C
0x10000310: 0x10000310 0x10000314 0x10000318 0x1000031C
0x10000320: 0x10000320 0x10000324 0x10000328 0x1000032C
0x10000330: 0x10000330 0x10000334 0x10000338 0x1000033C
I see exactly the same 8 words as it was before and from my experience signal integrity usually is not so accurately tied with bus width. Then I reduced to 16 bit and I very similar 8 words corruption even in 16 bit mode.
0x10000100: 0x10000100 0x10000104 0x10000108 0x1000010C
0x10000110: 0x10000110 0x10000114 0x10000118 0x1000011C
0x10000120: 0xFF73FFFF 0xFF73FF00 0xFF73FF00 0xCC73FF00
0x10000130: 0xCCFF0000 0xCCFF00FF 0xCCFF00FF 0xFFFFFFFF
0x10000140: 0x10000140 0x10000144 0x10000148 0x1000014C
0x10000150: 0x10000150 0x10000154 0x10000158 0x1000015C
0x10000160: 0x10000160 0x10000164 0x10000168 0x1000016C
0x10000170: 0x10000170 0x10000174 0x10000178 0x1000017C
0x10000180: 0x10000180 0x10000184 0x10000188 0x1000018C
0x10000190: 0x10000190 0x10000194 0x10000198 0x1000019C
0x100001A0: 0x100001A0 0x100001A4 0x100001A8 0x100001AC
0x100001B0: 0x100001B0 0x100001B4 0x100001B8 0x100001BC
0x100001C0: 0x100001C0 0x100001C4 0x100001C8 0x100001CC
0x100001D0: 0x100001D0 0x100001D4 0x100001D8 0x100001DC
0x100001E0: 0x100001E0 0x100001E4 0x100001E8 0x100001EC
0x100001F0: 0x100001F0 0x100001F4 0x100001F8 0x100001FC
I am not an expert in DDR3, but from what I see this is something more that signal integrity. If something would be wrong with data lines, I would have data corruption not aligned to burst of 8x32bit words. If something would be wrong on address lines, I would see increase of damaged data when I am decreasing bus width. The same applies to control signals, which shouldn’t depend from bus width and size of corrupted area shouldn’t depend from bus width.
I played with termination options on DQ/ADDR/CMD/CK/DQS/ODT – it was making only worse, but never better. I played with WALAT,RALAT and CAS latency – doesn’t make much difference. Switching to DDR3-133,1066 and 800 doesn't change anything. Noise on VDDQ and VREF is difficult to measure, on scope it looks more than 1% but it could be noise introduced by oscilloscope probe
Any ideas what it could be or where else I can look. I don’t have much test points apart from clock.
Best regards,
Alex
Solved. VDDHIGH_CAP suppose to be connected through zero ohm resistor to 2V5, but resistor hasn't been populated
Hi Alex
what about other tests, had they passed?
DQS gating calibration error is not fatal and may be ignored if
other tests passed.
It is highly recommended to check below documents for better understanding
ddr test operation (please create service request if problem with access arises)
Freescale i.MX6 DRAM Port Application Guide-DDR3 | NXP Community
DDR training slides p.23
https://community.nxp.com/docs/DOC-331528
If there are errors with particular bits, one can attach jtag and check
waveforms with oscilloscope. Sometimes drive strength (configured via
IOMUXC_SW_PAD_CTL_GRP_B2DS and similar registers) of particular
bits may be tweaked.
Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Igor,
Thanks for your reply.
So far all memory tests are failing. Because my Windows don't like old DDR test tool, I am using v2.52 through USB.
It fails immediately after I kick Stress Test. If wont do Calibration, it will fail reliably after first 8x32bit words,(0x10000020). After calibration it can go as high as 0x10000920.
I tried settings for drive strength, they can make everything worse, but I don't see much improvements.
I checked PCB according to your documents
1) Found bulk cap on VDD_SNVS_CAP - removed, made no difference
2) stitching could be done better, there are not much stitching vias on edge of board where DDR sits, but there are a lot of stitching vias near DDR chips itself
3) reference planes so far looks OK, but I did only few DDR3 signals, will do more today
4) placing and routing decoupling caps matches SABRE board
5) I didn't finish ripple measurements, but so far VDDARM and VDDSOC looks OK
Anything else what I can have a look?
Best regards,
Alex