AnsweredAssumed Answered

DDR3 doesn't work with iMX6DL on new PCB

Question asked by Alex Kondabarov on Sep 6, 2016
Latest reply on Sep 23, 2016 by Alex Kondabarov

Hi

I have a just designed board, based on iMX6DL and mostly based on Sabre design. Main difference is that it has only RAM, eMMC and RGMII interface to communicate with another board.  Voltage is fixed and set to 1.125V for VDDARM and VDDSOC. Ram is 4 x MT41K256M16HA-125 chips powered from 1.3V(one sample is running on 1.5V). Total memory is 2Gb with 64 bit bus. Layout and routing for DDR3 have been fully copypasted from Sabre board.

And now I have the following problem

Board perfectly identifies by DDR Stress tool v2.52, but fails to do a calibration at 400 MHz and saying

Starting DQS gating calibration

. . . . . . . . . . . . . . ERROR FOUND, we can't get suitable value !!!!

dram test fails for all values.

 

Error: failed during ddr calibration

 

When I am reading back values from RAM, I see the following data

0x10000000:       0x00000000         0x00000000         0xFFFFFFFF         0xFFFFFFFF        

0x10000010:       0x01010101         0x01010101         0xFEFEFEFE         0xFEFEFEFE        

0x10000020:       0x02020202         0x02020202         0xFDFDFDFD      0xFDFDFDFD     

0x10000030:       0x03030303         0x03030303         0xFCFCFCFC       0xFCFCFCFC      

0x10000040:       0xFF00FF12         0x40FFFF90         0xFFBBCCFF       0x40FFFF90        

0x10000050:       0xFFBBFFFF        0xFFFFFF00         0xFFB3FFFF        0xFFFFFFFF        

0x10000060:       0xFFB3FFFF        0xFFFFFFFF         0xFFA2FFFF        0xFFFFFFFF        

0x10000070:       0xFF20FFFF         0xFFFFFFFF         0xFF20FFFF         0xFFFFFFFF        

0x10000080:       0xFFFFCC12        0xFFFFFFFF         0xFF20FFFF         0xFFFFFFFB       

0x10000090:       0xFF00FFF3         0xFFFFFFFB        0xFF00FD73        0x79FBFFFA       

0x100000A0:       0xFF00ED00        0x49FBFFF2        0xFFFFCCFF        0x49FBFFD2       

0x100000B0:       0xFAFFCCFF       0x49FFFFD2        0xBAFBCCFF       0x48FFFF90        

0x100000C0:       0xFF00FF12         0x40FFFF90         0xFFBBCCFF       0x40FFFF90        

0x100000D0:       0xFFBBFFFF        0xFFFFFF00         0xFFB3FFFF        0xFFFFFFFF        

0x100000E0:       0xFFFFFFFF         0xFFFFFF90         0xFFFFFFFF         0xFFFFFF90        

0x100000F0:       0xFFFFFFFF         0xFFFFFFFF         0xFFFFFFFF         0xFFFFFFFF

 

I don’t know what must read, but I presume that problem starts at 0x10000040 and calibration test fails, because it doesn’t see expected data

When I kick memory Stress Test at 400Mhz – I see it failing with usually that burst of 8x32bit have not been written. Memory test around failed location usually looks like this

 

0x10000160:       0x10000160         0x10000164         0x10000168         0x1000016C       

0x10000170:       0x10000170         0x10000174         0x10000178         0x1000017C       

0x10000180:       0xBA2000F3        0x49FFFF90         0xBA20FF73        0x48FBFF90       

0x10000190:       0x0000FFFF         0x48FFFF00         0xFFFFFFFF         0x40FBFFFF       

0x100001A0:       0x100001A0        0x100001A4        0x100001A8        0x100001AC       

0x100001B0:       0x100001B0        0x100001B4        0x100001B8        0x100001BC       

0x100001C0:       0x100001C0        0x100001C4        0x100001C8        0x100001CC       

0x100001D0:       0x100001D0        0x100001D4        0x100001D8        0x100001DC

0x100001E0:       0x100001E0         0x100001E4         0x100001E8         0x100001EC       

0x100001F0:       0x100001F0         0x100001F4         0x100001F8         0x100001FC       

0x10000200:       0x10000200         0x10000204         0x10000208         0x1000020C       

0x10000210:       0x10000210         0x10000214         0x10000218         0x1000021C       

0x10000220:       0x10000220         0x10000224         0x10000228         0x1000022C       

0x10000230:       0x10000230         0x10000234         0x10000238         0x1000023C       

0x10000240:       0x10000240         0x10000244         0x10000248         0x1000024C       

0x10000250:       0x10000250         0x10000254         0x10000258         0x1000025C

 

Address of first failure is always different, for example another run, where it has 8x32bit words failed sequentially.

0x10000120:       0x10000120         0x10000124         0x10000128         0x1000012C       

0x10000130:       0x10000130         0x10000134         0x10000138         0x1000013C       

0x10000140:       0xFFFFFFFF         0x84FFFFF2         0xFF20FFFF         0x40FFFFF2        

0x10000150:       0xFE20FFFF         0x94FBFFD2        0xFA20FFFF        0xFFFBFF90       

0x10000160:       0xBA00FFF3        0xFF00FF90         0xBA00FFF3        0x49FBFF90       

0x10000170:       0xFFFFFDFF        0x49FFFF90         0xFFFFFDFF        0x49FFFF90        

0x10000180:       0x10000180         0x10000184         0x10000188         0x1000018C       

0x10000190:       0x10000190         0x10000194         0x10000198         0x1000019C       

0x100001A0:       0x100001A0        0x100001A4        0x100001A8        0x100001AC       

0x100001B0:       0x100001B0        0x100001B4        0x100001B8        0x100001BC       

0x100001C0:       0x100001C0        0x100001C4        0x100001C8        0x100001CC       

0x100001D0:       0x100001D0        0x100001D4        0x100001D8        0x100001DC       

0x100001E0:       0x100001E0         0x100001E4         0x100001E8         0x100001EC       

0x100001F0:       0x100001F0         0x100001F4         0x100001F8         0x100001FC       

0x10000200:       0x10000200         0x10000204         0x10000208         0x1000020C       

0x10000210:       0x10000210         0x10000214         0x10000218         0x1000021C

If I will try to write into faulty locations by Stress Toll 2.52, it will always pass. Also if I will start Stress test again, next time it will successfully overwrite what has not been written before and will fail in different location which will be few hundreds words after first fault.

My first thought was that there is an issue with system integrity and something critical has not been copied or I have excessive noise on VREF.

DDR can work with 4 words bursts, may during test it switchin 4 words mode and 4x64 bits can give me 8x32 bits. To verify this, I reduced bus width to 32bit and I got exactly the same effect.

This is memory dumps for 32bit mode

0x100000C0:       0x100000C0        0x100000C4        0x100000C8        0x100000CC       

0x100000D0:       0x100000D0        0x100000D4        0x100000D8        0x100000DC       

0x100000E0:       0xFF00FF00         0xFE6BFFFF        0xBAB3FFFF       0xBAA0FFFF      

0x100000F0:       0x00A0FFF3        0xFF20FF73         0xFF00FFFF         0xFF00FDFF       

0x10000100:       0x10000100         0x10000104         0x10000108         0x1000010C       

0x10000110:       0x10000110         0x10000114         0x10000118         0x1000011C       

0x10000120:       0x10000120         0x10000124         0x10000128         0x1000012C       

0x10000130:       0x10000130         0x10000134         0x10000138         0x1000013C       

0x10000140:       0x10000140         0x10000144         0x10000148         0x1000014C       

0x10000150:       0x10000150         0x10000154         0x10000158         0x1000015C       

0x10000160:       0x10000160         0x10000164         0x10000168         0x1000016C       

0x10000170:       0x10000170         0x10000174         0x10000178         0x1000017C       

0x10000180:       0x10000180         0x10000184         0x10000188         0x1000018C       

0x10000190:       0x10000190         0x10000194         0x10000198         0x1000019C       

0x100001A0:       0x100001A0        0x100001A4        0x100001A8        0x100001AC       

0x100001B0:       0x100001B0        0x100001B4        0x100001B8        0x100001BC

 

 

0x10000240:       0x10000240         0x10000244         0x10000248         0x1000024C       

0x10000250:       0x10000250         0x10000254         0x10000258         0x1000025C       

0x10000260:       0xFF00FF12         0xFFB3FFFF        0xFFA2FFFF        0xFEA0FFFF       

0x10000270:       0xBAA0FFFF       0xBA20FFFF        0x0000FFFF         0xFF00FDFF       

0x10000280:       0x10000280         0x10000284         0x10000288         0x1000028C       

0x10000290:       0x10000290         0x10000294         0x10000298         0x1000029C       

0x100002A0:       0x100002A0        0x100002A4        0x100002A8        0x100002AC       

0x100002B0:       0x100002B0        0x100002B4        0x100002B8        0x100002BC

0x100002C0:       0x100002C0        0x100002C4        0x100002C8        0x100002CC       

0x100002D0:       0x100002D0        0x100002D4        0x100002D8        0x100002DC       

0x100002E0:       0x100002E0         0x100002E4         0x100002E8         0x100002EC       

0x100002F0:       0x100002F0         0x100002F4         0x100002F8         0x100002FC       

0x10000300:       0x10000300         0x10000304         0x10000308         0x1000030C       

0x10000310:       0x10000310         0x10000314         0x10000318         0x1000031C       

0x10000320:       0x10000320         0x10000324         0x10000328         0x1000032C       

0x10000330:       0x10000330         0x10000334         0x10000338         0x1000033C       

 

 I see exactly the same 8 words as it was before and from my experience signal integrity usually is not so accurately tied with bus width. Then I reduced to 16 bit and I very similar 8 words corruption even in 16 bit mode.

 

0x10000100:       0x10000100         0x10000104         0x10000108         0x1000010C       

0x10000110:       0x10000110         0x10000114         0x10000118         0x1000011C       

0x10000120:       0xFF73FFFF         0xFF73FF00         0xFF73FF00         0xCC73FF00       

0x10000130:       0xCCFF0000        0xCCFF00FF        0xCCFF00FF        0xFFFFFFFF        

0x10000140:       0x10000140         0x10000144         0x10000148         0x1000014C       

0x10000150:       0x10000150         0x10000154         0x10000158         0x1000015C       

0x10000160:       0x10000160         0x10000164         0x10000168         0x1000016C       

0x10000170:       0x10000170         0x10000174         0x10000178         0x1000017C       

0x10000180:       0x10000180         0x10000184         0x10000188         0x1000018C       

0x10000190:       0x10000190         0x10000194         0x10000198         0x1000019C       

0x100001A0:       0x100001A0        0x100001A4        0x100001A8        0x100001AC       

0x100001B0:       0x100001B0        0x100001B4        0x100001B8        0x100001BC       

0x100001C0:       0x100001C0        0x100001C4        0x100001C8        0x100001CC       

0x100001D0:       0x100001D0        0x100001D4        0x100001D8        0x100001DC       

0x100001E0:       0x100001E0         0x100001E4         0x100001E8         0x100001EC       

0x100001F0:       0x100001F0         0x100001F4         0x100001F8         0x100001FC       

 

I am not an expert in DDR3, but from what I see this is something more that signal integrity. If something would be wrong with data lines, I would have data corruption not aligned to burst of 8x32bit words. If something would be wrong on address lines, I would see increase of damaged data when I am decreasing bus width.   The same applies to control signals, which shouldn’t depend from bus width and size of corrupted area shouldn’t depend from bus width.

I played with termination options on DQ/ADDR/CMD/CK/DQS/ODT – it was making only worse, but never better. I played with WALAT,RALAT and CAS latency – doesn’t make much difference. Switching to DDR3-133,1066 and 800 doesn't change anything. Noise on VDDQ and VREF is difficult to measure, on scope it looks more than 1% but it could be noise introduced by oscilloscope probe

 

Any ideas what it could be or where else I can look. I don’t have much test points apart from  clock.

 

Best regards,

 

Alex

Outcomes