 
					
				
		
Hello,
First of all, sorry for the length of the message but the problem is quite complex and I need to explain it in details to make it clear.
I'm currently trying to interface a MCP5484 with a SDRAM and I have some problems despite my numerous attempts to get it working properly and I could probably use some help from people who have some experience in this kind of design
The context is the following :
I use two 16 bits 46V16M16 -6T DDR memories for a total of 64 Mbytes which are similar to the memories embedded on the 5484 Lite kit development board (the only difference is the speed grade -75, the memory I use is faster). The schematics of my design mimics the development board regarding the SDRAM and the CPU part (no need to reinvent the wheel here) and the layout was performed with respect to all the considerations of path length, decoupling, etc.
I use a small software called bdmctrl to send bdm commands to the ColdFire and wrote a small script to get all the registers of the SDRAM controller initialized based on the operation mode described in the datasheet of the DDR and MCF5484.
First problem :
In some way, the SDRAM seems to respond to read and write requests but I still have two problems which prevent me from using it normally. Indeed there is a systematic misalignment of the data written inside the memory.
To explain it further, if I try to write the following in the memory (at each address I'm writing the lower byte of the address), I should read :
@0x1000 0000 : 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F
@0x1000 0010 : 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F
while I'm reading the following :
@0x1000 0000 : 0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
@0x1000 0010 : 0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
The pattern is simple and seems to be similar through the whole memory depth (as far as I tested) : bits 4 and 3 of the address are modified as followed (data meant to be written at an address containing bit 4 = 0 and bit 3 = 0 is written at the same address where bit 4 = 1 and bit 3 = 1 or in short : 00 =>11 ) :
00 => 11
01 => 00
10 => 01
11 => 10
Bits 4 and 3 of the address are translated by the SDRAM controller into bits C2 and C1 of the column address so that this problem seems to be linked to the burst mode. Indeed when I move from Burst Length (or BL) = 8 to Burst Length = 2, the misalignment disappears at the price of a severe performance loss which I cannot tolerate for my application. Trying to guess what the problem is, it's a bit as if the controller and/or the SDRAM mixed things up regarding the interleaved/sequential nature of the burst mode. However the SDRAM controller of the MCF5484 only supports sequential mode and the memory is also configured as sequential (bit 3 of the mode register). I tried interleaved mode in BL=8 just in case but data are still misaligned but not in the same way.
I'm pretty stuck here because the script I wrote with the very same BDM commands sent to the CPU of the development card does not produce that misalignment of the data even for a BL=8.
Second problem :
Among the read data, some of the bits just toggle for no good reason from one reading to another without performing any writing between these consecutive readings. It I try to successively read several times the same memory region, some of the bits may toggle (with no particular pattern and in what seems to be a random order) but then return to the value they should have after the following reading. Much stranger, some bytes suddenly toggle to 0xFF at random locations in the memory: this is something I cannot understand (with my limited knowledge of the internal SDRAM working) because there 32 bit data port is based on two separate chips of 16 bits operating in parallel. If there was a problem of timing, why would a single byte out of the 16 bits of the memory port have an incorrect 0xFF value while the other byte has the correct value ? Again this problem doesn't appear on the development board
This test was done filling the memory with the previously described content, the misalignment of the data being of course present. More than likely, these are two independent problems.
I tested a lot of things to get the design working but none of those succeeded :
Changing the RDLAT from 0x05 to 0x08 as suggested from one of the previous post about SDRAM issues (I use a CASL = 2.5 with a clock of 100 MHz)
Relaxing all the timings other than RDLAT and WLAT
Disassembling the code from Debug, the bootloader provided with the development board to make sure all the registers are written with the same values in my own script and there are no differences with the values I calculated based on the timing of the DDR
Verifying the layout, the schematics and the PCB as far as I could (the design of the PCB was outsourced, I only contributed to some of its supervision) but we have almost no test points and due to the inner layered strip-lined routes, the packages of the CPU (BGA) and the SDRAM (TSOP) , it's almost impossible to measure anything.
Sacrificing young SDRAM memories to the EDA gods but that didn't work either
Any help would be appreciated
Thanks a lot,
Alexis
P.S. : Here is the code I used
open /dev/bdmcfg0 reset sleep 100 # MBAR @0x8000 000 write-ctrl 0x0C0F 0x80000000 # SDRAM Initialization # SDRAMDS : all I/O are set in SSTL_2 Class II 15 mA compliant mode write 0x80000004 0x000002AA 4 # SDCS0 : place 64 Mbytes starting at base address 0x10000000 write 0x80000020 0x10000019 4 # SDCFG1 : timing considerations for the memory write 0x80000108 0x73711630 4 # SDCFG2 : timing considerations for the memory write 0x8000010C 0x46370000 4 # Issue PALL # SDCR write 0x80000104 0xE10B0002 4 # Initialize extended mode register # SDMR : Enable DLL and launch LEMR command write 0x80000100 0x40010000 4 # Initialize mode register # SDMR : Reset DLL and launch LMR command write 0x80000100 0x058D0000 4 # Wait a bit for the DLL to start (minimum 200 clock cycles at 100 MHz => 2µs) => sleeping for 1 ms should be enough sleep 1 # Issue PALL for the second time # SDCR write 0x80000104 0xE10B0002 4 # Perform two refresh cycles # SDCR : first refresh write 0x80000104 0xE10B0004 4 # SDCR : second refresh write 0x80000104 0xE10B0004 4 # Write to register mode for normal operation # SDMR write 0x80000100 0x018D0000 4 #Write to SDCR to lock SDMR write 0x80000104 0x610B0000 4 #Write to SDCR to enable the auto-refresh and enable all DQS write 0x80000104 0x710B0F00 4
Solved! Go to Solution.
 
					
				
		
Alexis,
I'll try to help.. But this will probably take several posts and exchanges to get to the bottom of this. If you prefer to take offline, I will understand, but if you can post more details then the community may benefit as a whole.
First I'll add a few pieces of background information for the broader audience. I suggest anyone following this thread review Alexis's previous posts as I'll reference some questions. Anyone familiar with DDR, I suggest just skimming down to the bottom where I've requested some more details so we can work through this problem.
DDR has a few simple truths.  Data is aligned by byte lane.  The DDR memories during a read cycle will source data independently (with some limits...   )  So on a 32 bit wide MCF548x bus, you'd see four byte lanes with four DM (data mask) and four bi-directional strobes called (DQS).  In theory on mostly in practice you can have some skew between data lanes, but need to minimize skew between data lines within a data lane.  As long as your board traces are matched DM, DQS, and DQ[x:y] for each byte lane, your half way there (I generally say a delta of <180ps at 133Mhz is matched.)  The next truth is VREF...  DDR controllers typically use an SSTL I/O method.  Although really...at 100Mhz or less it really doesn't provide much value.  That's why the lower performance 520x and 53xx families use a CMOS style driver just fine.  Anyway... The MCF548x/7x family does use SSTL I/Os and therefore the DDR and ColdFire (referred to as CF from now on) is referencing VREF to determine a high/low state.  The goal is to have a clean (limited ripple) and limited DC offset that is exactly the same for the DDR memory and the CF.
  )  So on a 32 bit wide MCF548x bus, you'd see four byte lanes with four DM (data mask) and four bi-directional strobes called (DQS).  In theory on mostly in practice you can have some skew between data lanes, but need to minimize skew between data lines within a data lane.  As long as your board traces are matched DM, DQS, and DQ[x:y] for each byte lane, your half way there (I generally say a delta of <180ps at 133Mhz is matched.)  The next truth is VREF...  DDR controllers typically use an SSTL I/O method.  Although really...at 100Mhz or less it really doesn't provide much value.  That's why the lower performance 520x and 53xx families use a CMOS style driver just fine.  Anyway... The MCF548x/7x family does use SSTL I/Os and therefore the DDR and ColdFire (referred to as CF from now on) is referencing VREF to determine a high/low state.  The goal is to have a clean (limited ripple) and limited DC offset that is exactly the same for the DDR memory and the CF.
Here are my trusty steps for validating a new DDR interface.
1. Clocking... A bad crossover at the DDR memory can cause sporadic shifts in data that will leave you baffled for weeks. Make sure you have a nice symmetric crossover... Meaning if the rise and fall edges are not symmetrical, your'll have a high cross over (meaning well about the 1.25V ideal) followed by a low crossover and then it repeats again. This creates some common mode spikes that you don't want. If you've followed most appnotes, you should probably have terminated the CLK and CLk_B with a resistor between the phases. I typically use something in the 100 to 120 ohm range. Place this resistor as close as possible to the DDR rams. If you can get a scope on these two signals. Measure them at the resistor and attach the scope capture.
2. DQS... This is an easy one to get crossed up and again will cause some funny patterns when you have other issues on top of this. Make sure you double check the EDA guy and that each DQS lines goes to the proper byte lane. I seen examples of copy and past where the schematic looks right but the exported netlist is wrong because of duplicate net names. So the layout engineer doesn't typically know that you didn't mean it. In your case.. Two 16bit wide DDRs means that you should have 2 DQS signals routed to each memory. DQS[0:3] or [3:0] I can't remember on this product family which way it goes. It is in the user's manual. So for example... Let's assume DQS[0:3] goes with DQ[31:0]. Make sure that DQS[0:1] goes to the DDR chip with DQ[31:16] and that you get the right DQS line on the DDR. If memory serves me you really have a DQS upper and lower... So double check the micron data sheet to get the byte lanes correct. Scope capture... Grab a picture at the via of the BGA and at the TSOP.
3. DM... Same as DQS. Verify connections.
4. RDLAT... Read latency is a fancy word for "mask." Because the SSTL buffers are true comparators, the slightest variation in VREF compared to a data line that is terminated to mid-rail, and you get extreme switching of the buffers (I/Os) internally. So RDLAT is used during read cycles to disable the CF inputs from latching data until the DQS preamble. The goal is get RDLAT to fire (expire) in the range of the preamble as the preamble is a quite period because the DDR memories start driving the data lines low in preparation for driving the first byte of data. The formulas in the manual are generally accurate except when you have really LONG traces betweent he DDR and CF or when you have extremely short traces between the DDR and the CF. if you have 3,4,5, or even 6 inches of trace on a standard 4,6,8 layer impedance matched board you should be fine. Meaning... at roughly 180ps per inch of prop delay on FR4 type material, you have a round trip that is much less than the granularity of the RDLAT timer. RDLAT in the most simple explanation is nothing more than a fancy timer that counts cycles from an internal state in the CF to when we expect data to come back from the DDR. That is why CAS latency is part of the formula. I believe...Again it has been a few years since I worked on this directly.... That RDLAT counts in 2x clocks. So if you are clocking the part at 100Mhz, its granularity is 200Mhz or 5ns. Again.. It would take a lot of trace delay on the out bound command bus + the return DQ bus to equate to an extra 5ns and therefore the need to increase the RDLAT.
Long story and some what a mini-appnote...
It sounds like we need to verify the robustness of your physical connections. Since you tell me that your script works on our EVB, I take your word that it is probably ok so far. And since you tell me that varying your RDLAT gives similar behaviour... I suggest we focus on the schematics and layout of the DDR bus.
1. Verify connections.
2. verify trace lenghts
3. verify Pwr supply for VTT and VREF... (Example... NEVER connect the two together...another common problem).
I'll try to respond to followup posts, but I'm traveling right now...
Hope this helps.
-JWW
 
					
				
		
JWW,
I found the solution yesterday just before reading your reply... bad timing I guess, I'm sorry about that :smileysad: It seems that your first recommandation was the one that did the trick for me: problem with the clock.
By having a closer look at the schematic and the MCF5484 datasheet, I realized that the configuration of the pull-up/down resistors connected to the address lines (whose value is read during boot to define several parameters) were not chosen correctly given our 50 MHz clock. So the input clock frequency on the EVB and on our PCB was the same but the internal bus and core frequency was too high on our PCB... due to the lack of measurement points I couln't figure that out (via's between the BGA and TSOP are unfortunalely placed under the TSOP package).
However I'm surprised that the DDR was able to perform so well (data was read back but not at the godd address) even at such high frequencies... a total lack of response from the controller/DDR would probably have lead me faster to a hardware problem.
I would like to really thank you for your detailed and very interesting answer: I'm truly sorry that I couldn't update the status of this thread to spare you the time of writing such an exhaustive reply. However I'm sure that the community will benefit from your advise.
I HIGHLY advise anyone having DDR problems to take some time to carefully read the reply of JW: this is very precious information that you won't find anywhere in the datasheet, the application notes or even on the web. I also recommend the intrested reader to have a look at a previous thread focused on RDlat value determination http://forums.freescale.com/freescale/board/message?board.id=CFCOMM&message.id=2161
Thank you again for your help and time
Alexis
Edit : BTW could you change the topic of this thread from "SRAM" into "SDRAM" ? I forgot the D in the original post (which changes a bit the focus of the discussion ;-)) and I'm not able to modify it: this could be helpful to anyone making some research on the keyword SDRAM on the forum. Thanks :smileyhappy:
 
					
				
		
Hello again,
Sorry to bump this thread but I'm still having problems and I'm running out of ideas to solve it. If any of you has even an idea about this particular issue, I would really appreciate your contribution :smileyhappy:
Thank you in advance !
 
					
				
		
Here is a small update on my current progress.
As a desperate attempt to solve the problem, I decided to change my investigation methodology: rather than trying to make the DDR on my PCB work properly, I tried to make the DDR of the development card fail and to reproduce the same erroneous behaviour as my PCB.
So I tried several things on the development card (with the bootloader erased from the flash) and it turns out that I was able to reproduce the two problems of my own PCB just by changing the value of Rdlat, the SDRAM read latency of the controller. Working with a CASL= 2.5, I observe different behaviours depending on the value of RDlat:
RDlat = 0x04 to 0x07 : it works fine, I'm able to read 0x00 at address 0x00 (see the first post) with no shift and no single bit error for the 128 first bytes of the memory
RDlat = 0x08 : I'm reading 0x04 at address 0x00 which makes sense because the controller likely misses the first DQS edges due to the masking performed by a overestimated value of RDlat.
RDlat = 0x09 : I'm reading 0x08 at address 0x00 meaning that the controller misses another DQS edge so that the data are shifted 32 bits (one word of a DDR) further from where they should be.
Etc.
With RDlat values beyond 0x08, I also see some FF values appearing at random places and some bits toggling from one reading to another.
However trying to modify the value of RDlat on my own PCB (all the other parameters being kept the same as the previous experiment performed on the development board) does not produce the same results at all ! Beyond a value of 0x04, I always read 0x08 at address 0x00, the rest of the data being shifted as explained in my first post: only the number of wrong FF values resulting from the successive readings seems to depend on the value of RDlat. I must say I'm pretty confused by this lack of consistency between the behaviour of the two cards and I cannot understand why RDlat has no influence on my own card. Even more disturbing is the fact that RDlat has only a meaning to the SDRAM controller and is completely ignored by the SDRAM that is supposed to have a constant latency (I mean a value independent of RDlat)... Could the controller be the source of the problem ? I highly doubt that but I have no reasonable explanation that would lead to a systematic shift of the data independent of RDlat value.
Of course I tried to read back the value of the SDCFG1 register to make sure that RDlat value was consistent with the value that I tried to write but no problem here. I also captured one of the DQS signals at the DDR side on the terminal resistor (with some difficulties, damned these SMD packages) and its shape and rising/fall times look acceptable except some small oscillations after the 4 pulses but nothing to be worried about.
I feel like I'm close to the answer but I'm still missing something here...
As always any help would be more than appreciated :smileyhappy:
More to come about the DDR summer jigsaw soon
Alexis
 
					
				
		
Alexis,
I'll try to help.. But this will probably take several posts and exchanges to get to the bottom of this. If you prefer to take offline, I will understand, but if you can post more details then the community may benefit as a whole.
First I'll add a few pieces of background information for the broader audience. I suggest anyone following this thread review Alexis's previous posts as I'll reference some questions. Anyone familiar with DDR, I suggest just skimming down to the bottom where I've requested some more details so we can work through this problem.
DDR has a few simple truths.  Data is aligned by byte lane.  The DDR memories during a read cycle will source data independently (with some limits...   )  So on a 32 bit wide MCF548x bus, you'd see four byte lanes with four DM (data mask) and four bi-directional strobes called (DQS).  In theory on mostly in practice you can have some skew between data lanes, but need to minimize skew between data lines within a data lane.  As long as your board traces are matched DM, DQS, and DQ[x:y] for each byte lane, your half way there (I generally say a delta of <180ps at 133Mhz is matched.)  The next truth is VREF...  DDR controllers typically use an SSTL I/O method.  Although really...at 100Mhz or less it really doesn't provide much value.  That's why the lower performance 520x and 53xx families use a CMOS style driver just fine.  Anyway... The MCF548x/7x family does use SSTL I/Os and therefore the DDR and ColdFire (referred to as CF from now on) is referencing VREF to determine a high/low state.  The goal is to have a clean (limited ripple) and limited DC offset that is exactly the same for the DDR memory and the CF.
  )  So on a 32 bit wide MCF548x bus, you'd see four byte lanes with four DM (data mask) and four bi-directional strobes called (DQS).  In theory on mostly in practice you can have some skew between data lanes, but need to minimize skew between data lines within a data lane.  As long as your board traces are matched DM, DQS, and DQ[x:y] for each byte lane, your half way there (I generally say a delta of <180ps at 133Mhz is matched.)  The next truth is VREF...  DDR controllers typically use an SSTL I/O method.  Although really...at 100Mhz or less it really doesn't provide much value.  That's why the lower performance 520x and 53xx families use a CMOS style driver just fine.  Anyway... The MCF548x/7x family does use SSTL I/Os and therefore the DDR and ColdFire (referred to as CF from now on) is referencing VREF to determine a high/low state.  The goal is to have a clean (limited ripple) and limited DC offset that is exactly the same for the DDR memory and the CF.
Here are my trusty steps for validating a new DDR interface.
1. Clocking... A bad crossover at the DDR memory can cause sporadic shifts in data that will leave you baffled for weeks. Make sure you have a nice symmetric crossover... Meaning if the rise and fall edges are not symmetrical, your'll have a high cross over (meaning well about the 1.25V ideal) followed by a low crossover and then it repeats again. This creates some common mode spikes that you don't want. If you've followed most appnotes, you should probably have terminated the CLK and CLk_B with a resistor between the phases. I typically use something in the 100 to 120 ohm range. Place this resistor as close as possible to the DDR rams. If you can get a scope on these two signals. Measure them at the resistor and attach the scope capture.
2. DQS... This is an easy one to get crossed up and again will cause some funny patterns when you have other issues on top of this. Make sure you double check the EDA guy and that each DQS lines goes to the proper byte lane. I seen examples of copy and past where the schematic looks right but the exported netlist is wrong because of duplicate net names. So the layout engineer doesn't typically know that you didn't mean it. In your case.. Two 16bit wide DDRs means that you should have 2 DQS signals routed to each memory. DQS[0:3] or [3:0] I can't remember on this product family which way it goes. It is in the user's manual. So for example... Let's assume DQS[0:3] goes with DQ[31:0]. Make sure that DQS[0:1] goes to the DDR chip with DQ[31:16] and that you get the right DQS line on the DDR. If memory serves me you really have a DQS upper and lower... So double check the micron data sheet to get the byte lanes correct. Scope capture... Grab a picture at the via of the BGA and at the TSOP.
3. DM... Same as DQS. Verify connections.
4. RDLAT... Read latency is a fancy word for "mask." Because the SSTL buffers are true comparators, the slightest variation in VREF compared to a data line that is terminated to mid-rail, and you get extreme switching of the buffers (I/Os) internally. So RDLAT is used during read cycles to disable the CF inputs from latching data until the DQS preamble. The goal is get RDLAT to fire (expire) in the range of the preamble as the preamble is a quite period because the DDR memories start driving the data lines low in preparation for driving the first byte of data. The formulas in the manual are generally accurate except when you have really LONG traces betweent he DDR and CF or when you have extremely short traces between the DDR and the CF. if you have 3,4,5, or even 6 inches of trace on a standard 4,6,8 layer impedance matched board you should be fine. Meaning... at roughly 180ps per inch of prop delay on FR4 type material, you have a round trip that is much less than the granularity of the RDLAT timer. RDLAT in the most simple explanation is nothing more than a fancy timer that counts cycles from an internal state in the CF to when we expect data to come back from the DDR. That is why CAS latency is part of the formula. I believe...Again it has been a few years since I worked on this directly.... That RDLAT counts in 2x clocks. So if you are clocking the part at 100Mhz, its granularity is 200Mhz or 5ns. Again.. It would take a lot of trace delay on the out bound command bus + the return DQ bus to equate to an extra 5ns and therefore the need to increase the RDLAT.
Long story and some what a mini-appnote...
It sounds like we need to verify the robustness of your physical connections. Since you tell me that your script works on our EVB, I take your word that it is probably ok so far. And since you tell me that varying your RDLAT gives similar behaviour... I suggest we focus on the schematics and layout of the DDR bus.
1. Verify connections.
2. verify trace lenghts
3. verify Pwr supply for VTT and VREF... (Example... NEVER connect the two together...another common problem).
I'll try to respond to followup posts, but I'm traveling right now...
Hope this helps.
-JWW
