Nigel,
The SDRAM controller really has no idea about CAS latency. This is a SDRAM memory concept. I know this sounds weird...but give me a chance to explain.
The SDRAM controller only cares about DQS edges...but... because the DDR standard allows for a parallel terminated bus and the DQS lines are bi-directional, the SDRAM controller has to be told roughly where in "time" the SDRAM memory will start driving the DQS signal back to the controller. The point we care about is the "preamble" and there are a range of acceptable values for the size of the preamble and the results actually vary a little from vendor to vendor. The RD_LATENCY setting in the SDRAM controller is really a timer that tells the SDRAM controller to wait, after issuing a read command, for an incoming DQS. This effectively masks off input DQS edges, from causing false data latches, from the SDRAM controller until the timer expires. The value of "6" is a value that on most systems allows the SDRAM controller's timer to expire such that the read preamble has started and the DQS bus is driven to a "low" state. The SDRAM controller then waits for the DQS edges in order to latch data. So you see the SDRAM controller really doesn't care about CAS latency, other than it needs to know where in time the SDRAM controller will start driving the DQS lines. Ok.. Now you say.. Isn't that the same thing? Ah...this is the catch. (by the way...the previous example is true as well for a setting of "7" for CAS 2.5)
The SDRAM controller timer is counting in 2x clocks. That is why bumping the counter to "7" is the typical setting for CAS 2.5. CAS 2.5 setting for an SDRAM causes the SDRAM to wait a half cycle longer before driving data. So a setting of "7" in the SDRAM controller causes the SDRAM controller to wait one 2x clock (0.5 a 1x clock) which give that little extra delay you need for a CAS 2.5.
The values of "6" and "7" work fine for CAS 2 and CAS 2.5 most of the time. The whole system depends on a few delays (the catch). The output delay of a command to tell the SDRAM you want to read. The turnaround time in a SDRAM plus the delay caused by a programmed CAS latency. Then the delay (propagation time) back to the SDRAMC.
So why isn't RD_latency and CAS the same thing? Because... If you routed a board and placed the processor 10 inches from the SDRAM, you would incur something on the order of 1.8ns of trace delay due to propagation (assume 180ps per inch). On fast DDR systems (yours doesn't really fit this mold, but the controller is designed for faster systems on other Coldfire parts) an additional 1.8ns might cause the round trip time from launching a read command to latching the data to take longer than "6" counts of the timer. In this case you can add an extra count. This allows the controller to adapt to a variety of trace lengths. The reason "6" and "7" work most of the time, is that most people don't have long trace lengths or extremely short trace lengths. At 180ps per inch it takes quite a bit of trace length to add up to anything.
Know for your system.. Did you try rd_latency "5" before trying CAS 3?
What is your average trace length? My guess is that you have the opposite problem.. Your trace length is probably very short, since I think you increased CAS latency without changing RD_Lat. Am I correct? I suspect that the SDRAM is driving the DQS before the timer expires, and you are missing the first edge. The SDRAM controller gets confused when it doesn't get enough DQS edges to match the burst. The cycle completes internally, but the SDRAM controller is still waiting for more edges to clear its state machine. When you do a write cycle, the DQS output, driven by the processor during writes, will cause the DQS read state machine to affectively reset itself. Then you can perform another read.
I hope this helps you and others reading this post.
-JWW