SDRAM Timing

I'm working on an LPC1788 project with external SDRAM.  If I set a feedback delay (FBCLKDLY) of approximately 2ns the memory works flawlessly at 78Mhz.  Without this delay, however, it does not work reliably and will typically fail a memory test.

I used a scope at the microcontroller pins to get a better understanding of the timing during a read but am only more confused.  The LPC1788 User Manual makes it appear that the microcontroller latches data on the rising edge of the clock.  If this is the case, however, it looks like there is plenty of set up time ~9ns, although hold time looks like it is an issue.  Based on the scope image, the 2ns feedback delay I added should actually be making things worse, not better.

It looks to me that while the microcontroller outputs data on the falling edge of the clock to be captured by the SDRAM on the rising edge, it latches data read lines on the falling edge of the clock (and the User Manual diagram is incorrect or misleading).  Is my understanding correct here?  Or does anyone have any insight into when the LPC1788 is latching SDRAM read data?

I feel uncomfortable using a feedback delay setting without understanding what I'm actually doing, even if it somehow makes things work.