I am writing a bare metal uSDHC driver for the iMX6Q based on the SDK version and am having an issue with reading (haven't tried writing yet). I can send commands to the SDHC card and get the proper responses back, so sending commands is not an issue. I believe the code for setting up for the read is also correct including the setup for the adma2. I'm using a CMD17 (read single block) to try to read the MBR (sector 0) of a known good (16GB) SDCARD using a known good OTS board from Boundary Devices. I initialize the read buffer (512 bytes) with the value 0x5a to detect what is read. This is what I get back (buffer starts at 0x104098a0):

The 55 aa at the end is correct and some of the 00 bytes may be as well. Since none of the bytes are 0x5a, 512 bytes were read from somewhere.
Below is what the data should be:

Here are the registers just before sending the command:

Below are the registers after the read (note that INT_STATUS has already been cleared):

In my setup I use three adma2 descriptors. The first one covers from the beginning of the buffer until the start of the first cache line. The read data is read into a separate uncached buffer (in DRAM), then copied into the real buffer. The second one is used to read in the rest of the data down to the last full 32 bytes (last full cache line). This data is read into physical memory, then the virtual memory is invalidated forcing the cache to be re-loaded. The third descriptor is used to read anything left over, and acts like the first one.
The adma2 descriptors are set up (in uncached DRAM) in this example such that, the buffer was started on a cache line so only two of the descriptor are used (third one not needed). There doesn’t appear to be any descriptor errors (AMDA_ERR_STATUS = 0), and the addresses of the 2 descriptors used are 0x184000C0, and 0x184000C8. The ADMA_SYS_ADDR reg shows 0x184000D0 which shows that the 2 descriptors used were executed. No errors are reported in the INT_STATUS reg.
It has the appearance of some kind of timing issue or data not ready, card can send data fast enough (@25MHz). Both CIHB and CDIHB are checked as you can see in the attached code.
Any ideas where to start looking for errors as to why the data is incorrect?