TSEC rx DMA putting in wrong spot?

I'm writing drivers for the TSEC.  I'm able to transmit.  I'm trying working on a bug while receiving.

While setting up my Receive Buffer descriptors, I set the buffer pointer to their corresponding arrays.

When I receive a packet, the DMA puts it in 4 bytes before where I put my buffer pointer.
IE The DMA places the packet in 0x20d188 if I set my buffer pointer to 0x20d18c.
It does this for every packet I receive.  
I noticed the memory location is 64 bit aligned.  I know the Buffer Descriptor needs to be 64-bit aligned.  Does the buffer that the BD buffer pointer points to need to 64 bit aligned as well?  I receive the packet at the correct location when I aligned the buffer to 64 bits.  

Below is a snippet of creating the BDs, buffers, and setting up the BD's


#define NUM_OF_BD_BUFFER 10

typedef struct {
uint16_t flags; //Flags Fields see Section - table 15-1106 in RM
uint16_t length; //Buffer length
uint32_t bufptr; //Buffer Pointer
} txbd_t;

typedef struct {
uint16_t flags; //Flags Fields see Section - table 15-1108 in RM
uint16_t length; //Buffer Length
uint32_t bufptr; //Buffer Pointer
} rxbd_t;

typedef struct {

//The TxBD and RxBD need to be 64 bit aligned
//this is due to the tbase/rbase register for the DMA needing to be 64 bit aligned
txbd_t txbd[NUM_OF_BD_BUFFER] __attribute__ ((aligned (64)));
rxbd_t rxbd[NUM_OF_BD_BUFFER] __attribute__ ((aligned (64)));

tsec_t *regs;
phy* phy;

uint32_t rx_idx; //index of the current RX buffer
uint32_t tx_idx; //index of the current TX buffer
uint64_t mac_address;
uint32_t ip_address;
} eth_device_t;

#define RX_BUFFER_SIZE 0x00001000 // size per receive buffer
//Why does it work when I align this to 64 bits?
uint8_t rx_buffers[NUM_OF_BD_BUFFER][RX_BUFFER_SIZE]; //__attribute__ ((aligned (64)));

static void setup_rx_buffer_bd(eth_device_t *dev)

for(uint8_t i = 0; i < NUM_OF_BD_BUFFER; i++)
dev->rxbd[i].bufptr = (uint32_t)(&rx_buffers[i][0]);
//prepare the rx bds and mark them as empty
for(uint8_t i = 0; i < NUM_OF_BD_BUFFER; i++)
//SEE Section for flags
dev->rxbd[i].flags = RXBD_EMPTY | RXBD_INTERRUPT;

dev->rxbd[NUM_OF_BD_BUFFER - 1].flags |= RXBD_WRAP;

* Set the RBASE register. RBASE points to the start of
* the receive BD ring. Note that at least two BDs should
* be present in a transmit or receive BD ring

dev->regs->rbase = (uint32_t)(&dev->rxbd[0]);