P2020 DMA won't burst over PCIE!?

We are using the P2020 CPU along with several FPGAs on one of our boards. For the last 2 weeks we have been trying to get the P2020 DMA engine to burst data over the PCIE to a Xilinx Virtex 7 FPGA. I have scoped the signals in the FPGA and verified that the DMA is sending only 1 word payloads in each TLP which is very inefficient. We have checked the BWC register and it's set to 8. The DMA transfer size is set to 1024 bytes. How do we enable the DMA to burst at least 128 bytes per TLP? Is there any example code that sets up the DMA and PCIE controllers for such transfers? We suspect the P2020 configuration is wrong, but have no idea where else to look?