I went through and made some changes to the way my DMA transactions are handled, so for anyone working with DMA on the v4 chips, here's some useful tips:
- Ensure you are using the maximum source and destination transaction size for your transaction - your dma function should start with some logic to sort what the source increment and source port size should be based on the source address and transfer length, and similar logic for the destination.
- Remember that the source is independent of the destination, the DMA module does all of the transaction size changing between the two - the above logic for the source should only depend on the transaction length, source start address and source port size. The destination depends only on the destination port size, transfer length and destination address.
- I found I was able to improve performance a fair bit by breaking my DMA transactions into unaligned and aligned chunks, at least for memory-to-memory transactions. So for each full transaction, I do a small transaction until the destination address is aligned on a 16-byte boundary (I actually just copy the bytes manually, since it's usually only a couple of bytes), then I do the main part of the transaction, but round the remaining length down to a multiple of 16 - this ensures that the destination port will be fully utilized and use 16 byte transfers, then I do another small transaction for the remaining bytes. This lets me make the most of DMA without particularly trying to align my buffers. It gets me about 20-25% more speed on average.
Still wondering if anyone out there uses round robin on their crossbar.
Message Edited by cmaryan on 2009-09-18 03:26 PM