I'm trying to get DMA working reliably on a 5445x and I'm finding that I get essentially identical performance, maybe even slightly better from a simple loop based memcpy. At the moment, I'm not interested in an interrupt driven DMA, just in a fast way of copying data, so a memcpy type function works well enough. The only time DMA is notably faster is when I can get the source and destination aligned to get 16 byte transfers.
Is this consistent with what others have seen?
Regardless of the above, when using DMA, do you put the source and destination buffers outside of cached memory or do you purge the data cache before you do the transaction?