Advice for predictable DMA timing?

I have a design where I am using a State Configurable Timer event to generate a single 16 bit DMA transfer from external SRAM to the internal RAM of the 4330.  The external SRAM EMC is set to zero wait states, and running the core at 204MHz.
At slower speeds this works without issue, but as I bring the SCT up to the speed where we need it (the DMA transfer needs to occur within 200 nsec), I start to see unexpected "glitches" where the DMA transfer is delayed. This doesn't happen very often, but often enough that it is a problem.
The code is running in the internal SRAM of the 4330 and interrupts are disabled.
I'm looking for any advice or tips.  Is this "glitch" something to expect with DMA? Is there anyway to avoid this?
Thank you