The Block Diagram shows the RGPIO "inside" the CPU block. That implies that DMA can't access it.
The documentation for the RGPIOBAR register states for the "V" bit that "Processor accesses of the RGPIO are enabled". The only other control bit determines if user-space access is allowed. That can be assumed to be Processor User Space access. So that seems to say "no".
Check the documentation for the SRAM. It is Dual Ported and has to have a dedicated "back door" to allow access from peripherals via a port on the Crossbar Switch. The RGPIO doesn't have this.
Look at the diagram of the Crossbar Switch in section "14.1 Overview". There are "Master" and "Slave" ports. Only Masters can access Slaves. The SRAM "backdoor" is a Slave port. The RGPIO device is on the Master side of that Crossbar, so it is impossible for any other devices to access it.
If you've ever timed the DMAC you'll probably find it is quite slow when compared with the CPU. So If you're finding it slow when accessing the GPIO registers, it is probably more to do with the DMAC than the GPIO access. Probably both.
Register access on these CPUs is very slow. Here are previous posts showing the V2 CPUs take 12 clocks and implying the V4 ones (your one) are worse than that:
https://community.nxp.com/message/62228#62228
https://community.nxp.com/message/307212#307212
https://community.freescale.com/message/42042#42042
http://permalink.gmane.org/gmane.comp.hardware.motorola.microcontrollers.coldfire/8056
And since gmane is not "permanent" any more, it is lucky that I copied the relevant text into two of the above posts.
Programming the DMAC is going to hit the same "slow register access problems" as getting to anything else.
> I have tried to change the RGPIO_DATA register from interrupt instead of DMA and it
> works but it is not so fast (2us) as i need.
This is a 250MHz CPU. If your interrupt service routine it taking longer than 500 CPU instructions to take the interrupt and write to one register then you've got something badly wrong there. The code is either stunningly inefficient, or the CPU is limping along at a fraction of its proper rate. So which one it is? Is the CPU running properly and at the right clock rate? Is the Cache enabled? What RAM are you using, internal or external? Where's the CPU Stack? It should be in the internal 64k Static RAM. That makes interrupts run faster.
You should be able to get from the interrupt to writing the pin in less than 50 instructions which should be able to execute in slightly more (due to cache loading) than 50 clocks, or 200 nanoseconds.
You should be using multi-level interrupts with the one you need to be fast at a higher CPU priority (say Level 6) than any of the other ones, and all of the rest of the system should be set up so that interrupt routines can interrupt other ones.
Are you running Linux on it? Give up any thought of "real time" if you are. Linux doesn't usually understand or support multi-level interrupts either.
If the "5th falling edge" is know to happen at a pretty constant time after the fourth edge, then you can take the interrupt on the 4th one (or 3rd or the rising edge before the 5th falling one) and then just sit in the interrupt routine, polling for that 5th falling edge. You'll be able to write to the RGPIO pin in two or three clocks (at 250MHz) after detecting that edge. Which should be connected to an RGPIO Input pin. If you can't hack that latency, then make that a low-priority interrupt so more important ones can interrupt it. But you then have to guarantee all higher priority interrupts will all complete in less than 2us.
Tom