Questions about DMA speed on rt1062

j_david_berger · ‎01-02-2020

I'm having a hard time controlling the speed of DMA on the rt1062. In general, it seems to cap out at much lower transfer rates than I expect.

I've attached my minimal example. The only other change to the project was to enable output on LPUART2 instead of LPUART1.

The behavior I'm seeing and don't quite understand is that if I transfer a full buffer in one major loop iteration, it's substantially faster than doing many iterations of just one uint32_t per. My expectation is that the two would take roughly the same amount of time.

This example demonstrates this behavior with 'always enabled' setting on, but I've seen this with XBAR sources too.

Another question I have is if there is any documentation for clock times; specifically IPG_CLK_ROOT which drives DMA / DMAMUX / XBAR. The clock tool in MCUXpresso doesn't let me change that clock speed above 150mhz; however if I set it in code to 300mhz it seems to work and does give me a speed increase on DMA -- albeit with the same speed hit on multiple iterations I see with the clock at 150mhz.

As far as I can tell, I can transfer 1 uint32_t per 8 clock cycles on IPG_CLK_ROOT if I do it all in one iteration; and it's 15-16 clock cycles for multiple iterations.

My end goal is to transfer either 2 or 4 bytes at 30mhz+-1mhz. Is this achievable with this chip? With IPG_CLK at 300mhz; I can get _faster_ than that, but to achieve a normal bit rate, I need to be able to have PIT start a small 2/4 byte copy at regular intervals while still maintaining fast DMA operation.

mjbcswitzerland · ‎01-07-2020

Justin

Using IPG_CLK_ROOT > 150MHz may work at room temperature but is out of specification and will probably fail over the temperature range or sporadically. Overclocking should only be used for hobby/fun projects where reliability is not of big concern.

Regards

Mark

P.S. Note also that when I did memory-to-memory DMA testing on an i.MX RT 1021 I found quite a low at around 60MByte/s rate: https://community.nxp.com/thread/518925 and memcpy() thorough the CPU was much faster. I don't know yet what the bottle-neck is but I will be revisiting it at some point.

Hui_Ma · ‎01-07-2020

Thanks for Mark's info.

RT1060 eDMA Max. working clock frequency is 150MHz.

j.david.berger@gmail.com‌ RT1060 using the same eDMA IP of K60_100MHz product, please check K60_100MHz reference manual chapter 22.4.4 about eDMA performance, which provides detailed info.

Wish it helps.

B.R.

Mike

j_david_berger · ‎01-07-2020

The overclocking is more to narrow down what clock dictates transfer speeds. Thanks for the graphic though; I did miss that in the documentation. I wish there was more information on what exactly dictates those limits; I imagine there might be peripherals that require a 150mhz max that I don't need and don't enable.

When you ran those tests was the DMA controller set up as one iteration of the full buffer size?

mjbcswitzerland · ‎01-07-2020

Justin

In the uTasker project memcpy() [uMemcpy()] generally performs the copy using eDMA in a single itteration of the buffer size (using the largest possible unit alignment* depending on how the buffers are located). I believe it was using 32 bit aligned transfers in the test. In the Kinetis version this gives a good speed increase over a CPU copy loop but in the i.MX RT version it is quite a lot slower and has presently been disabled by default. That actually means that the CPU copy is a lot faster by the i.MX RT that by Kinetis, not necessarily that the eDMA copy of the i.MX RT is slower than the same of a Kinetis. I expect that there is a method of controlling the bus bandwidth used by the eDMA that is somewhat throttled by default - if I identify it I will try with different settings. In the case of Kinetis the DMA channel(s) assigned to the memory to memory copy are given lowest priority with pre-emption allowed by higher priority master so that they don't starve peripherals, such as USB, during large transfers. The same setting is used in the i.MX RT version as far as I am aware.

Regards

Mark

*The eDMA in the i.MX RT supports additional, wider width transfers (64 bit and burst) which would help (when alignment allows) and this may be extended later.

Hui_Ma · ‎01-07-2020

Hi,

RT1060 eDMA module clock root is ipg_clk_root, please refer below picture for the detailed info:

IPG_CLK_ROOT Maximum frequency is 150MHz, which means RT1060 eDMA Max. working clock frequency is 150MHz:

Thanks for the attention.

Have a great day,
Mike

-------------------------------------------------------------------------------
Note:
- If this post answers your question, please click the "Mark Correct" button. Thank you!

- We are following threads for 7 weeks after the last post, later replies are ignored
Please open a new thread and refer to the closed one, if you have a related question at a later point in time.
-------------------------------------------------------------------------------

j_david_berger · ‎01-07-2020

This doesn't explain the behavior shown here at all.

First, I'm not using DMA functionality on UART at all in this example.

Second, the question is why chunking one DMA transfer into multiple small moves is dramatically slower than doing one large transfer. Is this expected behavior? Is it insurmountable? If it is, that seems to radically limit how useful the DMA control is on the RT1060. Is this true of the entire RT line?

Questions about DMA speed on rt1062

Questions about DMA speed on rt1062

i.MXRT 106x