DMA: M2M block transfer with different minor loop offsets for source and destination (K70)

mkjeldsen · ‎06-10-2014

Hi,

I have a question regarding Memory To Memory transfer on K70 of a block of memory from source to destination using different offsets after each line has been processed (minor loop) (Figure 1). What is the most effective solution for this?

Figure 1

The only method that seemed applicable from documentation is to use DMA_TCDn_NBYTES_MLOFFYES (Minor Loop and Offset Enabled), and apply MLOFF (Bits 10-29) to BOTH source (SADDR) and destination (DADDR) - But i'm interested in using different offsets. (I haven't been able to find any usage exampels on this scenario).

How does DMA_TCDn_DOFF / DMA_TCDn_SOFF relate to MLOFF in this case? Will using DOFF and SOFF in combination with DMA_TCDn_NBYTES_MLOFFNO (Minor loop enabled and offset disabled) give us the same as using DMA_TCDn_NBYTES_MLOFFYES which has a built in offset?

Thank you in advance!

Best regards,

Martin

Hui_Ma · ‎06-10-2014

Hi Martin,

From the description, the DMA seems to move continuous address of source data (1 Blcok) to another continuous address of destination data (1 block). If so, you can set DMA transfer data number at DMA_TCDn_NBYTES_MLOFFNO and set DMA_TCDn_SOFF/DMA_TCDn_DADDR of each DMA read/write action source/destination address offset. It just use DMA one minor loop to transfer whole 1 block data.

Wish it help.

best regards,
Ma Hui

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

mkjeldsen · ‎06-11-2014

Hi hui,

Thank you so much for your reply. I don't think i can use one single minor loop and thereby transfer all the data continuously. My destination is a framebuffer for the LCD controller, and i'm moving small graphical elements of a constructed view in memory to that framebuffer. The only way the DMA_TCDn_NBYTES_MLOFFNO works is if i'm transfering graphical elements that take up the entire display.

This is why i have to be able to transfer a small block of memory, that is not continuous, to the framebuffer. Figure 1 was meant to depitct graphical elements, this might not have been made completely clear by me (sorry).

Does that make sense? Look forward to your thoughts on this.

Best regards,

Martin Kjeldsen

Hui_Ma · ‎06-11-2014

Hi,

If the source address and destination address with same offset, such as below picture shows:

You can use one major loop with 4 minor loop, each minor loop transfer 16bytes data and need to set DMA_TCD_NBYTES_MLOFFYES [SMLOE] & [DMLOE] bits and set DMA_TCD_NBYTES_MLOFFYES [MLOFF] value to 0x30 (it also need to set DMA_CR[EMLM] bit to Enable Minor Loop Mapping).

If the source address and destination address without same offset (as below picture shows), customer need to use 4 major loop and use channel link way to do data transfer.

Wish it helps.

Best regards,

Ma Hui

mkjeldsen · ‎06-12-2014

Hi Hui,

Thanks for a great reply! Your method #2 looks exactly like what i'm looking for: Source and destination with different offsets.

I suppose for X number of transfers (= X major loops), we would have to set up some kind of dynamical TCD configuration for each channel. And we would then simply use 1 major loop without minor loops? Would specifying a pool of Y channels to use for linking be a decent method? Is there a certain number of channels for linking you would suggest for an approach in which the number of major loops (lines) might vary from a small number and up to the height of the display?

Or, for an unknown number of transfers, is it then much better to use a scatter/gather approach?

Thank you in advance,

Best regards,

Martin

Hui_Ma · ‎06-13-2014

Hi Martin,

There are two methods to dynamically change DMA TCD descriptions:

1> There only with one major loop (without minor loop), enable end-of-major loop interrupt, then during the interrupt to modify TCD description for next DMA transfer.

This way with flexible advantage, while it increase core work load to deal with DMA interrupt after each DMA transfer.

2> Using DMA scatter/gather feature, this way also can modify TCD description after major loop finish automatically.

The disadvantage is it lost flexible, it need to store many TCD descriptors at related memory address.

Wish it helps.

Best regards,

Ma Hui

mkjeldsen · ‎06-13-2014

Hi Hui,

Again, thank you for your replies. I think we're getting closer to the right solution :smileyhappy:

In regards to 1>, is the reason you initially recommended channel linking because we're only able to modify offsets after a major loop? If so, and if we use the method where we modify TCD during DMA interrupt, would we still be required to use channel linking (We probably only need to use 2 channels switching back and forth)?

As i see it the advantage of channel linking + dynamical TCD setup that we only have to issue 1 software start for the first channel and we would then continue in a channel loop until the last TCD has set linking to false, and the transfer ends. This would be the same as continuously starting the next transfer inside the DMA interrupt and just using one channel - is that correct?

Look forward to hearing your thoughts.

Thank you in advance!

Best regards,

Martin

Hui_Ma · ‎06-17-2014

Yes, I agree. Using two DMA channels and also using channel link will enhance DMA transfer efficiency. Each channel finish related major loop will generated related major loop finish interrupt and trigger linked channel. The DMA transfer stops until the TCD not be refreshed.

Wish it helps.

mkjeldsen · ‎06-24-2014

Hi Hui,

I tried out the approach you recommended to me: 2 channels linked to eachother on End-Of-Major Loop. Now i can place elements at their correct locations in my destination framebuffer, but the method requires me to syncronize drawing the last line of the element based on the amount of end-of-major-loop-interrupts triggered:

extern "C"

__irq void DMA_IRQHandler( void )

{

clear interrupt for both channels

if number of interrupts received == element.nrOfLines-1 //About to draw last line

remove channel linking for transfer descriptors

else if number of interrupts received == element.nrOfLines //We're done drawing

let system know we're done with current drawing operation

end

}

This solution doesn't feel quite accurate enough and might be error prone. Do you see a smarter approach within this channel-linking context? i.e. We might calculate the final src/dst addresses and check the _SADDR/_DADDR registers in our interrupt instead. But i'm not too fond of that either. I tried implementing this solution, but i experienced that the following would not result in an atomic operation, and as such, i couldn't trust it (End-of-major loop, interrupt, if last line address in _SADDR, disable channel linking). The DMA controller would've already written some more data before i could disable linking dynamically - Makes me think this dynamic major loop linking method might be flawed.

To recap: Is there really no other way to apply a minor loop-offsets other than using _MLOFFYES registers and reusing the same offset for both src and dst? And if not, and we resort to major loop offsets (by using _SLAST and _DLASTSGA), what is the best way to synchronize writing our last line? I'm looking for some kind of "Transfer completed" flag on the interrupt i suppose, rather than just having a Major loop-completed interrupt.

Look forward to hearing your thoughts. Thank in advance for your valuable assistance!

Best regards,

Martin

Hui_Ma · ‎06-25-2014

Hi Martin,

The channel link is just provides a DMA channel trigger signal, if the last line linked channel TCD not refreshed, the linked channel will not start to transfer any data (to the last line).

I don't think the last line was affected by DMA link, in fact, it was affect by previous DMA transfer (element.nrOfLines-2).

It need to calculate how many DMA transfer needed to transfer the drawing, and there using two DMA channels.

Thank you for the attention.

best regards,

Ma Hui

mkjeldsen · ‎07-02-2014

Hi Hui,

While this solution looks good theoretically, it seems unlikely that we can 100% guarantree proper synchronization on the last line (element.nrOfLines-2) because we essentially synchronize in software. Before the code in the interrupt handler is done processing, the DMA controller might have triggered again. Or it might work fine, but it seems non-deterministic because there really are no guarantees in this paralell solution. For this solution with channel linking to work, we would probably need some type of mechanism to tell DMA controller to trigger only X times in total.

I'm trying a scatter/gather solution currently, where the last line has INTMAJOR enabled instead. If you were to compare the performance of a channel linking vs scatter/gather solution, what would you say?

Thanks for your thoughts!

Best regards,

Martin

Hui_Ma · ‎07-03-2014

Hi Martin,

The two channels link just requires one DMA request; while use scatter/gather mode with two TCDs requires two requests.

When channel linking or scatter/gather is enabled, a two cycle delay is imposed on the next channel selection and startup.

About performance, the two channels link only need one DMA request, there reduce request generation and response time.

The two channels link is more efficiency.

Wish it help.

best regards,
Ma Hui

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

mkjeldsen · ‎07-04-2014

Hi Hui,

Thank you for a quick response. Knowing that there is a 2 cycle delay imposed on the next channel selection is very helpful. I have a scatter/gather solution that performs adequately at this moment. The reason for this is that i was not confident that i was able to disable channel linking in time on the last line.

I saw somthing strange using scatter/gather from internal flash: If DMA controller was operating on source addresses from internal flash, the speed was really slow. If instead operating on source addresses in external ram instead, the speed was ALOT faster, by a factor 1000 or more (I.e. it took about 21 seconds for a full screen transfer of 800x480 pixels - 2 bytes per pixel).

Do you have any insights on this issue?

Thanks for great support! :smileyhappy:

Best regards,

Martin

Hui_Ma · ‎07-10-2014

Hi Martin,

Sorry for the delay reply.

Kinetis product with Flash memory controller(FMC) module will enhance Flash access speed.

I check the data size is about 750KB, the internal Flash isn't a good choice.

I suggest to check crossbar switch setting, set DMA(master 2) has the highest priority to access slave0(Flash controller) and check if the performance becoming better.

Wish it help.

best regards,
Ma Hui

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

mkjeldsen · ‎07-27-2014

Hi Hui,

Thanks for your reply. Glad you could confirm what i was seeing in relation to internal flash. I won't be using the internal flash for DMA transfers, but under all circumstances it should be good to enhance the speed. I will try adjusting the crossbar switch settings.

Thank you

Best regards

Martin

ilkea · ‎07-11-2014

Hi Hui_Ma and Martin,

I hope this message finds you.

Iam working on MQX4.1 and cw10.5 environment.I am sending data directly to flexbus. I need to do this via dma.

I searched a lot and confused about how to this implementation.

1)Handling DMA interrupt with MQX or without MQX?

2)What will be DMA TCD register settings in my case and how (with or without MQX)?

3)How many DMA channels will I use?

I am sending bytes to two adresses, some command bytes to 0x6000000 and data bytes to 0x6010000. I have to send displaybuffer[160][30] data to flexbus without any corruption on screen and ofcourse without halting the softwareJ.

P.S: I have found same sample codes (may be yours)but do not know how to embed in my code, they are not using mqx I guess. How will I do register settings and dma interrupt?

Thanks in advance

Hui_Ma · ‎07-13-2014

Hi,

There is an example about DMA application with MQX OS at Freescale MQX V4.1 installation folder [sai_dma_demo]:

Default path is : C:\Freescale\Freescale_MQX_4_1\mqx\examples\sai_dma_demo

Wish it help.

best regards,
Ma Hui

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

ilkea · ‎07-13-2014

Hi Hui_Ma,

I reviewed the code but I did not understand how dma is used there.

Regards

Hui_Ma · ‎07-15-2014

Hi,

Please check another example code about how to use DMA driver, UART driver call dma driver at <serl_dma_kuart.c> file located at C:\Freescale\Freescale_MQX_4_1\mqx\source\io\serial\dma folder.

And <MQX_IO_User_Guide.pdf> provides DMA driver API function introduction, the document located at C:\Freescale\Freescale_MQX_4_1\doc\mqx folder.

Wish it helps.

best regards
Ma Hui

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

mjbcswitzerland · ‎06-13-2014

Hi Martin

If I understand it correctly you are using eDMA for the transfer of data between two areas of memory (presumably SRAM or SDRAM).

Depending on your exact use it is recommended to also consider the crossbar switch settings because the eDMA has, by default, higher fixed priority than USB, SDHC, LCD and Ethernet (in the K70). If you have (large) block transfers started by software these tend to starve the lower priority DMA masters using the same slaves (often SRAM or SDRAM).

As well as the eDMA operation itself the crossbar operation may need to be set accordingly; modfiying the bus slave(s) to operate in round-robin priority mode tends to be a good compromise to avoid negative effects on other peripheral due to such transfers.

Regards

Mark

mkjeldsen · ‎06-16-2014

Hi Mark,

That's some solid advice, thank you. There is indeed a chance that transfers will be quite large in size. I'll have a look at the crossbar switch interface.

Best regards,

Martin