FRDM-KL25Z: Bare Metal DMA Basics?

mahmoudsherrah · ‎04-08-2018

Apologies for the long post. I am trying to get my head around DMA on a KL25Z board in bare metal (i.e. without any support from Processor Expert, PDD, uTasker..etc), however there are some basics that I didn't find mentioned anywhere (or maybe I just missed), so any guidance would be greatly appreciated.

Summary of what I'm trying to do: Whenever requested in my software, I want DMA (ex: channel 0) to automatically grab the value in UART0_D (8-bits) and place it in an 8-bit variable in memory (ex: uint8_t receivedByte) (i.e. SAR is UART0_D and DAR is &receivedByte). Note: I don't really care if the UART receive buffer is full or not, I just want to grab whatever value is in UART0_D.

FIRSTLY: DMA Peripheral

1- Is it possible to achieve this without the UART0 peripheral calling for DMA request?

2- Is it possible to achieve this without any interrupts of any kind?

3- If possible, do I have to create an array and pass it to the destination address register "DAR", or a pointer to my uint8_t variable is enough?

4- When I set the DAR register to &receivedByte, GCC gives me a warning of "assignment makes integer from pointer without a cast". Casting &receivedByte to (uint32_t)&receivedByte fixed it but I don't understand why. Isn't &receivedByte already a 32-bit address value?

Here is my proposed configuration of the 4 DMA registers that I think is relevant to my simple application:

1- DMA_SAR0 = (uint32_t)&UART0_D; //UART0_D is 32-bit address of UART0 data register

2- DMA_DAR0 = (uint32_t)&receivedByte;

3- DMA_DSR_BCR0:

BCR: this field should be equal to 1 (since I am trying to transfer only one byte?)

The rest: don't care?

4- DMA_DCR0:

ERQ: I believe it should be 0 since I don't want peripheral request as mentioned above.

SINC and DINC: should be 0 since I don't want SAR or DAR to be incremented after transfer (since I am trying to transfer only one byte)

SSIZE and DSIZE: should be 01 (i.e. 8-bits since that's what I am transferring (one-byte)).

SMOD and DMOD: should be 0 (since only 1 byte so no buffer needed)

The rest: don't care?

SECONDLY: DMAMUX Peripheral

1- How can the DMAMUX help me in any of this? Is it essential for any DMA operation? (I don't need periodic triggering or anything). If it is necessary, how should it be configured to serve my application?

2- Should I use DMAMUX0_CHCFG0 since I want to use channel 0, or is there a catch here?

3- What is meant by SOURCE in DMAMUX0_CHCFG0? Does it mean UART0 in my case? How do I determine the its value?

As you can see, I am just trying to come up with a formal set of steps to run DMA in the most basic mode. Any help would be greatly appreciated.

mahmoudsherrah · ‎04-08-2018

Ok guys, l managed to get the most basic bare metal DMA working. Thought I would share my findings if someone ran to a similar situation like mine, or those who are complete beginners to DMA.

Here are some of my findings: (I would appreciate your feedback anyway)

FIRSTLY: DMA Peripheral

1- Yes, it is possible to realize this simple application without UART0 requesting DMA. It turns out that this is explained in KL25z reference manual, section "22.4.3 Always-enabled DMA sources". Don't know why it was mentioned in DMAMUX section though...

2- Yes, it is possible to realize this simple application without any kind of interrupts

3- No, you do not have to create an array. Passing the address of a simple variable is enough. Just make sure to set data transfer size to 1 (BCR, mentioned in pseudo code later), so that it doesn't read/write unwanted memory locations next to your variable.

4- I still do not know why GCC complains about a missing cast. Feedback is welcome here.

SECONDLY: DMAMUX Peripheral

It turned out that I did not need DMAMUX at all. I still didn't read much about it's practical importance. Feedback is also welcome here.

Finally, here is the pseudo code that implements my simple application:

SystemCoreClockUpdate(); //updates clock to 48MHz, using CLOCK_SETUP as 1
//UART0 Initialization
   //Turn on clock to A module (which contains UART0) in SIM module
   //Enable UART0 clock in SIM module
   //set PortA1 and A2 to UART alternative function in PORTx_PCRn module
   //Choose clock source for UART0 (48MHz) in SIM module
   //Calculate baudrate in UART0_BDH and UART0_BDL register values
   //Enable TX and RX in UART0_C2 register
//DMA Initialization
   SIM_SCGC7 |= (1<<8); //Enable clock to DMA
   DMA_SAR0 = (uint32_t)&UART0_D; //Set source address (from UART0_D)
   DMA_DAR0 = (uint32_t)&receivedByte; //Set destination address (to your uint8_t variable)
    DMA_DCR0 = (1<<20) | (1<<17); //Set SSIZE and DSIZE to 8-bit (i.e. one byte)

//Infinite Loop
   DMA_DSR_BCR0 |= (1<<0); //Set BCR to 1 (i.e. one byte). Since BCR is decremented every byte transmission,
//it needs to be reloaded for the next 1-byte transfer.
   DMA_DCR0 |= (1<<16); //Set START bit to start DMA transmission
while((DMA_DSR_BCR0 & (1<<24)) == 0); //Wait for DMA to finish transmission. Usually there is not much
//waiting since it is wicked fast, but checking wont hurt!)
   DMA_DSR_BCR0 |= (1<<24); //Clear DONE for next cycle by writing 1 to it (i.e. inverted logic). This is
//mandatory, as stated by reference manual.
    //Now receivedByte contains the value of UART0_D, you can use it however you like.

I would really appreciate your feedback regarding the missing points! Thanks a lot!

View solution in original post

mahmoudsherrah · ‎04-30-2018

Hello again mjbcswitzerland‌,

So I was trying lately to transmit a buffer over UART using DMA (to send data to terminal for example like a print function). I managed to achieve this but without any software control. In other words, I set up UART0 to request DMA transfer on TRDE, I set up DMA0 to source from my buffer and destination to UART0_D. I also setup the source for DMAMUX0_CHCFG0 as sourece #3 (UART0 Transmit). Once I enable the channel in DMAMUX, it sends the data as expected.

My main problem here is, once the data is sent, I get a DONE signal and I don't know the best way to "reset" the DMA environment again without reconfiguring all registers again. The only workaround I thought of was to enable DMAMUX, then the data is sent, then an interrupt fires to signal completion. In this ISR, I disable the channel and clear DONE.

Here is the exact sequence I used (questions will follow):

//Clock Enabling
SIM_SCGC7 |= (1 << 8); //Enable clock to DMA
SIM_SCGC6 |= (1 << 1); //Enable clock to DMAMUX
...
//Initialize UART0 as usual except turn on DMA
UART0_C5 |= (1 << 7 ); //Enable UART0 TX DMA request
...
//DMA Initialization
DMA_DAR0 = (uint32_t)&UART0_D; //Set destination address to UART data register
DMA_DSR_BCR0 |= 10; //Let's say 10 bytes need to be transferred
DMA_DCR0 |= (1 << 20) | (1 << 17); //Set SSIZE and DSIZE to 8-bit (i.e. one byte)
DMA_DCR0 |= (1 << 30); //Enable peripheral request (in this case UART0 TX)
DMA_DCR0 |= (1<<22); //Increment source (i.e. the tx buffer)
DMA_DCR0 |= (1<<29); //Enable cycle steal
DMA_DCR0 |= (1<<31); //Enable DONE interrupt
NVIC_EnableIRQ(DMA0_IRQn); //Enable DMA0 IRQ
//DMAMUX Initialization
DMAMUX0_CHCFG0 = 3; //Set DMAMUX0CHFG0 source to UART0 Transmit
//Application Calls Print function
DMAprint("1234567890");
//DMAprint Function
      //Copy message to buffer
      DMA_SAR0 = (uint32_t)&txBuffer; //Set source address everytime fn is called, otherwise it goes out of buffer                                                                 bounds
      DMA_DSR_BCR0 = 10;  //Let's say 10 bytes to transfer 1234567890
      DMAMUX0_CHCFG0 |= (1<<7); //Enable DMA channel 0
//DMA0 ISR
DMAMUX0_CHCFG0 &= (~(1<<7)); //Disable DMA channel 0.
DMA_DSR_BCR0 |= (1<<24); //clear DONE for next cycle by writing 1 to it

My questions:

1- Is there a there a cleaner way than my proposal to achieve this? (maybe a hardware automatically reset the configuration and without the DONE interrupt)? Or is it completely natural to manually reconfigure my DMA environment before sending another buffer?

2- I didn't use the START bit at all to initiate the DMA request, instead I used the DMAMUX0_CHCFG0 ENBL bit to start the transfer. Is this the correct way to do it? Is this what is meant by software triggered transfer?

3- I had to use cycle steal to achieve my application, but I didn't fully understand why. I believe the datasheet is not sufficiently describing the whole DMA process in enough detail.

As always your input would truly appreciated to clear up my confusion. Thanks a lot.

mjbcswitzerland · ‎05-01-2018

Hi Mahmoud

I would say that it is normal to use the DONE interrupt at the end of transmission. This allows a second buffer to be started if needed. The following is the KL25 DMA control I use on UART transmission as reference - note that the DMA in the KL25 is different to that in K parts (and some newer KL parts) so the DMA configuration is best realised in a subroutine that can adapt itself to the actual HW used.

Beware too that the 3 UARTs in the KL25 have different DMA mode control - the first channel uses the TDMAS bit in UART0_C5 register and the second and third use the TDMAS flag in their UARTx_MA1 register instead. The first channel should not have its Tx interrupt flag enabled but the other two NEED it enabled for the DMA trigger to operate....

1. Initialisation - basic configuration for buffer to UART Tx transmission triggered by the UART Tx trigger source:

uart_reg->UART_C2 &= ~(UART_C2_TIE | UART_C2_TCIE);          // ensure tx interrupt is not enabled
fnConfigDMA_buffer(UART_DMA_TX_CHANNEL[Channel], (DMAMUX_CHCFG_SOURCE_UART0_TX + (2 * Channel)), 0, 0, (void *)&(uart_reg->UART_D), (DMA_BYTES | DMA_DIRECTION_OUTPUT | DMA_SINGLE_CYCLE), _uart_tx_dma_Interrupt[Channel], UART_DMA_TX_INT_PRIORITY[Channel]);
if (Channel == 0) {
    uart_reg->UART_C5 |= UART_C5_TDMAS;                      // use DMA rather than interrupts for transmission
}
else {
    uart_reg->UART_MA1_C4 |= UART_C4_TDMAS;                  // use DMA rather than interrupts for transmission
    uart_reg->UART_C2 |= (UART_C2_TIE);                      // enable the tx dma request (DMA not yet enabled) rather than interrupt mode
}

2. Start the transmission (setting buffer and it length) and enabling

ptrDMA->DMA_DSR_BCR = (tx_length & DMA_DSR_BCR_BCR_MASK);            // the number of service requests (the number of bytes to be transferred)
ptrDMA->DMA_SAR = (unsigned long)ptrStart;                           // source is tty output buffer
ptrDMA->DMA_DCR |= DMA_DCR_ERQ;                                      // enable request source

3. On termination the DONE interrupt is used to check whether there is more data to transmit but otherwise needs to do nothing else.

As you see the DMA environment doesn't need to be reset in any way - it can be simply reused (new buffer and length set) each time it is needed. You just need to ensure that you don't disturb an active transmission by trying to configure a new buffer before it has completed.

Cycle steal mode is correct for this type of transfer.

Regards

Mark

uTasker developer and supporter (+5'000 hours experience on +60 Kinetis derivatives in +80 product developments)
Kinetis: http://www.utasker.com/kinetis.html

mjbcswitzerland · ‎05-01-2018

Mahmoud

When the DMA interrupt fires it just needs to clear the interrupt flag with
ptrDMA->DMA_DSR_BCR = DMA_DSR_BCR_DONE; // clear DMA interrupt

There is a flag that can be used that automatically stops the DMA after the transmission has completed:
ptrDMA->DMA_DCR |= (DMA_DCR_EINT | DMA_DCR_D_REQ); // interrupt when the transmit buffer is empty and stop operation after full buffer has been transferred

Cycle steal mode is used when a single transfer is required per trigger - it wouldn't be used only if you wanted the DMA trigger to start a complete buffer copy, which wouldn't work with the UART since it would cause its output buffer to overrun.

Regards

Mark

uTasker developer and supporter (+5'000 hours experience on +60 Kinetis derivatives in +80 product developments)
Kinetis: http://www.utasker.com/kinetis.html

mahmoudsherrah · ‎05-01-2018

I can't thank you enough. Legend!

mahmoudsherrah · ‎05-01-2018

Thanks a lot for the info!

1- So basically, you keep the DMAMUX channel enabled and instead use ptrDMA->DMA_DCR |= DMA_DCR_ERQ; to start the transmission. Correct?

2- I don't understand how come the DONE interrupt doesn't need to clear the DONE and ERQ bits. Because without clearing both bits, not other transmission takes place (at least in my case).

3. On termination the DONE interrupt is used to check whether there is more data to transmit but otherwise needs to do nothing else.

3- Do you have an idea when Cycle steal mode is not needed then?

Cycle steal mode is correct for this type of transfer.

mahmoudsherrah · ‎04-08-2018

Ok guys, l managed to get the most basic bare metal DMA working. Thought I would share my findings if someone ran to a similar situation like mine, or those who are complete beginners to DMA.

Here are some of my findings: (I would appreciate your feedback anyway)

FIRSTLY: DMA Peripheral

1- Yes, it is possible to realize this simple application without UART0 requesting DMA. It turns out that this is explained in KL25z reference manual, section "22.4.3 Always-enabled DMA sources". Don't know why it was mentioned in DMAMUX section though...

2- Yes, it is possible to realize this simple application without any kind of interrupts

3- No, you do not have to create an array. Passing the address of a simple variable is enough. Just make sure to set data transfer size to 1 (BCR, mentioned in pseudo code later), so that it doesn't read/write unwanted memory locations next to your variable.

4- I still do not know why GCC complains about a missing cast. Feedback is welcome here.

SECONDLY: DMAMUX Peripheral

It turned out that I did not need DMAMUX at all. I still didn't read much about it's practical importance. Feedback is also welcome here.

Finally, here is the pseudo code that implements my simple application:

SystemCoreClockUpdate(); //updates clock to 48MHz, using CLOCK_SETUP as 1
//UART0 Initialization
   //Turn on clock to A module (which contains UART0) in SIM module
   //Enable UART0 clock in SIM module
   //set PortA1 and A2 to UART alternative function in PORTx_PCRn module
   //Choose clock source for UART0 (48MHz) in SIM module
   //Calculate baudrate in UART0_BDH and UART0_BDL register values
   //Enable TX and RX in UART0_C2 register
//DMA Initialization
   SIM_SCGC7 |= (1<<8); //Enable clock to DMA
   DMA_SAR0 = (uint32_t)&UART0_D; //Set source address (from UART0_D)
   DMA_DAR0 = (uint32_t)&receivedByte; //Set destination address (to your uint8_t variable)
    DMA_DCR0 = (1<<20) | (1<<17); //Set SSIZE and DSIZE to 8-bit (i.e. one byte)

//Infinite Loop
   DMA_DSR_BCR0 |= (1<<0); //Set BCR to 1 (i.e. one byte). Since BCR is decremented every byte transmission,
//it needs to be reloaded for the next 1-byte transfer.
   DMA_DCR0 |= (1<<16); //Set START bit to start DMA transmission
while((DMA_DSR_BCR0 & (1<<24)) == 0); //Wait for DMA to finish transmission. Usually there is not much
//waiting since it is wicked fast, but checking wont hurt!)
   DMA_DSR_BCR0 |= (1<<24); //Clear DONE for next cycle by writing 1 to it (i.e. inverted logic). This is
//mandatory, as stated by reference manual.
    //Now receivedByte contains the value of UART0_D, you can use it however you like.

I would really appreciate your feedback regarding the missing points! Thanks a lot!

mjbcswitzerland · ‎04-09-2018

Mahmoud

Since the source and destination registers are declared as unsigned 32 bit words whatever you copy to them needs to be this (uint32_t). Although the address of your array, variable or register is a 32 bit address it is not an uint32_t but instead a *uint32_t, *uint8_t or similar and so a cast is necessary to satisfy the C compiler.
If the source and destination registers were declared as *void, rather than 32 bit values, it would be possible to copy without a cast but there may be other side-effects (eg. it would not be possible for C code to increment them without using a cast).
Finally, the fact that uint32_t and a register address are about the same is a HW fact which is true for this processor; it is not necessarily generally true and so it is not portable. C is warning about this fact and so the programmer is conforming to the compiler - with the cast - that the code is doing what is required for the particular instance.

The DMAMUX is there to connect DMA triggers which either start a single transaction of kickoff multiple ones. What you are doing is in fact a software triggered operation which doesn't need to use the DMAMUX module.

In this particular case of triggering a single transfer via software I don't see the advantage of DMA since it is much faster (no configuration of the controller needed) if the SW just does "val = register_value".

Regards

Mark

uTasker developer and supporter (+5'000 hours experience on +60 Kinetis derivatives in +80 product developments)
Kinetis: http://www.utasker.com/kinetis.html

mahmoudsherrah · ‎04-10-2018

Thanks for the tips Mark. Now I understand the bit about casting addresses.

I totally agree that DMA is overkill for one byte transfer, I just wanted to start from the most basic level and expand from there.

What I don't really get is DMAMUX. I read the datasheet but I feel the description is not very clear. I would appreciate a basic real life example where it can be useful (or mandatory?)

Many thanks.

mjbcswitzerland · ‎04-10-2018

Mahmoud

A simple example of where the DMAMUX comes in is when you what the transfer to be triggered when there is a new byte in the UART reception buffer. In this case you set up the DMA to copy the content of the UART Rx register to a buffer in SRAM and enable the operation but it doesn't yet start performing the transfer (as it does in the SW triggered case that you are presently testing).

In this case you configure the DMAMUX to connect the UART's buffer not empty flag to trigger the DMA transfer. Each time there is a new reception byte in the UART's Rx buffer it will thus cause it to be copied via DMA to the input buffer in SRAM. It allows high speed UART operation without CPU intervention so that the received data is available in the SRAM buffer for subsequent processing. The DMAMUX is responsible for triggering each required transfer - so in periods of no data reception nothing is taking place (in comparison, a SW triggered method also copies data when there is nothing received and so is a polling method which will have a very high overhead [inefficient] and limit the UART data rate since it will not always be able to keep up with it).

Another example of DMAMUX is to generate a waveform on a DAC output at a specific rate. It is possible to connect a timer via the DMAMUX to cause a transfer each time the reference timer overflows. SW triggered DMA is quite rare in comparison to the need to control the trigger, showing that the DMAMUX is require in many instances to be able to coordinate this.

Regards

Mark

uTasker developer and supporter (+5'000 hours experience on +60 Kinetis derivatives in +80 product developments)
Kinetis: http://www.utasker.com/kinetis.html

mahmoudsherrah · ‎04-13-2018

Mark

Great answer. Just the type of example I was looking for!

Lastly, I assume that this muxing method is also better than using interrupt driven DMA? e.g. enable UART receive interrupt (RIE in UART0_C2), then initiate DMA transfer in the UART receive ISR?

Anyway, thanks a lot for the tips. I'll get to work on DMAMUX and see where it leads to. I know I am reinventing the wheel, but I have to do this at least once bare metal on my own to get a feeling of whats going on :smileyhappy:

I'll just mark my initial reply as answer, since it contains the basic steps for enabling interrupt/request free DMA transfer.

Thanks again.

mjbcswitzerland · ‎04-13-2018

Mahmoud

>>Lastly, I assume that this muxing method is also better than using interrupt driven DMA? e.g. enable UART receive interrupt (RIE in UART0_C2), then initiate DMA transfer in the UART receive ISR?

Yes, the idea of the DMA is to save needing to do this, which would be even less efficient that pure interrupt driven operation.

Regards

Mark

FRDM-KL25Z: Bare Metal DMA Basics?

FRDM-KL25Z: Bare Metal DMA Basics?

Freedom Development Platform

Kinetis K Series MCUs