How do I use DMA to transfer an array to PORT pins at a known rate?

migry · ‎12-21-2023

I have searched the forum for answers, and several postings are relevant, but I am still confused.

I am using a Teensy3.5 (MK64FX512VMD12 based).

I want to create a bit stream as if it was generated with a 1MHz clock. Essentially I want to fake floppy disk write data pulses.

I have figured out that one less than ideal way is to stream an array of bytes to one of the ports, and simply use one bit as my wanted data stream. I can pre-process the 512 bytes of the sector and expand them into a 512(bytes per sector)x8(bits per byte)x4(1us pulses per bit)=16k array in RAM using the MFM coding scheme.

I am pretty sure that there are much more efficient/better ways to do this, but I want to experiment with this method.

I have already experimented with a sketch which uses the Teensy DmaChannel library.

I have programmed a PIT for 1MHz. I saw in the DMAMUX that there is a specific bit (TRIG) which selects this as the "trigger" for the DMA (to get the desired output data rate of 1Mb/s). What wasn't clear was the "source" part of the DMAMUX_CHCGFx register in this mode where the TRIG bit is '1' to select the PIT clock as trigger. I suspect (from experiment) that it needs to be "DMAMUX_SOURCE_ALWAYS0" (Teensy #define which I think is 58 for this SoC) so that the PIT clock always transfer a byte.

I found a simple example which transfers one byte (of 0xFF) from a memory location of the port toggle register (I used PORT C). The DMA transfer was size=1, length=1. When I enabled the DMA I got a nice 500kHz square wave on PORT C bit 0. Clearly the DMA being only a single byte transfer was being continually re-triggered. I need the behaviour that after starting the DMA it runs and completes, like I would expect for a memory DMA block transfer.

I did use Teensy DMA library code where I pointed the source to a 256 byte byte array in RAM. On the scope I saw the expected burst of data over 256us, then a little gap, then another block of 256us. Again I was seeing the re-run of the DMA transfer. Since the data stream in each burst was correct, this appeared to indicate that when the DMA re-runs the source address is reset?

When I changed the DMAMUX_CHCGFx by setting TRIG to zero and selecting PORTA as the source, when I toggled the appropriate PORT A input pin, it seemed to cause the PORT C I/O to change. So even though I was using an array, each byte was sent needing a trigger.

I am aware of (but do not fully understand) the minor and major loop registers.

SO I am now thinking that I simply need to detect once the burst of 256 has ended, and then stop the DMA so that it doesn't re-run?

I guess my confusion is that I could program the DMA to transfer 256 bytes, clocked by the PIT, start the DMA, and then I assumed that it would self-terminate. As I write this I think this is incorrect, and I need to step-in to stop the DMA.

The Teensy DMAChannel library has a way to attach and interrupt at the completion of the DMA (dma0.interruptAtCompletion(); ). What condition causes this interrupt to be triggered? Completing a major loop? Completing all major loops?

In the library the array length (256) gets written to BITER and CITER (all I know at the moment is that these are these are DMA TCD registers). NBYTES=1 (byte transfer I assume?). SOFF=1 (array byte increment?). This makes sense.

So I get the idea that each trigger (i.e.PIT or even PORT A pin) causes one DMA transfer, which in this case is the DMA transfer of one byte from memory to the same PORT register address, causing the I/O pin to change. This also does something to the minor and major loop count? At some point there is a interrupt (if enabled) which allows software to stop the transfer at that point?

Any clarification would be helpful.

I'm going to spend a few hours playing with the Teensy.

A pointer to nice PDF/presentation of the TCD structure and operation of the DMA would be appreciated. I searched but didn't find anything.

migry · ‎01-06-2024

I have managed to figure out how to use the DMA system and it is very powerful. Nevertheless in searching the web I have not found any really good documentation or training material.

It is my opinion that the NXP Reference Manual is simply a reference, and does not adequately explain how to use many of the module and features of the SoC.

This is a pity. The NXP SoCs are extremely powerful and flexible, but as usual the complexity of adequately documenting all these features means that you have to put in a lot of time and effort to figure things out yourself, where they are not documented in the reference manual.

migry · ‎01-06-2024

Thank you for the reply.

This is a very helpful app note, although not directly applicable for my purposes, it does illustrate the features of the DMA system, which I now realise are very very powerful.

I can see a way in which a modified version of the technique could be used to generate a data stream with pulse lengths of various sizes, just like my app requires. I could use an array of byte values which determine the length of each high and low and high and low, and so on. The downside is that this would be very wasteful of memory, although TBH the NXP based Teensy3.5 has lots of RAM, so perhaps not a real issue.

I have been experimenting and have played with a number of possible solutions.

1) use PIT based DMA (to get 1us pulse resolution) to DMA 8-bit array to 8-bit section of 32-bit port. Works, but very wasteful of memory. For every "byte" I want to encode I have to store 32 bytes in the 8-bit array, of which I only use 1 bit.

2) use SPI in DMA "CONT"inuous mode. This also worked, however to use DMA and continuous mode you are FORCED to DMA 32-bit values, because the top-bit of the PUSHR register MUST be set to '1' to indicate CONTinuous mode. Note that continuous mode means each SPI value is sent back to back with no filler or gap. By selecting 16 bit SPI mode, you still are wasting 50% of the array, just to set the CONT top bit. Also clocking is less flexible, because the SPI clock is based on the bus clock and only has fixed power of 2 dividers. I would need to overclock the 120MHz Teensy to 128MHz because this divides down by 128 to get the 1us pulse resolution that I need.

3) use PIT interrupts. Perhaps simplest of all. I get the PIT to interrupt every 1us and send the next bit immediately that the interrupt routine is entered. I then have less than 1us to calculate the next bit, ready for the next interrupt. Thanks to the fast 120MHz CPU, I see that worse case it takes 800us to calculate the next bit (MFM encoding).

xiangjun_rong · ‎01-02-2024

Hi,

Pls refer to the AN4419.pdf and AN4419SW.

The AN discusses how to write GPIO port with DMA involvement and controlled by PIT, it can simulate PWM signal with variable duty cycle.

Hope it can help you

BR

XiangJun Rong