Using SPI/DMA on the K60.

simonbusman · ‎09-11-2014

The tx/rx buffers and DMA are configured 16-bit wide.

How does the SPI module get its PUSH command bits?

Are there working examples that do more than 1 transfer in a minor loop?

egoodii · ‎09-18-2014

DMA_ATTR_SSIZE does NOT 'figure into' the total transfer size, it only defines the address-bus-operation-size DMA can 'work with' at the source (in this case being 32-bit-wide memory, DSIZE=4 for the destination as a 32bit SPI register). The transfer count is JUST the 256 iterations (CITER) of the 4-byte-minor-loop transaction-byte-counts, for 1024 Bytes. Again, MLNO MUST be 4 for the SPI interactions to make EACH DMA operation transfer exactly ONE Dword (4 bytes) from memory to SPI for each CITER count (DMA request).

Let's contrast that with, say, DMA to a UART. It is a 'one byte wide' interface, so the DSIZE would be 1. So regardless of the number of bytes in any part or whole transaction, they always get broken into single-byte-writes to the same address. But now here you have some 'watermark' registers, and a potential FIFO size of 8 bytes. So you MIGHT set a watermark to only request when 7 bytes are 'needed', and then you would set MLNO to 7 to make a single DMA-block-transfer move that many bytes for you on the single 'TX FIFO HAS [at lest this much] ROOM' request. And then your iteration-count (CITER) would be total-transfer-size/7. I know, UART is 'more complicated', but I bring it up only to show 'how else' MLNO might be used in a 'more general' case.

And for your 'memory to memory' DMA-based-buffer-expansion, you can set MLNO to 'any number of bytes you like' up to the whole transaction, BUT note that 'once started' the DMA process will proceed at the MAXIMUM rate allowed by hardware thru the MLNO count of bytes, at the expense of other potential memory-bus users (like the CPU!). If this 'stall latency' might cause you grief, you will want to set up smaller subtransactions for some 'breaks'.

在原帖中查看解决方案

egoodii · ‎09-11-2014

I'm not aware of any way to 'cheat' on this PUSHR loading. Including a 'command word' with EVERY 'SPI transaction word' leads to some very nice self-timed multi-transactions, but comes at this price. You can check out my post where I DMA thru SPI a bitmap to a monochrome graphics display, where I have the full PUSHR 32-bits in each buffer location. I've seen a couple other posts in here where they claim to 'get along without' the upper bits, at least by doing their own CS, but leaving the other controls to 'chance'(?) seems like a bad plan. None of those posts explains how these 'other controls' work out for them! Maybe you can use two DMA channels, one to expand a 'linear packed memory image' to a full-word buffer, followed by the SPI DMA?

simonbusman · ‎09-11-2014

Expanding the tx buffer to 32 bits and fill the lower 16 bits of each record with the (same) command data is the way to make this work. The main 'problem' is this doubles the amount of memory needed for the tx buffer. If i use another DMA channel to expand a 16-bit buffer to a 32-bit buffer would take even more ram. I will test this.

Can you provide a link to an example?

egoodii · ‎09-12-2014

My example where I 'blast' the full 32-bit-buffer (bottom 8 used for screen commands and then screen data!) to my OLED display (at regular intervals) is here:

Re: DMA with SPI to read SD Card?

The 'upper command bits' are 'set and left alone' at initialization, and the bit-banding-using code that I use to bitblt and line-draw into this binary-display-array understands the packing into one byte in each successive Dword (and of course the X/Y shape inherent therein as well!).

You usage would be a 'little different', where you would have a DMA operation setting the 'lower words'.

simonbusman · ‎09-14-2014

I have 1 board with 2 K60's interconneted with spi. With the help of your example I now have this running under full DMA control, both master and slave, rx/tx at the same time. Software tunes the CITER/BITER to make sure the 4 major DMA loops are in sync. 2 questions:

I your example your DMA_TCD2_NBYTES_MLNO = 0x04. Why did you choose 0x04?

You do regular updates of: SPI0_RSER_TFFF_RE_MASK | SPI_RSER_TFFF_DIRS_MASK. Why do you do that?

egoodii · ‎09-16-2014

DMA_TCD2_NBYTES_MLNO = 0x04 --- I expect each DMA request to come needing one Dword transfer -- what might you recommend?

SPI0_RSER_TFFF_RE_MASK | SPI_RSER_TFFF_DIRS_MASK. -- I don't know, I guess I never made an effort to see that I didn't need to...

simonbusman · ‎09-17-2014

I want to transfer 256 16-bit words. The way DMA/SPI is integrated one needs to convert this table of 256 16-bit words to a table of 256 32-bit words with de SPI commands added in the higher 16 bits of the 32-bit words. Now I want to configure DMA to do this and I have to make a choice how to configure major/minor loop parameters. Because the major loop generates the IRQ's I want to configure the DMA channel to transfer exactly 256 32-bit words to the SPI. (same for rx).

Because the SPI needs 32-bits I figure I need to configure the DMA to do exacly 256 32-bit words. Now with the current setting (based on your example) I set the DMA up to do 1024 32-bit words. And this works perfectly. Earlier I tried MLNO = 1 and do 256 in the major loop. That did not work at all. One should expect that it should do at least 64 words but it did not do that either. It seems to programmer is not free to choose MLNO value. I want to know why? Maybe it has got something to do with the FIFO QUE depth op the SPI (also 4) but it just doesn't add up. I cannot find any clue in the datasheet why MLNO needs to be 4.

The SPI0_RSER_TFFF_RE_MASK | SPI_RSER_TFFF_DIRS_MASK bits are IRQ enable bits and I believe these need to be set only once. But one never knows. It might be a work-around because of some other bug I didn't know of. Hence the questions.

egoodii · ‎09-17-2014

MLNO is the minor-loop-count, the 'number of BYTES to move on each request'. AFAIK for SPI that needs to be one Dword (4 bytes) as SPI will request for EACH transfer. I don't believe the SPI FIFO has 'watermark' controls to reduce the number of requests/expect more per request. So, as you say, 'no choice' for this DMA/peripheral combination. And I assume your 'code that works' sets up to transfer 1024 BYTES, as 256 Dwords (source/destination each set to a size of 4 bytes).

simonbusman · ‎09-17-2014

DMA setup:

#define S1_BUF_SIZE 256

// --- Enable Analog in clocking ---

SIM->SCGC6 |= SIM_SCGC6_ADC0_MASK; // Enable ADC0 gate clock moved to system_MK60F12.c

SIM->SCGC3 |= SIM_SCGC3_ADC1_MASK; // Enable ADC1 gate clock

SIM->SCGC6 |= SIM_SCGC6_ADC2_MASK; // Enable ADC2 gate clock

SIM->SCGC3 |= SIM_SCGC3_ADC3_MASK; // Enable ADC3 gate clock

SIM->SOPT7 = SIM_SOPT7_ADC0TRGSEL(0) | // 0 = external trigger

//SIM_SOPT7_ADC0PRETRGSEL_MASK | // 0 = pre-trigger A selected for ADC0; 1 = Pre-trigger B selected for ADC0

//SIM_SOPT7_ADC0ALTTRGEN_MASK | // 0 = PDB trigger selected for ADC0; 1 = Alternate trigger selected for ADC0

SIM_SOPT7_ADC1TRGSEL(0) | //

//SIM_SOPT7_ADC1PRETRGSEL_MASK | //

//SIM_SOPT7_ADC1ALTTRGEN_MASK | //

SIM_SOPT7_ADC2TRGSEL(0) | //

//SIM_SOPT7_ADC2PRETRGSEL_MASK | //

//SIM_SOPT7_ADC2ALTTRGEN_MASK | //

SIM_SOPT7_ADC3TRGSEL(0) ; //

//SIM_SOPT7_ADC3PRETRGSEL_MASK | //

//SIM_SOPT7_ADC3ALTTRGEN_MASK ; //

ADC_Calib();

// channel 6 descriptor

DMA0->TCD[6].SADDR = (uint32_t)&SPI1->POPR;

DMA0->TCD[6].SOFF = (uint16_t)0;

DMA0->TCD[6].DADDR = (uint32_t)&S1_rxbuf;

DMA0->TCD[6].DOFF = (uint16_t)4;

DMA0->TCD[6].ATTR = 0;

DMA0->TCD[6].ATTR |= DMA_ATTR_SSIZE(2) |

DMA_ATTR_DSIZE(2) |

DMA_ATTR_SMOD(0) |

DMA_ATTR_DMOD(0);

DMA0->TCD[6].NBYTES_MLNO = 4;

DMA0->TCD[6].SLAST = 0x00;

DMA0->TCD[6].CITER_ELINKNO = DMA_CITER_ELINKNO_CITER(S1_BUF_SIZE >> 0);

DMA0->TCD[6].BITER_ELINKNO = DMA_BITER_ELINKNO_BITER(S1_BUF_SIZE >> 0);

DMA0->TCD[6].DLAST_SGA = DMA_DLAST_SGA_DLASTSGA(0); // scatter/gather wordt niet gebruikt

DMA0->TCD[6].CSR = 0; // diable ELINK, disable scatter/gather,

DMA0->TCD[6].CSR |= DMA_CSR_DREQ_MASK | // One transfer only.

//DMA_CSR_INTMAJOR_MASK | // enable interrupts

DMA_CSR_BWC(0x0); // Bandwith max

// channel 7 descriptor

//DMA0->TCD[7].SADDR = (uint32_t)&S1_txbuf;

DMA0->TCD[7].SOFF = (uint16_t)4;

DMA0->TCD[7].DADDR = (uint32_t)&SPI1->PUSHR_SLAVE;

//DMA0->TCD[7].DADDR = (uint32_t)&SPI2->PUSHR;

DMA0->TCD[7].DOFF = (uint16_t)0;

DMA0->TCD[7].ATTR = (uint16_t)0;

DMA0->TCD[7].ATTR |= DMA_ATTR_SSIZE(2) |

DMA_ATTR_DSIZE(2) |

DMA_ATTR_SMOD(0) |

DMA_ATTR_DMOD(0);

DMA0->TCD[7].NBYTES_MLNO = 4;

DMA0->TCD[7].SLAST = 0x00;

DMA0->TCD[7].DLAST_SGA = DMA_DLAST_SGA_DLASTSGA(0); // scatter/gather wordt niet gebruikt

DMA0->TCD[7].CSR = 0; // diable ELINK, disable scatter/gather,

DMA0->TCD[7].CSR |= DMA_CSR_DREQ_MASK | // One transfer only.

DMA_CSR_INTMAJOR_MASK | // enable interrupts

DMA_CSR_BWC(0x0); // Bandwith max

DMA trigger:

SPI1->MCR |= SPI_MCR_CLR_RXF_MASK | // clear rx FIFO

SPI_MCR_CLR_TXF_MASK ; // clear tx FIFO

DMA0->TCD[6].CITER_ELINKNO = DMA_CITER_ELINKNO_CITER(S1_BUF_SIZE); // DMA's in sync.

DMA0->TCD[6].BITER_ELINKNO = DMA_BITER_ELINKNO_BITER(S1_BUF_SIZE);

DMA0->TCD[7].CITER_ELINKNO = DMA_CITER_ELINKNO_CITER(S1_BUF_SIZE);

DMA0->TCD[7].BITER_ELINKNO = DMA_BITER_ELINKNO_BITER(S1_BUF_SIZE);

SPI1->MCR &= ~SPI_MCR_HALT_MASK ; // halt disable

DMA0->ERQ |= DMA_ERQ_ERQ6_MASK; // DMA will respond to SPI rx (RDRF) requests

DMA0->ERQ |= DMA_ERQ_ERQ7_MASK; // DMA will respond to SPI tx (TFFF) requests

SUMMARY:

DMA_ATTR_SSIZE = 4 bytes

MLNO = 4

CITER = 256

With this setup we expect to send (datasheet): 4 * 4 * 256 = 4096 bytes

But in the test it actually does send 1024 bytes.

Any setting of MLNO other than 4 does not work.

Can you confirm this?

egoodii · ‎09-18-2014

DMA_ATTR_SSIZE does NOT 'figure into' the total transfer size, it only defines the address-bus-operation-size DMA can 'work with' at the source (in this case being 32-bit-wide memory, DSIZE=4 for the destination as a 32bit SPI register). The transfer count is JUST the 256 iterations (CITER) of the 4-byte-minor-loop transaction-byte-counts, for 1024 Bytes. Again, MLNO MUST be 4 for the SPI interactions to make EACH DMA operation transfer exactly ONE Dword (4 bytes) from memory to SPI for each CITER count (DMA request).

Let's contrast that with, say, DMA to a UART. It is a 'one byte wide' interface, so the DSIZE would be 1. So regardless of the number of bytes in any part or whole transaction, they always get broken into single-byte-writes to the same address. But now here you have some 'watermark' registers, and a potential FIFO size of 8 bytes. So you MIGHT set a watermark to only request when 7 bytes are 'needed', and then you would set MLNO to 7 to make a single DMA-block-transfer move that many bytes for you on the single 'TX FIFO HAS [at lest this much] ROOM' request. And then your iteration-count (CITER) would be total-transfer-size/7. I know, UART is 'more complicated', but I bring it up only to show 'how else' MLNO might be used in a 'more general' case.

And for your 'memory to memory' DMA-based-buffer-expansion, you can set MLNO to 'any number of bytes you like' up to the whole transaction, BUT note that 'once started' the DMA process will proceed at the MAXIMUM rate allowed by hardware thru the MLNO count of bytes, at the expense of other potential memory-bus users (like the CPU!). If this 'stall latency' might cause you grief, you will want to set up smaller subtransactions for some 'breaks'.

simonbusman · ‎09-18-2014

Tnx! We use the DMA to avoid cpu overhead (typical handling many IRQ's). If DMA memory actions blocks cpu RAM access what are the bennefits using DMA/Crossbar/backdoor?

egoodii · ‎09-18-2014

You would have to find someone 'more versed in the details' of both ARM bus hierarchy and the DMA engine to know 'how much' RAM-access interference is involved. Unless you run your code from RAM, then access from the CPU is of a 'limited' nature (say 10% of the CPU cycles?) so it only has to 'squeeze a few in every now and then' to avoid a serious bottleneck. Suffice it to say that K20P81M100SF2V2RM.pdf (to pick a reference manual) Table 22-292 shows that an SRAM-to-SRAM transfer can completely consume the SRAM bandwidth, reading then writing a full Dword every 2 clocks. 'Interruptability' is a question of hardware priorities. I only brought it up as an example of how one should think about the MLNO use. But I will bring up one other 'little' item of note here: The DMA engine is 'just a little CPU' running hard-coded cycles at the CPU clock rate, and as such is not INHERENTLY 'more efficient' at moving the data from memory to memory, so if you're 'spinning on waiting for it' there is little advantage in all the setup overhead -- it really only helps when the CPU can 'run some other processes, preferably from ROM'.

Using SPI/DMA on the K60.

Using SPI/DMA on the K60.

Kinetis K Series MCUs