K22 SPI DMAs with "Transmit or Receive" Sources Revisited

matt_boytim · ‎09-18-2019

Has anyone used the suggestion here by Mark Butcher

K22 SPI DMAs with "Transmit or Receive" Sources

for overcoming the single DMA source limitation of some SPI's? Specifically the suggestion:

3. If both are used, the Rx trigger needs to be selected (the Tx trigger occurs too early and so Rx data would not be ready on first transfer). This can be used to trigger the next Tx transfer, followed by chaining to a second DMA transfer which saves the Rx value.

I tried this but it doesn't seem to work. The problem appears to be that by using the RX to trigger a TX transfer the DMA request doesn't seem to get cleared so it keeps sending data to the transmitter until I guess the chained DMA for RX actually transfers the receive data and clears the request. This happens even though I am using a higher DMA priority for the chained-to RX channel.

Looking at the SPI traffic on a scope I see fewer transfers than I expect, and the state of the (hardware driven) ChipSelect seems erratic - it depends on what TX transfers are lost/overwritten. For example if I lose the one that deasserts CS at the end of the SPI transaction then the CS hangs asserted until the next frame.

If my SPI frame consists of only one or two transfers then it works as expected - since the transmitter is double buffered it can't be overrun with only two transfers. But I see this behavior for three or more.

I did not try channel preemption to see if that might allow the RX transfer to occur sooner and avoid the overwritten TX data.

Thanks,

Matt

matt_boytim · ‎09-18-2019

Thanks Mark for the reply. That will work for the DMA but I need a total solution. I had the idea to use the RX trigger TX independently and it leads to a nice total solution - except it didn't work. In searching for other solutions I saw your suggestion to do what I was doing so wondered if it could work.

The problem with using the RX to trigger the RX DMA is that each transfer from the SPI receiver needs a separate trigger so the minor loop must have a loop count of 1. The major loop must then perform a single SPI transaction - since it needs to stop at the end of the SPI frame.

In my application I am streaming sample data from a radio chip continuously at the sample rate. The radio chip issues a sample pulse which triggers the read of one sample via SPI - then on the next sample pulse it does the same, and so on. I want to at least double/pingpong buffer the data. So the problem then is, if the minor loop is one SPI byte/word, and the major loop is one SPI frame - then I need a third loop for the double/pingpong buffer. So I could chain the RX to TX and then I suppose further chain to some get a pingpong buffer via some sort of DMA acrobatics (scatter/gather maybe?). I don't really want just a pingpong of a single SPI frame - I really want a block of say 64 samples so each of the ping and pong buffers aren't one SPI frame they are 64 SPI frames.

The TX data just consists of the register address to read within the radio chip and dummy data since you have to send something in order to receive something. But this same data is sent for each SPI transaction - so TX just needs a double loop.

If I chain TX to RX then the RX minor loop is still a single transfer - but the outer loop can be the full pingpong buffer since it doesn't need to stop at the end to the frame - the TX will stop so the RX will do only as many transfers as the TX.

So I am using three DMA channels. The first is triggered from the GPIO pin which receives the sample pulse - which enables (SERQ) the TX DMA which is chained to the RX DMA. The problem of using the TX trigger is that it immediately triggers twice (because the SPI TX is double buffered) so you get two RX transfers before anything was actually received received. It might be possible to pipeline things to offset the transmitter and receiver but I expect the first two values received to be junk but for some reason the third value is also always junk (it is always a repeat of the second byte) - I assume this is some artifact of overrunning the SPI receiver.

If I could use the RX to trigger the TX then I can completely avoid these problems. I just skew the TX SPI frame and I write the first value to the SPI to 'prime the pump' so something is ready to trigger the DMA. This wouldn't take advantage of the SPI double buffering and might have small gaps between transfers and a small speed penalty as you pointed out, but otherwise it would be a good total solution. But as I said, it doesn't seem to work and the root cause seems to be that you can't trigger the TX from RX because it doesn't clear the DMA request.

Thanks,

Matt

mjbcswitzerland · ‎09-18-2019

Matt

How about using two DMA channels - both triggered by the Rx? Each set to have the transfer count of the complete frame and ping-pong buffers? Since the DMA trigger at the very end of the Rx byte triggers both a read and a write all flags should be cleared.

I think that this is "officially" illegal to do since one should not trigger multiple DMA channels with a single trigger source but in my experience it practically works. (I have not used it in a real product though since I have seen a warning that it 'could' give inconsistent behavior but if it does basically work you could contact NXP to get more details about whether there could "really" be side-effects in your actual case).

Regards

Mark

matt_boytim · ‎09-23-2019

Mark,

I don't think it will work to assign two channels to one source. I have done that accidentally before and it acted like it was connected to only one - like there is a priority to the DMAMUX. But even if it did I don't really want the transmit TCD to describe the full ping-pong buffers because that would require the triple loop. Whatever DMA is triggered by the SPI has to have an inner loop of 1 because it needs to be paced by the SPI and a next inner loop of the number of transfers (bytes assuming 8-bit transfers) - so the full ping-pong frame would take a triple loops so it's convenient that the transmitter only needs to describe a single SPI frame with a double loop. Since the receiver is chained-to from the transmitter it also needs an inner loop of 1 but it doesn't need to stop at the end of an SPI frame because the transmitter will stop - so its outer loop can describe the full ping-pong buffer.

Thanks,

Matt

mjbcswitzerland · ‎09-18-2019

Matt

Try chaining the other way around:
1. Use the Rx DMA trigger to read the Rx
2. Now send the next byte by chaining to the next DMA channel

Regards

Mark