Efficient interrupt-driven use of the UART FIFO

scottm · ‎04-29-2017

I'm using UART0 on a K22F at 1 Mbps, and will need to go faster later. Getting an interrupt for every incoming byte is inefficient, so I've got the FIFO RX watermark set to 6 bytes with hardware flow control enabled, to account for the sending device potentially taking a byte or two before honoring RTS.

The trouble is that the FIFO will never generate an interrupt if it only fills up partway. I could poll it periodically but doing so with high frequency defeats the purpose of using the FIFO and doing it slower produces unacceptable latency, so I'm using the IDLE interrupt to notify my driver that the sender is done and that the FIFO should be read.

Clearing the IDLE flag requires reading S1 with IDLE set and then reading D. If a byte comes in after the FIFO count is read, what ought to be a dummy read to clear IDLE gets actual data and the byte is lost. To avoid that, I'm checking the FIFO RX underflow flag after reading D to find out if it was real data or not. That works, but it introduces a new problem because reading D with the FIFO empty causes the FIFO pointer to become misaligned and it needs to be flushed - again potentially losing data that might have just come in.

What is the best way to make use of the FIFO to generate the fewest possible interrupts with the lowest latency without creating the possibility of losing data due to a race condition?

Thanks,

Scott

mjbcswitzerland · ‎04-30-2017

Scott

Since I don't have experience with interrupt driven rx and idle line interrupt at the same time I won't try to answer the actual question. Instead I would suggest looking also at DMA reception since it is easy to achieve efficient and reliable operation with low latency up to highest rates.

The latency that you mention is however not yet defined: Is it latency at the driver level or maximum latency to avoid overruns (or possibly unnecessary CTS negations) or is it latency at the application level due to the end of a received "message"?

The simplest DMA based method (which is interrupt-free and will probably never cause any CTS negation even at highest speeds) is free-running DMA reception to a (large) circular buffer. The application can check this periodically or when there is nothing else to do and read out data that is present to make room for further reception.

I expect that idle line interrupt can also be used in parallel if message triggering is required to possibly further reduce application latency to react to a complete message (assuming it is followed by an idle condition). In many situations the application ("background task") buffer polling will be just as fast though as long as there are not many different 'tasks' operating at the same time.

Even if the optimised FIFO interrupt plus IDLE line interrupt operation is solved and perfectioned I don't expect that it will be able to match a fairly simple DMA based strategy with regards to CPU overhead and UART throughput.

Regards

Mark

scottm · ‎04-30-2017

Hi Mark,

I find that DMA operation is usually harder to debug and I wanted to have a solid understanding of what's going on with the UART before I try using DMA. Do you know where I might find some good example code that doesn't rely on KSDK? I'm stuck on CW 10.7 for most of my projects for the foreseeable future.

The handshaking latency is fine; with hardware flow control enabled, it raises RTS immediately when the threshold is hit and the sending device *usually* honors that immediately. It's the latency in getting the data to the application that I'm worried about.

I will still need flow control even if I set aside all of the DMA buffer space I can. Of all of the firm real-time tasks the application needs to handle, the UART data is possibly the only one that can be postponed briefly as long as flow control is working. Being able to rely on flow control taking care of a rare missed deadline makes the rest of the system quite a bit easier to deal with.

Last time I looked at DMA for this, one of the problems was that to get a transfer after every byte, the FIFO watermark has to be set to 1. The problem is that it's also the flow control watermark, and it raises RTS briefly after every byte. The sending device is a SiLabs WGM110 and their first firmware release was totally broken in its flow control handling. A brief pulse on RTS would frequently cause their software-emulated UART to stop sending permanently, until a hard reset. In the next release it would only rarely freeze - after maybe 30-60 seconds of data instead of a few hundred bytes, but still too frequent. I haven't had the latest beta release freeze yet but I don't have 100% confidence in it and in any case every microsecond that RTS is raised is a microsecond that it's not sending data. I also don't like throwing another 1 MHz square wave into the mix when I'm trying to keep noise down. Using the FIFO reduces that by at least a factor of 6, and since many of the packets coming in are 4-5 bytes it sometimes eliminates it entirely. Trying to squeeze those last bytes out of the FIFO in DMA mode when the line goes idle seems at least as problematic as getting them out in interrupt mode, though. I'll deal with the 1 MHz RTS pulses if I have to; if SiLabs has actually fixed the bug, the rest of it can be dealt with.

Background poling in this case is complicated by the fact that when there's not much activity it'll be in WFI mode most of the time to save power.

If anyone from NXP is paying attention, what I'd love to see in a future version is a separate flow control watermark to separate the interrupt/DMA trigger from the flow control trigger, and an option for an idle line to generate an IRQ or DMA (only once!) when there is data in the FIFO.

Thanks,

Scott

mjbcswitzerland · ‎05-01-2017

Scott

RTS/CTS is not needed with DMA (although can be used if it doesn't cause issues with the other side) since the Rx will never miss a character. The SW buffer just has to be large enough that it doesn't overflow (that is, the application SW must read on average fast enough that the circular buffer doesn't become full). HW RTS/CTS will not help in this case since it won't trigger.

If you use low power modes you won't be able to receive UART at high speeds. VLPS will work up to about 56kBaud due to the time that it takes for the processor to move from VLPS to RUN and so I don't expect that you will be able to use anything lower than WAIT, which means that the UART and DMA are still fully functional. There are however various different (incompatible) K22 parts, some with LPUART, so the exact type needs to be known to be sure. Based on the application latency requirements a regular HW timer wake up to check to see whether there is UART reception may be suitable (Eg. at 8MBaud, 16kBytes DMA buffer and 10ms HW timer wake-up would allow safe 10ms latency (with up to a 8k rx waiting to be treated).

There is complete and mature RX DMA in the uTasker Open Source project which has been proven in intensive industrial use which has no reliance on KSDK in case you require a reference or immediate complete operation:

Web:      https://github.com/uTasker/uTasker-Kinetis
HTTPS:    https://github.com/uTasker/uTasker-Kinetis.git
SSH:      git@github.com:uTasker/uTasker-Kinetis.git

Regards

Mark

scottm · ‎05-01-2017

RX will never miss a character, but the application needs to get to it in time and I've got only so much RAM to devote to buffers. It just makes my life easier if I know that one data stream can slip a deadline and not lose data. In some modes (like if it's streaming audio) it'll degrade performance, but missing a byte causes it to get de-synchronized with the sender and it can miss many packets before it finds a valid header again. I can probably get what I need by setting up a half-complete interrupt or something on the DMA channel and setting RTS/CTS from there.

WAIT is what I'm using. It may end up using VLPS at some point but only in standby modes. The WGM110 is a WiFi module so it's not going to be waking the system up from a very low power mode since it's a power hog itself and will only have unexpected data if it's connected. Using WAIT during normal operation is an easy 20 mA savings, though.

I'll check out the uTasker code, thank you very much!

Scott

mjbcswitzerland · ‎05-01-2017

Scott

The uTaker code has single and half-buffer interrupt options as well. It also supports all low power modes with dynamic switching - WAIT mode is its default, meaning that it will automatically use it when no tasks are in a RUN state.

A further advantage is that it will run the same code on all Kinetis parts so no redesign (or repeated development cycle) is needed in case a design is moved to a lower power KL device (its KL DMA interface emulates the K DMA interface for compatibility - also emulating half buffer DMA interrupts).

Regards

Mark

scottm · ‎06-27-2017

Hi Mark,

I've got my code using a circular buffer for incoming UART data now, but it seems less than ideal in its current state.

I'm communicating with a WiFi module that has a lot of back-and-forth with small, variable length packets and then long stretches of data at high speed. The host must wait for a response to every command packet before issuing another one, so latency on small transactions piles on fast. At the moment I'm still polling the DMA channel rather frequently to check for received data, and at the very least I have to poll often enough to ensure that I have time to drop RTS if the buffer is filling up. Doing that is risky unless and until SiLabs fixes the flow control bug, so I can't do it unnecessarily. Slow external flash access means I have to deal with relatively long periods where I can't take more data when I'm receiving a file and buffering alone is impractical. Other data streams are processed on the fly and need high speeds, so reducing the baud rate isn't an option.

Packet headers are always 4 bytes and contain the expected payload length, so I could set up my RX function to take the expected length and use the DMA complete interrupt for notification that the transfer is complete. The problem with that is that then the DMA transfer isn't running and the RX FIFO could overflow if more data comes in before it's restarted. I could maybe chain the DMA channel to another one that would write any unexpected bytes into the circular buffer, but then I've got the added complexity of having to piece buffers back together again.

I could try to use the UART's status interrupt to get an initial notification of incoming data and then set my UART polling timer only when I know there's data coming, but that brings me back to the original problem of not being able to safely clear the RDRF flag.

What I'd really like is a configurable CITER compare interrupt, so I could have the DMA controller notify me when the expected number of bytes have been received.

Scott

codyhubman · ‎07-11-2019

Did you ever get this answered? Im running into similar UART issues where my UART receive keeps getting invaded by noise when I am in the process of transmitting.

scottm · ‎07-11-2019

I never did find a way to safely use the IDLE interrupt, no. The system I ended up with is tuned for this specific application, which happens to be interfacing with a SiLabs WGM110 WiFi module. It's all packetized, and each packet has a 4-byte header and a variable-length payload. My code has the DMA controller writing received UART data continuously into a circular buffer, and a function call tells the driver how many bytes to expect next - initially 4.

When an expected count is given, it pauses DMA by setting CR[HALT], waiting for CSR[ACTIVE] to clear, setting CERQ, and clearing CR[HALT]. This has to happen in a critical section, and it allows DADDR to be read consistently. If the requested bytes aren't already in the buffer, the number of remaining bytes is set in TCD[n][BITER] and TCD[n][CITER] with INTMAJOR enabled so that when the rest have been received an interrupt is generated. DMA transfers are then resumed using SERQ. Flow control is handled in a separate timer and just relies on having a large enough margin that the buffer won't overrun before it can de-assert CTS.

From what I remember, the tricky part in all of that was reading DADDR to get an accurate count without dropping data from the UART. There's some kind of undocumented issue with the UART DMA request that can cause it to drop a byte - I know I made a post about it somewhere. The fix for that was to halt the DMA engine completely, wait for completion, and disable the channel's ERQ. That means you've got to take care of your DADDR read and restart DMA before the UART's FIFO fills up.

It's not the prettiest setup, but it works.

Scott

Efficient interrupt-driven use of the UART FIFO

Efficient interrupt-driven use of the UART FIFO

Kinetis K Series MCUs