DISCUSSION: Missing critical RX UART DMA Event Functionality in Freescale Parts

pmt · ‎11-05-2012

I've been through the detailed exercise of writing DMA driven TX and RX drivers for the Kinetis UARTs. Despite the kitchen sink of functionality included in the UART peripheral there seems to be some trivial, but critical functionality missing in the Kinetis (and Coldfire) implementation that is needed for proper device driver implementation. What’s very disappointing is these features are quite obvious and contained in many of the competitors implementations.

I wanted to create a discussion about why this omission exists (after all this peripheral set has been around for years and has evolved over time), whether these issues are recognized by Freescale as serious deficiencies, and if these problems will be rectified in future parts.

The problems exist on the Receive side. The UART peripheral, along with the DMA engine is fully capable of transferring all incoming UART data into a memory ring buffer. Once initial DMA and UART setup is performed, no other CPU intervention is ever needed to perform this task. This aspect alone is fantastic, as many vendors screw even this part up in one form or another, usually requiring some impossibly low latency CPU intervention to chain the next DMA or to wrap a DMA buffer pointer. But Freescale got this right.

What is missing is event (interrupt) notification on certain key UART activity. Event notification is used, particularly in the context of a RTOS, to provide a hardware initiated signal to the device driver that allows the implementation of low latency blocking I/O calls or callbacks.

Simply put, meaningful event notification is non-existent on the UART RX side while used in combination with DMA. This fact then requires some very cumbersome workarounds usually held together by shoelaces and bubble gum.

First understand the nature of a basic UART:

- It is an general purpose asynchronous device
- It is a character oriented stream I/O device

As such, in the most generic sense, no assumptions can be made about the timing of incoming characters, the number of characters, or the content of data. Additionally you often don’t have control of the thing communicating with you in terms of protocol.

Next, the purpose of implementing a DMA in conjunction with the UART is to:

- Reduce or eliminate CPU overhead (and RTOS context switching) needed vs. the classic interrupt driven task of transferring data between the UART and memory.
- Unconditionally eliminate the need for very low latency servicing of the UART, be it at the interrupt or user level, even if this is an occasional interrupt.

With this is mind, there are only two interrupt notifications allowed by the DMA engine:

- Buffer half full
- Major loop completion

Both of these are inadequate as you need to be able to generate an event for activity on the UART as small as a single incoming character. At the same time, you don’t want to generate an event unconditionally on every incoming character. Doing so would defeat the purpose of using DMA in the first place.

In addition to the DMA engine, the UART itself can generate events. Of the most useful ones, most are reserved for ISO 7816 mode functionality. The rest are incompatible with DMA or require control of the protocol (such as line break detection event).

Case in point: The IDLE interrupt functionality would be perfect for event notification. However to clear the IDLE flag you need to:

- Read the S1 STATUS register which has the side effects of changing critical status bits in the same register that control DMA execution.
- Read the DATA register that is now conflicting with the DMA read leading to race conditions and FIFO under runs.
- Then you need to re-set the IDLE flag in a way that has its own set of hazards and race conditions.

I’m perplexed why the IDLE flag clearing is this way. There is simply no reason for it. Why not provide a completely independent, bit and instruction atomic way of clearing this flag?

So Freescale really left us hanging. There simply is no meaningful event notification functionality as it relates to the UART RX and DMA. Everything is a workaround, or I am perhaps completely overlooking some obscure feature or technique in the datasheet.

Interestingly enough, there is mention in the datasheet of some orphaned hardware functionality called ILDMAS (do a search for this in the PDF). This allows you to generate a DMA on an IDLE line condition, having the DMA atomically clear and reset the IDLE flag. Conceivably, you could use this to kick of a dummy DMA transaction that could then trigger an interrupt that could provide event notification. My guess is that this was orphaned along the way to provide some optimization without realization of the true importance of this feature.

So here’s my list of critical features that are needed:

- IDLE line interrupt: after activity is detected on the line (which enables the counter after a previous clear), then after a programmable number of idle times (say 3 to 512 bits) generate an interrupt. Any activity resets the IDLE counter to zero and defers the interrupt.

- Non-IDLE interrupt: interrupt after a programmable number of consecutive characters assuming an IDLE condition is not otherwise triggered. That is, if say 32 characters in a row are received such that an idle interrupt is not generated, then an interrupt should be generated.

- These flags should have a simple bit and instruction atomic clearing mechanism that works in harmony with the DMA, and independent of any WAKE or SLEEP feature. In fact, this should be true of all the status bits.

Perhaps these features can be combined in a single ‘RxD-edge detect with delayed interrupt’ concept.

So that’s all that is needed. IMO, there are some simple hardware gate changes that would fix the IDLE interrupt mechanism as it now stands in the Kinetis implementation, and that would go a long way all by itself. But in the general case I’d like to see the two above features above.

While I’m at it, I’d like to throw my 2 cents in for some additional UART features which would be really nice. Some of these exist in competitor devices:

- HARDWARE INTERCHARACTER DELAY: The hardware should support inserting a programmable number of idle bits (say 1 to 32) between transmitted characters.
- INDEPENDENT TX/RX BAUD RATES: The TX and RX devices are almost wholly independent peripherals EXCEPT for the baud rate register. No reason it needs to be this way.
- USART: external Hardware CLK line that can be configured as a CLK input or output for USART operation. Having The CLK output is especially helpful when creating a high speed serial link to an FPGA for instance. It really simplifies the logic that you need to embed in the FPGA and allows higher speed.
- INTERRUPT ON SPECIFIC CHARACTER: When the UART sees a character such as CR, or 0x00, be able to generate an interrupt (similar to 9-bit or ISO modes but extend these to 8-bit mode as well).

I welcome some discussion on this. I hope Freescale will pick up this thread and provide these features.

Regards,

pmt

michaelguntli · ‎11-17-2015

Very well written, sadly no one of the Freescale team joined the conversation.

Anyway, if you keep searching, you will find an errata.

e2584: UART: Possible conflicts between UART interrupt service routines and DMA requests

http://www.freescale.com/files/microcontrollers/doc/errata/KINETIS_4N30D.pdf

pmt · ‎11-18-2015

Michael,

The way I ultimately implemented my RX UART driver is to use a ring buffer using the DMA modulo buffer feature. Incoming characters are fed into the ring buffer completely in hardware. Event notification comes by firing an interrupt on every DMA transfer. The ISR posts a binary semaphore which wakes up the user level RX routine to take characters out of the buffer. I suppress extra semaphore posts by using an interlock flag between the ISR and user service routine.

This creates a lot of interrupts, however since the ISR does not actually do anything other than notification even if ISR's are missed during burst transfers no characters are lost.

PMT

noisternig · ‎06-01-2017

Hi,

I know it is a long time ago since you wrote this, but hopefully you can help me a bit:

How did you generate an Interrupt for every received character? By setting minor and major loop of the DMA to 1? How would this work together with the modulo feature?

Wolfgang

pmt · ‎08-23-2017

Just setting the major loop. This gets an interrupt on every character.

// Generate interrupt on major loop (every character in this case)
DmaPtr->TCD[DmaRxChan].CSR = DMA_CSR_INTMAJOR_MASK;

The modulo feature just does the hardware pointer wrapping on the ring buffer (no CPU intervention). As long as you take characters out of the ring buffer faster than the DMA puts them in (before filling up) you are good.

PMT

scottm · ‎10-20-2017

I'm glad I'm not the only one who's run up against this. It's frustrating. I also settled on using INTMAJOR for notifications, but at 2 Mbps there's no way I can afford an IRQ every byte. Instead, I set the major loop count to something convenient and then I have a periodic timer interrupt that checks for an idle condition. This part is really sub-optimal.

I'm close to a solution with the IDLE interrupt, but it relies on CTS/RTS flow control and the problem in my case is that the other device is slow to respond to RTS and imposing a long enough wait to be sure it's safe to clear IDLE and reset the FIFO kills the performance. If only there was a safe, atomic clear for IDLE this would be a piece of cake!

My other big complaint is that there's no separation between the FIFO's DMA trigger threshold and the flow control threshold. If you want hardware flow control and you want DMA on every byte, it's going to raise RTS on every byte. This slows things down, and in the case of the SiLabs WGM110 module, it locks up the module completely.

For now I'm using both the periodic timer and INTMAJOR to check if RTS needs to be set. It's not ideal, but it leaves enough margin that a bit of a delay isn't going to mean lost data.

If I have time, I might try using HALFINT and the scatter/gather option or channel linking to set a fallback that raises RTS if the DMA transfer is ever allowed to complete - i.e., if the system doesn't service the first interrupt fast enough and it'd run out of buffer space, it'll hit the end of the transfer and then start a new transfer that sets RTS and hopefully stops the sender before the FIFO is full.

I'm using a protocol that does let me know how many bytes to expect so my next job is to set up the DMA transfers accordingly, but it still has to use the circular buffer since more packets may be coming that might be missed before the transfer is restarted, and that means I have to be able to change the major loop iteration count accurately on the fly. I'm not sure yet if that's going to be a problem.

Scott

mjbcswitzerland · ‎11-06-2012

Hi

Some additional points on the "wish-list" for UART Rx functionality:

- automatic XON/XOFF operation (almost impossible with DMA Rx without the UART recognising the characters and reacting accordingly - see the MC68302's UARTs which can do this - although it does use a dedicated communication processor...)

- HW inter-character delay recognition on reception - Modus RTU framing is popular (also outside of MODBUS) and requires recognising 1.5 and 3.5 character delays in a reception stream. Again, with Rx DMA, impossible at the moment. See the old ATMEL AT91SAM7X - very nice support for things like this!

- I didn't check whether break condition interrupts are supported (but works on the Coldfires to signal a complete frame reception has terminated and so useful for DMA Rx if the protocol uses it). If it is, great - if not, could be useful.

Regards

Mark

http://www.uTasker.com

pmt · ‎11-06-2012

Mark,

The 10-bit break condition interrupt, along with all standard Framing Error conditions are hopelessly incompatible with DMA because they all have the same non-atomic clearing mechanism requiring and require a data register read to boot.

Extended breaks (11+ bits, i.e. LIN bus breaks), have atomic clearing with no side effects to DMA.

Again, there are references to deleted DMA functionality in the datasheets that would have reconciled IDLE, Break, and Framing Errors with DMA (ILDMAS – Idle DMA request, LBKDDMAS – Break DMA request) and Framing Error DMA requests. These were probably deleted by an engineer who never wrote a device driver.

Pmt

DISCUSSION: Missing critical RX UART DMA Event Functionality in Freescale Parts

DISCUSSION: Missing critical RX UART DMA Event Functionality in Freescale Parts

Kinetis K Series MCUs

Kinetis L Series MCUs

Kinetis W Series MCUs