Hardware flow (RTS/CTS) on AUARTs on i.MX28 not working.

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Hardware flow (RTS/CTS) on AUARTs on i.MX28 not working.

7,823 Views
HectorPalacios
Senior Contributor I

Hello,

I'm trying to communicate from an MX28 AUART port (as transmitter) to another target (as receiver) at baudrates of 115200 or higher (230400, 576000) with hardware flow control, but it doesn't seem to work. It works for some time but at some point, after receiving a stop signal from the receiver (RTS deasserted on receiver side) the MX28 stops sending forever, no matter if the receiver tells him to resume sending (RTS asserted back on receiver side).

I have backported some patches from mainline (like 00592021010ad86d3b26bac7034034f6af145a2c) but it still fails.

Has anybody verified RTS/CTS hardware flow control on the AUART against a different target (this could be another MX28 too, but not another port on the same target). Kernel version 2.6.35. Driver mxs-auart.c.

Thanks

--

Héctor

Labels (1)
Tags (3)
0 Kudos
24 Replies

3,108 Views
sethbollinger
Contributor II

I'm adding this reply in case it helps anyone in the future.

Sometimes during a ~6 us CTS pulse, the uart hardware gets confused and you are no longer able to clear the CTSMIS bit (and will cease receiving CTS interrupts).  We found that the bit can be cleared if you do the following:  __raw_write(0, s->port.membase + HW_UARTAPP_INTR_CLR).  I’m aware that this make no sense.  :smileyhappy:  However, if you do this, then CTS will function normally until the next 6 us pulse (which hopefully shouldn’t happen often).

You cannot remove uart_handle_cts_change() as linux depends on this to wake the port if you begin the transfer before CTS is high (open the port and start sending before the other side has the port open to test).


Freescale's response is to clear and set CTSMIEN.  In our testing, this seemed to work as well.

0 Kudos

3,116 Views
YS
Contributor IV

Currently, my "best practice" is to apply minimum modification at mxs_aurat_irq_handel() to ignore negative CTS detection / transmit disable.

#if 0

                uart_handle_cts_change(&s->port, stat & BM_UARTAPP_STAT_CTS);

#else

/* Instead of standard CTS handler, we just check if port is blocked

and CTS is ready */

                if (stat & BM_UARTAPP_STAT_CTS) {

                        struct uart_port *uport = &s->port;

                        struct tty_port *port = &uport->state->port;

                        struct tty_struct *tty = port->tty;

                        if (tty->hw_stopped)  {

                                tty->hw_stopped = 0;

                                uport->ops->start_tx(uport);

                                uart_write_wakeup(uport);

                        }

                }

#endif

The modified code withstood overnight testing at two units, continuously transmitting at 115200bps with continuous pulsive CTS input (2usec negative pulse with 100usec interval).


0 Kudos

3,116 Views
HectorPalacios
Senior Contributor I

Hi Yuji-san,

Freescale Tech Support elaborated upon our patches and provided me the following one. I don't know if it resolves the 1-byte, 16-bytes transmission issues you talked about.

Regards

0 Kudos

3,116 Views
YS
Contributor IV

Thank you, Hector-san. I will try later.

I'm facing with more sinister issue... when I keep transmission from iMX28 AUART with very short CTS-negative pulse input overnight, it freeze up. I'm not sure if this is related to this particular issue or completely different. I'm trying to figure out what makes this freeze up now.

0 Kudos

3,116 Views
HectorPalacios
Senior Contributor I

Hi Yuji-san,

The previous patch was disregarding software flow control. Please see corrected patch.

0 Kudos

3,116 Views
YS
Contributor IV

Hector-san,

I tested the Freescale patch, but it looks not very good.

It will stop transmission in less than 10 seconds when configured with NO hardware flow control (stty -F /dev/ttySP0 -crtscts). It is connected to another i.MX28 AUART. I'm pretty sure that calling stop_tx() in mxs_auart_irq_handle() causes this (why do they retain this code? there is no need to handle CTSMIS interrupt when hardware flow control is off!)

With flow control ON, it runs longer but did not withstand overnight test. The transmission stopped, even though it was not catastrophic dead-lock as I experienced with my version of mxs-auart.c.

I'm increasingly doubtful that we should not use CTSMIS interrupt at all. This function is very faulty.

0 Kudos

3,113 Views
HectorPalacios
Senior Contributor I

Hi Yuji-san.

Mmm, I think you are right: software flow control has problems too. I tested with

'stty -F /dev/ttySP4 115200 raw ixon -crtscts' and the transmission stops sooner or later.

I think however that the problem is not the patch or the interrupt handler because

with software flow control the CTS line is not used at all. Anyway, I tried it also

disabling the interrupt and commenting the start/stop lines in the ISR and it still

happens.

I think this is a different problem. Maybe a misinterpreted data as STOP signal?

0 Kudos

3,108 Views
YS
Contributor IV

I'm testing rare-case complete system freeze up.

When I run continuous transmission test with pulsive CTS input (about 2usec inactive pulse in every 100usec), eventually whole system freeze up after several hours (usually takes overnight testing). No kernel panic message, no console response, no network response, nothing.

In fact this freeze up happend with very original mxs-auart.c with hardware flow control turned off. When I run same test over same hardware/software, except noCTS pulsive input, it withstood 48hours weekend run.

The freeze up happend two SX-580 units, so I'm sure it is unlikely caused by a faulty hardware. I haven't tested with i.MX28EVK. I know I'm testing an extreme case, but because our product is intended for medical/industory market (I believe most of iMX28 users too) every possibilty of freeze up must be eliminated.

I tried to separate start_tx() and uart_write_wakeup() call from the interrupt handler, making them as tasklet, but still caused freeze up. I'm running out of idea other than not to use CTS interrupt at all.

Your version looks promising. I haven't observed your version of mxs-uart.c freeze up if I configured it hardware flow control on. However your interrupt handler stall calls uart_handle_cts_change() and indeed it caused freeze up when I tested with no flow control (stty -F /dev/ttySP0 -crtscts).

I also notice strange behavior of iMX28 UART... I thought detecting and clearing CTSMIS in mxs_auart_irq_handle() would be unnecessary because CTSMIE is already disabled, but when I eliminated that part of code the driver stop working correctly. CTSMIS interrupt happens regardless to the CTSMIE bit, and it must be cleard by software or the flow control does not seems working right. Weird.


0 Kudos

3,108 Views
HectorPalacios
Senior Contributor I

I'm sorry, I messed up with branches and sent you a version of mxs-auart.c with the first patch I sent to this thread, not with the last one. Please find it reattached.

With this one I don't have any problems with hardware flow control.

With respect to your comment:

    "I was not impressed - the second patch has many issues.

    Such as, stty -F /dev/ttySP0 crtscts, turn CTS off, start transmission then

turn CTS on. Transmission does not start, because they do not care hw_stopped flag.


Do you think this is a valid test? It looks like you are turning hardware flow control on but then you are manually toggling CTS. Couldn't this confuse the UART logic?


0 Kudos

3,108 Views
YS
Contributor IV

Hector-san,


>Please find it reattached.

Which patch do you exactly refer as "correct" one ?

"mxs-auart.patch.zip"

"0001-mxs-auart-skip-CTS-handling-for-hardware-flow-contro.patch.zip" 

"0001-mxs-auart-skip-CTS-handling-for-hardware-flow-contro.patch.zip" 

>Do you think this is a valid test? It looks like you are turning hardware flow control on but then you are manually toggling CTS. Couldn't this confuse the UART logic?


I believe so. RTS/CTS(or DTR/DSR) are completely asynchronous to RXD/TXD. The receiver can de-activate RTS whenever he feels he don't like to receive, then can re-activate whenever he thinks he's ready to receive. There is no guarantee RTS is active when an application open the /dev/ttySP* port. (Although I admit it is most likely case, but not guaranteed by RS-232C protocol).

Also, there is much rare possibility that CTS input may have accidental impulse caused by ESD, EMI or faulty contact. It is unlikely irregular case, but high-reliability medical/industry equipment must overcome such irregular signal. That is why I'm testing iMX28 with 2usec CTS impulse in 100usec interval - it is very unlikely happen in real-life usecase, but I know accidental impulse does happen (I've been working for embedded computing more than 20 years, since the day of 8251, 6850 and Z-80SIO...) and industry equipment never excused to be freeze up by such irregular input.

0 Kudos

3,116 Views
HectorPalacios
Senior Contributor I

Yuji-san,

Please clarify: did you test the first Freescale patch I posted or the second one? The

first one didn't take into account software flow control, the second one does

(reattached here).

My tests ran ok with high baudrates and hardware/software/none flow control.

The ISR does:

- For HW flow control, it clears the status flag and ACKs the irq

- For SW flow control:

- if CTS is high and transmission stopped, it resumes.

- if CTS is low and transmission not stopped, it stops

Notice that for no-flow control, CTS is never triggered.

I'm attaching my whole mxs-auart.c file, just in case you're missing something else

from other patches.

Best regards

0 Kudos

3,116 Views
YS
Contributor IV

I tested both, but more intensive test with the second one.

I was not impressed - the second patch has many issues.

Such as, stty -F /dev/ttySP0 crtscts, turn CTS off, start transmission then

turn CTS on. Transmission does not start, because they do not care hw_stopped flag.

0 Kudos

3,116 Views
YS
Contributor IV

Hector-san,

I'm repeating this experiment. At this point, I think it is true that either iMX28 UART logic or iMX28 Linux UART driver (interrupt handler) has fraw, which causing TX stoppage with CTS flow control.

I setup two iMX28 boards (silex SX-580 with its serial service process killed), connecting with cross cable, configure both /dev/ttySP0 port with 115200 raw crtscts. Launch "cat /dev/ttySP0" at one side (receiver), send more than 16byte of data (such as "echo "0123456789ABCDEF0123456789ABCDEF" > /dev/ttySP0) at the other side (transmitter). Almost instantly the transmission stops, even though the CTS level is assert.

After the stoppage happend, the transmission usually resumes when I dnegate RTS (CTS input) then re-assert it. You can do this by simplly ^C to stop reciever process (cat /dev/ttySP0) then launch it again.

Logic analyzer shows RTS negate pulse from iMX28 UART is very short ... 20 to 40usec width. I guess this short RTS pulse may cause wrong effect to iMX28 UART logic or Linux driver / interrupt handler. My previous experiment is done with software-controlled RTS pulse, which pulse width is 200-400usec. I think that's why the issue "disappeared" when I rebooted the PC...I think I unintentionally turn hardware flow control ON yesterday. (I reported this morning, deleted because it was my mistake).

The stoppage happens even in lower bitrate, though the frequency is not so often compare to 115200bps (which stops almost immediately after 16byte). I could even successfully make TX stop at 1200bps! The RTS pulse width is still very short regardless to the bitrate.

0 Kudos

3,116 Views
HectorPalacios
Senior Contributor I

Hello Yuij-san,

Usually, testing hardware flow control against a PC serial port is a bad idea. PC's

are usually fast enough to process messages so that they rarely need to assert their

RTS line to tell the transmitter to stop sending. Besides, they rarely support

baudrates beyond 115200. If you can, please try to use two embedded devices rather

than a PC.

For my tests I have used one MX28 as transmitter and one MX53 as receiver, and also

two different i.MX28 targets (one as receiver and the other as transmitter).

I have sometimes seen it work without failure, and this usually happens when the

receiver's RTS pulse is wide. If I stop the test and launch it again the error usually

triggers quickly, though it sends far more than the 16bytes you talk about.

I'm attaching a pair of scope shots where you can see the width of the CTS (on MX28

transmitter side) and the TX data line.

The first one (576000baud) shows behaviour during normal transmission. The CTS pulse

is between 4 and 8 usec. This short duration doesn't seem to stop or have any

influence over the TX data flow, which looks continuous. This may be OK, because when

the transmitter receives the CTS it may still be flushing data out of the FIFO and by

the time it ends, the CTS has already been cleared and it can transmit without stopping.

The second one (230400baud) shows behaviour after the error happens. The pulse is

again around 8 usec or less. After receiving the CTS the MX28 flushes the data on the

buffer and then it stops sending.

I'm not sure of what in the receiver side makes that its RTS pulse is fast or slow.

Maybe a high load of interrupts would make the receiver slower in processing the

serial data. I would recommend to perform the test in systems that are mostly idle,

not running other processes.

I have sometimes been able to recover the transmission by generating a new CTS pulse,

but most times it never recovers. You need to close the port and reopen it.

--

Héctor Palacios

0 Kudos

3,116 Views
YS
Contributor IV

Hector-san,

I think I got a clue. Take look at CTRL2 register at the stoppage happened.

My data shows 0x0022ca01, your data shows 0x0022c201.

At normal state, my iMX28 shows 0x0022cb01.

In both cases of stoppage, bit8 of CTRL2 is not set.

The bit8 on CTRL2 is "TXE", Transmission Enable.

It seems like the chip stopped transmission because TXE is set to 0.

I looked at Linux max-aurt.c driver source code.

Function mxs_auart_irq_handle() detects CTS interrupt (by checking CTSMIS bit in INTR register),

then call uart_handle_cts_change() function (actually it is in-ilne macro defined in inlcude/linux/serial_core.h)

with second  argument as "current CTS status", read from STAT register,

After it called uart_handle_cts_change, it clears CTSMIS interrupt.

u32 stat = __raw_read(s->port.membase + HW_UARTAPP_STAT);

istatus = istat = __raw_read(s->port.membase + HW_UARTAPP_INTR);

if (istat & BM_UARTAPP_INTR_CTSMIS) {

        uart_handle_cts_change(&s->port, stat & BM_UARTAPP_STAT_CTS);

        __raw_writel(BM_UARTAPP_INTR_CTSMIS,

                s->port.membase + HW_UARTAPP_INTR_CLR);

In uart_handle_cts_change(), it will further call uport->ops->start_tx() or uport->ops->stop_tx() based on the

argument status, which should reflect "current" CTS status.

I guess the short CTS pulse screwed up this algorithm. Two IRQs must happen but apparently only one (rising edge of CTS - disabling TXE) happend, keep UART chip TX disabled forever. Or perhaps latency when IRQ is generated and CTS state is picked up in the IRQ handler causing the issue?

3,116 Views
HectorPalacios
Senior Contributor I

Yuji-san:

Firstly, let me quote your previous message as it didn't properly post into the thread on the forum:

I think I found the cause and cure.

When CTS is deactivated(rising), CTSMIS interrupt is caused.

IRQ handler is called, CPU will start calling mxs_auart_irq_handle().

If CTS pulse is very short, CTS will be activated (falling) before CPU exits from IRQ handler.

When the CPU clear the "current" interrupt, it will also erase second (falling) IRQ,

resulting TXE is set to 0 even though CTS level is active (low).

I swapped sequence of CTSMIS INTR clear and call of uart_handle_cts_change().

if (istat & BM_UARTAPP_INTR_CTSMIS) {

       /* uart_handle_cts_change(&s->port, stat & BM_UARTAPP_STAT_CTS); */

        __raw_writel(BM_UARTAPP_INTR_CTSMIS,

                        s->port.membase + HW_UARTAPP_INTR_CLR);

        istat &= ~BM_UARTAPP_INTR_CTSMIS;

        uart_handle_cts_change(&s->port, stat & BM_UARTAPP_STAT_CTS);

}

After this modification, I don't see stoppage anymore.

I'm not sure if this is complete solution, however...I guess there is slight chance of "pinhole" could be

exists, if RTS pulse is just right width it falls down after the CPU picks up STAT register but before

clear the CTSMIS interrupt. The chance will be very narrow, maybe way less than 1usec...

but possibility is still there, I think.

I think we should not manipulate TXE bit from interrupt handler at all. It is automatically handled by chip

hardware logic. I guess this code in uart_handle_cts_change() is provided for damn UART chip which

does not have hardware logic flow control at all, but unnecessary if the chip has hardware flow logic -

even cause undesirable side effect like this.

I think you hit the point here: the function uart_handle_cts_change() seems to be designed to do software flow control as it is manually stopping or starting the TX transmission. On a pure hardware flow control, the driver should not do this. So in my opinion, the driver is doing software flow control stuff even when configured to do hardware flow control.

When hardware flow control is enabled, it is the chip who should pause/resume the transmission automatically, without the need of any software action. My first approach was to remove the call to uart_handle_cts_change() in the interrupt handler, and have the interrupt handler simply acknowledge the IRQ. This worked fine, and then I thought: "if I'm not doing anything at all in the ISR, why the heck do I need to have the CTS interrupt enabled at all?" So I left the ISR untouched and disabled instead the CTSMIEN (CTS interrupt), but unfortunately this did not work. The reason is that the INTR register reflects the CTS interrupt status even when CTS interrupt is disabled (this looks like an error of the chip).

So I did both things: disable the CTS interrupt and discard the CTS interrupt status if the interrupt is disabled.

In summary, it looks like the chip correctly handles hardware flow control, but the driver is mixing the handling of hardware and software flow control by turning TXE on and off, which causes trouble when the CTS pulse is short enough.

Find attached my suggested patch which enables the CTS interrupt only when hardware flow control is not enabled.

Please notice that I have not extensively tested it.

0 Kudos

3,116 Views
YS
Contributor IV

Hector-san,

The reason I deleted previous post was, UART driver still stops even though I change the interrupt handler order. I tried various tricks. I tried to eliminate TXE disable from ISR (it still stops), but I didn't thought to eliminate uart_handle_cts_change at all. My concern was that, "tty->hw_stopped" flag belongs to tty domain rather than port driver domain. I saw hw_stopped is set in several places in serial_core.c. If we completely eliminate uart_handle_cts_change, it may cause another case of deadlock because tty->hw_stopped flag is not cleared on CTS level change.

(Should we implement our "own" version of uart_handle_cts in ISR, just to detect CTS active and enable (but never disable) transmission again?)

You are right, implement software flow control over hardware-flow-controlled UART chip is complete nonsense. However those damm hw_stopped flag is inside tty layer, which is part of linux serial driver framework. We should not alter OS framework due to the chip limitation, so somehow we should find "evasive" way to deal with it...

0 Kudos

3,116 Views
HectorPalacios
Senior Contributor I

Hi Yuji-san,

I understand your concern but if you look at the i.MX53 driver (mxc_uart.c):

1) It only calls uart_handle_cts_change() function for software flow control.

2) The driver manually sets tty->hw_stopped to 0 on set_termios hook if hardware flow is enabled. So I added this to my previous patch.

I tested communications using this patch with different combinations of databits, parity and flow control (none, software, and hardware) and it seems to be working without any problem so far.

0 Kudos

3,116 Views
YS
Contributor IV

Hector-san,

I tried your patch. I confirmed it can withstand extreme condition (300KHz continuous ON/OFF pulse in CTS) which I could not make my version of mxs-auart.c work reliably.

However, I found an issue. Turn CTS off, then send a single byte of data (echo -n A > /dev/ttySP0). It should be blocked until CTS turns ON, or give up after reasonably long timeout (30sec). The original Freescale mxs-uart.c works in that way, so as PC-Linux 16550 UART driver. However when I tried same operation on your version of patched driver, single byte data does not blocked at all. When I tried longer data (echo -n 0123456789ABCDEF > /dev/ttySP0), it blocked by CTS but only for 3 seconds. If I do not activate CTS within 3 seconds, whole 16byte data is gone forever.

I think just to set hw_stopped=0 might not enough to "fool" the tty layer.

Best Regards,

0 Kudos

3,116 Views
YS
Contributor IV

Okay... I successfully reproduced (supposed to be) same issue as yours.

I wrote a short program running on PC-Linux, which intentionally turn OFF and ON RTS signal between every receive event. The core of the program is;

    while (1) {

        len = read(fd, buff, 8);

        total_len += len;

        printf("%d\n", total_len);

        status &= ~(TIOCM_RTS);

        ret = ioctl(fd, TIOCMSET, &status);

        usleep(1 * 1000); /* 1msec */

        status |= TIOCM_RTS;

        ret = ioctl(fd, TIOCMSET, &status);

    }

With this program on receiver side at 115200bps, sure iMX28 stops transmission in less than 1Kbyte.

I peeked UART and ICOLL register status when the stoppage happend.

ICOLL raw0=80000000 raw1=00000001 raw2=00000000 raw3=00000000

ICOLL 112=00000004

UART CTRL0=00030000 CTRL1=00000000 CTRL2=0022ca01

UART LINECTRL0=00680a70 LINECTRL1=00000000

UART INTR=00720002 STAT=f1f00000 DEBUG=00682800

INTR high (0x0072) shows RTIEN, TXIEN, RXIE, CTSMIEN are set.

INTR low (0x0002) shows CTSMIS is set, which suggests it should generate CTS interrupt.

However the correspoinding ICOLL register (ICOLL raw3, bit 16) shows 0,

which shows no IRQ from UART0 is activated.

STAT high (0xf1f0) stuats shows Present, Highspeed, BUSY, CTS, RXFE are set.

BUSY bit (bit 29) should NOT be all-time high, but when this issue happens it will never go down.

Does it mean the UART logic is hung up?

I suspected it will be IRQ handler messes up IRQ mask, but if the UART logic is locked up, this is serious issue...

I attached "debuart" Linux driver source code to peek registers.

You have to modify KERNEL_SRC path in Makefile according to your build environment.

Once you get debuart.ko,

# insmod debuart.ko

debuart:register major id=253

# mknod /dev/debuart c 253 0 (* only once needed)

# cat /dev/debuart

Will give you register dump of UART and ICOLL registers.

0 Kudos