UART Auto-RTS trouble with MCF5271

ynaught · ‎01-05-2012

I'm having trouble with the UART feature in the MCF5271 which is supposed to de-assert RTS when the transmit buffer goes empty. The trouble is that it works most of the time--not terribly useful in a CPU...

I've had code in place on this hardware for a while using this, but its failure only became apparent when it was used as a 2-wire RS-485 slave device. In this situation, when our slave device sends a reply message but does not de-assert RTS at the end, its transmitter remains ON, effectively killing the entire RS-485 network.

My ports are configured:

UMR1 = 0x13 (RxRTS OFF)

UMR2 = 0x27 (TxRTS ON)

When transmitting a message, I turn on RTS by writing to UOP1, then enable the transmitter by writing 4 to UCR. Feeding bytes to the transmitter is done by an ISR. When the ISR is called, and it has no more bytes to send, it calls a function which disables the transmitter (0x8 > UCR).

As I mentioned before, this works to disable RTS when transmission is complete...MOST OF THE TIME. As a Master on the network, failures were probably dismissed (it would appear as if the slave "did not reply" (when in fact it tried, but was swamped on the network), and would recover when the master sent its next message.

The bug appears to happen more frequently when Ethernet traffic is being processed. This does cause extra interrupts (only on RxE, to manage buffer descriptors). BUT, it does happen with no Ethernet communications directly to the product. With Ethernet (cable) disconnected it seldom happens, but still it does happen.

My current workaround is hideous: foreground code examines both serial ports by calling:

void CheckRTS( int CardinalPort) {
   byte Mask;
   static int Counter[4];
   struct PCBT *PCB;
   if( CardinalPort == 1 ) {
       PCB = (struct PCBT *) &PORT1;
       Mask = 0x04;
   } else if (CardinalPort == 2 ) {
       PCB = (struct PCBT *) &PORT2;
       Mask = 0x40;
   } else {
       return;
   }
   if( PCB->RXEPEND ) { // Set TRUE on call to TransmitterDiable
       if( !PCB->TXENAB ) { // Transmitter is not enabled
           if( (__IPSBAR[ 0x100029] & Mask) == 0) { // RTS is on (reading GPIO_PPDSDR_UARTL)
               Counter[CardinalPort]++;
                    if( Counter[CardinalPort] == 3 )
                       iprintf( "\t***Caught a bug (%d)!\n", CardinalPort);
                    else if( Counter[CardinalPort] > 5000 ) {
                       iprintf( "\t*** Turning RTS OFF on port %d\n", CardinalPort);
                       MCF_UART_UOP0(CardinalPort-1) = 1;
                   }
                return;
           }
       }
   }
   Counter[CardinalPort] = 0;
}

This version of the "patch" leaves the broken situation in place for about 3.6 seconds, so I can observe and 'scope it.

I've searched the forums for mention of others running into this problem, but found nothing. Also, we've looked through the hardware errata for this chip.

Any help you can provide would be appreciated.

TomE · ‎01-06-2012

Searching this forum for "TXRTS" finds this apparently matching problem from 2005:

https://community.freescale.com/message/12175#12175

From that it looks like you have to drop the last character in, wait for it to go EMPTY, and then DISABLE the transmitter before it empties.

I'd say you're probably getting interrupts between dropping that last byte in and seeing it go empty (getting the interrupt), and so you're actually disabling interrupts AFTER the last bit has gone out.

I'd suggest you program a port pin to track your enabling and disabling of the UART, and then hang a CRO on that pin, the transmit data and the RTS pin. I'm guessing that on the occasions where RTS doesn't drop you'll find you've disabled the transmitter too late. This might be a bit tricky to trigger on though.

I'd suggest you try (if you're not doing so already) dropping the last character in and then waiting in a hard loop (with interrupts disabled) until it goes TXRDY, and THEN you send it the TRANSMITTER_DISABLE command.

If there's a reason you can't do that then it might be possible to program the UART with the highest priority interrupt so it can interrupt your Ethernet interrupt service routine. Then you'll have to check all your code for the longest "hard interrupt disable" time and make sure it is shorter than the character transmit time.

Tom

View solution in original post

TomE · ‎01-06-2012

Searching this forum for "TXRTS" finds this apparently matching problem from 2005:

https://community.freescale.com/message/12175#12175

From that it looks like you have to drop the last character in, wait for it to go EMPTY, and then DISABLE the transmitter before it empties.

I'd say you're probably getting interrupts between dropping that last byte in and seeing it go empty (getting the interrupt), and so you're actually disabling interrupts AFTER the last bit has gone out.

I'd suggest you program a port pin to track your enabling and disabling of the UART, and then hang a CRO on that pin, the transmit data and the RTS pin. I'm guessing that on the occasions where RTS doesn't drop you'll find you've disabled the transmitter too late. This might be a bit tricky to trigger on though.

I'd suggest you try (if you're not doing so already) dropping the last character in and then waiting in a hard loop (with interrupts disabled) until it goes TXRDY, and THEN you send it the TRANSMITTER_DISABLE command.

If there's a reason you can't do that then it might be possible to program the UART with the highest priority interrupt so it can interrupt your Ethernet interrupt service routine. Then you'll have to check all your code for the longest "hard interrupt disable" time and make sure it is shorter than the character transmit time.

Tom

ynaught · ‎01-06-2012

Tom, I'd buy you a beer...

I do not yet have the problem *fixed*, but I have confirmed that what you suggested is actually happening.

I had no "extra" GPIO with which to signal out to an OScope, but I did modify my ISR that disables the transmitter so that it examines the TxEMP and TXRDY bits before disabling the transmitter, and if either is set increment a counter that is visible to the foreground app.

Sure enough, every time my foreground code recognizes the problem of RTS being left on, the counter for TxEMP also incremented!

I had already (blindly) tried setting the serial ports' interrupts to the highest of all the enabled interrupts, but that had not fixed the problem. I've now put them back, where they're the highest priority interrupts, for what that's worth.

Of course, I can query the TxEMP bit before disabling the transmitter, but there's no guarantee that it won't change between my testing it and disabling the transmitter...

So I'm still scratching my head about how to FIX it (permanently), but am very relieved to know what is causing the problem.

Thanks again. If ever you're in Joplin, MO feel free to claim that beer. With good timing, it could even be a homebrew!

ynaught · ‎01-06-2012

At this point, I have decided that the only solution for this problem is for me to call the hardware "broken" and manage RTS with software. Bleah.

I have verified that the serial ports have the highest priority--higher even than the PITR, which I was not comfortable with--and the issue still crops up.

In my ISR, I could sample the USR register (in order to later test TxEMP and TxRDY) and disable the transmitter in the immediately subsequent instruction, but that still allows a very small amount of time for the TxEMP event to occur between the two.

In my case, I am often using the RS-485 to communicate Modbus RTU protocol, where disabling the transmitter immediately after the last stop bit is less than ideal. Since Modbus messages are terminated by 3.5 chararcter times of idle on the wire, it's best to leave the transmitter enabled for this 3.5 character times anyway...

All of which leads me to the conclusion that I must give up on the hardware RTU control and manage it in software.

TomE · ‎01-11-2012

> I have verified that the serial ports have the highest priority--higher even than the

> PITR, which I was not comfortable with--and the issue still crops up.

That tells me that the interrupts "aren't working properly" in your system. If it was working properly then the highest priority interrupt should get a guaranteed minimum latency, and should be able to guarantee servicing the TX_BSY interrupt before it is too late - before the last byte "has left the building".

Different parts of your hardware requires different "maximum hardware latencies". Your design, coding and testing has to guarantee this. Disobeying these "guarantees" is a bug which should be found and fixed.

So how long is "too long"?

http://en.wikipedia.org/wiki/Modbus#Communication_and_devices

"A Modbus RTU message must be transmitted continuously without inter-character hesitations. Modbus messages are framed (separated) by idle (silent) periods."

As you said these are 3.5 character times.

http://www.rtaautomation.com/modbusrtu/

"No specific baud rate is specified by the MODBUS: typical baud rates are 9600 or 19200."

Assuming you're running at 19200, a byte is 10 bits at that rate, or 521us.

That means your highest priority interrupts are being delayed for at least 500us on a CPU that normally runs at 150MHz. Or 75,000 instructions. Even if your CPU is running slower and your Modbus running faster it is still a very long time.

If the latency can't guarantee to be less than one byte time then your code is likely to have problems obeying the "framed idle" requirements of Modbus too.

The most likely thing causing this is another interrupt service routine that is disabling ALL interrupts. This might be accidental or deliberate. It might have been written by someone coming from a CPU where it is necessary to do this, but the ColdFire chip has automatically handled multilevel interrupts. The problem might be a delay in the mainline code that is disabling all interrupts and waiting for a long time.

It is very easy to find this sort of delay. In your UART interrupt routine where you're currently finding that you've been delayed "too long", put a breakpoint. When the code stops, examine the stack (ask the debugger to display the call stack) to see what code was interrupted. It will most likely be interrupted from just after a line of code that enables interrupt in the function that caused the problem. So fix that one and test again to find the next one.

The proper way to test this is to have a high priority interrupt running at a high rate (10us to 100us). When it runs it checks the timer to see how long it has been delayed. If this is longer than the system "required latency" it stops and lets the programmer find out where (as per previous paragraph) or logs the interrupted PC in an error log for later perusal.

Tom

ynaught · ‎07-25-2013

Tom,

The reason I have such large apparent latency sometimes, is that several parts of our code disable interrupts, to ensure atomic operations.

I do realize that there are other ways to accomplish the same task, but I don't want to rewrite (our TCP/IP stack, for instance).

Thanks for your input!

--Adam

TomE · ‎07-25-2013

> The reason I have such large apparent latency sometimes, is that several parts of our code disable interrupts, to ensure atomic operations.

But why is the "atomic operation" taking so darn long? The "atomic operation" should be "grab the lock flag" or "insert or delete one thing on a list", but it looks like your code might be searching a list or array. It shouldn't be too hard to make this better with smarter code and a better design.

> but I don't want to rewrite (our TCP/IP stack, for instance).

Why does a TCP stack lock out interrupts for so long? What is it doing (frankly, what is it doing wrong)?

There's another way. IPL7. The non-maskable interrupt. I'd normally steer well clear of ever doing this, but have just found one of our other products is using IPL7 all over the place without any obvious problems.

If you type "IPL7" into the "Search" you'll find previous posts on this. 12 of them by me I'm surprised to find!

So to support MODBUS I'd suggest writing the Serial port interrupt service routine to interrupt at IPL7. You can do this "all the time" or possibly just switch the level to IPL7 when you have a critical timing requirement.

If the rest of your code can handle interrupts being locked out for 3.5 characters, or the other times, then just spin in a hard loop in that ISR to obey the timing requirements. If the rest of your code can't handle that, then have the critical serial port interrupt drop the serial port interrupt priority back to normal and start a PIT timer to interrupt at IPL7 when you need to do the next thing in the protocol.

You should be able to design a simple state machine where the state transitions are made by normal, IPL7 and PIT interrupts.

The problem with IPL7 is it is possible to get the ISRs interrupting each other and you can't stop this unless you're REALLY careful with clearing the interrupt requests.

Scrub the above. There's a far better way that just needs a small fix in your existing code.

Let's assume the "ensure atomic operations" are to prevent the Ethernet interrupt from getting in the middle of some sensitive mainline code. You don't need to disable ALL interrupts, just the Ethernet one.

So the "atomic lock" should simply disable the Ethernet interrupt in the interrupt controller, and put it back afterwards. Ditto with any other interrupts.

A cleaner way to do the same thing is instead of totally disabling ALL interrupts, just set the CPU IPL to the same level as the interrupt level of the device the code needs to "lock" against. So if the Ethernet is interrupting at Level 4, then replace the TCP stack "disable interrupt" calls with code to set the CPU IPL to 4 and then restore afterwards. Run your UARTs at IPL5 or IPL6 and they won't be delayed.

This may require fixing all of your "disable interrupt" code, which I assume has been written by people familiar with simple and stupid CPUs that only have one level (like 8 bitters and all CPUs derived from them, like X86 and ARM and so on :-). You should find the function "uint32_t asm_set_ipl(uint32_t)" in mcf5xxx.h, and you should use it to change the IPL in all of your code, like this:

uint32_t nOldIpl = asm_set_ipl(7);

/* Interrupts now locked out completely, when you really have to */

... whatever ...

asm_set_ipl(nOldIpl);

#define IPL_LEVEL_ETHERNET (4)

/* Use the above #define when programming the Ethernet interrupt level in the ICR */

...

nOldIpl = asm_set_ipl(IPL_LEVEL_ETHERNET);

/* Ethernet interrupts locked out, but higher ones allowed */

... whatever ...

asm_set_ipl(nOldIpl);

Tom

UART Auto-RTS trouble with MCF5271

UART Auto-RTS trouble with MCF5271

General