Overrun detection needed for reliable SCI reception?

implicit · ‎03-21-2013

Basically I'm having trouble with a overruns causing a simple SCI receiver getting stuck in a state where no further interrupts can occur.

It seems if overrun occurring between polling SCIS1 and reading SCID leaves the peripheral with RDRF cleared but no longer receiving data, until the buffer is flushed by polling SCID.

__interrupt VectorNumber_Vscirx void ReceiveInterrupt(void) {     char data;     (void) SCIS1;     // The problematic overflow occurs here     data = SCID; }

For the record this happens on both 9S08QE8 and 9S08SH8 but not in the full-chip simulator.

This has rather caught me by surprise as I've previously always just ignored UART errors and this leaves me with a lot of potentially vulnerable old code across various Freescale devices to fix up, possibly even on MCUs from other manufacturers.

So what is the intended way of reliably reading data?

The following is a simple repro case using the loopback mode, which always gets stuck after the overrun.

#include "derivative.h"  // Transmit a character and busy-wait for it to get through static void transmit(char data) {     SCID = data;     data = 0;     do {         __RESET_WATCHDOG();     } while(++data); }  void main(void) {     char overrun = 10;      // Initialize the UART in loopback mode     SCIBD = 1;     SCIC1_LOOPS = 1;     SCIC2 = SCIC2_RE_MASK | SCIC2_TE_MASK;     (void) SCIS1;     (void) SCID;      // Keep transmitting character to ourself     for(;;) {         transmit(overrun);          // Received anything?         if(SCIS1_RDRF) {             // Force an intermittent buffer overflow. Imagine the next character             // arriving in the middle of a delayed interrupt             if(!--overrun)                 transmit(overrun);              // Actually fetch the character             (void) SCID;         }     } }

bigmac · ‎03-22-2013

Hello Johan,

Yes, it would be possible for an overrun condition to occasionally occur after reading SCIS1 register, but prior to reading the SCID register, due to the arrival of the next character. I would expect that this should not occur very frequently with only a few cycles between the two processes (I assume this is actually the case). Here, I would assume that the RDRF flag would be cleared, but the OR flag would remain set.

Possible solution 1:

Immediately after reading SCID, read SCIS1 register again. If any error flag is set, do another dummy read of SCID.

Possible solution 2:

Enable overrun interrupt for SCI error interrupt. The minimum ISR code should read SCIS1 and then read SCID.

Of course, this ignores the reasons why the overruns are occurring in the first instance.

Regards,

Mac

View solution in original post

Lundin · ‎03-22-2013

Please note that (void)SCIS1; doesn't necessarily produce a read of the register, even if SCIS1 has been declared as volatile. Whether a compiler is forced to generate such code or not has been debated, no matter what the standard says there are plenty of compilers that will not generate any code when encountering that line. I would disassemble to code to ensure that an actual read takes place.

A better, portable way is to do:

volatile uint8_t scis1_dummy;

dummy = SCIS1;

implicit · ‎03-22-2013

Thanks for the heads-up.

I've double-checked the compiler output and CodeWarrior does indeed generate loads for these statements. Unfortunately storing the result to a volatile dummy variable generates an additional write of the result to the stack, increasing the window of opportunity for this particular issue by a couple of cycles. I don't suppose there's a more efficient way of reliably forcing the desired behaviour?

To be honest I had rather thought this was guaranteed by the standard and universally supported but after my adventures with Microchip's MCC18 I should have learned never to take anything for granted.

bigmac · ‎03-24-2013

Hello Johan,

Johan Forslöf wrote:

I've double-checked the compiler output and CodeWarrior does indeed generate loads for these statements. Unfortunately storing the result to a volatile dummy variable generates an additional write of the result to the stack, increasing the window of opportunity for this particular issue by a couple of cycles. I don't suppose there's a more efficient way of reliably forcing the desired behaviour?

I have never had a problem with the CW compiler providing a read for a volatile register. However, an alternative method would be to provide macros that use HLI assembler to generate the read process, without the overhead of creating, and writing to a variable.

#define SCIS1_read __asm lda SCIS1

#define SCID_read __asm lda SCID

Regards,

Mac

bigmac · ‎03-22-2013

Hello Johan,

It is likely that the overrun error has already occurred by the time that the ISR code executes. An overrun occurs when a new character has been received by the SCI, but the previous character within the receive buffer has not yet been read, with the new data being lost.

The RDRF flag does not actually clear until both the SCIS1 register has been read, followed by the SCID register.

One possible cause of the overrun is that the commencement of the SCI receive ISR is delayed by the execution of another ISR, possibly one of the TPM interrupts, which have higher priority in a pending interrupt situation. There may be a number of reasons for this -

The SCI baud rate is too high for the bus clock frequency in use.
The execution cycles of the SCI receive ISR is too long. Any lengthy manipulation of the received data should be done from outside the ISR.
The execution cycles of one or more of the other enabled interrupts is too long.

To ensure that overrun does not occur, 10 times the baud period needs to be greater than the execution cycles of the SCI receive ISR itself, plus the worst case execution cycles of the longest "other" ISR. Assuming this is the problem, it demonstrates the need to always keep all ISR code as short as possible.

The state of the OR flag, as well as the RDRF flag, may be tested within the ISR. Whenever an overrun error is detected, you will probably need to flag that an error response needs to be returned, so the the original data can be re-sent by the remote end.

I do not know the reason for the overrun not appearing during FCS, assuming bus frequency and baud rate is the same in both instances.

Regards,

Mac

implicit · ‎03-22-2013

Hello Mac and thank you for taking a look at my ramblings however I think you misunderstood my problem. An occasional overrun during communication is quite tolerable, we use checksumming and a reliable delivery scheme to insure to that data will eventually get through intact.

My real issue is that the overrun happening to arriving between the polling of SCIS1 and reading of SCID appears to leave the receive buffer full without RDRF set (and OR raised.)

The effect being that no further incoming data will *ever* be processed, since RDRF cannot be set until the buffer gets flushed.

I can work around the issue easily enough by implementing an overrun interrupt which manually flushes the buffer, but it seems to me that this behaviour leaves the receiver vulnerable to a very subtle and insidious race condition.

This leaves me working whether I am reading data in the wrong way, whether this might be silicon bug on the 9S08 SCI, or if I am labouring under some other misapprehension.

bigmac · ‎03-22-2013

Hello Johan,

Yes, it would be possible for an overrun condition to occasionally occur after reading SCIS1 register, but prior to reading the SCID register, due to the arrival of the next character. I would expect that this should not occur very frequently with only a few cycles between the two processes (I assume this is actually the case). Here, I would assume that the RDRF flag would be cleared, but the OR flag would remain set.

Possible solution 1:

Immediately after reading SCID, read SCIS1 register again. If any error flag is set, do another dummy read of SCID.

Possible solution 2:

Enable overrun interrupt for SCI error interrupt. The minimum ISR code should read SCIS1 and then read SCID.

Of course, this ignores the reasons why the overruns are occurring in the first instance.

Regards,

Mac

implicit · ‎03-22-2013

Indeed the window of opportunity is very short and the problem occurs exceedingly seldom. In this particular application it would occasionally lock up a node every few days when the receiver interrupt happened to be sufficiently delayed and the stars were in alignment.

After that the operator would need to manually reset the system to recover.

I believe the actual cause of the delays were a FLASH write temporarily disabling interrupts, though it may have been something else. Obviously frequent overruns are not acceptable and I'll certainly strive to avoid them, but guaranteeing minimum interrupt latencies in all possible paths of the application would be an awful lot of work.

Oh well, time to go back and add overrun handling to various old applications. Thanks for confirming this behaviour!

kef · ‎03-22-2013

Johan,

you should have SCI error interrupt handler to handle SCI overruns. I think this will solve your issue.

BTW there's a bug in your transmit routine. You need to read status register before writing SCID.

implicit · ‎03-22-2013

Yes, handling overflow interrupts seems to work, thanks.

As for the transmit routine I'm explicitly not reading the status register in that example to be able to demonstrate the issue through the loopback interface. In the actual application the data is sent by a different host.

Overrun detection needed for reliable SCI reception?

Overrun detection needed for reliable SCI reception?

General