M5225x Spurious Interrupts

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

M5225x Spurious Interrupts

Jump to solution
3,070 Views
mjbcswitzerland
Specialist V

Hi All

 

Does anyone have any words of wisdom concerning the following?

 

1) An M52259 is being used to transfer data between USB and UART0 [UART is 115k and the USB is in CDC class so is seen as a virtual COM port on the PC]. The transfer is bidirectional (at the same time) and during the test case text files are sent between two terminal emulators (several MBytes in size so that the test takes a bit of time).

 

This test runs reliably.

 

2) At the same time there is a web server running on the M52259 and this is tested in parallel by continuously and quickly refreshing a page with a fairly large image to that there are a high number of TCP transfers on each refresh.

 

When the two tests are run at the same time the board will fail at some point (hang).

 

3) The reason why it hangs is due to a spurious interrupt (the spurious interrupt routine hangs so that the cause can be looked into). It is found that the interrupt level, when the spurious interrupt occurs, is level 3.

 

It is also seen that usually, when the spurious interrupt occurs, there are three interrupts pending (UART0 Rx interrupt, UART0 Tx DMA interrupt and Ethernet Rx interrupt - sometimes also the USB-OTG interrupt, but not always). The two UART0 interrupts are interrupt level 3 (but different priorities - also no two interrupts in the system are configured with colliding level/priority). The Ethernet interrupt is level 6.

 

4) Just before the spurious interrupt takes place there is a small amount of data corruption in both directions (at least in one recording).

 

5) When the UART0 Tx operation is changed from DMA based to interrupt driven (interrupt for each transmitted byte rather than DMA based with a single interrupt after a block of bytes) the system runs reliably. It is then not possible to disturb the USB<->UART transmission test by a stress test on the Ethernet interface.

 

 

The question is thus, what could be the cause of the disturbance (resulting in a level 3 spurious interrupt - presumably from the UART0 DMA transmission completion) due to Ethernet activity? Why is this restricted to the UART Tx DMA, whereas in Tx interrupt driven mode there are many more interrupts taking place? What can cause a spurious interrupt (I have to admit to never having seen one before when working with the Coldfires)? Can it be something due to the fact that Ethernet and UART are working with DMA rather than the interrupt itself? Has anyone experienced such a problem and has any ideas as to what should be done?

 

Regards

 

Mark

Labels (1)
0 Kudos
1 Solution
865 Views
mjbcswitzerland
Specialist V

Hi Chris and Mike

 

I am pleased to be able to report that the problem is not due to the chip but due to a progam error. I checked out the XON/XOFF operation and found that there is indeed a problem when the UART sends an XOFF in DMA mode. I also found that the UARTs rx buffer space is set up for only 64 bytes in the test so it doesn't take a great deal for this to fill up to cause an XOFF to be sent - the web server stress test was presumably slowing the task down handling UART reception so that the buffer was sometimes reaching the high water mark and triggering the XOFF transmission.

 

I stepped through this and found that it was aborting the DMA transfer that was in progress (basically what it should do) and then sending the XOFF together with an enable of the TRDY interrupt. It didn't give any spurious interrupts when stepping but I can well imaging that the DMA operation mixed with the TXRDY interrupt causes something strange to happen.

 

I disabled flow control and could no longer get it to hang. This means that the strange spurious situation is presumably possible with an incorrect interrupt configuration in combination with using peripheral DMA.

 

The up side is that I have learned (and documented) spurious interrupts well enough that this should never cause any problems in the furture. Also I don't need to get involved with DMA arbitration tests...

 

The down side is that I now have to move on to solve a flow control problem in DMA mode (I believe that HW flow control is OK since I did quite a lot of work with that in DMA mode). Probably just throwing a bit more that 64 bytes at the rx buffer will also ovecome any potential data loss in normal operation as well - until it is correctly sorted out.

 

Thanks for the ideas along the way - it always helps when there are others thinking out of other perspectives :smileywink:

 

Regards

 

Mark

 

 

PS: There were in fact 2 clues along the way which I overlooked:

1)

Quote "The difficulty in the case of the UART is that it is not possible to read the present interrupt mask state (write onle register) but I keep a backup variable which also shows me the last written state and this was also showing that these are enabled. (I checked very carefully that this was really kept perfectly synchronised to the written value)."

 

Here I saw that both Tx and Rx interrupt were enabled. In DMA mode one doesn't enable Tx interrupt - if I had remembered that it would have helped...

 

 

2)

Quote "I stopped the debugger and put a break point in the UART Rx interrupt and could then see that it was never entering. This can however also be due to the RS232/USB cable used for the tests (it can get locked in XOFF state) ..." The PC RS232/USB cables are pigs when they get stuck in this mode. They have to be removed and reinserted and I also had three blue screens during the tests (2 on Vista and 1 on XP) since USBSER.SYS (standard virtual COM driver) can also screw up totally. So I wonder why I didn't cotton on to the XOFF earlier....

 

 

Message Edited by mjbcswitzerland on 2010-02-08 01:45 AM

View solution in original post

0 Kudos
11 Replies
865 Views
scifi
Senior Contributor I

Hi Mark,

 

I'd like to suggests a new line of investigation: the internal bus arbiter module. Personally, I never spent the time to study all the subtleties of its inner workings. But a quick glance suggests that it may have something to do with the problem. I wonder if some kind of internal bus contention is able to cause a spurious interrupt...

 

Regards,

- mike

0 Kudos
865 Views
mjbcswitzerland
Specialist V

Hi Mike

 

There are a few DMA settings that are possible and the only difference that I can see at the moment is that UART 0 is using DMA rather than interrupts. In fact the UART DMA end-of-transmission interrupt is not doing much differently to the Tx character interrupt; they actually call the same routine to disable the interrupt when trannsmission has terminated (both with interrupts disabled) so I am not sure whether it is really interrupt or DMA related.

 

The memory copy routines are also using DMA (but without an interrupt) and this is called a lot to copy data buffers for transmission by UART, USB and Ethernet, so I also disabled that to be sure (DMA channel 3) but it didn't change anything.

 

Before starting with the details of DMA I did another test. Rather than checking the interrupt mask and pending bits in the debugger (the pending bits can arrive later since the peripherals don't get stopped by the breakpoint in the spurious interrupt routine) I changed the routine to save the values immediately on entry (in the hope that only one pending flag would be set to give more information about the exact source). This is what the interrupt routine looks like:

 

 

static __interrupt__ void spurious_int(void)
{
test[0] = IC_IPRH_0;
test[1] = IC_IPRL_0;
test[2] = IC_IMRH_0;
test[3] = IC_IMRL_0;
test[4] = IC_IRLR_0;
test[5] = IC_IACKLPR_0;
test[6] = IC_SWIACK_0;
test[7] = IC_L1IACK_0;
test[8] = IC_L2IACK_0;
test[9] = IC_L3IACK_0;
test[10] = IC_L4IACK_0;
test[11] = IC_L5IACK_0;
test[12] = IC_L6IACK_0;
test[13] = IC_L7IACK_0;

    while (1) {}             // wait here and analyse with debugger....
}

 

and this is what is in the array when it takes place:

 

0x00000000 -no pending interrupts

0x00000000 - no pending interrupts

0xff5fffff - the interrupts which are masked/non-masked as expected

0xf77dddfc - dito

0 - no interrupts pending

0 - no IACK

0 - no SWI

0x18 - no level 1 interrupt (0x18 is the spurious interrupt vector, which is taken if used)

0x18 - no level 2 interrupt

0x18 - etc.

0x18

0x18

0x18

0x18

 

 

This shows that, although the SR mask has been set to level 3 there is no interrupt pending. The spurious interrupt is taken since there is nothing pending at that level.

 

Now I don't have a reference as to what happens when a spurious interrupt occurs according to its definition but what I miss here is a pending interrupt in the IC_IPRH_0 or IC_IPRL_0 registers - I would have expected that this would be there since it triggered the interrupt but its mask in IMRH (or in its peripheral interrupt mask register) would be blocking the acknowledge cycle from taking place.

It may be that the pending bit was set but then cleared somehow between it causing the interrupt and the register being read, but the exact mechanism involved it not known to me.

 

I was hoping that just one pending bit would be set somewhere so that at least the offending source could be identified. Ether the mechanism doesn't allow this or else the spurious interrupt is so spurious that it is causing an interrupt to take place without a mask bit being set...(?)

 

The debugger reads the interrupt controller registers when it breaks (but with a delay) and this shows the values as noted in the previous posts. This means that each of the involved interrupts actually arrive between the save in the spurious interrupt routine and the debugger's read; this also confirms that they can't be blocked (eg. in their peripheral register) since they otherwise would not become pending shortly afterward.I suppose DMA settings really need to be looked into becauseI can't think of any coding issues that can lead to this (at the moment...)

 

Regards

 

Mark

 

 

0 Kudos
865 Views
scifi
Senior Contributor I

Hi Mark,

 

Is it true that if you simply ignore the spurious interrupt the system behaves as expected otherwise?

But even if that's the case, this is disturbing. It could be a bug in the silicon.

By the way, are temperature, supply voltage, clock frequency within the allowed range? Power supply not noisy?

 

- mike

0 Kudos
865 Views
mjbcswitzerland
Specialist V

Hi Mike

 

The original problem was reported here:

http://www.utasker.com/forum/index.php?topic=842.0

 

As you can see there there is in fact some data corruption (in both transfer directions) just before the interrupt arrives (could be due to DMA (?) since USB and UART Tx use it). This means that by ignoring the interrupt the system does continue to run but errors have taken place which can not be corrected.

 

My tests were with the M52259EVB and I did it with two PCs (swapped and mixed) and it is always the same.

 

The next step will be to see whether the workaround is also acceptable in the project reporting the problem. 

 

I am wondering whether it would be an idea to force a 'typical' spurious interrupt by disabling a peripheral's interrupt when it will take place (eg. a timer running at a high interrupt rate being enabled and disabled without disabling interrupts) and then comparing the state that exists when it fails. If a pending bit is indeed visible I would think that there is a strong argument for something else being the root cause, but it may be that it is not the case - then it would be a case of digging deeper...

 

Regards

 

Mark

 

PS. As final test (for today) I decided to change the OTG interrupt from level 6 to level 5 since the OTG interrupt was never seen as pending just after the spurious interrupt took place. At level 5 it was then also seen (meaining that it wasn't being shown since the Ethernet Rx IRQ had priority over it and not because it had just caused the fault). Also I let the while loop in the spurious interrupt run a little so that the system TICK interrupt (level 2) also became pending. This gave the following pending interrupts at the various levels:

 

Level 2 = PIT0 (system Tick) - vector 0x77
Level 3 = UART0 Rx interrupt - vector 0x4d
Level 4 = UART Tx DMA complete - vector 0x49

Level 5 = USG-OTG - vector 0x75

Level 6 = Ethernet Rx  - vector 0x5b

 

[Remember also that the spurious interrupt always takes place at level 3 - or follows the level of the UART Rx interrupt if it is changed; there are no futher interrupts configured at the same level as it]

 

In the SWIACK0 register the vector number of the highest priority waiting interrupt - Ethernet Rx - is being shown as expected.

 

This again reinforces the fact that none of the interrupts involved are masked since they all become pending just after the spurious interrupt takes place. Since the CPU remains in the while loop with no interrupts enabled (NMI is not used) it could not have re-enabled these interrupts; the conclusion (possibly proved by these results) is that there were no possible sources which became masked to cause a spurious interrupt to take place according to the know spurious mechanism.

 

0 Kudos
865 Views
ChrisJohns
Contributor I

Hi Mark,

 

My experience with suprious interrupt has always ended up at the bus cycle level. As stated in the manual it will be due to a request being raised and going away as the IACK cycle is occuring. You will not get the other reasons with this hardware, eg BERR being raised. I assume this cycle is not disivible so you could conclude it is a timing related issue. I also suspect you are right on the boundary of the timing. The Tx DMA would seem to hold the key for 2 reasons. First, you do not get the error when using per char interrupts, and second you should never get corrupted Tx data when using DMA this way. So the question is why do you get corrupted data ?

 

Is the Rx interrupt doing something to effect the Tx side ?

 

Is the corrupt data the same each time or does it vary ?

 

What baud rate are you using ? Does selecting a higher speed effect the time it takes for the problem to occur ?

 

0 Kudos
865 Views
mjbcswitzerland
Specialist V

Hi Chris

 

- The Rx interrupt is putting characters into the receive buffer - it can cause an XOFF to be sent if the buffer were getting full but I don't think that this is happening since the throughput is high (especially to the USB link). [This is something that I will look into though]

 

- I have just compared the data recording of the last error case with the one posted on the other forum. It does look to be the same in the UART0 Tx sense but a bit different in the other direction (but still similar).

 

- The UART speed is 115k (USB speed virtual). I can't go higher with the terminal emulator but I could go lower.

 

 

I just did a 'standard' Spurious interrupt test and the result is not good:

 

1) I removed the UART driver, disconnected USB and Ethernet cables so that they didn't generate any interrupts and started a 50us interrupt from PIT1 with Level 1/priority 5. This means that only this and a 50ms TICK on PIT0 are active.

 

2) After this was started I put the code into a forever loop:

 

 

    while (1) {            IC_IMRH_0 |=  (PIT_1_PIF_INT_H);   // mask interrupt source            IC_IMRH_0 &= ~(PIT_1_PIF_INT_H);   // unmask interrupt source    }

 

This is doing exactly that which one must avoid and it does indeed cause a spurious interrupt to take place within quite a short time.

 

3) These are the results:

 

SP = 2104 (reflecting the PIT1 interrupt level as expected !)

 

The values saved on entry to the spurious interrupt routine:

 

IC_IPRH_0 = 0x01000000 (as expected showing the PIT1 pending !)

IC_IPRL_0 = 0x00000000

IC_IMRH_0 = 0xff7fffff (as expected showing that the mask is set, causing the spurious interrupt)

IC_IMRL_0 = 0xf77ffffc

 

IC_IRLR_0 = 0

IC_IACKLPR_0

IC_SWIACK_0

IC_L1IACK_0..IC_L7IACK_0  = 0x18

 

The values displayed by the debugger after it has run the while for a short time

 

 

 

IC_IPRH_0 = 0x01800000 (PIT0, the system TICK, has now fired too)

IC_IPRL_0 = 0x00000000

IC_IMRH_0 = 0xff7fffff

IC_IMRL_0 = 0xf77ffffc

 

IC_IRLR_0 = 0

IC_IACKLPR_0

IC_SWIACK_0x77 (the PIT0 waiting)

 

IC_L1IACK_0 = 0x18

IC_L2IACK_0 = 0x77               (PIT0 is at level 2)

IC_L3IACK..IC_L7IACK_0  = 0x18

 

 

I repeated with the UART driver active and sending a test message just before the while was entered. The result was the same apart from the fact that the debugger showed the UART0 TX DMA interrupt pending too in the debug register view.

 

 

Therefore this shows the mechanism involved in the stress test case not to be a typical spurious interrupt based on the fact that the spurious interrupt is almost certainly originating from the UART Rx (level corresponds) but the interrupt is not pending when it happens.

 

Any possible explanations???

 

Regards

 

Mark

 

 

 

 

 

0 Kudos
866 Views
mjbcswitzerland
Specialist V

Hi Chris and Mike

 

I am pleased to be able to report that the problem is not due to the chip but due to a progam error. I checked out the XON/XOFF operation and found that there is indeed a problem when the UART sends an XOFF in DMA mode. I also found that the UARTs rx buffer space is set up for only 64 bytes in the test so it doesn't take a great deal for this to fill up to cause an XOFF to be sent - the web server stress test was presumably slowing the task down handling UART reception so that the buffer was sometimes reaching the high water mark and triggering the XOFF transmission.

 

I stepped through this and found that it was aborting the DMA transfer that was in progress (basically what it should do) and then sending the XOFF together with an enable of the TRDY interrupt. It didn't give any spurious interrupts when stepping but I can well imaging that the DMA operation mixed with the TXRDY interrupt causes something strange to happen.

 

I disabled flow control and could no longer get it to hang. This means that the strange spurious situation is presumably possible with an incorrect interrupt configuration in combination with using peripheral DMA.

 

The up side is that I have learned (and documented) spurious interrupts well enough that this should never cause any problems in the furture. Also I don't need to get involved with DMA arbitration tests...

 

The down side is that I now have to move on to solve a flow control problem in DMA mode (I believe that HW flow control is OK since I did quite a lot of work with that in DMA mode). Probably just throwing a bit more that 64 bytes at the rx buffer will also ovecome any potential data loss in normal operation as well - until it is correctly sorted out.

 

Thanks for the ideas along the way - it always helps when there are others thinking out of other perspectives :smileywink:

 

Regards

 

Mark

 

 

PS: There were in fact 2 clues along the way which I overlooked:

1)

Quote "The difficulty in the case of the UART is that it is not possible to read the present interrupt mask state (write onle register) but I keep a backup variable which also shows me the last written state and this was also showing that these are enabled. (I checked very carefully that this was really kept perfectly synchronised to the written value)."

 

Here I saw that both Tx and Rx interrupt were enabled. In DMA mode one doesn't enable Tx interrupt - if I had remembered that it would have helped...

 

 

2)

Quote "I stopped the debugger and put a break point in the UART Rx interrupt and could then see that it was never entering. This can however also be due to the RS232/USB cable used for the tests (it can get locked in XOFF state) ..." The PC RS232/USB cables are pigs when they get stuck in this mode. They have to be removed and reinserted and I also had three blue screens during the tests (2 on Vista and 1 on XP) since USBSER.SYS (standard virtual COM driver) can also screw up totally. So I wonder why I didn't cotton on to the XOFF earlier....

 

 

Message Edited by mjbcswitzerland on 2010-02-08 01:45 AM
0 Kudos
865 Views
scifi
Senior Contributor I

Hi Mark,

 

The section 16.3.2 'Interrupt Mask Registers (IMRHn, IMRLn)' of the MCF52259 Reference Manual Rev. 2 has a note discussing spurious interrupts and ways to avoid them. Have you taken that information into account? Or does the spurious interrupt occur despite the right precautions being taken?

 

Regards,

- mike

0 Kudos
865 Views
mjbcswitzerland
Specialist V

Hi Mike

 

Thanks. I believe that all precautions have been taken and I have just carefully checked all interrupt masking which takes place during operation.

 

1) Ethernet: interrupts are never masked during operation

2) USB: the only time that interrupts are masked is when a USB reset takes place or enumeration completes. This doesn't take place during data transfer and all occurrences take place directly in the USB interrupt where the SR mask level is 7.

3) UART Rx. During operation the Rx interrupt is never masked

4) UART HW control line. HW flow control is not being used, but to be sure I checked and could verify that any changes to the control line mask was being peformed in a protected code region (SR mask7).

5) UART DMA complete interrupt. This interrupt disables the transmitter and also disables transmission interrupt but this happens directly in the interrupt routine (with SR mask 7). 

To be absolutely sure I added code in the sub-routine setting the interrupt mask which first checks that interrupts are disabled; this code never indicated that this took place without the SR mask level set lower than 7, so I am certain that this can not be a cause.

 

During the tests the spurious interrupt still took place together with Ethernet load.

 

Regards

 

Mark

 

PS: In the meantime I am quite certain that this problem also only takes place when the USB<->UART transmission is in both directions at the same time. If I test with only data in one direction I haven't been able to provoke such a hang. I can only do it with both directions at the same time.

 

In one single test case that I had I didn't get a spurios interrupt but both directions stopped operating. The Ethernet interface was still working correctly in this state. What happens after this was quite strange:

- I stopped the debugger and put a break point in the UART Rx interrupt and could then see that it was never entering. This can however also be due to the RS232/USB cable used for the tests (it can get locked in XOFF state) so I pulled the cable out and the interrupt arrived (possibly due to change of line state, showing that the PC RS232/USB cable was probably at fault in this instance). I stepped a few linesand saw it handling the reception as normal rx character and so pressed the debugger Run button. Immediately the spurious interrupt breakpoint was hit...

Not sure whether this helps explain anything but it seems strange.

 

0 Kudos
865 Views
mjbcswitzerland
Specialist V

Hi

 

If I understand the spurious interrupt correctly I suppose that the cause should actually be visible when a break is set on the spurious interrupt handler. Therefore I tried identifying something based on the interrup registers. These are the contents:

 

SR = 0x2300 - here is is seen that the interrupt level is 3 (supposedly due to UART interrupt)

IRRH0 = 0x00200000 - here is is seen that one interrupt is pending = USB-OTG

IRRL0 = 0x08002200 - here is is seen that 3 interrupts are pending = FEC-RX + UART0 + DMA0

 

IMRH0 = 0xff5fffff - here it is seen that the USB-OTG mask is not set so this can not be a cause IMRL0 = 0xf77dddfc - here it is seen that the other 3 pending interrupts are not masked, so again no reason for a spurious interrupt in this register.

 

[Note that the interrupt controller 1 is not used]

 

IRLR0 = 0x58 - this shows that interrupts with levels 6, 4 and 3 are presently pending (matches)

IACKLPR0 = 0 - this is supposedly the value when the controller didn't get an acknowledgement, and actually causes the spurious interrupt to execute.

 

 

Since the second reason for a spurious interrupt is when the interrupt is masked at the periperal level I also checked the interrupt masks in the peripheral registers and found none masked that were used (that is, none masked that had previously been unmasked by code).

 

The difficulty in the case of the UART is that it is not possible to read the present interrupt mask state (write onle register) but I keep a backup variable which also shows me the last written state and this was also showing that these are enabled. (I checked very carefully that this was really kept perfectly synchronised to the written value).

 

So there are still no traces of an interrupt occuring which is masked (either at peripheral or interrupt controller level), which is, as far as I understand it, the cause of a spurious interrupt. But the spurious interrupt can be quite easily reporoduced under the discussed conditions. (And never occurs if UART Tx interrupt is used instead of UART Tx DMA operation).

 

Regards

 

Mark

 

0 Kudos
865 Views
mjbcswitzerland
Specialist V

Hi All

 

One additional piece of information. The suprious interrupt in fact seems to be originating from the UART0 Rx interrupt and not the UART0 DMA end of transmission interrupt. When I change the interrupt level of this interrupt (eg. to from 3 to 2) the spurious interrupt arrives with level 2; that means that it follows the UART0 Rx interrupt level...

 

Note also that past projects which have used the V2 chips to do streaming applications between Ethernet and UARTs (also using DMA) didn't suffer from this problem. Also past projects performing streaming between USB and UART (also using DMA) didn't have any problems. It is only when UART DMA, Ethernet and USB are active together that it has ever been seen.

 

Regards

 

Mark

 

0 Kudos