MCF52259 Spurious interrupt, help tracking it down

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MCF52259 Spurious interrupt, help tracking it down

Jump to solution
3,284 Views
FridgeFreezer
Senior Contributor I

A bit cheeky as I've already posted in the CW section but not had much luck:

https://community.freescale.com/message/75148#75148

 

We are using an MCF52259 on our own PCB, CW7.2. We have interrupts and ISR's set up for:

QSPI

I2C

UART0,1 & 2

/IRQ1,3,5,7

PIT0,1 (as fixed interval timers)

DTIM0,1,2,3 (variable freq, driving stepper motors)

RTC (seconds)

 

All seem to be working correctly, but at seemingly random intervals we are getting spurious interrupts. There seems to be no pattern to it - we can shove data up the UARTS, drive motors, leave the thing idle, whatever and sometimes it is fine for hours but other times it hits the spurious ISR in seconds. In our current prototype we are not using UART2, but I have left the ISR in place doing a dummy read of the Rx buffer and clearing any flags.

 

I have read MJBSwitzerland's excellent post about spurious interrupts, but nothing contained in it seems to have helped.

 

Does anyone have any wisdom on how to track down the source of the problem?

Labels (1)
1 Solution
1,281 Views
TomE
Specialist II

Glad to help. Check your Forum private mailbox.

 

Tom

 

View solution in original post

0 Kudos
12 Replies
1,281 Views
TomE
Specialist II

> A bit cheeky as I've already posted in the CW section but not had much luck:

 

It would have helped if you'd summarised here all the things you've already tried there. I was going to suggest "masking interrupts around IMR accesses" but you've already tried that:

 

> It seems that common best practice is to mask interrupts by SET_IPL(7)

> before modifying the interrupt mask to prevent spurious interrupts. I'm

> now doing this every time I modify the IMR. I don't know if there are

> other places this should be done, for example when modifying

> peripherals interrupt control registers?

 

Then you have to re-enable interrupts after that somehow. It is better practice to use something like:

 

    saved_isr = SET_IPL(7);

    Change the IMRs;

    SET_IPL(saved_isr);

 

But you need a version of "SET_IPL()" that returns the CURRENT IPL.

 

If your code is simple enough so that you can guarantee that you never call "SET_IPL(7) ; xxx ; SET_IPL(0);" anywhere that the IPL isn't already zero you're OK, but that prevents you from writing code with functions called from both mainline and interrupt-code. Technically it doesn't stop you doing this, it just makes lots of new and nasty bugs.

 

> I don't know if there are

> other places this should be done, for example when modifying

> peripherals interrupt control registers?

 

The interrupt is about to happen, and then the mask is changed so it is blocked. That's like going to sit on a chair and having someone pull it away. That's a "spurious interrupt". Any time you do anything to a control register that might make it "pull the chair away" you have to mask CPU interrupts. So if you're changing any peripheral registers to disable an interrupt you have to do it there too.

 

To find out where it is happening, have your Spurious interrupt routine DUMP THE STACK. You should see the PC on the stack just after the code that disabled the interrupt (that you shouldn't have done). If you're running under a debugger, put a breakpoint in there and then the debugger should show where it was interrupted from.

 

Tom

 

0 Kudos
1,281 Views
FridgeFreezer
Senior Contributor I

> Then you have to re-enable interrupts after that somehow. It is better practice to use something like:

>

>    saved_isr = SET_IPL(7);

>    Change the IMRs;

>    SET_IPL(saved_isr);

>

> But you need a version of "SET_IPL()" that returns the CURRENT IPL.

 

Sorry, I didn't give the full picture, the code does indeed do:

set_ipl(7)

<twiddle registers>

set_ipl(0)

 

Using the included Freescale routines from "mcf5xxx.c":

void mcf5xxx_irq_enable (void)
{
    asm_set_ipl(0);
}

 

And:

void mcf5xxx_irq_disable (void)
{
    asm_set_ipl(7);
}

 

> If your code is simple enough so that you can guarantee that you never call "SET_IPL(7) ; xxx ; SET_IPL(0);"

> anywhere that the IPL isn't already zero you're OK, but that prevents you from writing code with

> functions called from both mainline and interrupt-code.

> Technically it doesn't stop you doing this, it just makes lots of new and nasty bugs.

 

Can you explain what you mean by this? Are you saying that, for example, in an ISR the IPL may be non-zero and that blindly resetting it to 0 will mess things up? If so, it seems my predecessor and possibly also freescale have made the same mistake in places :-O

 

Unfortunately I've come into this embedded lark from a bit of a sideways direction (most previous coding in PHP!)  so there are some glaring gaps in my knowledge where "everyone knows that" but I really don't. For example, I have no idea how I might dump the stack in C (I realise I can see it in the debugging window though).

0 Kudos
1,281 Views
TomE
Specialist II

I wrote:

>> If your code is simple enough so that you can guarantee that you never call "SET_IPL(7) ; xxx ; SET_IPL(0);"

>> anywhere that the IPL isn't already zero you're OK, but that prevents you from writing code with

>> functions called from both mainline and interrupt-code.

>> Technically it doesn't stop you doing this, it just makes lots of new and nasty bugs

>

> Can you explain what you mean by this? Are you saying that,

> for example, in an ISR the IPL may be non-zero and that

> blindly resetting it to 0 will mess things up?

 

Yes, sure will.

 

The Coldfire CPUs have seven IPLs. Devices can interrupt at any level from 1 to 6, or 7 which is "non maskable" and best avoided.

 

This means that when everything is set up OK, a device interrupting at (say) level 6 can interrupt OTHER INTERRUPT SERVICE ROUTINES running from lower level interrupts. This can be very useful if you need it.

 

Other primitive ancient CPUs (cheap 8-bitters and their derivatives like the Pentium in your PC Smiley Happy have only one level. They can support multiple levels by faking it in the interrupt controller. This means the CPU has to program the interrupt controller with the heirarchy if multi-level nested interrupts are needed and let it know when interrupt service routines end. Whether this should be done in the CPU ir the interrupt controller is somewhat of a "religious argument", and you can see which one I like.

 

The CPU starts of in the "mainline" at Level zero. When a level "N" interrupt happens, the CPU's IPL is set to Level N. That means that device can't interrupt the CPU again. It is really bad news if you're in an ISR (Interrupt Service Routine), get the levels wrong and the device interrupts the SAME service routine in the middle. This is uncontrolled re-entrancy and almost always bad.

 

So if you're running in an interrupt service routine for a device at level "N", and have to set the level to "7" in order to mess with the mask register, it is ESSENTIAL to restore the level back to "N" and *NOT* back to Zero.

 

The ugly and dangerous way to do this is to have to know that a particular ISR is interrupted by a device at a specific hardwired level (say "5"), and then you do::

 

    asm_set_ipl(7);

    Change mask;

    asm_set_ipl(5);

 

This is "ugly" as it means you can change the IPL somewhere else and forget to change it in this routine.

 

It also means you can't write general purpose functions that need to disable the interrupts and be called from mainline (return IPL to zero) and multiple different ISRs where the IPL has to be returned to different values. It also means that if someone cut/pastes that code into a different ISR and doesn't change the "magic numbers" it'll cause more problems.

 

*BUT* if you track down the source code for "asm_set_ipl()" (should be in mfcxxx.s) you should find it returns the previous level. So your code for disabling interrupts (everywhere that you disable and restore interrupts) should be:

 

    int saved_ipl = asm_set_ipl(7);

    Change the IMRs;

    asm_set_ipl(saved_ipl);

 

If you can't find mcf5xxx.s in your distribution, then google for "_mcf5xxx_wr_other_sp" and you'll find a copy somewhere.

 

> For example, I have no idea how I might dump the stack in C (I realise I can see it in the debugging window though).

 

And when you check the debugger, where was the code when the spurious interrupt happened? That's where the bug is. That's the problem you were looking for.

 

Here's the simple way to dump the stack ("DEBUG" is our print function - replace with whatever you're using, and "dump_stack890' dumps frames, but you could just have it print 10 or 20 words and look for addresses manually)::

 

 

__attribute__((interrupt_handler)) static void int_addr_isr( void ){    uint32_t i[1];    unsigned int sr = i[2] & 0xffff;    unsigned int fs = ( ( i[2] >> 16 ) & 0x3 ) | ( ( i[2] >> 24 ) & 0xc );    unsigned int vector = ( i[2] >> 18 ) & 0xff;    unsigned int format = ( i[2] >> 28 );    unsigned int pc = i[3];    DEBUG( 1, "sr=0x%04X fs=0x%01X vector=%d format=0x%02X pc=0x%08X",           sr, fs, vector, format, pc );     dump_stack( &i[1], 1 );

 

 

Note that "i" is a local variable on the stack. So the rest of the stack is "up" from the address of that variable. For each CPU you want to do this with you have to check the manuals to see what the exception stack looks like and where the PC and other things are. Since the "frame pointer" is pushed on the stack it is pretty simple to walk back through all previous stack frames, printing all the previous function addresses too. That's how the debugger does it.

 

But if you've got a debugger it'll do all that for you.

 


0 Kudos
1,281 Views
FridgeFreezer
Senior Contributor I

Thanks Tom, that was incredibly handy :smileyhappy:

 

I had done some digging along the lines you suggested and found the asm_set_ipl() routine returns a value, is there any reason not to use the following to mask/restore interrupts?

 

 

void TwiddleInts(uint8 onoff){ static uint8 ipl = 0;  if(!onoff) {  ipl = asm_set_ipl(7); } else
        {  asm_set_ipl(ipl); }}

 

 

0 Kudos
1,281 Views
TomE
Specialist II

> is there any reason not to use the following to mask/restore interrupts?

 

Opaque. Hidden variable. Requires strictly calling it in "disable/enable" pairs.

 

Provides no extra functionality to directly calling asm_set_ipl() apart from making bugs more likely and being slower.

 

It doesn't look thread-safe or interrupt safe as there's only one storage location for the saved interrupt (which can't be interrupted, so it is safe, but took me 5 minutes to prove it). So almost by accident, if called properly, it probably is safe.


Don't use "uint8" on this CPU. Just use "int". All parameters are passed as 32 bits and so 16 and 8 bit variables on this CPU are "harder". Ditto storage, unless you have arrays of 8 or 16 bits.

 

"uint8 onoff" should be "bool" or preferably "typedef enum { INTERRUPT_OFF, INTERRUPT_ON } eInterrupt_t;", as otherwise it isn't obvious what the value should be.

 

0 Kudos
1,281 Views
FridgeFreezer
Senior Contributor I

Tom, just one question about your routine above:

If you define a one-element array uint32 i[1] then the first element is accessed as i[0], the 2nd element is i[1], so should the references to i[2] as stack frame pointer in fact be i[1] ?

 

After waiting for the thing to crash again (things never fail when you want them to) I have a result (using your routine, subject to my query above):

FS = 0x00000000

Vector = 0x00

Format = 0x00

PC = 0x00000001

SR = 0x0105

 

Which doesn't seem quite right to me (as in I don't entirely believe the answer).

0 Kudos
1,281 Views
TomE
Specialist II

 

> Tom, just one question about your routine above:

> If you define a one-element array uint32 i[1] then the first element is accessed

> as i[0], the 2nd element is i[1], so should the references to i[2] as stack

> frame pointer in fact be i[1] ?

 

No. The array is there to get the address of where the stack pointer is.
Backtrack a bit. Say your code is running, there are variables on the stack and the stack pointer is currently pointing to just under the last of them. They you get an interrupt. An "Exception Stack Frame" is pushed onto the stack. Search for that in your Reference Manuals. It consists of one 32-bit word for the interrupted Program Counter and then one word for the Status, Format, Vector and Status Register. Then you end up in your interrupt service routine. It pushes the Frame Pointer on the stack, allocates the local variables (in this case "i[1]"), saves the registers and leaves the stack pointer pointing under that.
Here's the disassembly. You should do the same to your code and make sure it is similar, or if anything else has snuck in, or if it is using a different calling convention. I'm using CodeSourcery's gcc-based build system running under Cygwin, so "m68k-elf-objdump is what I use to get a dissassembly. Your tools may do this from within the gui, or you might be able to disassemble within the debugger more easily:
$ m68k-elf-objdump -S interrupt.o | less
...
__attribute__((interrupt_handler))
static void int_spur_isr( void )
{
   0:   4e56 ffdc       linkw %fp,#-36
   4:   48d7 033f       moveml %d0-%d5/%a0-%a1,%sp@
   8:   242e 0004       movel %fp@(4),%d2 <<<<---- Stacked first word
   c:   701c            moveq #28,%d0
   e:   2602            movel %d2,%d3
  10:   e0ab            lsrl %d0,%d3
  12:   2802            movel %d2,%d4
  14:   103c 0012       moveb #18,%d0
  18:   e0ac            lsrl %d0,%d4
  1a:   2202            movel %d2,%d1
  1c:   103c 0018       moveb #24,%d0
  20:   e0a9            lsrl %d0,%d1
  22:   2002            movel %d2,%d0
  24:   2a2e 0008       movel %fp@(8),%d5 <<<<----- Stacked PC
  28:   2f05            movel %d5,%sp@-
  2a:   2f03            movel %d3,%sp@-
  2c:   760c            moveq #12,%d3
  2e:   4240            clrw %d0
  30:   4840            swap %d0

 

So in this simplistic case, i[0] is the variable, i[1] is outside of the bounds of our array - that 's the trick, but is the stacked Frame Pointer, i[2] is the SR/Status/Format/Vector, i[3] is the program counter and i[4] is looking back into the previous stack frame.

Where it all goes wrong is if you have "indirect interrupt service routines" and you've put your code in a function that is called from the real interrupt service routine. There'll be another stack frame in there in that case. likewise if you have more local variables and the compiler doesn't make "i[1]" the FIRST one, but puts something else in front of it (it is allowed to). You should make the ISR *VERY* simple, only containing "i[1]} and call through to another function to do the printing. You have to disassemble the function to find out where "fp" (the Frame Pointer) is pointing to, and then use i[2], i[3], i[4] or whatever matches your code and compiler and so on.
But you have a debugger so why not set a breakpoint and ask it where you are?

> FS = 0x00000000

> Vector = 0x00

> Format = 0x00

> PC = 0x00000001

> SR = 0x0105

> Which doesn't seem quite right to me (as in I don't entirely believe the answer).

 

That means i[2] is 0x00000105 and i[3] is 0x00000001.

 

I'd expect 0x418020xx for the first word and something within your code space for the second.

 

Why not dump from i[1] to i[30] in hex and look for matching patterns? Some of the words should match to addresses in your code.

 

Can you call in someone who has done this before? Assembly level debugging is a very steep leaning curve. "Higher" language programmers  (even PHP Smiley Happy never need to know this stuff."

 

0 Kudos
1,281 Views
FridgeFreezer
Senior Contributor I

Thanks again Tom, more excellent advice. It's been a long time since I touched asm (6502 & Z80 when they were still just about current!) I thought the way of things these days was to get a Java machine running on it and then forget all about that low-level nonsense :smileytongue: I'm still amazed by the state of things these days - using an arduino to flash an LED :smileysurprised:

 

I will try your suggestions for proper handling of the mask/unmask routine and debug dump, I'm learning through necessity here (often the best way to learn, if not most relaxing), unfortunately on this project we are "the guys who know about stuff" so have to work it out ourselves.

0 Kudos
1,281 Views
FridgeFreezer
Senior Contributor I

Well, the latest is that I went through all the code, scattered the proper disable-store-twiddle-enable code liberally about whenever I found the IMR or peripheral interrupt registers being modified and so far there's been no more spurious interrupts.

 

I hesitate to say "problem solved" (very difficult with intermittent faults of any sort) but it certainly looks like a great leap in the right direction. Plus I've learned a thing or two along the way, so a good result all round.

 

Cheers Tom!

0 Kudos
1,282 Views
TomE
Specialist II

Glad to help. Check your Forum private mailbox.

 

Tom

 

0 Kudos
1,281 Views
Nouchi
Senior Contributor II

Hello,

 

In my case , spurious interrupt was generated by a bad UART programming, RX/TX interrupt and was enabled while DMA transfer was enabled too.

DMA channel clear pending UART flags, while interrupt controller knows there's a pending interrupt, but doesn't know which is pending because flag is cleared.

Set breakpoint  inside spurious exception routine, and try to see in IPRL/IPRH register which interrupt is pending, and try to find an explanation :smileyhappy:

 

Emmanuel

0 Kudos
1,281 Views
FridgeFreezer
Senior Contributor I

We don't have DMA configured (we're not using it) but I will double-check the settings.

 

When it drops into the spurious ISR, as per my other thread the registers are as follows:

 

IPRH: 0x00000000

IPRL: 0x00008000

IMRH: 0x7E7FFFFF

IMRL: 0XFF839F56

IRLR: 0x00000000

IACKLPR: 0x00000000

SWIACK: 0x00000000

L1IACK: 0x00000000

L2IACK: 0x00000018

L3IACK: 0x00000018

L4IACK: 0x00000018

L5IACK: 0x00000018

L6IACK: 0x00000018

L7IACK: 0x00000018

0 Kudos