AN3470SW CW 6.3 memmove / tk_switch problem

DarrenSteadman · ‎10-17-2007

This is a bit of a strange one and it has taken me a while to debug things this far. (I'm totally new to Coldfire)

I've been using the demos from the AN3470SW app note to create an application that will talk to some of our existing equipment via serial and pass it along to a PC via ethernet, so a basic serial to ethernet converter. To get this working properly I had to modify the example slightly as well as the size of some of the TCP buffers to get optimal through put for our application.

I then added the HTTP server example to the code so I could work on a way of configuring the device.

This is when things went wrong. If I try to stream serial data through the device the RTOS crashes due to a stack gaurd word on the "Main" task being overridden.

I set a write watchpoint on the guardword and it stops in _tk_switch

_tk_switch:
move.w #0x2700,SR /* disable ints */

   move.l   4(A7),D0         /* save passed tk pointer in D0 */
   move.l   D2,-(A7)         /* push the non-volitile gp registers */
   move.l   D3,-(A7)
   move.l   D4,-(A7)
   move.l   D5,-(A7)
   move.l   D6,-(A7)
   move.l   D7,-(A7)

move.l A1,-(A7) //Stops here

The call stack shows that irq_handler is being called continuously and it is trying to print something to the console. I never actually see this output so I assume that the UART is not being processed because so much time is being spent in the IRQ.

I removed the printf statement to see what would happen and tried again. This time the application didn't crash, however our PC app stopped receiving data completely. If I stopped execution of the coldfire it was nearly always in the interrupt handler.

I then decided to put a break point in the interrupt handler to see when it was being called. If I run the application but don't make a TCP connection to it then the interrupt handler never gets called, however as soon as I make a connection the interrupt handler starts being called.

The interrupt handler always gets called for the first time when the main program is in a memmove operation which is called from "arprcv"

struct arp_hdr * arphdr;
   struct arptabent *   tp;
   arphdr = (struct arp_hdr *)(pkt->nb_buff + ETHHDR_SIZE);

   {
      struct arp_wire * arwp = (struct arp_wire *)arphdr;
      MEMMOVE(&arphdr->ar_tpa, &arwp->data[AR_TPA], 4);   //Could potentially be this line
      MEMMOVE(arphdr->ar_tha, &arwp->data[AR_THA], 6);     //Debugger shows it has stopped here
      MEMMOVE(&arphdr->ar_spa, &arwp->data[AR_SPA], 4);
      MEMMOVE(arphdr->ar_sha, &arwp->data[AR_SHA], 6);
   }

The memory addresses of the variables being used in memmove always seem to be the same as well.

The interesting thing is the "Value" (in debugger) of the dest and src pointer are the same.

Is something going wrong in "arprcv" that is causing a problem with the memmove that could then trigger an interrupt?

Is there any code I could put in the interrupt to try and find out the source of it?

Has anyone else had the same kind of problem?

Thanks for your time

Darren

DarrenSteadman · ‎11-21-2007

I've changed the priorities and so far so good. It has stopped randomly calling the irq_handler function.

Thanks for all the help guys.

TrevorCurry_eu · ‎11-21-2007

As far as the exception handler is concerned, I was worried about how to recover from a watchdog situation where the code has run the stack off the end of RAM. As Paul said, the return from the exception handler just triggered the same exception and the processor appears to lock up.

I placed a soft reset in the exception handler so that it never tries to return. Such exceptions are an indication of something going very wrong, so reset seems the appropriate response for me.

Trevor

DarrenSteadman · ‎11-21-2007

I've changed the priorities on the IRQs and now I seem to have got another problem. I keep getting an illegal instruction in the ISR. I've used the PC value to find out that it is in this bit of code

void
uart_isr(int unit)
{
   struct uart_desc *uart = &uarts[unit];
   int dev = uart->unit;
   uint8 ch = 0;
   uint8 usr = 0;
   int   rx_next, tx_next;

   usr = MCF_UART_USR(dev);
   /* UART receiver */
   if (usr & MCF_UART_USR_RXRDY) //<------PC says the illegal instruction is here
   {
      ch = MCF_UART_URB(dev);

This is when it is calling UART 1 (zero indexed) I changed the priority of UART 0 to 1 and left UART 1 as 2. Are these priorities ok?

Is there a way of finding out why I'm getting the illegal instruction? What is the usualy technique used to debug this kind of thing?

Thanks

Darren

mccPaul · ‎11-21-2007

I think that the IRQ level and priority may be a bit of a red herring here - they shouldn't overlap, but it seems unlikely that overlapping would cause an illegal instruction exception. Also, the NicheLite stack has overlapping IRQs all over the place (e.g. FEC) and it seems to mostly work.

You only need to make sure that priority or level are not the same - it is OK to have e.g. UART0 at level 1 priority 0 and UART 1 at level 1 priority 1.

I think that if you are getting an illegal instruction exception it is because your code is corrupted or you are trying to execute data.

There is no point looking at the C source to try and find out where it is going wrong - all that you are seeing is the source code, not the actual instructions that are being excetuted.

You need to look at the disassembled code around the PC where you get the exception. This is likely to show that you have garbage where once there was finely crafted code.

The reason this doesn't cause a problem when there is nothing connected to the UART will be because the UART ISR is not being called.

The reason why the code for the UART has been clobbered is going to be harder to discover. The way I would go about this would be to step through my application code from the start of the world with a view of the memory that contains the UART ISR code open. Hopefully as you step through the code you will notice the memory changing as the UART code is clobbered.

DarrenSteadman · ‎11-21-2007

Just as some additional information, if I have an external device attached to the UART when I start debugging the application it goes straight into the exception handler with the problem.

If however there is nothing connected to the UART when it is started it doesn't go straight into the exception handler.

TrevorCurry_eu · ‎11-21-2007

Nothing specific I'm afraid but Freescale did state:

> > ...ICR registers of particular
> > interrupt sources should be always programmed with unique and
> > non-overlapping level & priority.

- have you changed both the level and priority?

I seem to have got away with changing only the level (ICR13 = 1, ICR14= 2) but I may change this to abide by the letter of the advice...

Cheers,
Trevor

TrevorCurry_eu · ‎11-21-2007

...that should of course read "changing only the priority"...

DarrenSteadman · ‎11-21-2007

How do I change the level?

At current UART 0 is on IRQ 13 priority 1 and UART 1 is on IRQ14 pritority 2

DarrenSteadman · ‎11-21-2007

Sorry thats wrong UART0 is on IRQ 13 with level 1 set with the macro MCF_INTC_ICR_IL and UART 1 is on IRQ 14 with level 2 set with the same macro. I'm not sure what the priorities are

TrevorCurry_eu · ‎11-21-2007

Sorry, I am getting confused here as well

I also am using MCF_INTC_ICR_IL to set the *levels* the same as you. This sets bits 3-5 in the ICR

MCF_INTC_ICR_IP is used to set the priority in bits 0-2 of the ICR - I have not changed this and assume it has been left at 0?

Trevor

DarrenSteadman · ‎11-21-2007

Sounds like you have the same priority and level settings as me then

DarrenSteadman · ‎11-21-2007

Here is the mixed source output of where the problem occurs the illegal instruction is at address 0x00001c04

When the problem occurs

if (usr & MCF_UART_USR_RXRDY)
00001BFC: 122EFFE0        move.b   -32(a6),d1
00001C00: 7000            moveq    #0,d0
00001C02: 1001            move.b   d1,d0
00001C04: 028000000001    andi.l   #0x1,d0 //<------- where it has an illegal instruction
00001C0A: 4A80            tst.l    d0
00001C0C: 670001CC        beq.w    uart_isr+0x232 (0x1dda) ; 0x00001dda

When it doesn't

if (usr & MCF_UART_USR_RXRDY)
00001BFC: 122EFFE0        move.b   -32(a6),d1
00001C00: 7000            moveq    #0,d0
00001C02: 1001            move.b   d1,d0
00001C04: 028000000001    andi.l   #0x1,d0
00001C0A: 4A80            tst.l    d0
00001C0C: 670001CC        beq.w    uart_isr+0x232 (0x1dda) ; 0x00001dda

As you can see the code is exactly the same. Also beings my application executes from flash and not from RAM is it even possible to corrupt the program code?

I don't think this is my application code that is causing the problem as when i recorded this information I just restarted the board and I got the exception before the task system even had a chance to create my tasks.

mccPaul · ‎11-21-2007

What is the exception frame?

DarrenSteadman · ‎11-21-2007

The value of framep when it gets passed into mcf5xxx_exception_handler is

0x20001BF8

From docs etc I believe this is the exception frame. If I'm wrong let me know where I can get the value.

mccPaul · ‎11-21-2007

framep is a pointer, so the exception frame is located at 0x20001BF8 in SRAM.

DarrenSteadman · ‎11-21-2007

CodeWarrior lists variable name, value, location for all the variables

so I have

variable = framep

value = 0x20001BF8

location 0x20001BE4

if I view the location in the memory viewer it does indeed have the value stated above.

DarrenSteadman · ‎11-22-2007

I got hold of the errata sheet for the cf52233 (here) and there is a problem stated about the

"Internal Flash Speculation Address Qualification Incomplete"

It recommends to set FLASHBAR[6] = 1 to turn off the feature. I've looked in the reference manual and FLASHBAR[6] is reserved and "Internal Flash Speculation Address Qualification" is not mentioned anywhere so I assume it is an "un-documented feature", as it is supposed to work.

If I start my application then pause it and modify the FLASHBAR register through the registers window to enable the bit described above it seems to solve my problem (as far as current testing is concerned).

I was wondering if someone could post some code and where to put it to modify the FLASHBAR register in code at startup. From reading some of the docs it looks like it needs to be done in assembly and non of the CW provided headers seem to have any reference to FLASHBAR. Also it mentions something about being able to determine the state it will be in at reset by reading some registers, only problem is it doesn't say which registers to read.

If someone could help with this it would be great.

TrevorCurry_eu · ‎12-06-2007

I have just got a reply from Freescale via my distributor regarding errata 2:
"the initial feedback from Freescale is that there is no defined timescale for the errata fix.
...
Workaround 2 offers the best solution as this will not affect the flash speed, but this is reliant on using 224K bytes or less."

I have investigated Workaround 2 for the code in AN3470:
The original setting for the SRAM base address was 0x20000000; to comply with the suggestion I changed this to 0x20038000 in the lcf file (2 places).
However, when I ran this code the fec had stopped working.

Further reading found that the SCM RAMBAR can only be set on 64k boundaries - hence I had to put the SRAM at 0x20030000.

The Workaround 2 caveat should be: there must be *64K* free at the top of the flash area.

I hope this is helpful.

DarrenSteadman · ‎11-22-2007

I found the section to modify to set the value.

It is in mcf5223_lo.s

I changed it to this

/* Initialize FLASHBAR */
    move.l #__FLASH,d0
    cmp.l   #0x00000000,d0
    bne     change_flashbar
    /*add.l   #0x21,d0*/               <------ Comment out this
    add.l   #0x61,d0                    <------ Change to this
    movec   d0,RAMBAR0

I'm going to do some heavy testing and see if everything works ok.

Freescale list the particular problem in the errata as "This errata will be fixed" any ideas of when this will happen? By the looks of it, it was added in November 2006

DarrenSteadman · ‎11-21-2007

Ok here is an interesting one for you.

I changed all calls to irq_handler to asm_exception_handler by doing the following in mcf5223_vectors.s

//#define _irq_handler irq_handler
#define _irq_handler asm_exception_handler

This made the program jump into the mcf5xxx_exception_handler once stating there was an illegal instruction somewhere in the uart isr.

The interesting thing is that I then made the debugger continue execution and the irq_handler is still being called. Can you think of any reason as to why the handler is still being called?

mccPaul · ‎11-21-2007

Finally some progress!

I don't understand why your illegal instruction vector would have been pointing to irq_handler in the first place but moving on...

If the CPU encounters an illegal instruction is is usually because code memory has been clobbered by data or you have made a jump to some odd point that doesn't conating code.

The error exceptions are designed so that you can try to correct a fault condition. However, the standard code doesn't fix anything so you will continue to get the same exception generated. Basically, the CPU encounters an illegal instruction, rises an expcetion, then the exception handler returns to the same point that created the exception in the first palce and so on.

Paul.

AN3470SW CW 6.3 memmove / tk_switch problem

AN3470SW CW 6.3 memmove / tk_switch problem

General