Intermittent problem with USB Host Bootloader starting application on K22

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Intermittent problem with USB Host Bootloader starting application on K22

2,006 Views
ARQuattr
Contributor IV

I have a bootloader project that was based on the AN4368 source (USB Host bootloader) running on a custom board using the K22FN1M0, and most of the time this works fine.  On some units however, when the bootloader jumps to the application it appears to freeze.

The application is an MQX4.0 project, and presumably since it works most of the time the linker and other settings are OK.  The bootloader never fails to start, as I can see from the message printed by it.  I watch the crystal and it always starts, and continues when it should be jumping to the application.

Early in the main application there are some other messages printed, and these never appear when this lock-up occurs.  I'm not sure how to debug the main application (or if that's even possible), so I don't know if the main application is starting at all, or if it fails to jump.  I will try to add some code earlier in the boot up process that I can use to determine if it's actually starting (like toggling some pin).

The jumping part of the bootloader is essentially the same as the original code, and includes the following:  (I cleaned it up a bit to simplify this post.)

#define IMAGE_ADDR     ((uint_32_ptr)0x10000)

static void Switch_mode()

{

     /* Get PC and SP of application region */

     New_sp  = IMAGE_ADDR[0];

     New_pc  = IMAGE_ADDR[1];

     if ((New_sp != 0xffffffff) && (New_pc != 0xffffffff)) {

          printf("\nUSB Bootloader - Executing Application...\n"); /* Run the application */

          printf("New_sp = %X\n", New_sp);

          printf("New_pc = %X\n", New_pc);

          asm

          {

               ldr   r4,=New_sp

               ldr   sp, [r4]

               ldr   r4,=New_pc

               ldr   r5, [r4]

               blx   r5

          }

     }

}

I added the print-outs of the New_sp and New_pc values and this is what the output looks like when starting the application.

USB Bootloader - Executing Application...

New_sp = 2000FC00

New_pc = 1AEDD

It looks the same whether the main application actually starts or not.  I'm not sure about these values, but I'm a little unclear about how this jump is done anyway so I'm assuming these values are correct.

Do you think I should be looking on the bootloader side, or the application code?  Could there be some hardware factor I should investigate?  I saw another forum post that indicated the ARM MSP should be reset to zero, but I don't know if it applies to this situation.  Re: Problem of initialization of MQX after jumping to the start of MQX application ( _boot() )

Thanks,

Angelo

EDIT: Update...I tried to do some debugging but couldn't figure out how to debug the main application after being jumped to from the bootloader.  I tried to add some I/O changes early but I think it was too early and the program didn't run at all.  But then I just loaded the original application directly in Flash (with no bootloader) and tried power-cycling the unit.  The main application always starts properly, so it appears that the issue is somehow related to the bootloader.

EDIT2:  Based on the link I included above, I tried to clear the master stack pointer but it didn't seem to make a difference.  This is how I did it:

asm

{

     ldr     r4,=New_sp

     ldr     sp, [r4]

     ldr     r4,=0x00

     ldr     r5, [r4]

     MSR     MSP, r5

     ldr     r4,=New_pc

     ldr     r5, [r4]

     blx     r5

}

I also tried modifying the delay before Switch_mode is called.  Presumably this was added to let the button input settle.  I tried increasing that wondering if the clock or something else needed more time to settle, and even making it wait 10s before jumping doesn't seem to make it more reliable.  In fact it seems that with extra delay (even only ~100ms) it fails more often, but I didn't do a lot of trials to confirm this.  I suspect maybe the clock gating or GPIO initialization code might be leaving it in a bad state.

My other thought, going back to the main application code, is that the clock setup is not executing properly.  I know the ARM devices have a lot of clock options, and I wonder if the same code which reliably gets the clock started when running from power-up (without the bootloader), does not work as well when jumped to from the bootloader.  So maybe it does jump but the application gets stuck in the early clock init.  For example there are various places in bsp_cm.c where it waits indefinitely for PLL lock, etc.

Labels (1)
Tags (1)
0 Kudos
10 Replies

1,380 Views
bobpaddock
Senior Contributor III

I'm not familiar with the code you are discussing, so perhaps I'm being Mr Obvious here.

Are interrupts disabled when the code below is run?  If not that will explain many of the rare crashes.

Bootloader IRQs still running before the new main() application has reinitialized for the new environment.

  1. asm 
  2.           { 
  3.                ldr   r4,=New_sp 
  4.                ldr   sp, [r4] 
  5.                ldr   r4,=New_pc 
  6.                ldr   r5, [r4] 
  7.                blx   r5 
  8.           } 
0 Kudos

1,380 Views
adarrow
Contributor I

I'm not sure about the original poster but I know we are disabling interrupts before we jump to the application.

0 Kudos

1,380 Views
ARQuattr
Contributor IV

Using the technique Jorge provided, I did some debugging and found that the bootloader always jumps to the main application.  After repeated iterations and stepping through and setting breakpoints, I found a line of code (line 825 of bsp_cm.c in the bsp_twrk21f120m project) which always gets reached on every reset, but which it does not always get past.  So, if I set only one breakpoint at line 825 it always stops there, but if I set only one breakpoint at line 830 (which is the next line of code) it does not always reach it.  But I also found that if I break at that line 825, after resuming the code executes properly.  It's as though some settling time is needed after executing line 825.  To quickly check this theory, I added a simple for loop between line 825 and 830, and this fixes the problem, i.e. the application code always executes normally.

Although this appears to fix the issue, I feel that I haven't found the root cause, and would like to understand better why this workaround stops the problem.  Maybe it is still something in the bootloader, which for example leaves certain clock registers in a state that lets this symptom appear.  (Remember when the application is loaded directly in Flash without the bootloader, it always starts.)

The code I mentioned is in the __pe_initialize_hardware function.  This is line 825:

  /* System clock initialization */

  /* SIM_CLKDIV1: OUTDIV1=0,OUTDIV2=1,OUTDIV3=3,OUTDIV4=3,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0,??=0 */

  SIM_CLKDIV1 = SIM_CLKDIV1_OUTDIV1(0x00) |

                SIM_CLKDIV1_OUTDIV2(0x01) |

                SIM_CLKDIV1_OUTDIV3(0x03) |

                SIM_CLKDIV1_OUTDIV4(0x03); /* Set the system prescalers to safe value */

and this is line 830:

  /* SIM_SCGC5: PORTD=1,PORTC=1,PORTA=1 */

  SIM_SCGC5 |= SIM_SCGC5_PORTD_MASK |

               SIM_SCGC5_PORTC_MASK |

               SIM_SCGC5_PORTA_MASK;   /* Enable clock gate for ports to enable pin routing */

The line I added between these which fixed the issue was (i is type int):

  for (i=0;i<1000000;i++);

This was my first guess at a loop count and I didn't try lower values to see where it stops working.  I'd rather not need any delay, if I could find the root cause.

I noticed that the clock gating is enabled to all ports (A, B, C, D, and E) in the bootloader, and here only for A, C, and D.  Unless this register was cleared earlier, it seems to me that the masks for ports B and E would still be enabled from when the bootloader did it.  I don't see though why that would cause an issue.

Angelo

0 Kudos

1,380 Views
adarrow
Contributor I

Hi Angelo, did you ever find the root cause of this problem? We are having almost the exact same problem and we are suspecting some sort of clock init/deinit issue when the bootloader jumps to the application.

0 Kudos

1,380 Views
mjbcswitzerland
Specialist V

Angelo

What is not clear is how the random failures on certain boards actually are:

- are there boards where it never works but others where it always works?

- is it that the loading fails and then the application will always fail (not fully programmed, or programmed with errors?) - that is, if it fails once the application then never starts?

- is it that the loading is successful (the application image in Flash is correct) but the jump to it sometimes works and sometimes doesn't? Or that certain boards never do run (but others do?)

- is it that it fails after loading (after USB has been operating) but works after a reset (if the code is started without enabling any peripherals)?

Note that the uTasker USB-MSD host is now in operation in case you think that it may be the loader part that could be causing unreliability: USB MSD Host Bootloader for K22FN1M0

In case of continued problems I have developed many loaders and supported many projects (also many using MQX applications) so can probably quickly identify any problems in case your project has special needs.

Regards

Mark

Kinetis: µTasker Kinetis support

K22: µTasker Kinetis FRDM-K22F support  / µTasker Kinetis TWR-K22F120M support

For the complete "out-of-the-box" Kinetis experience and faster time to market

0 Kudos

1,380 Views
ARQuattr
Contributor IV

Thanks Mark, sorry I wasn't too clear about the symptoms.

1. We don't have issues loading the S19 from USB to Flash.  That part of the bootloader apparently always works fine, and the bootloader code always seems to start.

2. After the bootloader starts, on most devices it reliably jumps to the application, but on some devices and only in some cases, it will fail to jump to the main application.

3. On those devices where this issue is occurring, it can be easily reproduced by power-cycling.  Sometimes the application will start, and other times it won't.  It seems to be more likely to happen when powering off then on quickly.  This again makes me feel there is some hardware component to the problem, but as mentioned, watching the crystal oscillate with no noticeable delays or interruptions makes me feel it's perhaps some random register state, or other firmware related issue.

I will check out your bootloader and let you know how it works.

Thanks,

Angelo

0 Kudos

1,380 Views
Jorge_Gonzalez
NXP Employee
NXP Employee

Hello Angelo:

If your projects are based in CodeWarrior then a very good option to resolve this kind of problem is to debug the bootloader + application in the same debug session, this way you can approach to the problem and see if it's the bootloader or the application failing. Check the next tutorial about how to do that with CodeWarrior:

Adding Symbols to the CodeWarrior Debugger | MCU on Eclipse

Regards!

Jorge Gonzalez

0 Kudos

1,380 Views
ARQuattr
Contributor IV

Thanks Jorge, that's a great tip!  I will probably try that next.

0 Kudos

1,380 Views
talktrn
Contributor I

A couple of things jump out – first of all, the PC and SP values that you see are coming from the header of the S19 file.  These values are dictated by the compiler at link time.

Example S19 file header, 2nd line (spaces added for clarity):

S3510000A000 00FC0020 F1220100....

so app will load at 0x0000A000

SP will be set to 0x2000FC00 (read little Endian)

PC will be set to 0x000122F1 (read little Endian)

The printf's added to the original AN4368 make it easy to detect an error in these values.  The AN4368 bootloader "lives" in an area of Flash below the app in the memory space, and it must be protected from overwriting obviously.  So the app must have all of it's code past the end of the bootloader.  Bootloader uses IMAGE_ADDR to set the start of usable memory where the app can reside.  It is important that your linker file have the same start point for the app to agree with IMAGE_ADDR.

For an MQX project that I'm using with AN4368 type loader, I have two linker files, one for debugging without loader, and the other for production.  Below is the start of linker file for a K70 project, basically the default settings kicked out by CW for an MQX example:

   vectorrom   (RX): ORIGIN = 0x00000000, LENGTH = 0x00000400

   cfmprotrom  (R):  ORIGIN = 0x00000400, LENGTH = 0x00000020

   rom         (RX): ORIGIN = 0x00000420, LENGTH = 0x000FFBE0  # Code + Const data  

   ram         (RW): ORIGIN = 0x70000000, LENGTH = 0x08000000  # DDR2 - RW data

   sram        (RW): ORIGIN = 0x1FFF0000, LENGTH = 0x00020000  # SRAM - RW data

Now this is the production mapping when using the loader (note change in location of "rom"):

   vectorrom   (RX): ORIGIN = 0x0000A000, LENGTH = 0x00000400                                             <---- this agrees with IMAGE_ADDR in bloader

   cfmprotrom  (R):  ORIGIN = 0x0000A400, LENGTH = 0x00000020

   rom         (RX): ORIGIN = 0x0000A420, LENGTH = 0x000F5BE0  # Code + Const data  

   ram         (RW): ORIGIN = 0x70000000, LENGTH = 0x08000000  # DDR2 - RW data

   sram        (RW): ORIGIN = 0x1FFF0000, LENGTH = 0x00020000  # SRAM - RW data

Also make sure you have enabled RAM usage for the MQX vector table!!

It is possible to debug the startup of the app from the loader, but it's a bit of a nightmare.  However it is essential to do this if you have some kind of clock startup issue in the app, so here is how I've done it:

1. Load up the debug version of app in CW, have it halt at startup entry point (not 'main' in your app, I mean the startup code itself).

2. Step through the startup code, or set breakpoint to halt at the jump to the MQX app (startup code varies with processor, but you will find a 'jsr' to main somewhere)

3. Step into the jsr to main app entry and follow through the code in disassembly mode.  You can use 'step over' for this.

4. When you get accustomed to the disassembly of the app's startup it get's easier.  We really only need to see where CLKs', PLL and GPIO get set up.

5. This is the hard part:  get a printout of the disassembled code.  CW doesn't lend to doing this, so I had to capture screens using Windows ALT-PRTSC to capture to clipboard and then stick each screen shot in a Word or WordPad document.  Now you have an image of the disassembled MQX startup code.

6. Presuming that you have a good app, set up CW to debug the bootloader.  Put a breakpoint in the 'Switch_mode' func ahead of the jump to app.

7. Run the loader, and let it load the app.  It will halt at 'Switch_mode'.

8. Step into your app from the loader.  Unfortunately, all you will see is the disassembled code (that's why we did printout earlier).

9. The big debug technique:  use 'step over' until you hit something that fails.  It will be obvious because CW will barf up with bad memory space and control will be lost if the clocks are not initialized, or the app will lock up like you said, waiting for PLL lock.

This is all very messy, but I found that my app crashed very early on so it was entirely practical to do the steps above.  When the chips are down (pun intended) you have to resort to this level of debug, which effectively takes us back to the programmer's world of 1980.

Another observation:  if you can set triggers in your app like using GPIO pin toggle, or serial I/O output, so you can see if app is running very early on.  If the pulse period or serial output is wrong, then you clearly have clock setup problem.  Best way to debug this is to run the app by itself, and presuming it runs successfully then halt somewhere and go to the various CPU peripheral regs and capture (screenshots again, CW has no capture or print for this) all the important regs for the successfully running app.  Then compare those regs for the unsuccessful bootload of the app to see what's different.  You will have to either explain away the differences, or you will stumble into a critical difference that is the source of error.

This whole discussion of course relies on the problem being in software, not hardware.  You must verify that VCC and RESET on the CPU is to spec and there's not some other hardware issue.

0 Kudos

1,380 Views
ARQuattr
Contributor IV

Thanks Doug for your reply.  The S19 I'm loading does follow what you wrote, and the vector is 0x10000, which matches the linker file (copied partially below).  I was mostly confused about the 0x2000FC00 part as I didn't see how it got that from the linker file.

   vectorrom   (RX): ORIGIN = 0x00010000, LENGTH = 0x00000400

   cfmprotrom  (RX): ORIGIN = 0x00010400, LENGTH = 0x00000020

   rom         (RX): ORIGIN = 0x00010420, LENGTH = 0x0006FB00

   ram         (RW): ORIGIN = 0x20000000, LENGTH = 0x00010000

Yes, I double-checked that the vector table is not in ROM.  I had tried to set some GPIO early as mentioned above but it didn't work for some reason and I didn't continue to determine what was wrong.  I may try that again.  I looked through your steps for debugging and they look painful, but I may have to give it a try.  Good thought though about comparing registers for the good and bad start cases.

Thanks,

Angelo

0 Kudos