Issues running MQX in DDR on K70

avm · ‎11-27-2013

Happy Thanksgiving to all!

I'm trying to develop an MQX application that is a bootloader that resides in internal flash. It uses eGUI to initialize the LCDC and display status messages on the LCD screen, searches for an MQX application image on the SD card, loads it into DDR memory, and runs it. I'm close, but I'm running into problems: the loaded application starts up, and gets through MQX initialization, then crashes on the first interrupt.

Using the information from this thread, I have a simple MQX test application that the debugger can load into DDR memory, and it runs perfectly that way: https://community.freescale.com/thread/305368

But it doesn't run properly when launched by my bootloader. The BSPs for both the bootloader and application are configured to place the interrupt vectors in RAM. The application BSP has been modified according to the thread referenced above to not re-initialize the clock and DDR controller. I have verified that the bootloader properly loads the image into RAM by comparing the loaded RAM contents with the original image file contents. The bootloader transfers to the application by disabling interrupts, loading LR with 0xffffffff, loading SP with the initial value from the application's start vector, and branching to the PC from the application's start vector.

My board has a buzzer, and I added a little test code to the loaded application to sound the alarm before it starts up MQX. I do get the beep during startup when the bootloader loads the code then launches the application, so I know it's getting that far. (If I use the full BSP that doesn't have the clock and DDR initialization sections trimmed out as per the above thread, then this beep doesn't happen: presumably re-initializing the clock and/or DDR controller affects the DDR refresh and corrupts the memory.)

If I step through the MQX initialization function _mqx(), I can see that the interrupt vectors gets properly set up, and the VTOR register points to them. Everything appears to initialize properly, all the way through creating the auto-start tasks. At the end of _mqx(), once everything is initialized, it calls _sched_start_internal() which simply executes an SVC instruction to generate a software interrupt to enter the scheduler. It is right around here where the application crashes, and interestingly, this is where I see the first difference in behavior between running this application directly from the debugger, and running it from the bootloader.

When running from the debugger, the SVC call properly jumps through the SVCall vector to the _svc_handler() function. But when I step through the SVC call when launched by the bootloader, it either crashes immediately, or it reaches the CPSID instruction to disable interrupts at the beginning of _int_kernel_isr(), and stepping through that instruction causes a crash. So, when launched by the debugger, the SVC instruction reaches _svc_handler() as expected, but when launched from the bootloader it reaches the general interrupt ISR.

Looking at the ICSR register just before stepping through the SVC call, I see it has the value 0x00400000 (ISRPENDING) when launched from the debugger, and 0x0440f000 (PENDSTSET + ISRPENDING + exception 15 pending) when launched from the bootloader. So it looks like a SysTick interrupt is pending when the SVC instruction is executed, so it goes to process that interrupt first? But why does that cause a crash?

Thinking that perhaps it can't handle an interrupt until the first time through the scheduler (maybe a different stack must be selected?) I tried clearing all of the SysTick registers to stop the counter before launching the application. It doesn't make any difference, the application still doesn't seem to get any farther than before. (Trying to step the debugger through the transition from the bootloader in flash to the application in DDR is not completely reliable, and since I've stopped the SysTick counter, I've not been able to step to the SVC call to see what difference that has had in the ICSR register.)

I've reached a brick wall, and have not been able to get past this point for many days.

Does anyone see anything I'm missing or something I'm doing wrong? Anybody got any ideas what I should try next or what I should look at?

Edit: I probably should've mentioned that it's MXQ 4.0 and Code Warrior 10.2, using a custom BSP on a custom board. The DDR memory configuration is identical to that on the K70 Tower board. The custom BSP is derived from the twrk70f120 BSP.

pravinfalcao · ‎03-09-2015

Hi,

I am also facing the same problem. After bootloader, application crashes in _sched_start_internal at SVC call. I think this question is till unanswered?? can anybody throws the light on this issue?

Please help.

Thanks,

Pravin

pravinfalcao · ‎03-13-2015

Hi,

Issue is SVC priority. If interrupts are disabled, then we will face this problem so make sure that interrupts are enabled before SVC instruction.

avm · ‎12-08-2013

I still need to investigate Bjoern's suggestions, but wanted to share a little more data I've collected in my recent attempts to get this working.

Keying off of the fact that the behavior is different under the two running conditions (bootloader and debugger) is consistently different but the memory contents are the same, there must be some register somewhere that has a different setup and is causing the different behavior. So I went looking for any register content differences.

First off, a tip: In CodeWarrior 10.2 (and I'm sure other versions as well) it's rather easy to get a dump of ALL of the processor registers. In the debugger, go the Registers display pane, right-click and select the option to add a new register group. In the resulting dialog click the "check all" button. This adds a new group with every register in it. Open the group, right-click on it, and select copy group. You can then paste the contents of all of the registers into a text file, or any other file.

I did this under both my working debugger case, and crashing bootloader case. I pasted both sets of register dumps into a spreadsheet, and looked for differences. Out of more than 3000 registers, I found 89 that are different. Most are related to the flash memory cache, which makes sense: when started from the bootloader (which runs in flash) the flash memory cache will hold the contents of the last few flash accesses just before transferring control to the application in RAM. But when launched from the debugger, no flash accesses are made, and the flash memory cache will have whatever leftovers are in there from whatever was the last thing running from flash. I'm not sure, but I wouldnj't think that these differences would cause the behvaior I'm seeing.

Another group of changes are related to registers that are expected to be changing: real time counter values like the RTC, SysTick, watchdog, and flexible timer modules; and also GPIO input data registers that respond to varying signal inputs. I can't see how these would directly affect the behavior I'm seeing.

A couple more registers (FP_COMP5 and DEMCR) are directly related to the debugger, and are different because of the slightly different conditions stepping through the code under the two conditions.

That doesn't leave too many other differences. About all that's left are a few LMEM_* local memory controller cache registers. But I don't really understand the cache mechanism well enough to know if they are significant.

I've attached a summary of the 89 register differences I see. If a register is not on this list, it's either because the register has the same value under both running conditions, or it's not a register that is shown under the CodeWarrior 10.2 debugger's register tab.

Does anybody see anything in these differences that could explain the proper running under one scenario and a crash on the other? Besides register contents, what else can trigger different behavior while running the same code?

Any insights are greatly appreciated!

chrissolomon · ‎07-06-2014

Hi, this probably won't be of any use to you now, but just in case, or in case it helps anyone else:

I am using a k70 in a similar way - in my case I have split the symbols for my application and MQX into ROM and DDR sections, the DDR gets read from an SDCARD into DDR on boot, then the majority of the application runs out of DDR - everything not needed to get started and that can cope with the performance hit of running out of DDR.

The one thing we had to add to make our code work was an additional compiler flag, -mlong-calls

Without this calls cannot go from the ROM to the DDR.

Hope this helps!

avm · ‎07-07-2014

Chris,

Thanks for responding, that sounds like a very interesting idea!

After spending WAY too much time trying to get that bootloader to DDR scheme working, I finally gave up. That project is just about ready to be delivered, but it's overspent and out of hours, so it's unlikely I'll be changing it at this point.

But I'm working on another project with a virtually identical K70/DDR/SD Card configuration (just different project-specific I/O) and I'm afraid we're going to run out of flash space. Partitioning the application as you have could be an answer.

I'd like to hear more about just how you did this. I'm guessing a logical split is to esentially put the BSP in flash, and the rest in DDR? When you make updates to the code, how do you make sure nothing in the flash section moves around? Or do you have to reload the flash section every time you rebuild so that it matches the DDR overlay?

It's been way too long (30 years?) since I've had to break code into partitions and overlays to get around memory lmits on 8-bit processors. Not only am I rusty on the techniques, but I'm guessing the compiler technology has changed a bit since then. ;-) You wouldn't have a simple example that shows how you did it, sort of a "Hello DDR World!" sample?

I'm intrugued by your comment about needing to add the -mlong-calls flag. I wonder if this would've fixed my problem with the hard fault reset when doing the SVC instruction? Perhaps that instruction doesn't like jumping to DDR? Of course, now that I think about it, if that is the cause, a compiler flag isn't really going to help it, as that's a hardware instruction, not code written by the compiler. If that long call is the issue, the solution is likely to have the SVC handler in flash, so it's not a long call (perhaps the hardware assumes that the SVC vector, or perhaps all vectors, are pointing to flash?) I'm beginning to convince myself that your solution of having a flash resident kernel could be ideal.

Very interesting, I'd love to hear more!

-- Adam

chrissolomon · ‎07-07-2014

Hi Adam,

ok, basically we made some customized ld files.

To start we have a MQX_ROM.ld and an APP_ROM.ld.

These files just contain the objects we need in the ROM to boot, for simplicity only down to the object level for the most part, but we have a couple where just specific symbols need the performance boost of being in ROM.

I'm afraid finding which objects you need can be a bit of trial and error - if you set up the MPU so you can print an exception with the problematic PC if something tries to access the DDR before you've got it going, that really helps, you can just use arm-none-eabi-addr2line to find the culprit.

In main ld file we have

MEMORY

{

vectorrom (RX): ORIGIN = 0x00000000, LENGTH = 0x00000400 /* interrupt vectors */

fcfmprotrom (R): ORIGIN = 0x00000400, LENGTH = 0x00000010 /* Flash Configuration Field */

rom_header (R): ORIGIN = 0x00000410, LENGTH = 0x00000020 /* version, checksum */

romlow (RX): ORIGIN = 0x00000430, LENGTH = 0x0007DBD0 /* lower half of ROM (MQX) */

permdata (R): ORIGIN = 0x0007E000, LENGTH = 0x00001000 /* permanent data, e.g. serial number */

filow (R): ORIGIN = 0x0007F000, LENGTH = 0x00001000 /* Flash Indicator for ROM low (swap) */

ram (RW): ORIGIN = 0x1FFF0000, LENGTH = 0x00020000 /* SRAM - RW data */

end_of_kd (RW): ORIGIN = 0x2000FFF0, LENGTH = 0x00000000

bstack (RW): ORIGIN = 0x2000FA00, LENGTH = 0x00000200

end_bstack (RW): ORIGIN = 0x2000FC00, LENGTH = 0x00000000

exception (RW): ORIGIN = 0x2000FC04, LENGTH = 0x000003FC /* SRAM - RW data for storing CPU exception info */

ddr_header (RX): ORIGIN = 0x70000000, LENGTH = 0x00000020 /* magic, version, checksum */

ddr (RX): ORIGIN = 0x70000020, LENGTH = 0x03FFFFE0 /* LPDDR - 64MB */

}

This gives us various sections - the headers for the ROM and the DDR are used because we use the swap method for upgrading, which means that the actual upgrade is atomic, with very low risk of bricking a unit.

In the sections bit we added:

.rom_header :

{

__ROM_HEADER = .;

KEEP(*(.rom_header))

. = ALIGN (0x4);

} > rom_header

PROVIDE(__rom_header = __ROM_HEADER);

so we can access the header from code - and a similar section for the ddr:

.ddr_header :

{

__DDR_HEADER = .;

KEEP(*(.ddr_header))

. = ALIGN (0x10);

} > ddr_header

PROVIDE(__ddr_base = __EXTERNAL_DDR2_RAM_BASE);

PROVIDE(__ddr_size = __EXTERNAL_DDR2_RAM_SIZE);

PROVIDE(__ddr_header = __DDR_HEADER);

PROVIDE(__ddr_end = __EXTERNAL_DDR2_RAM_END);

Then we specify the parts which must be in ROM:

.rodata :

{

*(KERNEL)

*(S_BOOT)

*(IPSUM)

*(.eh_frame)

KEEP (*(.init))

KEEP (*(.fini))

. = ALIGN(0x4);

*main.obj(.rodata*) /* add comment */

. = ALIGN(0x4);

*(.rdata*)

. = ALIGN(0x4);

*(.exception)

. = ALIGN(0x4);

__exception_table_start__ = .;

__exception_table_end__ = .;

__sinit__ = .;

} > romlow

.ARM.extab : { *(.ARM.extab* .gnu.linkonce.armextab.*) } > romlow

.ARM : {

__exidx_start = .;

*(.ARM.exidx*)

__exidx_end = .;

} > romlow

.ctors :

{

__CTOR_LIST__ = .;

/* gcc uses crtbegin.o to find the start of

the constructors, so we make sure it is

first. Because this is a wildcard, it

doesn't matter if the user does not

actually link against crtbegin.o; the

linker won't look for a file to match a

wildcard. The wildcard also means that it

doesn't matter which directory crtbegin.o

is in. */

KEEP (*crtbegin.o(.ctors))

/* We don't want to include the .ctor section from

from the crtend.o file until after the sorted ctors.

The .ctor section from the crtend file contains the

end of ctors marker and it must be last */

KEEP (*(EXCLUDE_FILE (*crtend.o ) .ctors))

KEEP (*(SORT(.ctors.*)))

KEEP (*(.ctors))

__CTOR_END__ = .;

} > romlow

.dtors :

{

__DTOR_LIST__ = .;

KEEP (*crtbegin.o(.dtors))

KEEP (*(EXCLUDE_FILE (*crtend.o ) .dtors))

KEEP (*(SORT(.dtors.*)))

KEEP (*(.dtors))

__DTOR_END__ = .;

} > romlow

.preinit_array :

{

PROVIDE_HIDDEN (__preinit_array_start = .);

KEEP (*(.preinit_array*))

PROVIDE_HIDDEN (__preinit_array_end = .);

} > romlow

.init_array :

{

PROVIDE_HIDDEN (__init_array_start = .);

KEEP (*(SORT(.init_array.*)))

KEEP (*(.init_array*))

PROVIDE_HIDDEN (__init_array_end = .);

} > romlow

.fini_array :

{

PROVIDE_HIDDEN (__fini_array_start = .);

KEEP (*(SORT(.fini_array.*)))

KEEP (*(.fini_array*))

PROVIDE_HIDDEN (__fini_array_end = .);

___ROM_AT = .;

} > romlow

.mqx_rom :

{

. = ALIGN(0x4);

/* ideally we could do something like libmfs.a(.text*) but

couldn't get it working, so including the list of .o files */

INCLUDE ../MQX_ROM.ld

} > romlow

.app_rom :

{

. = ALIGN(0x4);

INCLUDE ../APP_ROM.ld

} > romlow

Then we have the catchall, to put any new objects into DDR:

.catchall :

{

. = ALIGN(0x4);

*.obj(.text*)

*.obj(.rodata*)

. = ALIGN(0x20);

__DDR_CODE_END = .;

} > ddr

PROVIDE(__ddr_code_end = __DDR_CODE_END);

PROVIDE(__ddr_pool_start = (__DDR_CODE_END + 0x200) & 0xfffffe00);

The final 2 provides are useful, they tell you where the DDR code image ends and the DDR memory pool can start - you can either extend the existing memory pool or create another - I made a separate DDR memory pool because in my app I have to cope with external power going away without warning, and loosing the DDR, so I keep critical items in the internal memory.

chrissolomon · ‎07-07-2014

Oh, and we used SREC to split the binary into ROM and DDR image, add CRCs and lengths

The DDR image isn't stored in the MFS partition, just in a seperate raw section, and we have two 'slots' so we can swap internal ROM and it finds the matching DDR image.

Hope thats enough to give you some ideas - I'm not suggesting this is the best solution, this is just the way we ended up going.

Chris

avm · ‎12-01-2013

I've been stuck on this for almost two weeks, and I've made little baby steps of progress, but I'm still getting nowhere. :smileyangry:

So far I've determined that I do not get past the "SVC" instruction in _sched_start_internal(). This is the first SVC call in the loaded application. As soon as I try to step over that instruction, the system crashes and restarts.

I've been able to determine that the reason for the crash is a core lockup (RCM_SRS0 is 0x00, and RCM_SRS1 is 0x02) I've also updated the vector table with custom handlers that pulse various I/O pins to let me know if they ever get reached - none of them ever do, it certainly looks like it never gets past the SVC instruction.

Trying to figure out why this is happening, I found this thread:Tracking Down Hard Faults -- https://community.freescale.com/thread/306244

The first reason stated in there is "execution of an SCV instruction at a priority equal or higher than SVCall." This sounded very promising, as it sounds like what's happening to me. But I'm not seeing it. The SVCall priority is 0x10. The SCB_ICSR register shows that no interrupt is active, therefore I'm at Thread priority. So there doesn't seem to be a priority issue here. Dang, I was hoping I was on to something.

At the time that I'm about to step on the SVC instruction, these are the values of the SCB registers:

It's interesting to note that I am now getting exactly the same values whether I run my application stand-alone in DDR through the debugger, or if the bootloader loads it from SD card into DDR memory. I don't know what changed from before where the SCB_ICSR register was 0x00400000 when running stand-alone throug the debugger. But at this point, it's behaving exactly as it does under the bootloader, except that it doesn't crash!

From looking at these, it appears I'm in Thread mode, so not currently executing any exception. According to SCB_ICSR I'm on Thread mode not currently executing an ISR, and I've got a SysTick interrupt pending (priority 0x40 as per SCB_SHPR3.) So it looks like I should be able to process either the pending SysTick interrupt, or the SVC call. But I can't.

Are there other registers I should be looking at that might explain why the SVC instruction fails?

Anybody got any ideas? I'm at my wits end...

bjoernjohanness · ‎12-07-2013

Adam,

first of all I didn't get anywhere really with configuring the DDR. I just stuck with the "deaults" from the Tower setup. I'm upset with Freescale not documenting the registers fully so I gave up since their support keeps disappointing me.

I read your two posts and I feel like you're sooo close. The first thing that came to mind is that there are two different stack pointers, one for thread mode and one for supervisor mode. Unless you setup the supervisor it would (most likely) crash at the first Supervisor Mode instruction which is what you're describing. Please make sure you setup the Supervisor SP prior to entering SVC.

Also, the most useful thing when working on things like these (to me) is using a "non-supported-exception handler". Any exception that you don't supply a handler for would execute this handler. I use this to print out all the registers that will let me decipher what went wrong, including the current last ~128 bytes of stack.

BTW, running from ram is very slow compared to internal memory unless you enable cache. At least on my setup.

Good talking to you.

//BJoern

avm · ‎12-08-2013

Bjoern,

Thanks for the reply! Yes, I feel like I'm so close, just not quite there. It's been very frustrating.

You really got me thinking about the two stack pointers. I believe they are both valid, but there might be something there. However, as I look at the register contents from running under both conditions (debugger which works, and bootloader which crashes) both stack pointers appear to be valid, and both have the same contents under both conditions. But I will continue to investigate this idea. One thing I didn't do was to look what was on each stack in the two cases to see if there are any differences.

I've done a basic unexpected interrupt handler in the past, and it has indeed been helpful. In this test application for RAM, I went so far as to put in a dedicated exception handler for every fault and ISR -- none of them get called: It seems to go straight from the SVC instruction to the reset condition with no intermediate stops. That's what has me so intrigued about your stack pointer comments - if there was a stacking issue, I suppose that could prevent the processor from entering the exception handler, and I suppose the end result could be a core lockup that I'm seeing? I'm relatively new to the ARM architecture, so I don't know, but it sounds like a plausible enough theory to pursue it a bit.

As for timing, I obviously haven't gotten far enough to do any timing tests. Fortunately, I think I can get away with slower execution for my application. My code does a lot, but it doesn't have to do it very fast. Rather than churning a lot of high speed data, it's doing mostly low speed control with some long and complicated state machine sequences -- lots of code, but not a lot of data throughput. It's more of a machine controller rather than a data analysis engine. Even running out of slow RAM, I think the K70 will have plenty of processor headroom. Time will tell.

Thanks for your help!

-- Adam