malloc crushing M0 core

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

malloc crushing M0 core

755 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Tue Mar 01 08:22:07 MST 2016
Hi,

I'm working with a LPC Link2 board as evaluation board for the LPC 4370.
I recognized a problem with malloc on my system when changing the used core for a lib from M4 to M0App.

I reduced the problem to code only containing malloc (and a while-loop at the start to have time to get the debugger running on M0). When calling malloc the system crashed (Hardfaulthandler). But if the debugger is running and I'm stepping over malloc the interrupt does not appear. malloc also runs without any problem on the M0Sub and M4 core.

So it looks like some kind of timing thing to me but I've no idea how the malloc in main can be dependent on any timing. Found nothing on google or the forum so now it's time for help :)

Martin

Labels (1)
0 Kudos
14 Replies

647 Views
lpcware
NXP Employee
NXP Employee
bump
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Tue Apr 26 06:35:53 MST 2016
Problem still not solved.
Switched to Redlib for now. This still works.
Not happy with this
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by rocketdawg on Fri Apr 15 08:00:44 MST 2016
Malloc is not thread safe(nor for that matter, are any of the stdlib functions)
one cannot call stdlib functions from multiple threads or forground/background super loop without the risk of corruption.
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Fri Apr 15 02:54:25 MST 2016
Looks to me like an interrups during malloc caused the problem.
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Fri Apr 15 02:44:52 MST 2016
This WAS indeed a problem, but one I jsut added with the given length by debugging. The initial problem persists....

It seems to depend on the length of code (which is placed in RAM) of the m0.

For debugging purpose I needed a delay at the start and added a

for (u32 i=20*SystemCoreClock/1000;i;i--)
  nopx1000();

with nopx1000 being a macro causing 1000 nop() asm commands. not nice but its easier to estimate the time than with
something like
for (u32 i=20UL*SystemCoreClock;i;i--)
nop();
lots of overhead there.

So, the adding of
for (u32 i=20*SystemCoreClock/1000;i;i--)
  nopx1000();
"solved" the problem.

I thought about timing but as this does not fit with heap / malloc I tested

for (u32 i=2000*204000UL;i;i--)
nop();

and the crash was there again.

It does not even enter __check_heap_overflow (breakpoint).


The debugger shows:

M0_HardFault_Handler() at cr_startup_lpc43xx-m0app.c:412 0x1008018c
<signal handler called>() at 0xfffffff9
_malloc_r() at 0x10085c4a
malloc() at 0x10085bd8

by the way...
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Wed Apr 13 07:38:28 MST 2016
I think I found the problem and I'm quit happy about it even though I'm still looking for the setting to make it work my way. But we will see...

I added

unsigned __check_heap_overflow (void * new_end_of_heap)
{
u32 dwEndOfHeap=(u32)(&_pvHeapLimit);
volatile unsigned ret=((u32)new_end_of_heap >= dwEndOfHeap)?1:0;
return ret;
}

to my code to replace the __check_heap_overflow function in the newlib so I may debug it. The test should be ok this way. And I saw...

The first malloc called the test 2 times. Once with _heapStart + 40 ( 40 being the size for malloc). It passed obviously. But then there was a second call with the value 0x10089000. This is the next 4k alingnment. It did not pass obviously. I changed my linker params to:

--defsym=__user_heap_base=0x1008A000
--defsym=_pvHeapLimit=__user_heap_base+0x2000

And as suspected:
One call for 0x1008A032 and one for 0x1008B000. And that was the last one.

So newlib seems to get memory from _sbrk only on the align base and uses the rest of memory on an internal checking. The solution is not the best to say the least for embedded programming. On this device though the space is ok for me.

Perhaps there is even a possibility to change the memory alingment to like 1k or even 256. For less powerful devices.


I hope this helps if someone else gets the problem, too.


P.S. I'm not sure why it did not work in the first place but I found a thing: In my library I used Redlib and in the Main code Newlib. This got solved by now.
Hopefully thats my solution now. :)
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Wed Apr 13 04:05:53 MST 2016
I'm using this lines to add the heap values.

--defsym=__user_heap_base=_end_noinit
--defsym=_pvHeapLimit=__user_heap_base+0x800

This just works ok with redlib ( because of "PROVIDE(_pvHeapStart = DEFINED(__user_heap_base) ? __user_heap_base : .);" ?) but not with newlib.

My version and OS (actual and starting) is
Version: LPCXpresso v8.0.0 [Build 526] [2015-11-23]
Operating system: Windows 7

I have a bootloader starting the firmware. The firmware is compiled to only run on RAM. The bootloader is running on the M4 and starts the firmware on M4 (linked together for all cores) and the M0 APP is started by the M4 firmware.


But I don't see this to be a problem as malloc works fine on the M4 and works with Redlib on both.
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by vtw.433e on Tue Apr 12 07:22:14 MST 2016
I think the method of defining the heap is different between Redlib and Newlib. Redlib uses __pvHeapStart and newlib uses __user_heap_base. See This thread https://www.lpcware.com/content/forum/using-userheapbase-doesnt-change-map-file
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Tue Apr 12 03:53:17 MST 2016
A change to Redlib nohost seems to solve the problem (but I thought this once already...). I'd prefer to use newlib. Is there anything known about a malloc problem with newlib?
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by vtw.433e on Tue Apr 12 01:14:27 MST 2016
Please post your project, so we can see what you are doing wrong.
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Tue Apr 12 00:50:28 MST 2016
Things are not getting better.
The work around just causes the malloc to always return NULL. So its not useful at all.

The problem only appears on my M0 core (APP core). M0 Sub not tested by now. On M4 the malloc seems to work totally fine.

I use newlib library.

I'm near to using newlib none + my own written heap. but this sounds not funny at all and perhaps not an ideal solution.
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Wed Apr 06 05:46:44 MST 2016
Found this im my .map file:

0x10085a2c                __end_of_heap
0x10085a30                PROVIDE (_pvHeapStart, DEFINED (__user_heap_base)?__user_heap_base:.)

This eighter gives a range of -4 or, even worse, the whole addressing range (-4 bytes)? Think so.

Will investigate the __end_of_heap origin

0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Tue Apr 05 05:15:08 MST 2016
Same problem, same workaround used again. Still interested in a real solution.

Heap size should be no cause (first of all the timing thing but also because it should return NULL in that case).

Any ideas?
0 Kudos

647 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by metraTec on Tue Mar 01 09:34:20 MST 2016
I have found a solution for some or all time, will see.

The error does not occur any more after I changed the memory banks used for RAM (32kB) as Code and RAM2 (32kB) as Data to just one bank (with 72kB using the area between the 2 parts).

If this works its ok for this part but my M4 core and M0Sub core both will have performance issues so I'm interested in the problem. Is there any execution difference (Code + Data same MemBlock) vs (Data + Code in different MemBlocks)?



Regarding the core thing: Perhaps there is an error in the generated linker script causing wrong addresses for heap for 2 memory blocks?


Have a nice evening
Martin
0 Kudos