AnsweredAssumed Answered

Hard Fault on branch instruction to _sched_execute_scheduler_internal for MQXLite

Question asked by Sam Kreuze on Feb 8, 2017
Latest reply on Mar 12, 2017 by Daniel Chen

We're using a MKE06Z128VLK4 to control industrial equipment. Our code was built using Kinetis Design Studio (3.2.0) with various Processor Expert (3.0.2.b151214) components and MQXLite (v1.1.1). 

 

The project has been going well for about a year and a half now with only a few solvable issues with KDS and PE; seems we've found our first issue with MQXLite. Recently we've noticed a hard fault when calling the lwevent_wait_ticks() function from our code. We've caught the hard fault a few times now using the hard fault handler and debugger.

 

The program counter points to the _sched_execute_scheduler_internal function each time.  We're able to trace through the stack and find where our code calls the lwevent_wait_ticks() function. Each time, its a different call to lwevent_wait_ticks() from a different task. Here is the stack of the task where the fault currently happened. I included the assembly from the objdump at the relevant locations in flash:

0x200013A8 0001218D
0x200013AC 0000E779
0x200013B0 200004E4
0x200013B4 20001088
0x200013B8 00000000
0x200013BC 00000001
0x200013C0 00000000
0x200013C4 20000058
0x200013C8 04692CEE
0x200013CC 046940B1
0x200013D0 00000000
0x200013D4 FFFFFFFF
0x200013D8 00000001
0x200013DC 00000000

0x200013E0 20001238  <-- stack pointer
0x200013E4 00000000
0x200013E8 0000042D
0x200013EC 0000052D
0x200013F0 200004E4
0x200013F4 0000E779 <--faulting instruction: e774: f003 fcce bl 12114 <_sched_execute_scheduler_internal>
0x200013F8 0000E77A
0x200013FC 00000000
0x20001400 00000001
0x20001404 00000001
0x20001408 1FFFF83C
0x2000140C 200004E4
0x20001410 20001238
0x20001414 2000145C
0x20001418 1FFFF844
0x2000141C 0000E7EF <-- e7ea: f7ff ff79 bl e6e0 <_time_delay_internal>
0x20001420 00000000
0x20001424 00000000
0x20001428 200004E4
0x2000142C 1FFFF83C
0x20001430 20001238
0x20001434 0000C68F <-- c68a: f002 f87d bl e788 <_time_delay_for>
0x20001438 2000145C
0x2000143C 00001770
0x20001440 00000000
0x20001444 200004E4
0x20001448 00001770
0x2000144C 0000C743 <-- c73e: f7ff ff43 bl c5c8 <_lwevent_wait_internal>
0x20001450 00000000
0x20001454 00000001
0x20001458 00000000
0x2000145C 00001770
0x20001460 00000000
0x20001464 00000000
0x20001468 00000000
0x2000146C 00000000
0x20001470 00000000
0x20001474 1FFFF83C
0x20001478 1FFFFB9C
0x2000147C 1FFFF860
0x20001480 00000000
0x20001484 0000B4E1 <-- our code now b4dc: f001 f900 bl c6e0 <_lwevent_wait_ticks>
0x20001488 0000006A

0x2000148C 00000000
0x20001490 00000000
0x20001494 00000000
0x20001498 00000000
0x2000149C 0000B78F <-- b78a: f7ff fe99 bl b4c0 <RunSpeedTask>
0x200014A0 00000000
0x200014A4 0000E201
0x200014A8 00000000
0x200014AC 00000000
0x200014B0 00000000
0x200014B4 00000000
0x200014B8 00000000
0x200014BC 00000000

The link register is 0xfffffffd which indicates we're using the PSP. The PSP is set to 0x200013e0. 

The MMAR, BFAR, and PSR are all 0. The DFSR is 2, indicating we're at a breakpoint (which we are).

 

There is plenty of stack left on all tasks and the interrupt stack.

 

This fault happens at random times when running the equipment (doesn't happen when its sitting idle). The scheduler obviously runs many times before this fault with no issues and so does our tasks which use the lwevent_wait_ticks(). 

 

We noticed that the MKE06 page doesn't link to any of these tools anymore so we're not sure what to make of that. At this point in the project, changing RTOS's is not really an option.

 

Is this a known issue with the mqx lite scheduler? 

Outcomes