File writing tasks silently failing on K60 and K64F

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

File writing tasks silently failing on K60 and K64F

Jump to solution
3,912 Views
ironsean
Contributor V

Hello, I've experienced an issue using MFS and a file writing task on

both a K60 custom board, and a K64F FRDM board: When the fopen() or

write() function is called the function will never return and the task will

cease operating. There are no errors thrown and I don't know where to

look to figure out why the function ceased. If I step through line by line

in the debugger eventually the debugger will just stop responding as

well.

It happened first with a custom K60 based board, using MFS on the

Internal Flash (to facilitate developing the file logic until a new board with

better storage was developed). When a specialized test task wrote to

Flash it worked, and when ONLY the file writing task (not the webserver,

telnet server, sensor reading, or data uploading tasks) it also worked.

But when all the tasks were running the file writing would silently stop

the task. By lowering the priority of the file writing task to the lowest

priority it began to function properly.

Now, on the K64F FRDM board we are attempting to refactor that task

to use the SD card. The internal flash writing still works. We have also

used the SD card example application to verify the SD card writing

works. I moved the SD card initialization into our full project and it

installs and uninstalls the SD card appropriately, but whenever we try to

write to it the task silently fails again.

Any idea where or how I could try and Troubleshoot this, or what might

be happening? Because of the task priority and the way it just ceases, I

wondered if there's some sort of deadlock that occurs with the file

writing task and one of the others. Priority doesn't seem to be helping

the SD card case though.

We're using CodeWarrior Version: 10.5

     Build Id:130916

Labels (1)
0 Kudos
1 Solution
2,149 Views
ironsean
Contributor V

I have confirmed that stack overflows were the course of my issues. I discovered it when another task was crashing on printf calls, and realized the printf was buffering the output and overflowing the stack. After realizing I went back and that was exactly what was happening with the SD card writing task. I used the Codewarrior task summary and stack usage tools to monitor this.

Since this is not the case with you I'm less sure. Could you possibly be running out of memory elsewhere?

View solution in original post

0 Kudos
28 Replies
1,841 Views
asmith
Contributor III

I see something similar to this with MQX 4.1.1 with a K70.

It is my beleif that this has something to do with MQX kernel memory being overwritten.

I have debugged into the write functions and I am "hanging" when the indirect function is being called suguesting that the function address in kernel memory has been zapped somehow.

I am actually using arm-none-eabi-gdb --tui to debug rather than CW but see the same debugger hanging issue.

I can often trace further by stepping by assembler instruction rather than C instruction.

0 Kudos
1,841 Views
ironsean
Contributor V

I took a quick look myself, and found this for an approximate code flow:

log_task.c

io_fopen.c:48-97

mfs_init.c:323-418

mfs_create.c:58

mfs_dir_entry.c:462

mfs_dir_entry.c:58-93

mfs_search.c:121

mfs_rw.c:437 MFS_Read_device_sector

mfs_rw.c:311

mfs_rw:256 read()

io_read.c:53

io_read.c:78 res = (*dev_ptr->IO_READ)(file_ptr, data, num);

part_mgr.c:1420 read() io_read, sdcard.c:285

sdcard_esdhc.c:363-408

io_read.c:74 in _mqx_int _io_read:

if (file_ptr->HAVE_UNGOT_CHARACTER) {

            res = (*dev_ptr->IO_READ)(file_ptr, data+1, num-1);

        }

        else {

            res = (*dev_ptr->IO_READ)(file_ptr, data, num);

I get the debugger to hang on that final call to io_read.c:74 after getting through the sdcard_esdhc.c code, and it gets to the if statement at line 74 where if (file_prt->HAVE_UNGOT_CHARACTER) can be seen as false through the debugger, but stepping over or into that statement doesn't lead to either of the res = calls. I'll see if I can step through assembly code and get any further.

0 Kudos
1,841 Views
asmith
Contributor III

You appear to be failing mush as I am at the indirect call

res = (*dev_ptr->xxxx)

My guess is that the memory where dev_ptr points has been trashed.

I have been trying to chase down MQX memory corruption thus far without success.

0 Kudos
1,838 Views
tobiasmaurer
Contributor I

Hello everyone,

I think I have the same or a similar problem with the esdhc / mfs / filesystem handling on the Vybrid with MQX 4.1.2.

Most SD-Cards work fine but some SD-Cards cause a hard fault after awhile. In the trace I also find following last entries

_io_read  ...

  |    if (res<0) return res;                                                                                   

  |    if ( file_ptr->HAVE_UNGOT_CHARACTER ) {                                                                           

  |    return res;                                                                                                  

I am not sure what causes this issue, changes in the task stack do not help. Maybe it is something inside the MQX code?

BR Tobi

0 Kudos
1,841 Views
ironsean
Contributor V

You might be right, after some further testing the exact moment it loses it is inconsistent, but it's usually at one of those DEV_PTR calls. And a couple times when I've checked the variable right before losing it the contents of the struct are almost all 0x59D.

0 Kudos
1,841 Views
RadekS
NXP Employee
NXP Employee

Unfortunately it doesn’t exactly fit to any of known issues.

Is it possible place watchpoint at this DEV_PTR (and address where it points) and check whether dev pointer is modified somewhere between fopen on fclose?

Did you try different SD card?  I know about one old and slow SD card here which has similar issue (it just stop working after few writes). We suppose that there is some issue due to timeouts during card transfers, but we didn’t have time for investigation till now.

0 Kudos
1,841 Views
asmith
Contributor III

In my case I am using a K70.  I am not using an SD.

I am building MQX using GCC from the command line under Ubuntu 14.04. I have used both the ubuntu Mutli-arch arm gcc compiler an the recomended kinetis GCC toolchain.

I have built for DDR and SDRAM all with the same results.

Eventually I execute and indirect call inside MQX and things go to hell.

I can play games with the code and delay this, but ultimately it is happening.

I am looking for a clue where to look to try to find this.

I am also trying to build my application with KDS to see if somehow that makes a difference, but it is a fair amount of effort creating a project file.

0 Kudos
1,838 Views
ironsean
Contributor V

David, what stack size are you allocating for the task doing file writing? I found out I was getting a stack overflow during the file write call (actually debugging another task). When I bumped my allocation up from 1000 to 2000, I see my stack usage jump up to 71% during file writing, which is well above what the old limit would have been. Now writing works properly.

0 Kudos
1,838 Views
asmith
Contributor III

I have checked my stacks not the problem. I doubled them anyway - that causes the failure to occur earlier.

All my task stacks are atleast 2000 many are much larger.

I can even cause a crash using printf to print our stack usage.

0 Kudos
2,150 Views
ironsean
Contributor V

I have confirmed that stack overflows were the course of my issues. I discovered it when another task was crashing on printf calls, and realized the printf was buffering the output and overflowing the stack. After realizing I went back and that was exactly what was happening with the SD card writing task. I used the Codewarrior task summary and stack usage tools to monitor this.

Since this is not the case with you I'm less sure. Could you possibly be running out of memory elsewhere?

0 Kudos
1,838 Views
asmith
Contributor III

Printf is an interesting clue - Printf could use alot of stack.

I do not beleive I have a stack problem - but only because increasing the stack sized made the problem worse.

I know that MQX pointers are getting stomped on somehow.

I am not building using CW and can not easily get to where I can.

But I can use the MQX stack API's,

A clue there is that the one that displays stack information directly runs fine,

but the one that returns data dies when I try to printf that data myself.

0 Kudos
1,838 Views
ironsean
Contributor V

If you want, I could attempt to run some of it if I've got the right hardware and see what CodeWarrior sees? Maybe you're also running into issues using all the available memory with stacks that are too large? I'm not sure how else increasing the stack size could make things worse.

Are you able to find the Stack Base, Stack Limit, and Stack used variables? Maybe your main task could run checks that see what those are for each task? That seems to be how the CodeWarrior tools monitor.

Also, if you check for task errors, they will throw a stack overflow task error. You could check task errors of all tasks in your main or most stable task.

Sean

0 Kudos
1,838 Views
asmith
Contributor III

I would have little problem beleiving this was a stack issue,

though there is still the question of why kernel data is getting corrupted ?

Stacks are supposed to come from free memory. Why would a stack overflow encroach on kernel data ?

I am really not happy with the magic numbers in the link files.

Explicitly placing one kind of memory at an absolute address while allocating other memory out of a pool that overlaps seems like a recipe for disaster to me.

0 Kudos
1,838 Views
asmith
Contributor III

I have tried so many things it is hard to recall details about all of them.

I beleive I have had failures with single tasking examples and 2000+ stacks.

Right now I am stalled because something caused USB to quit working entirely, and I can not get back to where it was.

Need to push to the SCM more frequently

0 Kudos
1,838 Views
ironsean
Contributor V

My experience on a K60 board writing to internal flash via MFS failing also supports that it isn't an SD card issue. In that case, I could write to the flash when only the testing task was running, but I could not write while my other tasks were running until I changed the file writing to have a lower task priority (possibly causing whatever task was wiping the memory to not run during the call anymore?)

As well, if I simply compile an SD card example project I can read and write to the SD card fine. but when incorporating the same code in my project it fails. I will try and set the watchpoints and see if I can find it changing for sure.

Sean

0 Kudos
1,841 Views
ironsean
Contributor V

Anyone else have any ideas where to even look to debug this? How to figure out what happened to the task that ceased operation?

0 Kudos
1,841 Views
pbanta
Contributor IV

If you pause the debugger when the task is stuck what does the call stack for the stuck task look like?

0 Kudos
1,841 Views
ironsean
Contributor V

The Thread which is my logging task (0x10006) is this:

Thread [ID: 0x10006] (Suspended: Signal 'Process Suspended' received.
Description: Process Suspended.)
7 (AsmSection)() dispatch.S:178 0x0000044a
6 _time_delay_internal() time.c:554 0x000299c0
5 _time_delay_for() time.c:484 0x000298fc
4 _lwevent_wait_internal() lwevent.c:1163 0x0002dd54
3 _lwevent_wait_ticks() lwevent.c:1345 0x0002dde4
2 _esdhc_send_command() esdhc.c:557 0x0000fada
1 _esdhc_read() esdhc.c:1509 0x000107da

The Log_task is no longer even showed as the root task, and my debugger ends up looping through dispatch.S

0 Kudos
1,841 Views
soledad
NXP Employee
NXP Employee

Hello Sean,

which MQX version are you using??

Regards

Sol

0 Kudos
1,841 Views
ironsean
Contributor V

On the K60 custom board it was MQX 4.0.x (we updated to .4 or .5 as the latest I believe), and with the K64F we are using MQX 4.1.0, from the FRDM board's specific MQX release package.

I'm sorry, this was copied from a service ticket where the MQX versions were already listed as part of the ticket metadata and I forgot to include them here.

Thanks,

Sean

0 Kudos