_io_part_mgr_write fails on large sector

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

_io_part_mgr_write fails on large sector

Jump to solution
1,643 Views
adyr
Contributor V

I am using KSDK 1.3.0 with MQX on the MK66 processor.

Problem 1:

I have found a problem where after a period of time of writing data to an SD card it would become unavailable. It would mount but any read / write / list would fail. I have tracked this down to a problem with the _io_part_mgr_write function. It calls lseek, which returns the selected sector as a int64_t value, but puts the result into a int32_t variable and then checks if the result is greater than or equal to 0 before writing the data. The problem is when the returned sector number is greater than a int32_t can hold it wraps around to be a negative number so the check fails, even though the seek was successful.

I have now changed the call to put the result back in to the location variable then check that. I can now continue to use cards that had previously become "corrupted".

Snippet from _io_part_mgr_write in part_mgr.c

// Adria Rockall:- return is a sector so put into location which is a int64_t else we get overflow
//    result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET);
    location = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET);
    if (location >= 0)
    {
        result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num);
    }
 else // Adrian Rockall:- return -1 if the seek fails
 {
  errno = MFS_ERROR_SEEK;
  if (error)
  {
   *error = MFS_ERROR_SEEK;
  }
  result = -1;
 }‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Problem 2:

It appears the function has also been updated at some point to take a pointer to an error variable to pass back the error code. However the function itself still puts the error codes in the global errno variable at not in the passed in variable. As the calling function does not initialise the variable that it passes in and copies the value to errno if a failure is detected it results in a completely random error code being reported back to the calling functions. I have now changed the declarations of the error variables in all the calling functions to ensure they are initialised to 0, plus added code, like that shown in lines 11 to 14 above, to all places where errno is set.

Hope this helps anyone having this problem.

Adrian.

Labels (1)
0 Kudos
1 Solution
1,270 Views
adyr
Contributor V

I have just found the same problem exists in the _io_part_mgr_read function as well so I have applied the same change there.

Snippet from _io_part_mgr_read in part_mgr.c:

    /* Perform seek and data transfer */
 // Adria Rockall:- return is a sector so put into location which is a int64_t else we get overflow
//    result = _nio_lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);
    location = _nio_lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);
    if (location >= 0)
    {
        result = _nio_read(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num, error);
    }
 else // Adrian Rockall:- return -1 if the seek fails
 {
  if (error)
  {
   *error = MFS_ERROR_SEEK;
  }
  result = -1;
 }

Also I am not sure why the read function calls _nio_lseek and the write function calls lseek. It seems a bit inconsistent and may be better if both call _nio_lseek as lseek ends up calling that anyway. It would be one less call in the chain. It might even be more efficient if they both called _io_part_mgr_lseek directly as that is where both the other functions seem to end up. Or just set LOCATION directly as that is all that is happening at the end of the chain. If I get time I will investigate all that but if anyone else can see an immediate reason why it is not a good idea I would like to know.

Adrian.

View solution in original post

0 Kudos
4 Replies
1,270 Views
danielchen
NXP TechSupport
NXP TechSupport

Hi Adrain:

Yes, you are right.  This is a bug in MQX for KSDK 1.3. unfortunately there will be no updates for MQX in KSDK.

Thank you very much for your input, it is very helpful.

Regards

Daniel

0 Kudos
1,271 Views
adyr
Contributor V

I have just found the same problem exists in the _io_part_mgr_read function as well so I have applied the same change there.

Snippet from _io_part_mgr_read in part_mgr.c:

    /* Perform seek and data transfer */
 // Adria Rockall:- return is a sector so put into location which is a int64_t else we get overflow
//    result = _nio_lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);
    location = _nio_lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);
    if (location >= 0)
    {
        result = _nio_read(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num, error);
    }
 else // Adrian Rockall:- return -1 if the seek fails
 {
  if (error)
  {
   *error = MFS_ERROR_SEEK;
  }
  result = -1;
 }

Also I am not sure why the read function calls _nio_lseek and the write function calls lseek. It seems a bit inconsistent and may be better if both call _nio_lseek as lseek ends up calling that anyway. It would be one less call in the chain. It might even be more efficient if they both called _io_part_mgr_lseek directly as that is where both the other functions seem to end up. Or just set LOCATION directly as that is all that is happening at the end of the chain. If I get time I will investigate all that but if anyone else can see an immediate reason why it is not a good idea I would like to know.

Adrian.

0 Kudos
1,270 Views
jschepler
Contributor III

Thanks for posting this issue, Adrian.  We are using a 16 GB SD card and I am testing the solution by continuously writing 1 KB to a file.  I am trying to get the file location to pass over the 4GB boundary so it will pass the overflow issue in the _io_part_mgr_write() function. 

I started stepping through the debugger when the location gets to the value of 0x8000_0000.  Now, with the 64-bit variable I expect to return the location 0x8000_0000 when it calls lseek, but the value returned is 0xFFFF_FFFF_8000_0000, which is < 0, so the write fails.  

The root cause of the this issue is the call of lseek.  lseek has a return value of off_t, which on my K64F system is defined as a long in the _types.h file in GNU tools 4.8 2014q3.  Also, the lseek argument "offset" is also an off_t type, so it is 32-bit as well. 

I think I can solve this by calling _nio_lseek directly, which returns an _nio_off_t which is type defined in nio.h as an int64_t.  Also, the _nio_lseek argument "offset" is of type _nio_off_t as well, so it is 64 bits.  

When you tested your solution, did you test write by using lseek, or did you go directly to _nio_lseek?

0 Kudos
1,270 Views
adyr
Contributor V

Hi,

I guess you have an additional problem with that type definition:

The root cause of the this issue is the call of lseek.  lseek has a return value of off_t, which on my K64F system is defined as a long in the _types.h file in GNU tools 4.8 2014q3.  Also, the lseek argument "offset" is also an off_t type, so it is 32-bit as well.

I'm using the IAR tools and off_t is define as int64_t in MyProject\SDK\rtos\mqx\mqx\source\psp\cortex_m\compiler\iar\comp.h so lseek is working OK for me.

0 Kudos