MFS Invalid Sector Errors

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MFS Invalid Sector Errors

Jump to solution
1,499 Views
jschepler
Contributor III

I am using KSDK 1.3.0 with MQX and MFS.  We are using a 16 GB SD Card.

I am evaluating the reliability of the MFS file writes by using 1 task that writes to files.  I was running into issues with timeouts originally (see here).  I have a tick time of 1 ms for my system so I set the timeouts to the following until I hear back from what they should be set to in my other post:

#define ESDHC_CMD_TICK_TIMEOUT 40 // 40ms 
#define ESDHC_CMD12_TICK_TIMEOUT 1000 // 1 Sec
#define ESDHC_TRANSFER_TIMEOUT_MS 750 // 750ms

I am still using a similiar test procedure as before:

1. Create a new text file.

2. Write 1 KB of data into the file.

3. After 1024 Writes (1 MB), close the file.

4. Open the same file, seek to the end.

5. Perform steps 2 - 4 until I write 1,000 MB to the file.

6. After 1,000 MB has been written, close the current file and create a new one. Repeat steps 2-4 until 8 files have been created. (We are using a 16 GB card)

The farthest I have gotten is 5 files written before a failure during the writing of the 6th.  The errors I am receiving are the following:

MFS_SECTOR_NOT_FOUND

MFS_INVALID_CLUSTER_NUMBER

These appear to happen at random.  Again, all I am doing in my program is writing to files.  

The latest error happened in the function "MFS_get_cluster_from_fat".  I set a breakpoint here so I am able to see the context of variables:

pastedImage_8.png

drive_ptr information and fat_entry:

pastedImage_9.png

So, fat_entry is a value of 0x0141_4141, but the LAST_CLUSTER is 0x001D_1C5B, so we are trying to write into a cluster that is greater than the max, which is why it fails.

However, I am not sure how fat_entry gets its value. 

How did it become greater than the LAST_CLUSTER? 

 

I noticed there are a some places where "location" is a byte value, and then other places where "location" is converted into a sector number.  Could it be related to some 32-bit overflow issue? 

Although I don't plan on creating such large files in my application, I am still concerned with the errors that are happening because I do not know the root cause.  

Tags (5)
1 Solution
1,191 Views
jschepler
Contributor III

The root cause of this issue is an overflow of a unsigned 32-bit number when trying to access the sector.  It is confusing to follow through the functions because sometimes the LOCATION is referred to by a SECTOR, and sometimes it is referred to by BYTE. 

In my SD Card, a sector is 512 bytes. In the function mfs_rw.c a sector location is converted into a byte location.  This is accomplished by multiplying the byte location by the number of bytes in a sector.  This is all done with 32-bit variables.  This variable will overflow when the sector is 0x0080_0000.  When multiplying by 512 the byte location becomes 0x0001_0000_0000.  Since it is a 32-bit value it will be saved as 0x0000_0000 and writing to this location will corrupt the file system.

By the way, a sector location of 0x0080_0000 with 512 bytes per sector corresponds to 4 GB.  When I tested with a 4 GB SD card I never saw the issue.  We will be using 16 GB going forward.

The following changes pass the test I created as described in the original post:

mfs_format.c

Original

/* sector size is obtained during open operation and stored in the drive structure */
 sector_size = drive_ptr->SECTOR_SIZE;
/* get the number of sectors */
 error_code = ioctl(drive_ptr->DEV_FILE_PTR, IO_IOCTL_GET_NUM_SECTORS, &num_sectors);
#if !MQX_USE_IO_OLD
 if (error_code == -1 && errno == NIO_ENOTSUP)
 {
 num_sectors = lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END) / sector_size;
 error_code = MFS_NO_ERROR;
 }‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Changes

if (error_code == -1 && errno == NIO_ENOTSUP)
{
 //num_sectors = lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END) / sector_size;
 /* lseek returns 32-bit value, so if > 4 GB we would have an issue */

 num_sectors = _nio_lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END, &error) / sector_size;

 error_code = MFS_NO_ERROR;
}‍‍‍‍‍‍‍‍‍

Explanation

lseek returns a 32-bit value (off_t) for my compiler.  lseek is going to return the number of bytes, which when greater than 4 GB will overflow. (This might actually be 2 GB if its signed...)  _nio_lseek returns a 64-bit number.

mfs_rw.c

Changes are made to the functions MFS_Write_device_sectors and MFS_Read_device_sectors

Original

uint32_t attempts;
int32_t num, expect_num, seek_loc, shifter;
char *data_ptr;
_mfs_error error;

...

MFS_LOG(printf("MFS_Write_device_sectors %d %d\n", sector_number, sector_count));

if (sector_number > drive_ptr->MEGA_SECTORS)
{
 return MFS_SECTOR_NOT_FOUND;
}

if (drive_ptr->BLOCK_MODE)
{
 shifter = 0;
 seek_loc = sector_number;
 expect_num = sector_count;
}

else
{
 shifter = drive_ptr->SECTOR_POWER;
 seek_loc = sector_number << shifter;
 expect_num = sector_count << shifter;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

#if MQX_USE_IO_OLD
 fseek(drive_ptr->DEV_FILE_PTR, seek_loc, IO_SEEK_SET);
#else
 lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET);
//TODO: check errno lseek
#endif‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Changes

uint32_t attempts;
int64_t seek_loc;    /* Added for LSEEK issue */
int32_t expect_num;  /* Can be 32-bit because its just a count of sectors */
int32_t num, shifter;
int nio_error;       /* Added for LSEEK issue */
char *data_ptr;
 _mfs_error error;

...

// MFS_LOG(printf("MFS_Write_device_sectors %d %d\n", sector_number, sector_count));
 
if (sector_number > drive_ptr->MEGA_SECTORS)
{
 return MFS_SECTOR_NOT_FOUND;
}
 
if (drive_ptr->BLOCK_MODE)
{
 shifter = 0;
 seek_loc = sector_number;
 expect_num = sector_count;
}

else
{
 shifter = drive_ptr->SECTOR_POWER;
 seek_loc = (int64_t)sector_number << shifter;  /* Need to typecast to avoid overflow */
 expect_num = sector_count << shifter;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

#if MQX_USE_IO_OLD
 fseek(drive_ptr->DEV_FILE_PTR, seek_loc, IO_SEEK_SET);
#else
//    lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET);
 _nio_lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET, &nio_error); /* off_t issue */
//TODO: check errno lseek
#endif‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Explanation

Commented out MFS_LOG since it is not needed.  Changed seek_loc to int64_t since it will be converted into a BYTE location from a SECTOR.  Type-cast sector_number with an (int64_t) since it will be multiplied by 512 and will overflow if it stays at 32-bit.  Changed call from lseek to _nio_lseek due to off_t issue.

part_mgr.c

These changes need to be applied to the _io_part_mgr_write and _io_part_mgr_read functions.  (Note: Make sure to call read when editing the _mgr_read)

Original

uint64_t location;
uint64_t part_start;
uint64_t part_end;
int32_t result;

...

/* Perform seek and data transfer */
result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET);

if (result >= 0)
{
 result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num);
}

Changes

int64_t location;    /* _nio_lseek returns a signed 64-bit number. */
int64_t part_start;
int64_t part_end;
int32_t result;

...

/* Perform seek and data transfer */
// result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);

/*  Return could be a byte address so put into location which is an int64_t else
 *  we will have an overflow issue.
 */

location = _nio_lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);

if (location >= 0)
{
 result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num, error);
}
 
 // JS: Return -1 if the seek fails
else
{
 if(error)
 {
  *error = MFS_ERROR_SEEK;
 }

  result = -1;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Explanation

_nio_lseek returns a signed 64-bit number because a value less than 0 is an error.  Changed location, part_start, part_start, and part_end to signed as well.  My application will only address up to 16 GB, so the signed / unsigned will not impact it.   

Additional notes:

off_t is described by my compiler (GCC) as 32-bits.  It was mentioned in another post that the comp.h file for an IAR project explicitly defines off_t as 64-bits signed.  I tried redefining off_t in comp.h for the GCC project, but I ran into many issues and didn't feel like trying to solve them, fearing I could unintentionally break something else.  My solution was to remove all calls to lseek and replace them with _nio_lseek.

comp.h can be found at \KSDK_1.3.0\rtos\mqx\mqx\source\psp\cortex_m\compiler\iar

View solution in original post

3 Replies
1,192 Views
jschepler
Contributor III

The root cause of this issue is an overflow of a unsigned 32-bit number when trying to access the sector.  It is confusing to follow through the functions because sometimes the LOCATION is referred to by a SECTOR, and sometimes it is referred to by BYTE. 

In my SD Card, a sector is 512 bytes. In the function mfs_rw.c a sector location is converted into a byte location.  This is accomplished by multiplying the byte location by the number of bytes in a sector.  This is all done with 32-bit variables.  This variable will overflow when the sector is 0x0080_0000.  When multiplying by 512 the byte location becomes 0x0001_0000_0000.  Since it is a 32-bit value it will be saved as 0x0000_0000 and writing to this location will corrupt the file system.

By the way, a sector location of 0x0080_0000 with 512 bytes per sector corresponds to 4 GB.  When I tested with a 4 GB SD card I never saw the issue.  We will be using 16 GB going forward.

The following changes pass the test I created as described in the original post:

mfs_format.c

Original

/* sector size is obtained during open operation and stored in the drive structure */
 sector_size = drive_ptr->SECTOR_SIZE;
/* get the number of sectors */
 error_code = ioctl(drive_ptr->DEV_FILE_PTR, IO_IOCTL_GET_NUM_SECTORS, &num_sectors);
#if !MQX_USE_IO_OLD
 if (error_code == -1 && errno == NIO_ENOTSUP)
 {
 num_sectors = lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END) / sector_size;
 error_code = MFS_NO_ERROR;
 }‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Changes

if (error_code == -1 && errno == NIO_ENOTSUP)
{
 //num_sectors = lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END) / sector_size;
 /* lseek returns 32-bit value, so if > 4 GB we would have an issue */

 num_sectors = _nio_lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END, &error) / sector_size;

 error_code = MFS_NO_ERROR;
}‍‍‍‍‍‍‍‍‍

Explanation

lseek returns a 32-bit value (off_t) for my compiler.  lseek is going to return the number of bytes, which when greater than 4 GB will overflow. (This might actually be 2 GB if its signed...)  _nio_lseek returns a 64-bit number.

mfs_rw.c

Changes are made to the functions MFS_Write_device_sectors and MFS_Read_device_sectors

Original

uint32_t attempts;
int32_t num, expect_num, seek_loc, shifter;
char *data_ptr;
_mfs_error error;

...

MFS_LOG(printf("MFS_Write_device_sectors %d %d\n", sector_number, sector_count));

if (sector_number > drive_ptr->MEGA_SECTORS)
{
 return MFS_SECTOR_NOT_FOUND;
}

if (drive_ptr->BLOCK_MODE)
{
 shifter = 0;
 seek_loc = sector_number;
 expect_num = sector_count;
}

else
{
 shifter = drive_ptr->SECTOR_POWER;
 seek_loc = sector_number << shifter;
 expect_num = sector_count << shifter;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

#if MQX_USE_IO_OLD
 fseek(drive_ptr->DEV_FILE_PTR, seek_loc, IO_SEEK_SET);
#else
 lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET);
//TODO: check errno lseek
#endif‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Changes

uint32_t attempts;
int64_t seek_loc;    /* Added for LSEEK issue */
int32_t expect_num;  /* Can be 32-bit because its just a count of sectors */
int32_t num, shifter;
int nio_error;       /* Added for LSEEK issue */
char *data_ptr;
 _mfs_error error;

...

// MFS_LOG(printf("MFS_Write_device_sectors %d %d\n", sector_number, sector_count));
 
if (sector_number > drive_ptr->MEGA_SECTORS)
{
 return MFS_SECTOR_NOT_FOUND;
}
 
if (drive_ptr->BLOCK_MODE)
{
 shifter = 0;
 seek_loc = sector_number;
 expect_num = sector_count;
}

else
{
 shifter = drive_ptr->SECTOR_POWER;
 seek_loc = (int64_t)sector_number << shifter;  /* Need to typecast to avoid overflow */
 expect_num = sector_count << shifter;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

#if MQX_USE_IO_OLD
 fseek(drive_ptr->DEV_FILE_PTR, seek_loc, IO_SEEK_SET);
#else
//    lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET);
 _nio_lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET, &nio_error); /* off_t issue */
//TODO: check errno lseek
#endif‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Explanation

Commented out MFS_LOG since it is not needed.  Changed seek_loc to int64_t since it will be converted into a BYTE location from a SECTOR.  Type-cast sector_number with an (int64_t) since it will be multiplied by 512 and will overflow if it stays at 32-bit.  Changed call from lseek to _nio_lseek due to off_t issue.

part_mgr.c

These changes need to be applied to the _io_part_mgr_write and _io_part_mgr_read functions.  (Note: Make sure to call read when editing the _mgr_read)

Original

uint64_t location;
uint64_t part_start;
uint64_t part_end;
int32_t result;

...

/* Perform seek and data transfer */
result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET);

if (result >= 0)
{
 result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num);
}

Changes

int64_t location;    /* _nio_lseek returns a signed 64-bit number. */
int64_t part_start;
int64_t part_end;
int32_t result;

...

/* Perform seek and data transfer */
// result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);

/*  Return could be a byte address so put into location which is an int64_t else
 *  we will have an overflow issue.
 */

location = _nio_lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);

if (location >= 0)
{
 result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num, error);
}
 
 // JS: Return -1 if the seek fails
else
{
 if(error)
 {
  *error = MFS_ERROR_SEEK;
 }

  result = -1;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Explanation

_nio_lseek returns a signed 64-bit number because a value less than 0 is an error.  Changed location, part_start, part_start, and part_end to signed as well.  My application will only address up to 16 GB, so the signed / unsigned will not impact it.   

Additional notes:

off_t is described by my compiler (GCC) as 32-bits.  It was mentioned in another post that the comp.h file for an IAR project explicitly defines off_t as 64-bits signed.  I tried redefining off_t in comp.h for the GCC project, but I ran into many issues and didn't feel like trying to solve them, fearing I could unintentionally break something else.  My solution was to remove all calls to lseek and replace them with _nio_lseek.

comp.h can be found at \KSDK_1.3.0\rtos\mqx\mqx\source\psp\cortex_m\compiler\iar

1,191 Views
jschepler
Contributor III

EDIT 2018-04-12: nio_error of MFS_FILE_NOT_FOUND was caused by a different call to _nio_open outside of the loop.  I did not correctly reset nio_error to 0 before I started the loop.  The issue I am describing below does not exist.

Update: I am still trying to find the root cause of this issue.  Interestingly, I am writing one character repeatedly to the file, 'A', which happens to be a value of 0x41.  This is the value seen for the cluster number above.

As this error occurred again today, I noticed the error value was 0x3002 (MFS_FILE_NOT_FOUND) before the call to write.

How did this happen? I have a check to make sure the file descriptor is not < 0:

chk_error = _nio_close(index_fd, &nio_error);

if(chk_error < 0)
{
  testfunction();
}

index_fd = _nio_open(myfilenames, O_RDWR | O_APPEND, &nio_error);

// Check that we can access the file descriptor
if(index_fd < 0)
{
  testfunction();
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I added logic to not only check if the file descriptor is OK, but also to check the nio_error value:

index_fd = _nio_open(myfilenames, O_RDWR | O_APPEND, &nio_error);

// Check that we can access the file descriptor
if(index_fd < 0 || nio_error != 0)
{
  testfunction();
//  _task_block();
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

And it turns out that the _nio_open function can return a valid file descriptor even when an error has occurred in the _nio_open function.

Has anyone else seen a case like this?

0 Kudos
1,191 Views
danielchen
NXP TechSupport
NXP TechSupport

Hi jschepler

I would suggest you try this test with MQX 4.2 first.  Because in KSDK 1.3, MQX & MFS & SDHC modues are ported from MQX 4.2.   The only difference is there is a NIO layer in KSDK1.3,  classic MQX 4.2 not.

I guess this issue maybe related with 32-bit overflow issue somewhere.  for cluster?

Regards

Daniel

0 Kudos