I am using KSDK 1.3.0 with MQX and MFS. We are using a 16 GB SD Card.
I am evaluating the reliability of the MFS file writes by using 1 task that writes to files. I was running into issues with timeouts originally (see here). I have a tick time of 1 ms for my system so I set the timeouts to the following until I hear back from what they should be set to in my other post:
#define ESDHC_CMD_TICK_TIMEOUT 40 // 40ms
#define ESDHC_CMD12_TICK_TIMEOUT 1000 // 1 Sec
#define ESDHC_TRANSFER_TIMEOUT_MS 750 // 750ms
I am still using a similiar test procedure as before:
1. Create a new text file.
2. Write 1 KB of data into the file.
3. After 1024 Writes (1 MB), close the file.
4. Open the same file, seek to the end.
5. Perform steps 2 - 4 until I write 1,000 MB to the file.
6. After 1,000 MB has been written, close the current file and create a new one. Repeat steps 2-4 until 8 files have been created. (We are using a 16 GB card)
The farthest I have gotten is 5 files written before a failure during the writing of the 6th. The errors I am receiving are the following:
MFS_SECTOR_NOT_FOUND
MFS_INVALID_CLUSTER_NUMBER
These appear to happen at random. Again, all I am doing in my program is writing to files.
The latest error happened in the function "MFS_get_cluster_from_fat". I set a breakpoint here so I am able to see the context of variables:
drive_ptr information and fat_entry:
So, fat_entry is a value of 0x0141_4141, but the LAST_CLUSTER is 0x001D_1C5B, so we are trying to write into a cluster that is greater than the max, which is why it fails.
However, I am not sure how fat_entry gets its value.
How did it become greater than the LAST_CLUSTER?
I noticed there are a some places where "location" is a byte value, and then other places where "location" is converted into a sector number. Could it be related to some 32-bit overflow issue?
Although I don't plan on creating such large files in my application, I am still concerned with the errors that are happening because I do not know the root cause.
Solved! Go to Solution.
The root cause of this issue is an overflow of a unsigned 32-bit number when trying to access the sector. It is confusing to follow through the functions because sometimes the LOCATION is referred to by a SECTOR, and sometimes it is referred to by BYTE.
In my SD Card, a sector is 512 bytes. In the function mfs_rw.c a sector location is converted into a byte location. This is accomplished by multiplying the byte location by the number of bytes in a sector. This is all done with 32-bit variables. This variable will overflow when the sector is 0x0080_0000. When multiplying by 512 the byte location becomes 0x0001_0000_0000. Since it is a 32-bit value it will be saved as 0x0000_0000 and writing to this location will corrupt the file system.
By the way, a sector location of 0x0080_0000 with 512 bytes per sector corresponds to 4 GB. When I tested with a 4 GB SD card I never saw the issue. We will be using 16 GB going forward.
The following changes pass the test I created as described in the original post:
Original
/* sector size is obtained during open operation and stored in the drive structure */
sector_size = drive_ptr->SECTOR_SIZE;
/* get the number of sectors */
error_code = ioctl(drive_ptr->DEV_FILE_PTR, IO_IOCTL_GET_NUM_SECTORS, &num_sectors);
#if !MQX_USE_IO_OLD
if (error_code == -1 && errno == NIO_ENOTSUP)
{
num_sectors = lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END) / sector_size;
error_code = MFS_NO_ERROR;
}
Changes
if (error_code == -1 && errno == NIO_ENOTSUP)
{
//num_sectors = lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END) / sector_size;
/* lseek returns 32-bit value, so if > 4 GB we would have an issue */
num_sectors = _nio_lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END, &error) / sector_size;
error_code = MFS_NO_ERROR;
}
Explanation
lseek returns a 32-bit value (off_t) for my compiler. lseek is going to return the number of bytes, which when greater than 4 GB will overflow. (This might actually be 2 GB if its signed...) _nio_lseek returns a 64-bit number.
Changes are made to the functions MFS_Write_device_sectors and MFS_Read_device_sectors
Original
uint32_t attempts;
int32_t num, expect_num, seek_loc, shifter;
char *data_ptr;
_mfs_error error;
...
MFS_LOG(printf("MFS_Write_device_sectors %d %d\n", sector_number, sector_count));
if (sector_number > drive_ptr->MEGA_SECTORS)
{
return MFS_SECTOR_NOT_FOUND;
}
if (drive_ptr->BLOCK_MODE)
{
shifter = 0;
seek_loc = sector_number;
expect_num = sector_count;
}
else
{
shifter = drive_ptr->SECTOR_POWER;
seek_loc = sector_number << shifter;
expect_num = sector_count << shifter;
}
#if MQX_USE_IO_OLD
fseek(drive_ptr->DEV_FILE_PTR, seek_loc, IO_SEEK_SET);
#else
lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET);
//TODO: check errno lseek
#endif
Changes
uint32_t attempts;
int64_t seek_loc; /* Added for LSEEK issue */
int32_t expect_num; /* Can be 32-bit because its just a count of sectors */
int32_t num, shifter;
int nio_error; /* Added for LSEEK issue */
char *data_ptr;
_mfs_error error;
...
// MFS_LOG(printf("MFS_Write_device_sectors %d %d\n", sector_number, sector_count));
if (sector_number > drive_ptr->MEGA_SECTORS)
{
return MFS_SECTOR_NOT_FOUND;
}
if (drive_ptr->BLOCK_MODE)
{
shifter = 0;
seek_loc = sector_number;
expect_num = sector_count;
}
else
{
shifter = drive_ptr->SECTOR_POWER;
seek_loc = (int64_t)sector_number << shifter; /* Need to typecast to avoid overflow */
expect_num = sector_count << shifter;
}
#if MQX_USE_IO_OLD
fseek(drive_ptr->DEV_FILE_PTR, seek_loc, IO_SEEK_SET);
#else
// lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET);
_nio_lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET, &nio_error); /* off_t issue */
//TODO: check errno lseek
#endif
Explanation
Commented out MFS_LOG since it is not needed. Changed seek_loc to int64_t since it will be converted into a BYTE location from a SECTOR. Type-cast sector_number with an (int64_t) since it will be multiplied by 512 and will overflow if it stays at 32-bit. Changed call from lseek to _nio_lseek due to off_t issue.
These changes need to be applied to the _io_part_mgr_write and _io_part_mgr_read functions. (Note: Make sure to call read when editing the _mgr_read)
Original
uint64_t location;
uint64_t part_start;
uint64_t part_end;
int32_t result;
...
/* Perform seek and data transfer */
result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET);
if (result >= 0)
{
result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num);
}
Changes
int64_t location; /* _nio_lseek returns a signed 64-bit number. */
int64_t part_start;
int64_t part_end;
int32_t result;
...
/* Perform seek and data transfer */
// result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);
/* Return could be a byte address so put into location which is an int64_t else
* we will have an overflow issue.
*/
location = _nio_lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);
if (location >= 0)
{
result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num, error);
}
// JS: Return -1 if the seek fails
else
{
if(error)
{
*error = MFS_ERROR_SEEK;
}
result = -1;
}
Explanation
_nio_lseek returns a signed 64-bit number because a value less than 0 is an error. Changed location, part_start, part_start, and part_end to signed as well. My application will only address up to 16 GB, so the signed / unsigned will not impact it.
off_t is described by my compiler (GCC) as 32-bits. It was mentioned in another post that the comp.h file for an IAR project explicitly defines off_t as 64-bits signed. I tried redefining off_t in comp.h for the GCC project, but I ran into many issues and didn't feel like trying to solve them, fearing I could unintentionally break something else. My solution was to remove all calls to lseek and replace them with _nio_lseek.
comp.h can be found at \KSDK_1.3.0\rtos\mqx\mqx\source\psp\cortex_m\compiler\iar
The root cause of this issue is an overflow of a unsigned 32-bit number when trying to access the sector. It is confusing to follow through the functions because sometimes the LOCATION is referred to by a SECTOR, and sometimes it is referred to by BYTE.
In my SD Card, a sector is 512 bytes. In the function mfs_rw.c a sector location is converted into a byte location. This is accomplished by multiplying the byte location by the number of bytes in a sector. This is all done with 32-bit variables. This variable will overflow when the sector is 0x0080_0000. When multiplying by 512 the byte location becomes 0x0001_0000_0000. Since it is a 32-bit value it will be saved as 0x0000_0000 and writing to this location will corrupt the file system.
By the way, a sector location of 0x0080_0000 with 512 bytes per sector corresponds to 4 GB. When I tested with a 4 GB SD card I never saw the issue. We will be using 16 GB going forward.
The following changes pass the test I created as described in the original post:
Original
/* sector size is obtained during open operation and stored in the drive structure */
sector_size = drive_ptr->SECTOR_SIZE;
/* get the number of sectors */
error_code = ioctl(drive_ptr->DEV_FILE_PTR, IO_IOCTL_GET_NUM_SECTORS, &num_sectors);
#if !MQX_USE_IO_OLD
if (error_code == -1 && errno == NIO_ENOTSUP)
{
num_sectors = lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END) / sector_size;
error_code = MFS_NO_ERROR;
}
Changes
if (error_code == -1 && errno == NIO_ENOTSUP)
{
//num_sectors = lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END) / sector_size;
/* lseek returns 32-bit value, so if > 4 GB we would have an issue */
num_sectors = _nio_lseek(drive_ptr->DEV_FILE_PTR, 0, SEEK_END, &error) / sector_size;
error_code = MFS_NO_ERROR;
}
Explanation
lseek returns a 32-bit value (off_t) for my compiler. lseek is going to return the number of bytes, which when greater than 4 GB will overflow. (This might actually be 2 GB if its signed...) _nio_lseek returns a 64-bit number.
Changes are made to the functions MFS_Write_device_sectors and MFS_Read_device_sectors
Original
uint32_t attempts;
int32_t num, expect_num, seek_loc, shifter;
char *data_ptr;
_mfs_error error;
...
MFS_LOG(printf("MFS_Write_device_sectors %d %d\n", sector_number, sector_count));
if (sector_number > drive_ptr->MEGA_SECTORS)
{
return MFS_SECTOR_NOT_FOUND;
}
if (drive_ptr->BLOCK_MODE)
{
shifter = 0;
seek_loc = sector_number;
expect_num = sector_count;
}
else
{
shifter = drive_ptr->SECTOR_POWER;
seek_loc = sector_number << shifter;
expect_num = sector_count << shifter;
}
#if MQX_USE_IO_OLD
fseek(drive_ptr->DEV_FILE_PTR, seek_loc, IO_SEEK_SET);
#else
lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET);
//TODO: check errno lseek
#endif
Changes
uint32_t attempts;
int64_t seek_loc; /* Added for LSEEK issue */
int32_t expect_num; /* Can be 32-bit because its just a count of sectors */
int32_t num, shifter;
int nio_error; /* Added for LSEEK issue */
char *data_ptr;
_mfs_error error;
...
// MFS_LOG(printf("MFS_Write_device_sectors %d %d\n", sector_number, sector_count));
if (sector_number > drive_ptr->MEGA_SECTORS)
{
return MFS_SECTOR_NOT_FOUND;
}
if (drive_ptr->BLOCK_MODE)
{
shifter = 0;
seek_loc = sector_number;
expect_num = sector_count;
}
else
{
shifter = drive_ptr->SECTOR_POWER;
seek_loc = (int64_t)sector_number << shifter; /* Need to typecast to avoid overflow */
expect_num = sector_count << shifter;
}
#if MQX_USE_IO_OLD
fseek(drive_ptr->DEV_FILE_PTR, seek_loc, IO_SEEK_SET);
#else
// lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET);
_nio_lseek(drive_ptr->DEV_FILE_PTR, seek_loc, SEEK_SET, &nio_error); /* off_t issue */
//TODO: check errno lseek
#endif
Explanation
Commented out MFS_LOG since it is not needed. Changed seek_loc to int64_t since it will be converted into a BYTE location from a SECTOR. Type-cast sector_number with an (int64_t) since it will be multiplied by 512 and will overflow if it stays at 32-bit. Changed call from lseek to _nio_lseek due to off_t issue.
These changes need to be applied to the _io_part_mgr_write and _io_part_mgr_read functions. (Note: Make sure to call read when editing the _mgr_read)
Original
uint64_t location;
uint64_t part_start;
uint64_t part_end;
int32_t result;
...
/* Perform seek and data transfer */
result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET);
if (result >= 0)
{
result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num);
}
Changes
int64_t location; /* _nio_lseek returns a signed 64-bit number. */
int64_t part_start;
int64_t part_end;
int32_t result;
...
/* Perform seek and data transfer */
// result = lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);
/* Return could be a byte address so put into location which is an int64_t else
* we will have an overflow issue.
*/
location = _nio_lseek(pm_struct_ptr->DEV_FILE_PTR, location, SEEK_SET, error);
if (location >= 0)
{
result = write(pm_struct_ptr->DEV_FILE_PTR, data_ptr, num, error);
}
// JS: Return -1 if the seek fails
else
{
if(error)
{
*error = MFS_ERROR_SEEK;
}
result = -1;
}
Explanation
_nio_lseek returns a signed 64-bit number because a value less than 0 is an error. Changed location, part_start, part_start, and part_end to signed as well. My application will only address up to 16 GB, so the signed / unsigned will not impact it.
off_t is described by my compiler (GCC) as 32-bits. It was mentioned in another post that the comp.h file for an IAR project explicitly defines off_t as 64-bits signed. I tried redefining off_t in comp.h for the GCC project, but I ran into many issues and didn't feel like trying to solve them, fearing I could unintentionally break something else. My solution was to remove all calls to lseek and replace them with _nio_lseek.
comp.h can be found at \KSDK_1.3.0\rtos\mqx\mqx\source\psp\cortex_m\compiler\iar
Update: I am still trying to find the root cause of this issue. Interestingly, I am writing one character repeatedly to the file, 'A', which happens to be a value of 0x41. This is the value seen for the cluster number above.
As this error occurred again today, I noticed the error value was 0x3002 (MFS_FILE_NOT_FOUND) before the call to write.
How did this happen? I have a check to make sure the file descriptor is not < 0:
chk_error = _nio_close(index_fd, &nio_error);
if(chk_error < 0)
{
testfunction();
}
index_fd = _nio_open(myfilenames, O_RDWR | O_APPEND, &nio_error);
// Check that we can access the file descriptor
if(index_fd < 0)
{
testfunction();
}
I added logic to not only check if the file descriptor is OK, but also to check the nio_error value:
index_fd = _nio_open(myfilenames, O_RDWR | O_APPEND, &nio_error);
// Check that we can access the file descriptor
if(index_fd < 0 || nio_error != 0)
{
testfunction();
// _task_block();
}
And it turns out that the _nio_open function can return a valid file descriptor even when an error has occurred in the _nio_open function.
Has anyone else seen a case like this?
Hi jschepler
I would suggest you try this test with MQX 4.2 first. Because in KSDK 1.3, MQX & MFS & SDHC modues are ported from MQX 4.2. The only difference is there is a NIO layer in KSDK1.3, classic MQX 4.2 not.
I guess this issue maybe related with 32-bit overflow issue somewhere. for cluster?
Regards
Daniel