ESDHC Timeout Error

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

ESDHC Timeout Error

1,612 Views
jschepler
Contributor III

I am using MQX for KSDK 1.3.0 on a K64F processor.  I am using the MFS file system and writing to an SD card.  I noticed a post mentioning there is a 32-bit / 64-bit issue and designed a simple test to evaluate the changes.  (See here) However, during this testing I noticed that writing to the SD card will fail because of a timeout.  I traced this timeout into esdhc.c and found it is the NIO_ETIMEDOUT error.  It seems to occur randomly.

My test code for the issue is the following:

1. Create a new text file.

2. Write 1 KB of data into the file.

3. After 1024 Writes (1 MB), close the file.

4. Open the same file, seek to the end.

5. Perform steps 2 - 4 until I write 1 GB to the file.

6. After 1 GB has been written, close the current file and create a new one. Repeat steps 2-4 until 8 files have been created. (We are using a 16 GB card)

The error occurs when I write to the file.  The section of code it fails on is the following in esdhc.c (~line 1795):

/* Wait for transfer to complete. Timeout depends on number of blocks. */
 if (_lwevent_wait_ticks(&esdhc_device_ptr->LWEVENT, (ESDHC_LWEVENT_TRANSFER_DONE | ESDHC_LWEVENT_TRANSFER_ERROR), FALSE, /*(esdhc_device_ptr->BUFFERED_CMD.BLOCKS > 1)? */ ESDHC_CMD12_TICK_TIMEOUT /*: ESDHC_CMD_TICK_TIMEOUT*/) != MQX_OK)
 {
   if (error)
   {
     *error = NIO_ETIMEDOUT;
   }
   return -1;
 }

The event timeout value used is ESDHC_CMD12_TICK_TIMEOUT.  I found the following definition in esdhc_prv.h:

#define ESDHC_CMD_TICK_TIMEOUT 20 // 40ms?
#define ESDHC_CMD12_TICK_TIMEOUT 200 //500ms
#define ESDHC_TRANSFER_TIMEOUT_MS 750 //750ms

I am confused on the comments of these defines.  A tick timeout of 200 would only be 500 ms if the tick time was set to 2.5 ms.  I think the default tick time for the K64F processor is 5 ms.  So 200 * 5 = 1 second timeout.  I have re-defined the tick time for my application to be 1 ms.  So the timeout value is actually 200 ms.

The data is setup to transfer via ADMA a couple of lines before we wait for it to complete.  I added some code in this area to provide feedback on the amount of time it takes to transfer the data to the SD card.  I found that ~ every 400 writes to the SD card it would take > 10 ms for the write to complete. All other transfers took less than 10 ms.

However, occasionally it takes longer than 200 ms to complete the transfer, and the write fails.  I have seen this happen when writing the 87th MB to the file all the way up to the 500th MB.  I have yet to be able to complete all writes to the first file without this timeout occurring.

  

How can there be so much variability in the time it takes to transfer the data?

How was the value of the CMD12 tick timeout assigned?

What should the timeout value, in milliseconds, be for the CMD12 and CMD timeouts?

Labels (1)
0 Kudos
3 Replies

1,164 Views
jschepler
Contributor III

Our hardware uses an SDCard that is inserted into the board at production time and will never be taken out.  (It is inside an enclosure and not visible to the customer)

Each board uses the same manufacturer and part number for the SD card.  Knowing this, I looked up the data sheet for the part number and did not see any information related to timeouts.  I then contacted the manufacturer and asked them to provide me with the specification document.  I explained that I was writing embedded software which must interface with the card and I needed to know timeout values.  I am now waiting for them to send me the document.

I searched online for example SDHC spec documents and found one.  In the document, they have a table that defines suggested maximum timeout values for waiting for a command response, reading data after issuing a command, and a busy status change.  I expect the document provided by the manufacturer for the SD cards we use to provide the same information.

I also will change the #defines for timeouts to be dependent on the BSP_ALARM_FREQ.  I do not understand how the timeouts were allowed to be defined without dependence on this value.  


The SD manufacturer should provide this timeout in seconds or milliseconds, so I will create two additional #defines:

#define ESDHC_CMD_DATASHT_TIMEOUT_MS     40            // Desired timeout in ms
#define ESDHC_CMD12_DATASHT_TIMEOUT_MS   1000          // Desired timeout in ms‍‍

I will then calculate the number of milliseconds per tick based on BSP_ALARM_FREQ, which is defined in bsp_config.h:

#define ESDHC_MS_PER_TICK   ((1000) / BSP_ALARM_FREQ)

I will then change the TICK timeout defines to be dependent upon the tick time:

#define ESDHC_CMD_TICK_TIMEOUT    ((ESDHC_CMD_DATSHT_TIMEOUT_MS) / ESDHC_MS_PER_TICK)
#define ESDHC_CMD12_TICK_TIMEOUT  ((ESDHC_CMD12_DATASHT_TIMEOUT_MS) / ESDHC_MS_PER_TICK)‍‍‍‍‍‍
0 Kudos

1,164 Views
danielchen
NXP TechSupport
NXP TechSupport

Hi  jschepler:

 _lwevent_wait_ticks is a block function, which puts the calling task to the timeout queue. When timeout exceeds for the task (measured by the BSP tick timer, 5ms period timer interrupt)  the task is put back to ready queue. Then RTOS scheduler runs and selects next active task to run depending on scheduling rules.   It depends on what are other ready tasks at the moment and their priorities and what interrupts are running at the moment.

 _lwevent_wait_ticks is specified to gurantee minimum time, not maximum. Maximum depends on the application. That is normal RTOS behaviour.

Regarding how to assign the right tick timeout value, I would suggest you check the sd card spec. You can modify the macros according to your requirements.


Have a great day,
Daniel

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer or helpful. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

1,164 Views
jschepler
Contributor III

Hi Daniel,

From reading the MQX reference manual, user's guide, and comments within the code, the lwevent waits for the events to be set, OR the timeout based on the tick value.  If a tick of 0 is give, the task will wait indefinitely for the lwevent bits to be set.  So, the code is waiting for the event bits to be set, OR the timeout in ticks specified by ESDHC_CMD12_TIMEOUT_TICKS.  This time is the MAXIMUM number of ticks to wait for the event bits to be set.

If a timeout does occur, _lwevent_wait...() returns LWEVENT_WAIT_TIMEOUT.  

If the specified lw_event bits were set before the timeout, the _lwevent_wait...() function will return MQX_OK.

From lwevent.c:

/*!
 * \brief Used by a task to wait for the event for the number of ticks.
 *
 * \param[in] event_ptr Pointer to the lightweight event.
 * \param[in] bit_mask Bit mask. Each set bit represents an event bit to wait for.
 * \param[in] all TRUE (wait for all bits in bit_mask to be set),
 * FALSE (wait for any bit in bit_mask to be set).
 * \param[in] timeout_in_ticks The maximum number of ticks to wait for the events
 * to be set. If the value is 0, then the timeout will be infinite.
 *
 * \return MQX_OK
 * \return LWEVENT_WAIT_TIMEOUT (The time elapsed before an event signalled.)
 * \return MQX_LWEVENT_INVALID (Lightweight event is no longer valid or was never
 * valid.)
 * \return MQX_CANNOT_CALL_FUNCTION_FROM_ISR (Function cannot be called from an ISR.)
 *
 * \see _lwevent_create
 * \see _lwevent_destroy
 * \see _lwevent_set
 * \see _lwevent_set_auto_clear
 * \see _lwevent_clear
 * \see _lwevent_wait_for
 * \see _lwevent_wait_until
 * \see _lwevent_get_signalled
 * \see LWEVENT_STRUCT
 */

I would like to know how the timeout values were chosen when the function was developed.  What SD card spec was used to determine these timeouts?  I am also failing to understand the commented values of the original timeouts, since the tick time default of 5 ms would not provide the timeout specified by the comment. 

Can you show me an example of an SD specification that would give me enough information to choose these timeout values?

0 Kudos