Lightweight semaphore corruption in USB stack

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Lightweight semaphore corruption in USB stack

838 Views
deadwoods
Contributor II

Good morning,

I'm attempting to integrate the MQX USB host stack (specifically the mass storage device class) onto our bespoke hardware -- currently using a Kinetis K61 part. The aim is to be able to read and write the filesystem on a USB stick. We're using MQX 3.8.0 with CodeWarrior 10.2, and the USB peripheral is the full-speed one, so I'm using the KHCI low-level driver.

We're not using MFS as the filesystem; we're using a third-party solution, but I'm attempting to use the USBMFS layer in the USB host stack, which (as far as I can tell) isn't dependent on the filesystem implementation, but provides a device to access a reasonably generic USB stick.

This all works well, up to a point. I can detect insertion and removal of the USB stick and I can read arbitrary sectors. However, after a few transactions, I'm finding that the USB stack falls over. This manifests itself as the COMMAND_DONE semaphore in the USBMFS layer returning from its wait without the semaphore being posted. I've verified that it isn't just a timeout occurring by waiting indefinitely. Upon further investigation, it appears that the semaphore is becoming corrupt (a call to _lwsem_test() returns the COMMAND_DONE semaphore). Interestingly, the semaphore doesn't appear to be corrupt prior to the first time the wait fails, but it does afterwards.

The semaphore appears to become corrupt consistently, given the same set of USB transactions. I was wondering if anyone had come across something similar, or if anyone knew of scenarios that can cause lightweight semaphores to become corrupt. It's rather difficult to debug without understanding why this can occur!

Thanks in advance for any help!

Labels (1)
0 Kudos
2 Replies

381 Views
c0170
Senior Contributor III

Hello deadwoods,

We do not have these experience.

However we recommend you to put HW data breakpoint on the semaphore structure to see where the semaphore is corrupted so you can find the code that accesses and overwrites the semaphore structure.

Regards,

MartinK

0 Kudos

381 Views
deadwoods
Contributor II

Thanks for the response. I got to the bottom of the problem -- in brief, we've implemented a rudimentary periodic scheduler. A periodic task is basically a while() loop that calls _task_block() at the end of the loop. A periodic timer fires when the task is next scheduled to run, and if the task is blocked, it calls _task_ready() to restart it. To check if the task is blocked (and thus restartable), the STATE element of the task descriptor is checked for the IS_BLOCKED bit.

I was accessing the USB stick from within one of these periodic tasks. What was happening was that while the USB stack is waiting for the COMMAND_DONE semaphore, the periodic task was being placed in the waiting queue, which caused the IS_BLOCKED bit to be set. The periodic timer fired and thus called _task_ready() which caused the wait to be interrupted prematurely.

The quick fix was to check for more than just the IS_BLOCKED bit to avoid readying tasks when they are in the waiting queue.

I've learned a lot about MQX in the last couple of days!