USB Attach Failure i.MX28 Windows CE 6.0 Memory Stick

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

USB Attach Failure i.MX28 Windows CE 6.0 Memory Stick

Jump to solution
8,325 Views
markwilliams
Senior Contributor I

Hi all,

We have been battling issues with USB memory stick support for some time with the i.MX28 Windows CE BSP. We have implemented a number of 'fixes' ranging from Microsoft updates, Errata Work-arounds and general Windows CE fixes from other BSPs or user experiences found online and in these forums.

We have an issue where after several insertions of a memory stick the USB port appears to lock up and on the next device attachment the 'AttachDevice' process fails at DEVICE_CONFIG_STATUS_SCHEDULING_GET_DEVICE_DESCRIPTOR_TEST. This is actually the first point in the attach process that an attempt is made to communicate with the device and uses ENPOINT 0 on a control pipe created during earlier stages of the AttachProcess.

Once the port has failed it no longer responds to any other USB device and fails at this same point every time. The pipe has the halt flag set and is non-functional. The only way to recover seems to be to power cycle the unit.

We have found forum posts that recommend increasing the delay from ResetAndEnablePort (the previous step) and sending any data. By default in the BSP this is 10ms but some devices may not be ready to communicate at this point and still coming out of reset. We have tried increasing to 100ms and the results initially seemed promising but we could eventually get it to fail again after repeated insertions.

We have used a USB logger to see what happens on the bus through several insertions to failure. For each successful attach and detach we see no bus errors and can decode the SCSI packets. The last successful transfer on removal was the last SCSI Test Unit Ready packet which is sent to poll for disk removal.

When we insert the disk and there is a failure we just see babbled data out of the USB port. There isn't a single valid packet but random data bytes being received by the analyser following device reset. The analyser cannot tell if the memory stick is babbling or the i.MX28 processor as it is just receiving random bytes constantly.

I have attached the last successful transfer below (note that NAKs and SOF are hidden). The memory stick was unplugged in the middle of this screenshot which is why the analyser detected suspend (no bus activity). The Reset following the suspend is when the stick was re-inserted. You can see that following this we just have babbled data. The bytes are random values.

USB1.png

On a successful remove / attach we see that after the suspend there are two resets and we have the initial 'get descriptor test' followed by the address set and the rest of the successful enumeration.

USB3.png

I have been adding debug messages at various points to try and figure out what could be going on. I am not sure why the USB port has the random data on it. Perhaps there is a corrupt qHead/qTD that is just running through memory sending out random data bytes but the data is not in any of the proper transfer types (SETUP/IN/OUT).

The transfer queue should be torn down on removal and cleared out and then recreated on the first transfer on attach.

Has anyone seen anything similar or could offer some debugging advice?

Thank you in advance, Mark

Labels (2)
1 Solution
6,027 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

The issue can be closed

View solution in original post

0 Kudos
Reply
29 Replies
788 Views
karina_valencia
NXP Apps Support
NXP Apps Support

Deactivated user​ do you have an update of this case?

0 Kudos
Reply
788 Views
Radist
Contributor II

Hi Mark,

I have some experience with iMX28 and USB and i also had an issues with USB stick at iMX28

please answer on next questions:

1) How many different kind of sticks have you tried to connect to the board? By USB standard, USB memory stick should have crystal with frequency stability better than 50ppm. Some time ago we found, that very cheap USB sticks without crystal (only with RC) works very unstable.

2) Is it custom board with iMX28? Do you have screenshot of USB traces from MCU to USB port? USB should be routed according to the standard, otherwise you will see some strange issues, that you will never fix with SW.

3) How many PCBs have you made and how many has this issue? At some of our devices, that now at mass production, we found that BGA and crystal was mounted not so good, and it was additional capacitance between iMX28 package and PCB. As a result - frequency deviation from 24MHz was more than 150ppm. I recommend you to check, that you have frequency at crystal inside the range 24MHz +/-50ppm (you can do that with spectrum analyzer or with pulse counter. Oscilloscope usually will not have needed accuracy).

4) Can you provide SCH for USB host port?

--

Regards,

Sergii

0 Kudos
Reply
788 Views
markwilliams
Senior Contributor I

Sergii,

Thanks for your reply and the pointers above. Its worth mentioning that all of these tests are carried out limited to full speed.

A long time ago when I first started working with the iMX28 (2012) I had issues with high speed USB on memory sticks despite signal integrity measurements showing great signal quality. I have since applied numerous microsoft driver fixes around the SCSI transfers but the one that made high speed reliable (at the time) was the iMX28 errata that recommends the arm swap instruction for register writes. That being said as I was unable to perform enough testing I carried on with the port limited to full speed.

Also the issue only occurs on some memory sticks during attach and that is only on a few. The first transfer fails and the controller is babbling data out of the USB port so I am inclined to think that there is something in the way the transfer descriptors are set up that is causing corruption and the controller to output random data. Previous insertions and data transfers do not have a single error. But of course I am open to look at any suggestions.

With regards to the trace routing this has been carefully done with controlled impedance lines, length matched from processor to connector in less than 50mm and in accordance with a few different USB guideline documents. There is a USB ESD diode (low enough capacitance for USB high speed) but we already tried removing that in very early tests just in case. Anyway at full-speed limited USB the signal integrity issues are really unlikely to play a part.

Even with the port set to high speed I have performed USB signal integrity tests with a low capacitance differential probe using the USB test patters. The eye is nice open and clean.

I have also run multiple USB attach and transfer tests with a protocol analyser without any error conditions across the bus on successful attach. So this seems to be something relating to the driver being left in a bad state from the last detach, or not being configured correctly on the next attach. There are a couple of chip errata around this.

We have shipped over 5000 boards using this processor and memory layout. There are two boards with different USB routings - one with a 3-port hub and the other with a single direct host port.

I have bought 20 different memory sticks ranging from no-brand vendors up to expensive branded parts with various capacities. Some memory sticks can produce this quicker than others.

Another thought is that the 10ms following port ResetAndEnable might not be enough and the subsequent first endpoint 0 transfer may fail as the device is not ready. If there is not a mechanism to handle this first transfer failing then it could be an issue. There certainly was no mechanism to retry from a control pipe failure until I found some additional code that was added to the iMX27 BSP and ported it across to the iMX28. All this did was allow three attempts at the control transfer but still continuously failed from that point.

Thank you for the oscillator pointers. I will have to check this spec again (I chose the crystal four years ago) but I recall it being a low PPM part with high speed USB reliability in mind.

Mark

0 Kudos
Reply
788 Views
Radist
Contributor II

Hi Mark,

i can see you made deep research there.

I am not expert in USB from SW point of view, so can't say nothing about ResetAndEnable  issue.

Regarding crystal - we also had 20ppm crystal, but due to bad mounting and some issues, frequency at the crystal was 24MHz + 150ppm.

Try to heat your MCU with air solder station (to have +70....+80C at MCU) and check will you have this issue, and repeat test with some freeze cooling spray. It could be interesting to see results.

Also, if you have an issue only after inserting, - what is going on with VBUS power at this moment? Please check that you have stable +5V without any drops and peaks just after insertion. Maybe you have big capacitor at the USB port and it need more than 10ms for charging? So, in that case you will have stable +5V just after plug in.

As i understand you have USB port power switch and you are controlling it from MCU? Can you try to apply +5V directly to USB host port without any switchers?

Sergii

0 Kudos
Reply
788 Views
markwilliams
Senior Contributor I

Hi Sergii,

Thanks for the advice on the crystal.

With regards to power dips we have analysed this. We have a single USB current limited switch (500mA) on the port and the recommended 150uF capacitance to minimise droops on the host PSU when a device is plugged in and needs to charge its internal capacitance.

The 5V supply holds up well - there is brief ns of glitch on the 5V rail at the connector as the device is connected but not on the system 5V rail - this is likely due to the power supply ferrite beads and inability to respond to the instantaneous current required when plugging in a new device. The 5V rail as the device sees it just comes up quickly on attach as expected on a hot plug. There is no after-effect for example a trigger of the power switch over-current circuitry.

I am not sure why that would cause the controller to just output rubbish data. Once in this state any USB device plugged in is detected but fails when the AttachDevice function tries to schedule its first transfer. If I log any future attachment once in this state then after the port reset there is just random data on the bus - not even SOF packets. At least nothing my protocol analyser can hook onto and decode.

Mark

0 Kudos
Reply
788 Views
markwilliams
Senior Contributor I

I am looking to see if I can output the QHead and Qtd list for the asynchronous schedule when the port is in the stuck state. The idea behind this to see if there is invalid pointers in the transfer schedule. At the moment I am not 100% sure where to do this - I need to fully understand the Microsoft driver structures to know what to output.

I am not able to run in a debugger connected to VS so using release builds and debug messages. The debug builds are also too bloated and slow - the timings can affect what we are trying to test.

Mark

0 Kudos
Reply
788 Views
markwilliams
Senior Contributor I

Some more examples of the random data on the port on failure. Note if we disconnect and reconnect this still persists.

uab4.png

0 Kudos
Reply
788 Views
Yuri
NXP Employee
NXP Employee

Hello,

  Please look at my comments below.


1.

  As first step, please try to test DRAM memory.

You may  use the memory test, provided in Community.

"i.MX28 DDR stress test"

< https://community.freescale.com/message/375263#375263 >

2.

  Below is possible new i.MX28 erratum.ERR006308: USB: HOST controller lock-up issue.

Description:

The USB host controller can lock-up when a FIFO under run occurs on a non-32-bit aligned

data buffer. This applies to both the Host controller and OTG controller in host mode.


Workaround:


  2.1. Set Stream Disable bit (SDIS) in the USBMODE register. This will force the controller to

load an entire packet in the FIFO before starting to transmit on the USB bus. Hence, the

FIFO will never underrun. This will somewhat reduce the max bandwidth of the USB since

there will be idle time as the the controller waits for the entire packet to be loaded.

   2.2. Instead of setting SDIS, the FIFO threshold can be increased such that more data will be

in the FIFO before a packet transmit is started. This increases the tolerance to bus latency

and avoid FIFO under run. The Threshold can be increased by using higher values for the

TXTHRESHOLD filed in the TXFILLTUNING register. The default value is 2 bursts (64 bytes

if burst size=8).

3.

You may try to decrease USB speed (full-speed) as described in

< https://community.freescale.com/thread/307678 >

If this helps, the issue may be caused by hardware problems, in particular as following.

4.

  USB functionality is critical regarding clock accuracy, please try to exchange

24 MHz oscillator.  


Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply
788 Views
markwilliams
Senior Contributor I

Yuri,

Thank you for your reply above. I have replied with some more detail to Sergii below. The post regarding limiting the port to full speed was mine from quite a few years ago! These issues are happening on attach for a small number of devices on random insertions limited to full speed. If the disk does mount then USB transfers are reliable and I do not see errors on my protocol analyser. If there was an underlying clock or signal integrity issue (even at full speed) then I would expect to see errors in other places and not just a failed attach.

Also once in this state the port is just sending out random data so it does seem to look like something has happened to the controller or its transfer queue pointers to make it babble. I am open to all suggestions though.

With regards to the DDR stress test, this is something that I would like to do independent of the USB issues anyway. I would need to rebuild for my DDR capacity and timings as I am not using the same part as the reference design. I will have a read of the posts - hopefully they explain how to go about rebuilding the test.

With regards to the errata this must be a hidden NXP internal errata! The current errata document for customers is V2 from 2012 and doesn't mention this. Is there somewhere else I need to look for errata for the iMX28 in case I am missing anything else?

I have heard the same USB controller (Chipidea?) is used on other processors but I am not sure if the same errata applies across the iMX range.

I will have a read and test the errata fix suggestion.

Thanks, Mark

0 Kudos
Reply