AnsweredAssumed Answered

i.MX7D USB HSIC Hub (USB3503) enumeration variability

Question asked by Bill Gessaman on Jun 5, 2017
Latest reply on Jun 19, 2017 by Mark Ruthenbeck

Hardware / Software Summary:

  • Custom i.MX7D board based on MCIMX7SABRE board
  • Silicon rev 1.2 (MCIMX7D5EVM10SC)
  • BSP L4.1.15-1.0.0-GA primarily but also tested with L4.1.15-2.0.0-GA with difference seen
  • Microchip USB3503T-I/ML 3-port USB Hub with HSIC interface
  • Variation in cold boot / reboot enumeration of USB Hub
  • Two out of 50 systems *always* fail to enumerate USB Hub on cold boot / reboot, but those two systems *always* enumerate USB Hub properly on resume from suspend-to-RAM state (either PMIC standby or LPSR mode).

 

Our application required more USB Host ports than were implemented by the i.MX7 SABRE board, so the Microchip USB3503 Hub was included to add the two additional ports that we needed. The guidelines for hardware design for both the i.MX7D and the USB3503 were taken into account and the critical routing of the DATA and STROBE signals was implemented to be less than 1" with their length matched to within just a little more than 5 mils. Most boards work correctly 99% of the time, but once in a while I have seen a system fail to enumerate the USB Hub. In these cases, a reboot (equivalent to a cold boot because the PMIC is forced off then turned back on again) will result in the USB Hub enumerating correctly.

 

This was true until I discovered two boards that fail 99.9% of the time on a cold boot / reboot. With these boards, I actually have something to try to find the root cause. First the high level visibility of the issue in the console serial output is:

ci_hdrc ci_hdrc.2: EHCI Host Controller
ci_hdrc ci_hdrc.2: new USB bus registered, assigned bus number 2
ci_hdrc ci_hdrc.2: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: EHCI Host Controller
usb usb2: Manufacturer: Linux 4.1.15+ ehci_hcd
usb usb2: SerialNumber: ci_hdrc.2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 1 port detected
usb 2-1: new high-speed USB device number 2 using ci_hdrc
usb 2-1: device no response, device descriptor read/64, error -71
usb 2-1: device no response, device descriptor read/64, error -71
usb 2-1: new high-speed USB device number 3 using ci_hdrc
usb 2-1: device no response, device descriptor read/64, error -71

.......

A normal system with a USB Flash drive on a USB Hub downstream port looks like this:

ci_hdrc ci_hdrc.2: EHCI Host Controller
ci_hdrc ci_hdrc.2: new USB bus registered, assigned bus number 2
ci_hdrc ci_hdrc.2: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: EHCI Host Controller
usb usb2: Manufacturer: Linux 4.1.15+ ehci_hcd
usb usb2: SerialNumber: ci_hdrc.2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 1 port detected
usb 2-1: new high-speed USB device number 2 using ci_hdrc
usb 2-1: New USB device found, idVendor=0424, idProduct=3503
usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
hub 2-1:1.0: USB hub found
hub 2-1:1.0: 2 ports detected
usb 2-1.2: new high-speed USB device number 3 using ci_hdrc
usb 2-1.2: New USB device found, idVendor=0781, idProduct=5595
usb 2-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-1.2: Product: Ultra USB 3.0
usb 2-1.2: Manufacturer: SanDisk
usb 2-1.2: SerialNumber: 4C531001641115101474
usb-storage 2-1.2:1.0: USB Mass Storage device detected
.......

Over the last 2 to 3 weeks I have taken the following steps to isolate the cause of the problem:

  1. Verified that the accuracy and jitter of the 24 MHz reference clock from the i.MX7 to the USB3503 is within the specifications for the USB3503.
  2. Verified all related power supplies on both i.MX7 and USB3503.
  3. Verified related signal levels and general timing relationships against a known "good" board.
  4. Technician inspected USB3503 solder quality on both boards that always fail, then both Hubs were reflowed to make sure there were no bad solder joints.  HINT: When these boards were booted very soon after reflow, they both worked correctly!  As they cooled off to room temp again, they returned to always failing.  Further experimentation with heat / cold showed a strong thermal relationship between working / failing respectively.
  5. New USB3503T-I/ML parts were purchased and the Hubs on both "bad" boards were replaced.  There was no change in behavior and they continued to be sensitive to temperature in the same way.
  6. I then started trying to understand what would trigger the "error -71" which is an EPROTO error in the Linux kernel.  I'm not sure why I tried to put a "bad" system into suspend-to-RAM and then resume, but this lead to the revelation that the USB Hub *always* enumerates properly during resume!  I have tried this with Linux configured to do a powered "standby" on the USB Hub, and I have also used LPSR mode to suspend /resume which turns off all power supplies to the USB Hub.  Both work equally well on the "bad" boards to re-initialize the USB Host interface to the USB3503!
  7. I chased the source of the "error -71" to lower levels of the USB drivers and found that the Chipidea HDRC driver sees the following error and retries 32 times before returning the -EPROTO error.
    ci_hdrc ci_hdrc.2: detected XactErr len 0/8 retry 1
  8. Looking deeper into ehci_hcd.c and getting some visibility on what ehci register read / write operations are being done, I found that a "bad board" is detecting that the UEI bit of the USB2_USBSTS register is being set to indicate that an USB error interrupt has been detected. The following debug output (non-standard code added by me) shows the first few register read / write operations when it tries to enumerate the USB3503 Hub.
    On a "bad board":

    usb 2-1: new high-speed USB device number 2 using ci_hdrc
    ehci_readl: 0xf5b30144 = 0x00000080
    ehci_writel: 0xf5b30140 = 0x00010b25
    ehci_readl: 0xf5b30140 = 0x00010b25
    ehci_readl: 0xf5b30144 = 0x00008082           <<-- read of USB2_USBSTS with UEI bit set
    ehci_writel: 0xf5b30144 = 0x00000002           <<-- write of USB2_USBSTS to clear UEI bit
    ehci_readl: 0xf5b30140 = 0x00010b25
    ci_hdrc ci_hdrc.2: detected XactErr len 0/8 retry 1
    ehci_readl: 0xf5b30144 = 0x00008082
    ehci_writel: 0xf5b30144 = 0x00000002
    ehci_readl: 0xf5b30140 = 0x00010b25

    On a normally good board or during a resume from suspend:

    usb 2-1: new high-speed USB device number 6 using ci_hdrc
    ehci_readl: 0xf5b30144 = 0x00000080
    ehci_writel: 0xf5b30140 = 0x00010b25
    ehci_readl: 0xf5b30140 = 0x00010b25
    ehci_readl: 0xf5b30144 = 0x00048081
    ehci_writel: 0xf5b30144 = 0x00000001
    ehci_readl: 0xf5b30140 = 0x00010b25
    usb 2-1: usb_start_wait_urb length=18, retval=0
    ehci_readl: 0xf5b30184 = 0x0a001205
    ehci_writel: 0xf5b30184 = 0x0a001301
    ehci_readl: 0xf5b30140 = 0x00010b25
    usb usb2: usb_start_wait_urb length=0, retval=0

 

At this point, I've run out of places to look for new information so I'm stuck and need someones deep insight into what can trigger this error and particularly why it only happens on a cold boot / reboot.  What is different for a suspend / resume than what is done on a cold boot / reboot?  Why does the interface to the USB3503 work flawlessly if you can get past the initial enumeration, but that part is variable from board to board and between boot cycles?

 

Thanks,

Bill Gessaman

Outcomes