We are investigating a strange USB hang with linux 4.1.38-fslc (branch 4.1-1.0.x-imx): It was first found by sending EMI to a usb cable between an imx6q based product and a USB LTE modem. We then discovered we could reproduce this by simply shorting the data lanes of the USB cable for a short period of time (a couple seconds).
As it is simpler to do, I'll focus on the data lanes short method:
- Communicate with a USB device on the USB host port of the imx6q (tried with a LTE modem and a simple USB drive, the key seems to be that there is some ongoing communication)
- Short the USB data lines on the cable connecting the imx6q and the device (white & green wires)
Most of the time nothing happens in dmesg / syslog, sometimes with the LTE modem I see this: `[ 63.428507] option1 ttyUSB1: option_instat_callback: error -71` and the USB port seems just dead. The devices are not removed from /dev, they are simply not answering at all.
If I unplug/replug the device, nothing happens either. I can short VBUS to ground to trigger an overcurrent condition so that part of the USB stack is reinitialized and then it detects the device is gone and some device has just been plugged but it still fails to enumerate:
[ 512.041415] usb 2-1: new full-speed USB device number 6 using ci_hdrc
[ 522.501370] usb 2-1: device not accepting address 6, error -110
[ 522.621472] usb 2-1: new high-speed USB device number 7 using ci_hdrc
[ 533.081380] usb 2-1: device not accepting address 7, error -110
[ 533.087572] usb usb2-port1: unable to enumerate USB device
After enabling some EHCI debugging in the kernel, when I short the data lanes I see a lot of those messages:
[ 379.546153] ci_hdrc ci_hdrc.1: detected XactErr len 0/4096 retry 16
[ 379.552440] ci_hdrc ci_hdrc.1: detected XactErr len 0/1 retry 17
[ 379.558458] ci_hdrc ci_hdrc.1: detected XactErr len 0/4096 retry 16
[ 379.564735] ci_hdrc ci_hdrc.1: detected XactErr len 0/4096 retry 17
[ 379.571013] ci_hdrc ci_hdrc.1: detected XactErr len 0/10 retry 3
[ 379.577056] ci_hdrc ci_hdrc.1: detected XactErr len 0/1 retry 18
[ 379.583075] ci_hdrc ci_hdrc.1: detected XactErr len 0/4096 retry 17
[ 379.589353] ci_hdrc ci_hdrc.1: detected XactErr len 0/4096 retry 18
[ 379.595645] ci_hdrc ci_hdrc.1: detected XactErr len 0/1 retry 19
[ 379.601664] ci_hdrc ci_hdrc.1: detected XactErr len 0/4096 retry 18
If I keep the data lanes shorted, at some point this flow of error stops. It seems to me the EHCI controller is indeed seeing the errors but somehow hangs at some point.
Is this something anyone has ever seen ? Can this be fixed or recovered from ? I've tried unbinding/rebinding usb2 and it does not allow to recover from that state. If I soft reboot the system (`reboot -n -f` from the command line) then it recovers, devices are enumerated and working properly. One thing to note is that this command takes much longer than usual when USB is hung (usual is instant reboot, with USB hung it is more like 30 secs).