We have experienced a fatal usb error that occurs both on our own custom board, using a modified mainline Linux 3.10, and on the imx28-evk, using the BSP L2.6.35_1.1.0_130130_source.tar.gz.
On our custom board, we have seen it sometimes during loading of the usb driver, and consistently when changing the speed while the usb is in use. On the imx28-evk, in happens consistently when changing speed, and we can not rule out if it happens during load as well.
To recreate it, attach an USB-to-serial adapter and run the following two commands:
cat /dev/ttyUSB0 &
echo 360000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
fsl-ehci fsl-ehci.0: fatal error
fsl-ehci fsl-ehci.0: force halt; handshake c88fe144 00004000 00004000 -> -110
fsl-ehci fsl-ehci.0: HC died; cleaning up
usb 2-1: USB disconnect, address 3
pl2303 ttyUSB0: pl2303 converter now disconnected from ttyUSB0
pl2303 2-1:1.0: device disconnected
On mainline 3.10:
ci_hdrc ci_hdrc.1: fatal error
ci_hdrc ci_hdrc.1: fatal command 0010024 (park)=0 ithresh=1 Async period=512 HALT
ci_hdrc ci_hdrc.1: fatal status 8090 Async FATAL
ci_hdrc ci_hdrc.1: HC died; cleaning up
From what I can tell, the error is caused by bit 4 in HW_USBCTRL_USBSTS, SEI being set to 1 ( Linux/drivers/usb/host/ehci-hcd.c - Linux Cross Reference - Free Electrons ). Printing out the register (0x80090144) from the function, shows that the bit is set.
According to the Reference Manual, "This bit is not used in this implementation and will always be set to 0."
Has anyone else experienced this, and is the Reference Manual correct, or does a situation exist that will cause the bit to be set?
My guess is when the speed changes, it affects the USB clocking and the USB needs a reset. I guess you can try two things:
1. If SEI is really not used as the RM claims (I doubt so), you should be able to mask off SEE here and ignore the interrupt. See if the USB keeps working or if it dies. I'm guessing it'll die.
2. Alternatively, when you do get an SEI interrupt, reset the host controller after you halt it (maybe using ehci_reset?) Your application will probably terminate but you might be able to restart it by catching a signal.
Disabling the interrupt only postponed the problem, and using ehci_reset or other attempts at resetting it did not help either. It might be possible to recover the interface entirely that way, iwith enough work to determine what is needed, but I doubt that it would be possible without loosing the active data. As it should be possible to change the speed at any time, e.g. if using an automatic governor, this does not look like a useful solution.
I have determined that the error happens when the assembler function mxs_ram_freq_scale in emi.S is run, by checking the values of HW_USBCTRL_USBSTS and HW_USBCTRL_USBCMD just before and just after the call. By disabling the lines from where BM_DRAM_CTL17_SREFRESH is set, to it is cleared, the code can run without the problem. If the memory is put into self refresh and the emi controller is stopped, the problem occurs.
This still happens both on our own board with the modified mainline kernel, and on the imx28-evk, using the BSP, so it should be possible to reproduce.
Disabling the interrupt does actually improve the stability. The error appeared when I brought "cat /dev/ttyUSB0" to the foreground, after changing the speed a couple of times, but apparently it can handle a single change without problems so far.