We have noticed USB enumeration issues when connecting the i.MX 8QM or i.MX 8QXP in the serial downloader mode to a host PC. Sometimes it takes several seconds and multiple attempts until the USB port is enumerated. We see the issue only with Linux host PCs, but not with Windows hosts. When the issue occurs, we see the following messages in the console of the Linux host:
[ 3124.693526] usb 1-1.1: new high-speed USB device number 57 using xhci-hcd
[ 3124.777880] usb 1-1.1: Device not responding to setup address.
[ 3124.989590] usb 1-1.1: Device not responding to setup address.
[ 3125.201533] usb 1-1.1: device not accepting address 57, error -71
[ 3125.493249] usb 1-1.1: new high-speed USB device number 58 using xhci-hcd
[ 3125.598282] usb 1-1.1: device descriptor read/all, error -71
[ 3125.604069] usb 1-1-port1: attempt power cycle
[ 3126.209538] usb 1-1.1: new high-speed USB device number 59 using xhci-hcd
[ 3126.299633] hid-generic 0003:1FC9:0129.001E: hidraw0: USB HID v1.10 Device [NXP SemiConductor Inc SE Blank 8QM ] on usb-xhci-hcd.1.auto-1.1/input0
We have seen the issue on our own board, as well as on the NXP reference board MCIMX8QM-CPU (SCH-29420 REV C2). We have noticed the same behavior also with i.MX 8QXP based boards.
For reproducing the issue on the MEK board, I but the strapping to "SERIAL BOOT" and use a USB-A to USB-C cable for plugging it to a Linux host PC. The issue appears more often when using the reset button. But we have seen it also when powering up the board.
I have used a Total Phase Beagle USB 480 analyzer for checking the enumeration progress (see attached files). Interestingly, the communication does not always fail at the same stage of the enumeration. Sometimes I just see CRC errors, or I get a STALL message.
Normally, after a few retries, the bus gets enumerated. It is quite annoying, and we would like to know why the boards have so much trouble getting enumerated in Serial Download mode.
。。。
The issue cannot be closed, we still need to understand what is causing the issue and how it can be mitigated.
Internal team is considering how to handle this case.
@igorpadykov can you provide an update on this?
for that issue I did not receive additional updates from internal team, sorry.
Best regards
igor
I got additional answer from team:
---------------------
Regarding the root cause of this issue, please see the following description:
On power-up, a boot monitor timer is initialized. During USB enumeration in serial download mode, Host side may enumerate multiple times until enumeration succeeds, or exceeding the maximum number of times which causes enumeration failure eventually. Under corner conditions, ROM code may not get chance to refresh boot monitor timer due to USB host behavior and causes device system reset.
Generally to change a different host is most effective way to avoid this USB enumeration failure. In future, we will consider this case and refresh boot monitor timer to avoid system reset.
---------------------
Best regards
igor
Hi Igor
Thank you very much. Could you please make sure this gets properly documented in the errata document so we may refer to this when asked by our customers about such limitation. Thanks!
Cheers
Marcel
In the meantime, we did automated tests for the recovery mode. With the test setup, we can power cycle the MEK board or our own computer module based on the i.MX 8QM and test how many attempts it takes for enumerating the USB in the serial loader. I did this test with two different Linux hosts. For each combination, I did around 3000 power cycles. Here an overview of the test results:
The interesting part is that with our own hardware, only 66% of the time, the enumeration is successful on the first attempt, while on the MEK it is 83%. On the MEK, a maximum of two retries is required for a successful enumeration. The connection can always be established. On our own board, much more retries are required, and in 1.4% of the cases, the enumeration fails completely.
Can you please help us understand why we have a different success rate? Do you have any idea what could cause the different behavior? Could it be related to a difference in the PMIC power-up sequence timing?
You mentioned that there are CRC errors that lead to a failing enumeration. Do you have a further background on this? Why is the CRC failing?
We are really concerned about the 1.4% of the cases the enumeration is completely failing. This will cause massive issues during production testing if we cannot rely on a USB enumeration in serial loader mode.
Thank you in advance for your help.
-------------------------
ROM team got some finding, and debug on this issue, we can wait for rom team.
-------------------------
----------------------------
I want customer do below code change of Linux PC kernel source, and rebuild Linux PC kernel , re-do their test, and also show me the whole dmesg log which got from their test.
The code change at drivers/usb/core/hub.c, function hub_port_init,
for (retries = 0; retries < GET_DESCRIPTOR_TRIES; (++retries, msleep(100))) {
bool did_new_scheme = false;
if (0) { //use_new_scheme(udev, retry_counter, port_dev)) {
}
Above change is like use_new_scheme(){ NOT RUN code in this }
----------------------------
Hi @igorpadykov
We changed the Linux kernel on the host PC and run the automated test again for several thousand cycles. The results are quite interesting:
The success rate for enumerating the USB the first time increased dramatically on the MEK and our own hardware. On the MEK, the enumeration was successful at 99.46% of the time. We had only 21 times out of more than 3000 cycles that it required one retry. It now never needed more than one retry for enumerating the MEK board.
While our own hardware also had an increased success rate of enumerating the USB without a retry, the cases in which it was not able to enumerate at all massively increased from 1.43% to 7.37%. This is quite alarming. Do you have an idea why the behavior of our own hardware could be different? Could it be a difference in the voltage rail power-up sequence or the routing of the USB signals?
Please find attached some examples of dmesg outputs of the host PC. These are the complete dmesg we get during a test cycle. We do not reboot the host PC during the tests. Therefore, sending the dmesg of the kernel boot does not make much sense. Please let me know if you would like to see different dmesg outputs.
from team
----------------------------
While I am still waiting for ROM team.
From customer restult , Linux host 1 no any issue, what is the Linux host 1 hardware and OS version?
And what is the Linux host 2 hardware and OS version?
And customer seems doubt their board "voltage rail power-up sequence or the routing of the USB signals" , if customer could share related hardware signal waveform or related file, may be we could involve hardware team member for review what they shared.
----------------------------
Hi @igorpadykov
Since it was easier for the automated test setup, we have been using our own hardware as host computers.
We have seen the same issues also with two laptop PC of our software developers. However, since they need their PC for work, we did not do automated test cycles on these machines:
Hi Peter
>We see the issue only with Linux host PCs, but not with Windows hosts.
seems issue is not caused by i.MX8 processor but Linux computer. One can try several Linux
computers and check if quality usb cables used in the case.
Best regards
igor
Hi @igorpadykov ,
Thank you for your answer. In the meantime, we did further tests with other host PC. We have seen some dependency on the used host PC. On some Linux computers, the issue appears more often than on others. It also happens more often if the i.MX 8QM is plugged in behind a USB hub, but it also happens when we plug in the board directly to the computer's USB port.
We can reproduce the issue easier if the USB cable is plugged in after the board is powered up or the USB cable is replugged. But we definitely see the issue also after a reset or power cycle.
I found an errata that could be related: ERR050053 ROM: USB HID device cannot be re-enumerated successfully after an unplug/plug USB cable operation. The errata sounds very close to the issue we see also on reset or power cycles. Could it be that the errata also affect these procedures?
The issue is crucial for us since we see the enumeration issues on our i.MX 8QM and i.MX 8QXP based products. For automated production testing and programming, it is important that the USB gets enumerated reliable.
additional answer from team:
-------------------
Below from ROM team:
"Regarding this issue, we have conducted debugging, and reproduced the two situations you mentioned as follows:
1. Sometimes it takes several seconds and multiple attempts until the USB port is enumerated.
A: Because there is a CRC error in the packet sent by the host, then the host resets and enumerates again. Because the wrong packet is sent by the host, and the host actively re-enumerates, ROM code cannot control.
2. The issue appears more often when using the reset button.
A: If reset device during enumeration, the ROM code will restart from the beginning. Obviously, it will not respond to the host at this time, so host must re-enumerated.
We found that in both cases, the enumeration can be successful in the end.
Do you see the same phenomenon (enumeration can be successful in the end) as ours?"
-------------------
additional feedback from team:
---------------------------
We are still waiting for Customer i.MX8 chip version, and did they use is USB2.0 controller or usb 3.0 controller.
We get attached CRC kind error , attached is USB protocol log, need customer double check is it
same as they met error?
---------------------------
> "Regarding this issue, we have conducted debugging, and reproduced the two situations you mentioned as follows:
>
> 1. Sometimes it takes several seconds and multiple attempts until the USB port is enumerated.
> A: Because there is a CRC error in the packet sent by the host, then the host resets and enumerates again. Because the wrong packet is sent by the host,
> and the host actively re-enumerates, ROM code cannot control.
But why exactly is there a CRC error in the first place? We really don't think this may be simply blamed on the host as the exact same host can work without any such issues with non-SCFW based i.MX chips in serial download mode both with older ones (e.g. i.MX 6, 6ULL and 7) as well as later ones (e.g. i.MX 8M Mini and Plus).
> 2. The issue appears more often when using the reset button.
> A: If reset device during enumeration, the ROM code will restart from the beginning. Obviously, it will not respond to the host at this time, so host
> must re-enumerated.
Sure, but we are not talking about the initial enumeration, we are talking about it having to re-enumerate over and over sometimes even leading to no successful enumeration at all.
> We found that in both cases, the enumeration can be successful in the end.
While so far we have not seen otherwise on them NXP MEK reference boards we do need to conduct more (automated) testing to confirm this.
> Do you see the same phenomenon (enumeration can be successful in the end) as ours?"
No, we see cases where the enumeration really fails and we do need to either power-cycle or reset the board again to trigger a complete re-try in order to ever get to an enumerated state.
As a background, we do have our automated testing infrastructure where all of this can be tried automatically millions of times. And as indicated above, our setup works very reliably using i.MX 6, 6ULL, 7, 8M Mini and 8M Plus while we see lots of problems with i.MX 8 and 8X aka the SCFW based boards.
> We are still waiting for Customer i.MX8 chip version,
Given our early access partner status we really do have any and all different versions of your chips. And while we saw this issue much more often on them initial ones (e.g. i.MX 8 A0 silicon as well as i.MX 8X A0 and B0 silicon) the issue persists but somewhat with a lower rate with later chip versions (e.g. i.MX 8 B0 silicon and i.MX 8X C0 silicon).
> and did they use is USB2.0 controller or usb 3.0 controller.
We really tried any and all such variants without seeing any significant difference.
> We get attached CRC kind error , attached is USB protocol log, need customer double check is it
> same as they met error?
Yes, this is definitely also something we are seeing when hooking up our USB analyser. But the test results do not seem consistent meaning this seems not the only issue at hand.
Anyway, we suspect that the boot ROM may only do limited tuning of them USB settings as compared to the full USB stack running later in Linux where we are not seeing any such USB comunication issues at all. We are also in the process to run full USB 3.0 compliance testing with our hardware to confirm this. Could you please comment/confirm this?
BTW: We also noticed that NXP seems to keep further tuning even the regular Linux USB stack (e.g. in the latest NXP BSP 5.10.9-1.0.0). However, we do not have any visibility into what/why exactly NXP is doing this...
I asked internally and below answer:
-------------------
The customer used i.MX8QXP or i.MX8QM chip version is?
What is the reporudce rate on i.MX8QM or i.MX8QXP, here i need is every time new power on board, not reset board.
And as customer said no issue on Windows host PC, so share one USB protocol log which get from Windows host PC case.
-------------------
>>I found an errata that could be related: ERR050053 ROM: USB HID device cannot be
>>re-enumerated successfully after an unplug/plug USB cable operation. The errata sounds very
>>close to the issue we see also on reset or power cycles. Could it be that the errata also affect
>>these procedures?
>sorry I do not think so.
We were not asking what YOU think about this. We are asking whether or not you can connect us to anybody within NXP who actually does know what exactly this ERR050053 is about!
>>On some Linux computers, the issue appears more often than on others.
>this proves that issue is not caused by i.MX8, but linux host.
No, the only thing this proves is that you obviously have no clue what you are talking about!
>Also situation when usb devices work fine in windows and not in linux is well known
>as described for example below:
And this proves that you are not even any good at googling relevant stuff! An i.MX SoC in serial download mode has absolutely nothing to do with any USB-serial adapters or anything!
We would really appreciate that rather than giving stupid answers you would a) admit that you do not know anything about it and b) forward our queries to actual NXP R&D personnel who does know what they are talking about. Thanks!