USB Enumeration Problems in Serial Downloader Mode

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

USB Enumeration Problems in Serial Downloader Mode

6,186 Views
peterlischer
Senior Contributor I

We have noticed USB enumeration issues when connecting the i.MX 8QM or i.MX 8QXP in the serial downloader mode to a host PC. Sometimes it takes several seconds and multiple attempts until the USB port is enumerated. We see the issue only with Linux host PCs, but not with Windows hosts. When the issue occurs, we see the following messages in the console of the Linux host:

[ 3124.693526] usb 1-1.1: new high-speed USB device number 57 using xhci-hcd
[ 3124.777880] usb 1-1.1: Device not responding to setup address.
[ 3124.989590] usb 1-1.1: Device not responding to setup address.
[ 3125.201533] usb 1-1.1: device not accepting address 57, error -71
[ 3125.493249] usb 1-1.1: new high-speed USB device number 58 using xhci-hcd
[ 3125.598282] usb 1-1.1: device descriptor read/all, error -71
[ 3125.604069] usb 1-1-port1: attempt power cycle
[ 3126.209538] usb 1-1.1: new high-speed USB device number 59 using xhci-hcd
[ 3126.299633] hid-generic 0003:1FC9:0129.001E: hidraw0: USB HID v1.10 Device [NXP       SemiConductor Inc  SE Blank 8QM ] on usb-xhci-hcd.1.auto-1.1/input0

 We have seen the issue on our own board, as well as on the NXP reference board MCIMX8QM-CPU (SCH-29420 REV C2). We have noticed the same behavior also with i.MX 8QXP based boards. 

For reproducing the issue on the MEK board, I but the strapping to "SERIAL BOOT" and use a USB-A to USB-C cable for plugging it to a Linux host PC. The issue appears more often when using the reset button. But we have seen it also when powering up the board. 

I have used a Total Phase Beagle USB 480 analyzer for checking the enumeration progress (see attached files). Interestingly, the communication does not always fail at the same stage of the enumeration. Sometimes I just see CRC errors, or I get a STALL message.

Normally, after a few retries, the bus gets enumerated. It is quite annoying, and we would like to know why the boards have so much trouble getting enumerated in Serial Download mode. 

21 Replies

5,481 Views
ada_lu
NXP Apps Support
NXP Apps Support

。。。

0 Kudos

5,450 Views
peterlischer
Senior Contributor I

The issue cannot be closed, we still need to understand what is causing the issue and how it can be mitigated. 

0 Kudos

5,407 Views
igorpadykov
NXP Employee
NXP Employee

Internal team is considering how to handle this case.

5,156 Views
brandon_shibley
Contributor II

@igorpadykov can you provide an update on this?

0 Kudos

5,147 Views
igorpadykov
NXP Employee
NXP Employee

for that issue I did not receive additional updates from internal team, sorry.

 

Best regards
igor

0 Kudos

5,110 Views
igorpadykov
NXP Employee
NXP Employee

I got additional answer from team:

---------------------

Regarding the root cause of this issue, please see the following description:

On power-up, a boot monitor timer is initialized. During USB enumeration in serial download mode, Host side may enumerate multiple times until enumeration succeeds, or exceeding the maximum number of times which causes enumeration failure eventually. Under corner conditions, ROM code may not get chance to refresh boot monitor timer due to USB host behavior and causes device system reset.

Generally to change a different host is most effective way to avoid this USB enumeration failure. In future, we will consider this case and refresh boot monitor timer to avoid system reset.

---------------------

Best regards
igor

5,082 Views
marcelziswiler
Senior Contributor I

Hi Igor

Thank you very much. Could you please make sure this gets properly documented in the errata document so we may refer to this when asked by our customers about such limitation. Thanks!

Cheers

Marcel

0 Kudos

5,868 Views
peterlischer
Senior Contributor I

In the meantime, we did automated tests for the recovery mode. With the test setup, we can power cycle the MEK board or our own computer module based on the i.MX 8QM and test how many attempts it takes for enumerating the USB in the serial loader. I did this test with two different Linux hosts. For each combination, I did around 3000 power cycles. Here an overview of the test results:

peterlischer_0-1619184738542.png

 

The interesting part is that with our own hardware, only 66% of the time, the enumeration is successful on the first attempt, while on the MEK it is 83%. On the MEK, a maximum of two retries is required for a successful enumeration. The connection can always be established. On our own board, much more retries are required, and in 1.4% of the cases, the enumeration fails completely.

Can you please help us understand why we have a different success rate? Do you have any idea what could cause the different behavior? Could it be related to a difference in the PMIC power-up sequence timing?

You mentioned that there are CRC errors that lead to a failing enumeration. Do you have a further background on this? Why is the CRC failing?

We are really concerned about the 1.4% of the cases the enumeration is completely failing. This will cause massive issues during production testing if we cannot rely on a USB enumeration in serial loader mode. 

Thank you in advance for your help. 

5,824 Views
igorpadykov
NXP Employee
NXP Employee

-------------------------

ROM team got some finding, and debug on this issue, we can wait for rom team.

-------------------------

0 Kudos

5,773 Views
igorpadykov
NXP Employee
NXP Employee

----------------------------

I want customer do below code change of Linux PC kernel source, and rebuild Linux PC kernel , re-do their  test, and also  show me the whole dmesg log which got from their test.

The code change at  drivers/usb/core/hub.c, function hub_port_init, 

for (retries = 0; retries < GET_DESCRIPTOR_TRIES; (++retries, msleep(100))) {
bool did_new_scheme = false;
if (0) { //use_new_scheme(udev, retry_counter, port_dev)) {

}

Above change is like  use_new_scheme(){  NOT RUN code in this  }

----------------------------

0 Kudos

5,700 Views
peterlischer
Senior Contributor I

Hi @igorpadykov 

We changed the Linux kernel on the host PC and run the automated test again for several thousand cycles. The results are quite interesting:

peterlischer_0-1620135275405.png

The success rate for enumerating the USB the first time increased dramatically on the MEK and our own hardware. On the MEK, the enumeration was successful at 99.46% of the time. We had only 21 times out of more than 3000 cycles that it required one retry. It now never needed more than one retry for enumerating the MEK board.

While our own hardware also had an increased success rate of enumerating the USB without a retry, the cases in which it was not able to enumerate at all massively increased from 1.43% to 7.37%. This is quite alarming. Do you have an idea why the behavior of our own hardware could be different? Could it be a difference in the voltage rail power-up sequence or the routing of the USB signals? 

Please find attached some examples of dmesg outputs of the host PC. These are the complete dmesg we get during a test cycle. We do not reboot the host PC during the tests. Therefore, sending the dmesg of the kernel boot does not make much sense. Please let me know if you would like to see different dmesg outputs. 

0 Kudos

5,649 Views
igorpadykov
NXP Employee
NXP Employee

from team

----------------------------

While I am still waiting for ROM team. 

From customer restult , Linux host 1 no any issue, what is the Linux host 1  hardware and OS version? 

And what is the Linux host 2 hardware and OS version?

 And  customer seems doubt  their board "voltage rail power-up sequence or the routing of the USB signals" , if customer could  share related hardware signal waveform or related file, may be we could  involve hardware team member for review what they shared.

----------------------------

0 Kudos

5,637 Views
peterlischer
Senior Contributor I

Hi @igorpadykov 

Since it was easier for the automated test setup, we have been using our own hardware as host computers.

  • Linux Host 1 (which works fine) is an Apalis iMX6Q. It is based on the i.MX 6Q SoC and runs the following OS: 
    Linux apalis-imx6 4.1.44-2.7.4+gb1555bf #1 SMP Wed Oct 4 22:39:51 UTC 2017 armv7l GNU/Linux
  • Linux Host 2 (on which we saw the failures) is a Verdin iMX8M Plus. It is based on the i.MX 8M Plus Quad SoC and runs the following OS:
    Linux verdin-imx8mp 5.4.91-06394-g82e97870feb6-dirty #3 SMP PREEMPT Fri Apr 30 11:42:05 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux

We have seen the same issues also with two laptop PC of our software developers. However, since they need their PC for work, we did not do automated test cycles on these machines:

  • Lenovo T14 Gen 1 (AMD) 
    Linux fedora 5.11.15-200.fc33.x86_64 #1 SMP Fri Apr 16 13:41:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  •  Lenovo T470s
    Linux philippe-pc 5.10.32-1-MANJARO #1 SMP Wed Apr 21 14:54:10 UTC 2021 x86_64 GNU/Linux

0 Kudos

6,118 Views
igorpadykov
NXP Employee
NXP Employee

Hi Peter

 

>We see the issue only with Linux host PCs, but not with Windows hosts.

 

seems issue is not caused by i.MX8 processor but Linux computer. One can try several Linux

computers and check if quality usb cables used in the case.

 

Best regards
igor

 

 

0 Kudos

6,082 Views
peterlischer
Senior Contributor I

Hi @igorpadykov ,

Thank you for your answer. In the meantime, we did further tests with other host PC. We have seen some dependency on the used host PC. On some Linux computers, the issue appears more often than on others. It also happens more often if the i.MX 8QM is plugged in behind a USB hub, but it also happens when we plug in the board directly to the computer's USB port.  

We can reproduce the issue easier if the USB cable is plugged in after the board is powered up or the USB cable is replugged. But we definitely see the issue also after a reset or power cycle. 

I found an errata that could be related: ERR050053 ROM: USB HID device cannot be re-enumerated successfully after an unplug/plug USB cable operation. The errata sounds very close to the issue we see also on reset or power cycles. Could it be that the errata also affect these procedures?

The issue is crucial for us since we see the enumeration issues on our i.MX 8QM and i.MX 8QXP based products. For automated production testing and programming, it is important that the USB gets enumerated reliable. 

0 Kudos

5,991 Views
igorpadykov
NXP Employee
NXP Employee

additional answer from team:

-------------------

Below from ROM team:

"Regarding this issue, we have conducted debugging, and reproduced the two situations you mentioned as follows:

1. Sometimes it takes several seconds and multiple attempts until the USB port is enumerated.
A: Because there is a CRC error in the packet sent by the host, then the host resets and enumerates again. Because the wrong packet is sent by the host, and the host actively re-enumerates, ROM code cannot control.
2. The issue appears more often when using the reset button.
A: If reset device during enumeration, the ROM code will restart from the beginning. Obviously, it will not respond to the host at this time, so host must re-enumerated.

We found that in both cases, the enumeration can be successful in the end.
Do you see the same phenomenon (enumeration can be successful in the end) as ours?"

-------------------

 

0 Kudos

5,955 Views
igorpadykov
NXP Employee
NXP Employee

additional feedback from team:

---------------------------

We are still waiting for Customer i.MX8 chip version, and did they use is USB2.0 controller or usb 3.0 controller.

We get attached CRC kind error , attached is USB protocol log, need customer double check is it 

same as they met error?

---------------------------

0 Kudos

5,933 Views
marcelziswiler
Senior Contributor I

> "Regarding this issue, we have conducted debugging, and reproduced the two situations you mentioned as follows:
>
> 1. Sometimes it takes several seconds and multiple attempts until the USB port is enumerated.
> A: Because there is a CRC error in the packet sent by the host, then the host resets and enumerates again. Because the wrong packet is sent by the host,
> and the host actively re-enumerates, ROM code cannot control.

But why exactly is there a CRC error in the first place? We really don't think this may be simply blamed on the host as the exact same host can work without any such issues with non-SCFW based i.MX chips in serial download mode both with older ones (e.g. i.MX 6, 6ULL and 7) as well as later ones (e.g. i.MX 8M Mini and Plus).

> 2. The issue appears more often when using the reset button.
> A: If reset device during enumeration, the ROM code will restart from the beginning. Obviously, it will not respond to the host at this time, so host
> must re-enumerated.

Sure, but we are not talking about the initial enumeration, we are talking about it having to re-enumerate over and over sometimes even leading to no successful enumeration at all.

> We found that in both cases, the enumeration can be successful in the end.

While so far we have not seen otherwise on them NXP MEK reference boards we do need to conduct more (automated) testing to confirm this.

> Do you see the same phenomenon (enumeration can be successful in the end) as ours?"

No, we see cases where the enumeration really fails and we do need to either power-cycle or reset the board again to trigger a complete re-try in order to ever get to an enumerated state.

As a background, we do have our automated testing infrastructure where all of this can be tried automatically millions of times. And as indicated above, our setup works very reliably using i.MX 6, 6ULL, 7, 8M Mini and 8M Plus while we see lots of problems with i.MX 8 and 8X aka the SCFW based boards.

> We are still waiting for Customer i.MX8 chip version,

Given our early access partner status we really do have any and all different versions of your chips. And while we saw this issue much more often on them initial ones (e.g. i.MX 8 A0 silicon as well as i.MX 8X A0 and B0 silicon) the issue persists but somewhat with a lower rate with later chip versions (e.g. i.MX 8 B0 silicon and i.MX 8X C0 silicon).

> and did they use is USB2.0 controller or usb 3.0 controller.

We really tried any and all such variants without seeing any significant difference.

> We get attached CRC kind error , attached is USB protocol log, need customer double check is it
> same as they met error?

Yes, this is definitely also something we are seeing when hooking up our USB analyser. But the test results do not seem consistent meaning this seems not the only issue at hand.

Anyway, we suspect that the boot ROM may only do limited tuning of them USB settings as compared to the full USB stack running later in Linux where we are not seeing any such USB comunication issues at all. We are also in the process to run full USB 3.0 compliance testing with our hardware to confirm this. Could you please comment/confirm this?

BTW: We also noticed that NXP seems to keep further tuning even the regular Linux USB stack (e.g. in the latest NXP BSP 5.10.9-1.0.0). However, we do not have any visibility into what/why exactly NXP is doing this...

0 Kudos

6,071 Views
igorpadykov
NXP Employee
NXP Employee

I asked internally and below answer:

-------------------

The customer used i.MX8QXP or i.MX8QM chip version is?

What is the reporudce rate on i.MX8QM or i.MX8QXP, here i need is  every time  new power on board, not reset board.

And as customer said no issue on Windows host PC, so share one USB protocol log which get from Windows host PC case.

-------------------

 

0 Kudos

6,057 Views
marcelziswiler
Senior Contributor I

>>I found an errata that could be related: ERR050053 ROM: USB HID device cannot be
>>re-enumerated successfully after an unplug/plug USB cable operation. The errata sounds very
>>close to the issue we see also on reset or power cycles. Could it be that the errata also affect
>>these procedures?

>sorry I do not think so.

We were not asking what YOU think about this. We are asking whether or not you can connect us to anybody within NXP who actually does know what exactly this ERR050053 is about!

>>On some Linux computers, the issue appears more often than on others.

>this proves that issue is not caused by i.MX8, but linux host.

No, the only thing this proves is that you obviously have no clue what you are talking about!

>Also situation when usb devices work fine in windows and not in linux is well known
>as described for example below:

And this proves that you are not even any good at googling relevant stuff! An i.MX SoC in serial download mode has absolutely nothing to do with any USB-serial adapters or anything!

We would really appreciate that rather than giving stupid answers you would a) admit that you do not know anything about it and b) forward our queries to actual NXP R&D personnel who does know what they are talking about. Thanks!