LPC55S69 USB device receives corrupt data

mditto1 · ‎01-27-2021

Hello, we need some help solving a USB data corruption problem on the
LPC55S69 (or perhaps the problem is on the i.MX8M Quad).

We are building a system that includes an i.MX8M acting as a USB host
connecting to an LPC55S69 acting as a high speed USB device. We see USB
messages sent from the host and received by the LPC55S69 that are
corrupted by the time they are stored in RAM. I have used a Beagle USB
480 analyzer to capture the bus traffic and it sees intact data,
suggesting that the corruption is happening in the LPC55S69. However,
this corruption never happens when an x86 laptop is used as the host,
and happens very frequently when the i.MX8M is the host. Using the x86
PC as a host I can send (and receive the replies to) hundreds of
thousands of consecutive messages without error, but using the i.MX8M as
a host I get at most one or two correct transfers before a failure. I
have not been able to identify a specific difference between the
messages sent by the PC and the ones sent by the i.MX8M.

We originally encountered the problem with data packets of the DFU
protocol, but have reduced our code in the MCU to a simple
vendor-defined "echo" protocol to allow the problem to be easily
reproduced. It simply receives a message into a buffer and then allows
the host to request the same buffer to be sent back. The CPU does not
read or write the buffer. The host side software for this test is a
small Python 3 program.

The particular form of data corruption is usually limited to the first
16 bytes of the received message, and usually affects entire 32-bit
words, but not necessarily all of the 32-bit words in the first 16
bytes. For example, we might see that only the second and fourth word
of the buffer are corrupt. Each word that is corrupt contains a copy of
some other 32-bit word later in the message.

Messages of 64 bytes or shorter do not experience this corruption. It
is easy to reproduce with 80-byte transfers and happens at least up to
512 bytes as well. The problem does not happen when the LPC55S69's
full speed interface is used - only on the high speed interface.

The problem can be reproduced using the MCIMX8M-EVKB as the host,
running the NXP reference Linux (4.9.88-imx) image with the addition of
the Python 3 "usb" module, and the LPCXpresso55S69 development board as
the device.

A typical run of the test looks like this:

root@imx8mqevk:~# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 1fc9:0094 NXP Semiconductors
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
root@imx8mqevk:~# date | ./usbecho
Found 001:002 1FC9:0094 MCU VIRTUAL COM DEMO None
Data payload is 29 bytes.
Good match.
Success: received data matched sent data.
root@imx8mqevk:~# dd if=/dev/urandom bs=80 count=1 of=/tmp/shortfile
1+0 records in
1+0 records out
80 bytes copied, 0.000155521 s, 514 kB/s
root@imx8mqevk:~# ./usbecho < /tmp/shortfile
Found 001:002 1FC9:0094 MCU VIRTUAL COM DEMO None
Data payload is 80 bytes.
Receiving...
Failure: received data does not match sent data.
< d9eeb869024c4f7fac0b9b79331f7d36e7a04b7c159513d0d2277541c523a006698188b05c3c8db46bb6ace9e5c49341c4b603177eff7dee3bac0bca1b690940ed4f333ddc64ccb8b404a109126ca82a
> ed4f333ddc64ccb8b404a109126ca82ae7a04b7c159513d0d2277541c523a006698188b05c3c8db46bb6ace9e5c49341c4b603177eff7dee3bac0bca1b690940ed4f333ddc64ccb8b404a109126ca82a

In this case you can see that the first 16 bytes of the data have not been transferred correctly - specifically, they have been replaced with a copy of the last 16 bytes of the message.

Please find attached:

a patch to the NXP example program lpcxpresso55s69_dev_cdc_vcom_freertos that adds the "echo" functionality
The patched program, built with MCUXpresso, runs on the LPCXpresso55S69 board.
a Python 3 program to send a packet to the device, then read it back and check it for correctness
This program runs on x86 desktop Linux or on i.MX8M Linux.
a Beagle USB data capture file showing a packet on its way to the device and a packet reading back a corrupted copy of the data.
a screenshot showing the same
two Beagle USB data capture files showing "good" transfers (no corruption), one from the same i.MX8M board and one from the x86 laptop.

Thanks,

Mike

ZhangJennie · ‎02-23-2021

Here is the workaround we received , please try it.

"

The previous workaround should fix the issue described: If echoBuffer is stored in SRAM or in USB_RAM (unaligned), the allocation and memcpy will be done for all packets(chunked as the maximum packet size), so the issue should not show up.

Anyway, please find attached the patch with the fix (patch for SDK 2.9.0).
Replace the file under <project folder>\usb\device\source\lpcip3511\usb_device_lpcip3511.c

The problem was that, if the buffer size is not the multiple of the maximum packet size, the lpcip3511 driver will allocate a USB dedicated RAM to receive data from host when the second-to-last packet transfer is done. As the previous code primes next buffer firstly(will allocate a USB dedicated RAM if needed) and then does the memcpy according to the variable epPacketCopyed, the variable epState->epBufferStatusUnion[odd].epBufferStatusField.epPacketCopyed will be set to 1 firstly and then do the memcpy. Since the control endpoint do not have the double buffer, the variable odd is always 0. Hence, the previous code will do the memcpy from a random USB dedicated RAM to echoBuffer, and thus the echoBuffer is corrputed. Now fix this by doing memcpy firstly and then prime the next buffer for the control endpoint.

This will be included in the upcoming SDK release. I confirmed patch is working, please test it as well.

With patch, user can store echoBuffer in USB_RAM (aligned).

"

Hope this helps,

Jun Zhang

View solution in original post

rijn · ‎05-05-2023

Hi Jennie, I am getting the same problem with LPC54S018 and USB-HS with current latest SDK 2.13.0

With a Pc Computer everything is fine, but when harware is connected to a android tablet, the same problem as this thread is hapenning. the android tablet works fine with a stm32 running the same source code.

rijn · ‎05-05-2023

I made test with #undef USB_DEVICE_VALUE_RETURN_VALUE_CHECK and now its better

problem still happens on and off but does not error out as much as before.

mditto1 · ‎01-28-2021

I wrote:

The particular form of data corruption is usually limited to the first
16 bytes of the received message,

I should clarify that this is when using an 80-byte message. Presumably this has something to do with the 64-byte packet size on high speed control transfers, resulting in the last 16 bytes being sent in a second data packet.

ZhangJennie · ‎02-01-2021

From your description. this problem only happens when message is longer than 64bytes, because you said "Messages of 64 bytes or shorter do not experience this corruption."

If you don't use i.MX8M send data, message still can't longer than 64byes? As we need to identify the problem is on LPC or i.MX8M.

Have a nice day,

Jun Zhang

mditto1 · ‎02-01-2021

Messages of any size from a PC are received correctly on the LPC.

Thanks,

Mike

ZhangJennie · ‎02-02-2021

So the problem is only on data from iMX8M. Could you collect the same data from iMX8M and PC, check them with USB analyzer. What's the difference?

mditto1 · ‎02-02-2021

I attached Beagle capture files to my initial report above. You can view them with the "data center" software from https://www.totalphase.com/products/data-center/.

I did not see any great difference between the successful and corrupted runs. In both cases, the 80-byte transfer is broken into a 64-byte packet and a 16-byte packet as expected (high speed control transfers have a maximum packet size of 64 bytes). The 64-byte packet is re-transmitted one time, and the 16-byte packet is transmitted only once. The data in the first packet is correct (in the initial transmission and in the re-transmission) when captured by the analyzer, but in the LPC memory it is corrupt only in the i.MX8M case. The only thing I noticed was minor timing differences. I'm attaching screenshots showing these two test runs here.

Mike

ZhangJennie · ‎02-05-2021

The problem is mostly on iMX8M side. How do you think? Please let me know if you don't think.

Can you check why your iMX can't generate same USB wave as PC?

mditto1 · ‎02-05-2021

It's very difficult to say. Clearly the problem only happens when the iMX8M is involved.

But consider this:

Look at evk-evk-bad.png. At index 43, a "good" transfer of 80 bytes goes from host to device. At index 74, the transfer in the opposite direction shows corrupt data. Since the host->device transfer is expanded in this display, we can look at the individual packets. Index 48 shows the first attempt to send bytes 0-63, and this attempt failed (NAK at index 51). Index 52 shows the second attempt to send those same bytes and the OUT packet at index 58 shows the bytes are not yet corrupted and the NYET packet at index 59 shows the LPC received the packet. We assume that those bytes 0-63 were stored correctly in LPC memory at that point.

But the data transfer is not done yet - it's an 80-byte transfer so there are still 16 more bytes to send. At index 60 there is another OUT transaction with those 16 bytes. At index 66 we see the exact values of those 16 bytes, and the acknowledgement at index 67 (and ACK at index 72 for the entire control transfer). We assume that the final 16 bytes were stored correctly in LPC memory at that point.

But how, exactly did those 16 bytes ALSO get stored at the beginning of the RAM buffer? Those bytes appeared on the wire exactly once (index 66). But they appear in RAM in two places (offset 0-15 and offset 64-79). Why did the LPC's USB hardware and DMA engine store those bytes in RAM twice?

This mystery suggests a flaw on the LPC side. But it does not explain why the problem happens only when the iMX8M is the host.

To reiterate, here are the bytes the iMX8M sent on the wire:

D9EEB869 024C4F7F AC0B9B79 331F7D36
E7A04B7C 159513D0 D2277541 C523A006
698188B0 5C3C8DB4 6BB6ACE9 E5C49341
C4B60317 7EFF7DEE 3BAC0BCA 1B690940
ED4F333D DC64CCB8 B404A109 126CA82A

And here are the bytes that appeared in RAM on the LPC:

ED4F333D DC64CCB8 B404A109 126CA82A <-- why duplicated?
E7A04B7C 159513D0 D2277541 C523A006
698188B0 5C3C8DB4 6BB6ACE9 E5C49341
C4B60317 7EFF7DEE 3BAC0BCA 1B690940
ED4F333D DC64CCB8 B404A109 126CA82A <-- why duplicated?

I think this should be investigated on the LPC side by someone who understands the USB1 high speed receive and DMA hardware.

Thanks,
Mike