g_serial serial gadget issue

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

g_serial serial gadget issue

9,190 Views
kef2
Senior Contributor IV

Hi

I need to send 10-30 byte packets over /dev/ttyGS to Windows host PC. Time gaps between adjacent packets is random, max packets rate at now is quite low, about 400Hz.

Linux write() function is used to send data. The problem is that gadget may stop sending anything to host, unless I insert usleep(150) call between adjacent write() calls. Data transfer from host PC to Vybrid is always fine. Write problem arises on both 1N02G and 2N02G maskset chips. Am I alone having this issue?

My very first application code was write()-ing packets byte by byte, one byte per write() call. Serial gadget stopped transmitting almost immediately. Then I modified code to send whole packet using single write() call. This worked better, but still was problematic. I think that problem happens when two packets are too close to each other. Perhaps errata 6857 applies? 150 microseconds gap for workaround is chosen keeping in mind 1/125usec high speed USB microframe rate. I haven't yet tried to push usleep() argument down, perhaps it could be lower. 150us is OK for me. ...Still testing with 150us and hoping it won't hang any more.

Edward

Labels (4)
22 Replies

4,722 Views
billpringlemeir
Contributor V

I think this is a common problem with g_serial and not Vybrid specific.  The USB is packet based, but RS232 is character based.  There is some time between a last character sent and the g_serial deciding to flush the buffer and send it across the interface.  I think a default is to look for CR/LF and flush on this.  In your case, you probably have binary data and the USB algorithm to create a packet just keeps on globbing.  When you add the delay, it decides there is no more data and it sends the packet (your usleep() is acting like a flush() ).  There are probably better ways around the issue.

0 Kudos

4,722 Views
kef2
Senior Contributor IV

Hi Bill

Do you imply that g_serial is buggy and I can't expect it working properly on any HW? I can't agree that the problem is the difference of packetness of USB vs RS232. 200-300 packets per second of <30 bytes is enough to make g_serial stuck. This is not a lot.

I think it could be Vybrid errata e6857 Adding dTD to Primed Endpoint May Not Recognized. First write() could initialize USB queue head, first dTD and initiate transfer. Then it could fail adding dTD with data from second write() to HW linked list. Linux driver may keep waiting forever for transfer complete.

Regards

Edward

0 Kudos

4,722 Views
richard_stulens
NXP Employee
NXP Employee

Hi Edward,

I wonder if the root cause may be in the way the gadget driver reclaims (re-uses) the descriptor memory.

First a quick overview of the operation of the controller.

The USB controller operates on a linked list of transfer descriptors (dTDs). When the list is empty, new dTDs are added to the queue head.

When the list is not empty, software adds new dTDs to the last descriptor in the list by setting the next pointer to the address of the new dTD and clearing the Terminate bit at the same time. This simply extends the existing linked list.

The controller processes a dTD by copying it's content to the queue head overlay area. This is the working area where intermediate results are stored. In teh queue head this dTD is referenced by the current dTD pointer. When all data for the current dTD is transferred, the controller copies the status information from the queue head back to the dTD memory. At this point, the active bit will be cleared in the dTD.

If the dTD was the last one in the list (T-bit is set), then the controller will re-read the dTD to check if SW had added a new dTD to the list whilst the last dTD was in progress, and if the T-bit in the next pointer is no longer set, it will use the new next pointer to load the new dTD.

Software will also  re-use the memory of completed dTDs. Usually SW will start at the top of the list and walk the list, checking the Active bit,  until it finds a dTD with the Active Bit still set, or until it finds one with the T-bit set.

If it finds onewith the T-bit set, it means it has reached the end of the list.

The issue that can occur is now that there may be some time between the controller writing back the status to the dTD and the controller re-reading the dTD. If software re-uses that memory before the controller has re-read the dTD, the memory may not have valid data in the dTD and the controller can crash on bus error. This is a non-recoverable error.

The solution to this is to not remove the last completed dTD until a new dTD is added to the queue head.

This is not an actual bug. The last dTD is the current dTD for the controller and as long as that is the case, the dTD memory  should not be re-used.

I'm not sure if this is actually the problem, but there is a fair chance.

My software Colleague pointed me to this link for the gadget driver. This may not be suitable for your Linux version but I guess it can at least serve as example.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/patch/drivers/usb/chipidea/core...

Best regards,

Richard

4,722 Views
kef2
Senior Contributor IV

Richard,

thank you very much for nice explanation. It is a pity that such details seem being missing in Vybrid RM.

Could you please guide how to estimate how long may it take from the moment controller clears active bit in dTD to dTD I/O complete interrupt is pending, also how long does it take from active bit clear in dTD to dTD reread by controller?

Timesys Linux /drivers/usb/gadget/fsl_udc_code.c seems using dma_pool_alloc() and dma_pool_free() to allocate and free dTD memory. If I understood you properly, software should keep last used dTD memory untouched, and instead use different piece of memory for next dTD to be filled and added to the linked list. Linux driver is not following this, so if time between dTD active=1->0 to dTD reread by controller is long enough, then last dma_pool_free() call and new dma_pool_alloc() may refer to the same physical memory and the issue you mentioned may take place.

Thanks

Edward

0 Kudos

4,722 Views
richard_stulens
NXP Employee
NXP Employee

Edward,

The delay is primarily defined by internal memory bus usages of other bus masters.

After the controller writes the dTD update, it will have to re-arbitrate for memory bus access. When there are higher priority masters, they will get the buss first. There may transfers n progress, like the LCD controller, which may take several micro seconds before they release the bus.

So it is system dependent, but it can easily be tens of micro seconds.

For the last dTD, the recommendation to not re-use the last dTD is just an easy way to avoid reading a corrupted dTD.

After the controller has re-read the memory, it will not access the dTD anymore, but technically the controller still has a pointer to that memory.

If the dTD is replaced with a new dTD before the re-read, then there will not be a problem. It will just continue as if the linked list was extended. The problem is that once the memory is freed, some other process or driver can allocate that memory and use it for something unrelated. That's when a crash can occur.

So, we leave the last dTD allocated until a new dTD is put on the queue head. Then we can free the old dTD memory.

Best regards,

Richard

4,721 Views
kef2
Senior Contributor IV

Hi Richard,

Sorry for delay. Looks like SDK on my PC got broken, took some time to figure out why kernel refuses to compile.

Timesys kernel /drivers/usb/gadget/fsl_udc_code.c driver file, I mentioned previously, actually is not included in kernel build tree. It seems being an older version of /drivers/usb/gadget/arcotg_udc.c . And this newer driver has some #ifdef POSTPONE_FREE_LAST_DTD 's, which should fix the issue you mentioned.

POSTPONE_FREE_LAST_DTD in Timesys kernel is enabled only for MX5. I tried making it defined also for Vybrid. Perhaps it is useful for busy systems, but unfortunately it doesn't seem solving my issue. Row of write( , , 1) calls is very likely to break serial gadget.

Thanks

Edward

P.S. To enable POSTPONE_FREE_LAST_DTD one needs to modify line 49 of /drivers/usb/gadget/arcotg_udc.h :

@@ -46,7 +46,7 @@

#define NEED_IRAM(ep) ((g_iram_size) && \

  ((ep)->desc->bmAttributes == USB_ENDPOINT_XFER_BULK))

-#ifdef CONFIG_ARCH_MX5

+#if defined(CONFIG_ARCH_MX5) || defined(CONFIG_ARCH_MVF)

#define POSTPONE_FREE_LAST_DTD

#else

#undef POSTPONE_FREE_LAST_DTD

0 Kudos

4,721 Views
richard_stulens
NXP Employee
NXP Employee

Hi Edward,

Thanks for posting the fix for the postpone patch.

If you have time to capture endpoint data at the time of the error, we can have a look at what might be happening.

On the other hand, since it is Timesys Linux based, this should probably be on the Timesys support.

Anyway, if you like to do this please dump:

  • the queue head for the endpoint,
  • the descriptor that is pointed to as current
  • the descriptor that should be next but did not get executed
  • The controller registers

I probably don't have to tell you, but just in case, the USB controller uses physical addresses and the Linux drivers use logical addresses.

For ease of analysis, please try to dump the physical addresses as well.

Best regards,

Richard

0 Kudos

4,721 Views
kef2
Senior Contributor IV

Hi Richard,

I can't capture this data by now. Hope to try it 1-2mo later.

Thanks and Regards,

Edward

0 Kudos

4,721 Views
sanchayanmaity
Contributor III

Hello Edward,

I came across your post while doing some research on net with my own issue. My use case I actively worked on is the gadget ethernet though.

We also have a Vybrid module and had issues with USB gadget ethernet. I did test USB storage and serial, but, not like you did. It worked for me in my very limited use/test case. Recently after some testing we landed with a reliable fix.

The patches for the same were send just a while back

https://lkml.org/lkml/2014/12/19/107

May be this will help you, if it does, do report so if possible. Not sure which kernel version and from where you are using, but, we did with it a recent 3.18 kernel.

On a different note, with newer kernels you can use ConfigFS for USB gadget functionalities.

Linux/Documentation/usb/gadget_configfs.txt - Linux Cross Reference - Free Electrons

https://wiki.tizen.org/wiki/USB/Linux_USB_Layers/Configfs_Composite_Gadget/Usage_eq._to_g_mass_stora...

Regards,

Sanchayan.

0 Kudos

4,721 Views
kef2
Senior Contributor IV

Hello Sanchayan,

thanks for reply. First I thought you are pointing me to patch for kernel gadget sources, which could help. But it seems it is a patch for /drivers/usb/chipidea, which is not present in Timesys Linux. Is there a patch for g_serial sources? Thanks

Regards,

Edward

0 Kudos

4,721 Views
sanchayanmaity
Contributor III

Hello Edward,

May I ask which kernel version you are using? Are you using the Timesys 3.0 or 3.13 release? These are the two releases I know of.

I cannot be sure as I didn't test USB serial gadget thoroughly, but, I believe this is a problem with the core Chipidea driver. Based on my testing, it seems the Vybrid needs a software implementation of an errata which is observed on the Vybrids only, because it uses the 2.40a version of the core. The same is not observed on i.MX devices as they use the 2.50 version. You can refer the spinics mailing list link on that lkml page to see the discussion between me and Peter Chen. The gadget functionality implementations themselves are fine and probably have no issue. Depending on the kernel version you are using, you will have to modify the patch as the chipidea driver source was cleaned and unified starting 3.14 onwards I believe. I will also be checking for the fix on 3.0 kernel version, but, it will be a while before I get to that.

Regards,

Sanchayan.

0 Kudos

4,722 Views
kef2
Senior Contributor IV

Hellp Sanchayan

It is Timesys 3.0.15 .

Well, it sounds bit cryptic, since I don't know what is Chipidea and how it relates to Vybrid, no idea at all. Timesys 3.0.15 Linux uses USB-device driver stored in /drivers/usb/gadget, not in /drivers/usb/chipidea. Anyway thanks for help.

Regards,

Edward

0 Kudos

4,722 Views
sanchayanmaity
Contributor III

Hello,

Chipidea is the name of the IP core which Freescale's Vybrid  and i.MX implement. It's basically the company name by whom the IP core was orginnally designed. If the core is the same, the driver will be the same. Kernel development happens at a lot faster pace and so the difference which you see in 3.0 and 3.18.

Can you modify the driver in 3.0 tree with the diff at the below link

http://www.spinics.net/lists/linux-usb/msg118786.html

If it does not apply cleanly, do the changes yourself. See if that fixes the issue for you. The fix is posted by Matthieu and seems to refer the ci13xxx driver, which was the Chipidea driver before the cleanup happened. My patch was based on that diff, but, with the changes made for the 3.18 source tree. If it looks confusing, let me know, I will writeup a rough changeset here which you can use.

Regards,

Sanchayan.

0 Kudos

4,721 Views
kef2
Senior Contributor IV

Hello Sanchayan,

Thank you very much for help. Finally I see diff for patch you mentioned :-). I was unable to find it previously.

From your patch and discussion you mentioned it is clear that what helped you is a workaround for Vybrid errata e6857. Looking at kernel sources it seems that e6857 is not handled in Timesys 3.0.15 kernel. Unfortunately 3.0.15 and 3.1.x drivers do differ a lot and I can't patch 3.0.15 with existing 3.1 patches. It is not clear for me how best to fix 3.0.15 driver. By now I think it is the best to look for active dTD's and reprime in arcotg_udc.c done() routine. I hope I'll try to fix it later.

Is there the right place to look for IP core vendors, names and core versions used in Vybrid. How else could I know in advance that for example kernel /drivers/usb/chipidea folder contains Vybrid drivers? Thanks

Regards,

Edward

0 Kudos

4,720 Views
sanchayanmaity
Contributor III

Hello Edward,

The diff at the link I gave you in the last post can be applied to 3.0 series.

If you have a look at the diff, it is applicable at the line 1529. Your source should probably be the same. The git link below is to our older BSP which was based on 3.0 Timesys kernel. You should be able to apply the diff I believe or may be just make the changes by hand. I will check the working soon enough on 3.0 as well, but, I will be testing gadget ethernet as it is simpler to test with iperf. Though frankly I haven't check for any automated serial testing tools.

http://git.toradex.com/cgit/linux-toradex.git/tree/drivers/usb/gadget/ci13xxx_udc.c?h=colibri_vf#n15...

Addendum: Ok, I am wrong kind of. I was under the impression that ci13xxx driver was being used in the 3.0 version, but, it seems that is not the case. The arcotg_udc driver is used. This will require a bit of looking into, but, gist of the fix should be the same I believe. It seems the 3.0 version is really unclean having multiple drivers for the same IP.  The fix was reported by Matthieu Castet and he seems to be using the ci31xxx driver, while the Timesys 3.0 uses the arcotg_udc driver. This will take a bit of time. I will try doing a backport of the fix to arcotg_udc and get back to you, if I fix it.

As far as recent kernels go, the driver being used can be found from device tree desciptions in

http://git.toradex.com/cgit/linux-toradex.git/tree/arch/arm/boot/dts?h=toradex_vf_3.18-next

In older kernels, these can be traced down with board files in

http://git.toradex.com/cgit/linux-toradex.git/tree/arch/arm/mach-mvf?h=colibri_vf

If you want something more to be cleared, let me know.

Regards,

Sanchayan.

0 Kudos

4,720 Views
kef2
Senior Contributor IV

Hello Sanchayan,

Sorry for delay and thanks for response. I'd rather upgrade to 3.1x kernel. It's too difficult for me to port 3.1x gadget driver to 3.0x kernel.

Thanks and regards.

Edward

0 Kudos

4,722 Views
jackblather
Senior Contributor I

I am also starting to use the g_serial loadable module for VCOMM to a PC. I have not encountered problems yet that you have, so I'm interested in bug reports and solutions/workarounds.

0 Kudos

4,722 Views
naoumgitnik
Senior Contributor V

Hello Edward,

I tried to discuss this issue with the Vybrid IC design team, but it is a bit difficult for them to understand it without having its beak-down to some block level.

Our assumption is that it is related to the USB block; is it correct?

Regards, Naoum Gitnik.

0 Kudos

4,722 Views
kef2
Senior Contributor IV

 

Hello Naoum,

 

  thanks for your comment and sorry for very short description. "Serial gadget" is Linux USB device function for virtual COM port. It allows plugging Vybrid Linux to USB port on host PC to communicate with Vybrid through virtual COM port.

   

I thought it must be popular among developers and should be widely used. But I searched Vybrid forums and found just a single thread asking for g_serial usage specifics. There's no thread mentioning anything similar to my issue. And the issue is:

 

 

Vybrid Linux box receives some event and makes write() call to send event data to /dev/ttyGS0. Max average events frequency is less than 10kHz, but there may be 2-5 events bursts up to ~100kHz. Windows PC should read event data from virtual COM port. The problem is that serial gadget may stop sending messages until someone replugs USB cable. Sending data back from host to Vybrid works always and is always fine. Once serial gadget stops sending data to host:

 

- it doesn't help to restart software, which talks to virtual COM port on Windows PC

- it doesn't help to restart software, which talks to /dev/ttyGS0 on Vybrid Linux

- it helps to replug USB cable

 

- Debugging Windows and Linux software parts didn't reveal anything dumb. Both parts keep writing and listening for data. Windows side SW was tested for many years using different HW.

 

First version of Linux side SW was calling write() function for every byte sent to PC. It was very easy to make serial gadget stuck. Then I modified SW to make it sending whole event data using single write() call. It lowered chances to make serial gadget stuck, but still possible. This gave me idea to put small delay after write() call. 150us delay seems solving issue completely. I tried lowering delay. At least 50us delay is required for robust operation.

 

I'm quite happy with usleep(150) workaround, but I would like to know what causes the problem.

 

Thanks

 

Edward

 

 

0 Kudos

4,722 Views
jackblather
Senior Contributor I

Another guy is having a similar problem. He found it's related to 'blocksize'. He's found a way around it.

[Rfi] g_serial writes block with blocksize of 512 :

I am having some problems with the g_serial gadget (USB port that looks like a serial port to the host) when performing writes that are multiples of 512 bytes.

Whenever I perform a write() on the device-side file descriptor with a size of 512, 1024, or 1536 bytes the write system call eventually blocks indefinitely and nothing at all is received at the host-side. However, writes of 511, 513, 1023, 1025, seem to work fine (though I've not yet checked that the data is correct).

Device-side read() seems to work OK.

This can also be demonstrated using dd: e.g.:

# dd if=/dev/zero of=/dev/ttyGS0 bs=511 # This works

# dd if=/dev/zero of=/dev/ttyGS0 bs=513 # This works

# dd if=/dev/zero of=/dev/ttyGS0 bs=512 # This doesn't

"After a lock-up, the USB has to be unplugged and replugged to make it work again."

He doesn't have a proper solution but he's found that by avoiding certain block sizes, as demonstrated in the quote, the problem does not manifest itself.

I hope this helps.

0 Kudos