Hi Edward,
I wonder if the root cause may be in the way the gadget driver reclaims (re-uses) the descriptor memory.
First a quick overview of the operation of the controller.
The USB controller operates on a linked list of transfer descriptors (dTDs). When the list is empty, new dTDs are added to the queue head.
When the list is not empty, software adds new dTDs to the last descriptor in the list by setting the next pointer to the address of the new dTD and clearing the Terminate bit at the same time. This simply extends the existing linked list.
The controller processes a dTD by copying it's content to the queue head overlay area. This is the working area where intermediate results are stored. In teh queue head this dTD is referenced by the current dTD pointer. When all data for the current dTD is transferred, the controller copies the status information from the queue head back to the dTD memory. At this point, the active bit will be cleared in the dTD.
If the dTD was the last one in the list (T-bit is set), then the controller will re-read the dTD to check if SW had added a new dTD to the list whilst the last dTD was in progress, and if the T-bit in the next pointer is no longer set, it will use the new next pointer to load the new dTD.
Software will also re-use the memory of completed dTDs. Usually SW will start at the top of the list and walk the list, checking the Active bit, until it finds a dTD with the Active Bit still set, or until it finds one with the T-bit set.
If it finds onewith the T-bit set, it means it has reached the end of the list.
The issue that can occur is now that there may be some time between the controller writing back the status to the dTD and the controller re-reading the dTD. If software re-uses that memory before the controller has re-read the dTD, the memory may not have valid data in the dTD and the controller can crash on bus error. This is a non-recoverable error.
The solution to this is to not remove the last completed dTD until a new dTD is added to the queue head.
This is not an actual bug. The last dTD is the current dTD for the controller and as long as that is the case, the dTD memory should not be re-used.
I'm not sure if this is actually the problem, but there is a fair chance.
My software Colleague pointed me to this link for the gadget driver. This may not be suitable for your Linux version but I guess it can at least serve as example.
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/patch/drivers/usb/chipidea/core...
Best regards,
Richard