USB Bus Reset - Bug on USB Stack - SDK 2.3

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

USB Bus Reset - Bug on USB Stack - SDK 2.3

4,218 次查看
brunoalbrecht
Contributor III

Hello,

I have an application using MK22FN512VLH12, SDK 2.3 and KDS 3.2, FreeRTOS 9.0.0, custom board, Win 10 Pro.

Over USB (device), I implement one USB Mass Storage Device and 2 Virtual COM Ports. The MSD works as should (i.e I can write stuff to the memory and read it back, etc), the VCOM ports also work as expected.

The problem is that the USB BUS is being reset once in a while. I managed to debug the problem and realized the reset happens after a specific command is sent from the host to the MSD:

If I just connect the USB (physically), after the whole enumeration, the host keeps sending to the MSD a "test unit ready" (0x00) command (kUSB_DeviceMscEventTestUnitReady on the class callback function), for which I respond normally. Once in a while, I receive the following commands:

  1. Read Capacity (10) - 0x25 - Callback function is not called
  2. Read Capacity (10) - 0x25 - again
  3. Inquiry - 0x12 - kUSB_DeviceMscEventInquiry
  4. Log Sense - 0x4D - Unsupported command, answered directly by the USB_DeviceMscProcessUfiCommand (callback function is not called)

After the log sense, I receive a bus reset interrupt on the microcontroller. The MSD filesystem continues working on windows and if I weren't having problems, I'd never see those commands (or the bus reset). But the problem is that sometimes I have to stream data through one of the VCOM ports and if the Bus Reset happens during the streaming, the VCOM that is streaming gets stuck.

My question is: why is the bus being reset? How do I prevent that from happening?

10 回复数

3,329 次查看
mjbcswitzerland
Specialist V

Hi Bruno

I have attached a K22FN512 binary that does USB-MSD and 2 x CDC as comparison; it runs from the IRC48M and so should run on any board with this device.
If you have a USB analyser you can check the way that the class requests are handled.
The first VCOM has a command line interface on it and the second will just echo back all reception. The MSD will emulate a disk and show files with various content.

When I use this with Windows 10 I never see a LOG SENSE being requested (this is a SCSI primary request that has seemingly been added in addition to the original MSD UFI class set) so it may be that it is specific to your host (which OS do you use at the host?) or it may be a consequence of how you respond the Inquiry request(?)
In any case, if there is an unsupported request it should be responded to by a stall on that endpoint and then the stall will be cleared on the control endpoint in the standard manner - remembering that a stall is not an error but a handshake method to report of unsupported requests.

Such a bus reset is usually due to erroneous handling or the stall having problems that can't be recovered from and so a USB reset is the host's last resort (usually after a number of seconds of waiting).

You will need to check the low level USB operation to check the STALL handling.
Compare the attached binary to verify that it doesn't have the same problem (and as reference if it correctly handles a LOG SENSE).
If possible, show the analyser recording at the point of interest.


Regards

Mark

Kinetis: http://www.utasker.com/kinetis.html
Kinetis K22:
- http://www.utasker.com/kinetis/FRDM-K22F.html
- http://www.utasker.com/kinetis/TWR-K22F120M.html
- http://www.utasker.com/kinetis/BLAZE_K22.html
- http://www.utasker.com/kinetis/tinyK22.html

USB User's Guide: http://www.utasker.com/docs/uTasker/USB_User_Guide.PDF
Demos and simulation Guide: http://www.utasker.com/docs/uTasker/uTaskerV1.3_USB_Demo.PDF
USB composites: http://www.utasker.com/kinetis/USB_Device.html
USB-CDC host<->device video: https://www.youtube.com/watch?v=XhISV1czIo4&list=PLWKlVb_MqDQFZAulrUywU30v869JBYi9Q&index=16

Free Open Source solution: https://github.com/uTasker/uTasker-Kinetis
Working project in 15 minutes video: https://youtu.be/K8ScSgpgQ6M
Build with: CW10.x, KDS, MCUXpresso, IAR, Keil, Greenhills, Crossworks, CooCox, Atollic, S32 Design Studio, GNU Make and Visual Studio

For better, faster, cheaper product developments consider the uTasker developer's version, professional Kinetis support, one-on-one training and complete fast-track project solutions to set you apart from the herd : http://www.utasker.com/support.html

0 项奖励
回复

3,329 次查看
brunoalbrecht
Contributor III

Thanks Mark!

I'm using Win 10 Pro 1709

I uploaded the binary to my board and monitored the USB using Wireshark with USBPCap. I didn't connect to anything (VCOM) or played with the filesystem, I just let it run in "idle" mode.

I was able to track a Log Sense request in your application too.

Check the attached file (utasker.pcapng - usb.device_address == 1): after enumeration, I see several Test Unit Ready requests and the answers. After a while (req no. 934), there are two Read Capacity (10) requests, an Inquiry LUN and finally a Log Sense (which your application actually answers), followed by some URB_CONTROL frames and than it goes on.

I did the same in my application (file bruno.pcapng - usb.device_address == 1) and the same thing occurs (req. no. 1323), but I have more URB_CONTROL frames.

The answer of an unsupported command is what I would expect (according to the USB/UFI/SCSI doc), but it seems like it never gets to be sent:

  • Sense Key: 0x05 (UFI Illegal Request)
  • ASC: 0x20 (UFI Invalid Command Opcode)
  • ASQ: 0x00 (UFI No Sense)
0 项奖励
回复

3,328 次查看
mjbcswitzerland
Specialist V

Bruno

I don't know whether you can see everything with Wireshark but there is certainly a STALL taking place in the uTasker binary (although I don't get the log sense when I connect to my Win 10 machine I can still simulate it). Comparing the simulation result and the recording that you made:

pastedImage_1.png

The host sends the log sense in line 943.
I don't know how to interpret the lines 944..949 but I suspect it is showing a stall taking place and the stall being unstalled (although the unstall takes pace on endpoint 0 and I don't know whether this is filtered out or not).

950 is the termination of the process where the device is reporting that the previous command failed (the very last 1 in the data packet is the result - CSW_STATUS_COMMAND_FAILED, which is 1). This is exactly what I see when simulating the case and it is normal behavior for an unsupported command. Therefore the answer is "reporting the failure of the last request" and not actually handling it in any specific way.

Looking at your case:

pastedImage_2.png

It looks like the additional traffic is related to the stall process but I don't know how to interpret the content with Wireshark, or whether Wireshark really sees all details. I notice that there is a SET_FEATURE request following it and I suspect that what you are seeing is in fact the USB hub communication which is commanding it to perform a bus reset due to the previous activity causing an unrecoverable error.

My careful conclusion would be that you do have a problem with stalling which doesn't cause the host to wait (and timeout) but immediately declare a serious error and command a reset [maybe we are seeing it make two or three retries at something in the process]. There is a possibility that the actual error is the unstall part (eg. it is trying to unstall but not the correct endpoint that actually stalled - this would be my first point of investigation in fact).

Since I don't know how reliable the Wireshark recordings are (USBpCap looks still be be in the experimental stage) I prefer to use dedicated USB analysers for my work (although I can't exclude that the details are in there if one knows how to interpret correctly).

I would have offered you to send me a binary that I can run on a FRDM-K22F board but the application USB socket has recently broken off mine so I will need to order a new one. If you can build the project for any other board with USB (eg. FRDM-K64F) I could quickly run it though.
Note that you can also verify the stall process by removing support of another typical command for experimental purposes.

Regards

Mark

P.S: If you do have problem solving this in SDK it is also possible to use uTasker USB in FreeRTOS 9.0 projects with further advantages that the project is no longer device dependent and can run on any Kinetis part without porting. I have used it in many industrial projects since 2008 (on Coldfires) and since 2011 on most Kinetis parts; it is proven to be bullet-proof and allows mixing composite devices with almost no effort.

P.P.S. I can also solve SDK problems if needed in emergencies but this would require a contact since it is quite difficult to understand all SDK's USB operating details and so not always as efficient.

0 项奖励
回复

3,328 次查看
brunoalbrecht
Contributor III

Hey Mark,

thanks for the tips. In Wireshark, the addresses (if I understood correctly) are formed as BUS.DEVICE.ENDPOINT. That menas 1.3.* and 1.1.* are different devices and from the pictures, I'd say the board is enumerated with the address 1.1.*.

Wireshark indeed doesn't show much of the communication and I unfortunately don't have a Hardware USB Analyzer. I found, though, another tool that gave me better results: Microsoft Message Analyzer. I redid the experiments with this tools and got the following results:

My firmware

During enumeration, I get some problematic packages (6 packages). They show problems with Stall commands followed by endpoint halt command, but enumeration goes fine.

01-enum stall.png

After enumeration, communication goes on as it was on the Wireshark experiments (several Bulk In and Bulk Out Transfers), until eventually I get the following error: Status Babble. This makes windows close the endpoint and open it with a different address (??) - I don't really understand that, but I do understand there is an error with the Status message.

02-status babble.png

Your Firmware (initial one, without FreeRTOS)

The same Stall problems during enumeration appear, but this time I can see 9 times the problem.

03-enum stall.png

During execution, your application too get some other stall problems followed by endpoint halt, but I can't see any status babble problems.

04-endpoint halt.png

I'll go further into this status babble problem, as it seems to be causing the problem (bus reset).

I attached the trace files.

Cheers,

Bruno

0 项奖励
回复

3,328 次查看
brunoalbrecht
Contributor III

So, I actually manage to narrow the problem even more.

Using the demo application from the SDK 2.3, I get the same problem. The image below is using dev_composite_cdc_msc_freertos_frdmk82f with USB_DEVICE_CONFIG_USE_TASK set to 1, running on a FRDM-K82F board.

10-demo.png

Every ~5min, the Status Babble is reported, the endpoint is closed and the Bus is Reset (one can see on the image above that every time the bus is reset, the host PC gives a new address to the device.

The exact same thing happened with the FRDM-K22F running the simpler demo from SDK 2.3 dev_msc_ramdisk_freertos_frdmk22f with USB_DEVICE_CONFIG_USE_TASK set to 0. The demo dev_msc_ramdisk_bm_frdm_frdmk22f presents the same problem.

10-demo frdmk22f.png

Conclusion: this is a problem with the NXP's USB Stack and not my application.

To reproduce, just flash one of the MSD examples in a dev board and monitor the USB using Microsoft Message Analyzer. You'll see that the  device will receive a new address after one or two minutes of the enumeration and then every 5 min.

I'll try to narrow down where, in the stack's code this problem is originated, but I expect a positioning of NXP on the matter.

Cheers,

Bruno

0 项奖励
回复

3,328 次查看
mjbcswitzerland
Specialist V

Hi Bruno

As I mentioned, stalls are normal operations on the USB bus so stalls themselves don't represent any errors or problems if handled correctly and reliably.

If your goal is to improve the NXP stack quality you might find some workarounds at Erich Styger's blogs [MCU on Eclipse] since I believe I have seen some fixes presented there in the past.

In case you do run into emergency requirements you can request support and analysis with dedicated tools (also for NXP code corrections) at http://www.utasker.com/services.html

Regards

Mark

0 项奖励
回复

3,328 次查看
brunoalbrecht
Contributor III

Thanks Mark,

I'll check Erich's website (already do quite often actually).

Regarding the Stack problem, I found a SDK 2.0.0 in my PC. The dev_msc_ramdisk_freertos_frdmk22f presents the same problems (USB Stack version 1.0.0) - Status Babble and then reset.

I also tested making another SCSI/UFI command unsupported and the result was the same (several bus resets).

I understand it would **usually** not represent a problem for the application or even be noted by the developer, but when I'm streaming data through another endpoint on the same USB connection and don't want to lose packages, having the bus reset for no apparent reason is a big problem.

Cheers,

Bruno

0 项奖励
回复

3,328 次查看
mjbcswitzerland
Specialist V

Bruno

I just saw that a USB babble error is what they call it when the device sends back more data that requested from it. Maybe there is something visible in the recordings that allow you to see this(?)

Since I work with USB analysers for USB development (especially when close to the HW) I am not familiar with the PC tools but could imagine that they don't allow the same level of monitoring since they sit at a higher level. They are probably more suitable for application level work where the low level USB operation can be considered as essentially conform.

If you develop with USB professionally I would invest in a USB analyser (they are not that expensive nowadays) to avoid needing to invest time in guessing. It is especially advisable if you are working with a stack that is not fully robust so that you won't lose as much additional development time (which my prove much more expensive that the investment) when you need to debug and correct it.

Regards

Mark

0 项奖励
回复

3,329 次查看
brunoalbrecht
Contributor III

I think I solved the problem.

That means I don't receive Bus Resets anymore, but I'm not sure if the solution is "formally correct" (i.e. if the device sends what the hosts expects it to send after an unsupported command).

What I saw is that the code that handles the supported commands all use the function USB_DeviceMscUfiThirteenCasesCheck, which, from what I could understand, prepares the packages to be sent/received, according to how much data the host expects to send/receive. The code that handles the unsupported command doesn't call that function and even sets some other variables to 0, which might explain: 1. why was the device not answering the Log Sense command; and 2. why was the device babbling.

The code was changed on the following files:

  • usb_device_msc.c: function USB_DeviceMscProcessUfiCommand, default statement of the switch: commented everything but the call to the USB_DeviceMscUfiUnsupportCommand.
  • usb_device_msc_ufi.c: function USB_DeviceMscUfiUnsupportCommand: returned the result of the function USB_DeviceMscUfiThirteenCasesCheck (nothing else on the function was changed).

Since I'm not an USB expert and don't want to waste more time on this problem, I'll leave the proof of the solution (i.e. the check whether it is formally correct) to NXP. If I have any news, I'll post it here.

Thank you very much Mark for helping me understand the problem.

Cheers,

Bruno

3,329 次查看
mjbcswitzerland
Specialist V

Bruno

I have attached a FreeRTOS project as reference (any K22FN512 but on FRDM-K22F uTasker blink task flashes the green LED at 2.5Hz and a FreeRTROS blink task the red on at about 0.6Hz)
It still has MSD and 2 VCOMs, but I put the second VCOM on a FreeRTOS task:

static void usb_task(void *pvParameters)
{
    QUEUE_TRANSFER length = 0;
    QUEUE_HANDLE usb_handle;
    unsigned char ucRxData[128];
    while ((usb_handle = fnGetUSB_Handle()) == NO_ID_ALLOCATED) {        // get the USB handle
        vTaskDelay(500/portTICK_RATE_MS);                                // wait for 500ms and try again
    }
    FOREVER_LOOP() {
        length = fnRead(usb_handle, ucRxData, sizeof(ucRxData));         // read any usb data received
        if (length != 0) {                                               // if something is available
            fnWrite(usb_handle, (unsigned char *)"FreeRTOS Echo:", 14);  // echo it back
            fnWrite(usb_handle, ucRxData, length);                       // send the data back
            fnWrite(usb_handle, (unsigned char *)"\r\n", 2);             // with termination
        }
        else {                                                           // nothing in the input buffer
            vTaskDelay(1);                                               // wait a single tick to allow other tasks to execute
        }
    }
}

This shows how a USB-CDC task can easily receive bulk USB data on a specific interface and send (in this case echoing it back).

Regards

Mark

0 项奖励
回复