TOK_DNE too long sometimes

Frelon · ‎08-21-2008

Hi,

I use the USB stack provided by Freescale on a MCF52211. It works well almost all the time, but sometimes I have speed problems when I write data on some USB keys (2 out of 16 are problematic for now). I think that the problem may be related to ColdFire or USB stack because these 2 keys are working well on Windows.

Here is what I have seen: my write test takes about 3 minutes on non-problematic keys. On problematic keys it can takes up to 50 minutes to do the same test. I noticed that the token TOK_DNE (in INT_STAT registry) takes very long to change its state from 0 to 1, so it loops in usb_host_start_transaction() for about 237 ms instead of 1 ms normally (as far I can see).

The loop is the following:

while((MCF_USB_INT_STAT & (MCF_USB_INT_STAT_TOK_DNE | MCF_USB_INT_STAT_STALL | MCF_USB_INT_STAT_ERROR)) ==0)
{
      if (MCF_USB_INT_STAT & MCF_USB_INT_STAT_USB_RST)
      {
        evt_disconnect();
        tr_error=tre_disconnected;
        return((hcc_u16)-1u);
      }
}

Note that it doesn't go into the inner loop, which is OK.

Like I said, these keys are working quickly on Windows and they are USB 2.0 (Full speed) like the other ones that work well. Note that it doesn't seem to relate to a USB key manufacturer.

Maybe I can exit the loop if it's too long to go out of it, but I think that if the token is not in its done state, it can be a risk to exit and send new commands before it's ready.

How can I solve this? Thanks.

RichTestardi · ‎08-22-2008

One more thought back to the original post, something you might try is setting MCF_USB_OTG_ENDPT_RETRY_DIS and seeing (now in software, rather than in hardware) if retries are occurring... You'll now get back NAK tokens explicitly (rather than having hardware do a silent retry). Obviously, you want to also notice if you're getting back token statuses of 0 (bus timeout) of 15 (data error) in usb_host_start_transaction(), as well.

Frelon · ‎08-22-2008

Hello Rich,

Thanks for your replies, I tried the 3 solution you told me:

1- BDT_CTL_DATA

2- Fferraro's thread

3- ENDPT0_RETRY_DIS

Unfortunately, it didn't correct the situation.

Note that ENDPT0_RETRY_DIS was already set. I also disabled it, but it wasn't better.

For the software retry, there is no software retry that occurs because it is outside the problematic loop.

If you take a look at the code of my first post, it loops until MCF_USB_INT_STAT_TOK_DNE equals 1 (or 8 to be exact). Other token remains at 0 and it's OK for them.

The result is that there is no error, but it takes a long time to MCF_USB_INT_STAT_TOK_DNE to change its state from 0 to 8, so delays are very long.

I will now try to change registry settings that may have an impact on MCF_USB_INT_STAT_TOK_DNE.

If you have another ideas, they are welcome,

Thanks,

Frelon

RichTestardi · ‎08-22-2008

Hi,

> Note that ENDPT0_RETRY_DIS was already set. I also disabled it, but it wasn't better.

> For the software retry, there is no software retry that occurs because it is outside the problematic loop.

So you are saying you tried with RETRY_DIS both set and clear?

The retry actually exactly affects the TOK_DNE loop, since if the target NAK's and hardware retries are enabled (i.e., the bit is clear), then the loop won't complete until the ACK -- the hardware retries automatically as many times as needed.

However, if hardware retries are not enabled (i.e., the bit is set), then the loop will complete on the first NAK -- long before the eventual ACK -- and then software is responsible for initiating as many retries as are needed.

So in the case of a device that was legitimately taking 10ms to do the ACK, if the first NAK came after 1ms, then with the bit set, your loop would complete in 10ms, and with the bit clear, it would complete (for the first time) in 1ms.

Does that make sense?

Hardware retries are really nice (except for interrupt endpoints, obviously, since NAK does not mean you want a retry!), but they can hide long strings of NAKs (or worse yet, infinite strings of NAKs!) from the software, making you wonder what is actually going on.

I don't suppose you can check what's going on with an analyzer?

-- Rich

RichTestardi · ‎08-22-2008

Rich hastily said:

> So in the case of a device that was legitimately taking 10ms to do the ACK,

> if the first NAK came after 1ms, then with the bit set, your loop would

> complete in 10ms, and with the bit clear, it would complete (for the first time)

> in 1ms.

>

> Does that make sense?

It probably would have made more sense if he said exactly the opposite...

With the bit set (hardware retries disabled), then the NAK would be passed

back to software in just 1ms. With the bit clear (hardware retries enabled),

then only the ACK would be passed back to software, after 10ms.

I don't do double negatives very well... :smileyhappy:

-- Rich

Frelon · ‎08-29-2008

Hi Rich,

In my last post I made I mistake, MCF_USB_ENDPT0_RETRY_DIS was set only on interrupt endpoints.

Anyway, I tried MCF_USB_ENDPT0_RETRY_DIS = 1 but I wasn't able to communicate with any USB key at all. The error returned was something like no valid usb key inserted. So I think that disabling hardware retry may not be the way to solve that problem.

Note: for an analyser, I have a scope but no USB analyser.

Frelon

RichTestardi · ‎08-29-2008

Hi,

Once you set RETRY_DIS to 1, you will most likely start receiving NAK responses back where you used to only get ACK's back before... The USB device will often respond to a command with a NAK, meaning it wants the host to "try again later"... Then it will go fetch the data that was requested by the command it just NAK'd, and wait for the retry. The retry will then be ACK'd.

So once you set RETRY_DIS to 1, you want to make sure your transaction code is retrying any NAK'd commands it issues (unless they are on an interrupt endpoint).

If you are running the regular CMX stack, you should follow this code path in usb_host_start_transaction():

Code:

    case TOKEN_NAK:      /* device is not ready */      MKDBG_TRACE(ev_got_nak, ep);            if (my_device.eps[ep].type == EPTYPE_INT)      {        return(0);      }      /* retry */      break;

That will retry the NAK'd transaction up to 3 times.

I am now wondering if you might be getting many more than 3 NAK's in a row! Which also might be an indicator of your original performance issue... Note that the hardware retry mechanism will retry NAK's indefinitely, so if you wanted to make the software and hardware mechanisms identical, you'd need an infinite retry count.

Can you try changing this line in the same function to a very big number (say, 1000000)?

Code:

  hcc_u8 retry=3;

At that point, you should have software retries behaving just like hardware retries. Then you can instrument your code to see how many NAK's you are actually processing (which are otherwise hidden from your code with the hardware retry mechanism), and then you might have a good clue as to what is going on.

-- Rich

RichTestardi · ‎08-29-2008

PS In case it is not clear, getting rid of the hardware retry *won't* make things go faster (typically I find software retries a bit slower, actually), but it helps diagnose *why* a TOK_DNE is taking a long time, because you can see if you're getting one (or more) NAK's before the final ACK that results in the TOK_DNE, thereby implicating the device (as opposed to the driver) as the source of the slowness. The only problem with the hardware retry mechanism is that it hides what is really going on from you, especially if you don't have an analyzer.

BTW, if you have a digital storage scope, you can actually track the USB traffic that way, with a bit of effort... I did so here: http://www.testardi.com/rich/coldfire/bitscope.htm and would be happy to walk you thru the details, if you need. The only thing you really need to do is get a good trigger, and it seems you could just pulse a GPIO pin on the "slow command" completing, and get very close to the issue at hand.

Frelon · ‎08-29-2008

Thanks for the hint, I will take a look at that!

Frelon · ‎11-28-2008

Hello, if it can help some people having the same bug, I finally resolved that problem (some time ago now!).

My software received data from serial port by 128 bytes packet, so I wrote directly on the USB key was I received, 128 bytes each time, instead of keeping a big buffer in memory.

But sectors on USB key, or mass storage, are 512 bytes long, so I tried to wait for 4 transactions, so 512 bytes, before writing to USB key and it resolves my problem.

I repeat that the slow transfer occured on some USB keys only. Now the writing is done correctly using 512 bytes.

Frelon

RichTestardi · ‎08-21-2008

There are bugs in the CMX example drivers, and I remember one related to data toggle. They sometimes set the tgl_tx and tgl_rx structure members as if they are booleans, and sometimes as actual byte values. I believe these two lines:

/* After the setup we shall send/receive DATA1 packets. */
my_device.eps[ep].tgl_tx=1;
my_device.eps[ep].tgl_rx=1;

Should change to:

/* After the setup we shall send/receive DATA1 packets. */
my_device.eps[ep].tgl_tx=BDT_CTL_DATA;
my_device.eps[ep].tgl_rx=BDT_CTL_DATA;

I had a number of issues with the CMX stack, so I eventually wrote my own which is on the web at the bottom of this page, as host2.zip, if you want to see it:

http://www.cpustick.com/downloads.htm

The driver decides between host (talking to MST) and device (exposing FTDI) mode on boot, so you only want the host code paths.

I believe fferraro also found bugs in the CMX stack at:

http://forums.freescale.com/freescale/board/message?board.id=CFCOMM&message.id=4954#M4954

-- Rich

JimDon · ‎08-21-2008

Sorry to borrow your thread.
Rich - Does this code support interrupt endpoints in device mode?
(It look like it does - you pass in -1 to usb_bulk_transfer for an interrupt end point)

Can I use the config tables from the CMX code unchanged?

It might be nice if you made some upper level driver examples available, or explain a bit about the user callbacks.

RichTestardi · ‎08-21-2008

Hi JimDon,

I just added some upper level driver examples to the host2.zip (including scsi.[ch] -- not sure if you saw it before or after I did that). And I'll post some basic commands below. Yes, device mode works with interrupt endpoints (though host mode does not) and the first device I brought up was an accelerometer-based mouse!

I am just learning how inconsistent different USB MST devices are... I just found one where you *have to* send TUR, Request Sense, TUR, and a bunch of other commands before it will allow you to do a Read10 -- a Request Sense directly on the Read10 won't clear the Unit Attention!!!

-- Rich

Code:

        // set interface        usb_setup(0, SETUP_TYPE_STANDARD, SETUP_RECIP_INTERFACE, 0x0b, 0, 0, 0, &setup);        rv = usb_control_transfer(&setup, NULL, 0);        assert(rv == 0);        // get max lun        usb_setup(1, SETUP_TYPE_CLASS, SETUP_RECIP_INTERFACE, 0xfe, 0, 0, sizeof(max), &setup);        rv = usb_control_transfer(&setup, &max, sizeof(max));        assert(rv == 1 && max == 0);        // inquiry        memset(cdb, 0, sizeof(cdb));        cdb[0] = 0x12;  // inquiry        cdb[4] = 36;        rv = scsi_bulk_transfer(1, cdb, 6, inq, sizeof(inq));        if (rv < 0) {            return rv;        }        assert(rv == sizeof(inq));        led_happy();                // test unit ready        memset(cdb, 0, sizeof(cdb));        cdb[0] = 0x00;  // test unit ready        (void)scsi_bulk_transfer(0, cdb, 6, NULL, 0);                // request sense        memset(cdb, 0, sizeof(cdb));        cdb[0] = 0x03;  // request sense        cdb[3] = sizeof(sense);        rv = scsi_bulk_transfer(1, cdb, 10, sense, sizeof(sense));        if (rv < 0) {            return rv;        }        assert(rv);        led_happy();                    // test unit ready        memset(cdb, 0, sizeof(cdb));        cdb[0] = 0x00;  // test unit ready        rv = scsi_bulk_transfer(0, cdb, 6, NULL, 0);        if (rv < 0) {            return rv;        }        assert(rv == 0);        led_happy();                    // read format capacities        memset(cdb, 0, sizeof(cdb));        cdb[0] = 0x23;  // read format capacities        cdb[8] = sizeof(caps);        rv = scsi_bulk_transfer(1, cdb, 12, caps, sizeof(caps));        if (rv < 0) {            return rv;        }        assert(rv);        led_happy();                // read capacity        memset(cdb, 0, sizeof(cdb));        cdb[0] = 0x25;  // read capacity        rv = scsi_bulk_transfer(1, cdb, 10, cap, sizeof(cap));        if (rv < 0) {            return rv;        }        assert(rv == sizeof(cap));        led_happy();                // read block        memset(cdb, 0, sizeof(cdb));        cdb[0] = 0x28;  // read10        cdb[2] = sector>>24;        cdb[3] = sector>>16;        cdb[4] = sector>>8;        cdb[5] = sector>>0;        assert(count < 256);        cdb[8] = count;        rv = scsi_bulk_transfer(1, cdb, 10, buffer, count*512);        if (rv < 0) {            return rv;        }        assert(rv == count*512);        led_happy();

TOK_DNE too long sometimes

TOK_DNE too long sometimes

General