Lock up of USB EHCI controller in i.MX28

cancel
Showing results for 
Search instead for 
Did you mean: 

Lock up of USB EHCI controller in i.MX28

Jump to solution
2,103 Views
axels
Contributor I

Dear All,

 

We are experiencing a complete lock up of the EHCI controller in the i.MX28.

When this happens, the USB bus is completely dead and he D+ and D- signals end up in the J state.

 

More details about the circumstances in which the EHCI locks up:

- It happens with different usb-to-ethernet dongles (with different drivers)

- We can only reproduce this with FTP for some reason. iperf does not trigger it, even with similar packet sizes and throughput.

- It only happens when the client pc downloads, uploads work fine. (the bulk of the data is OUT)

- The AHB transfers use the default INCR mode, not INCR8 or INCR16 (errata 2858 & ENGR119650)

- Putting a hub between the device and host controller has no effect, the problem still happens, it also happens on both host controllers.

- Varying the amount of outstanding URBs doesn't have any impact.

- We've backported the linux ehci driver from linux 3.6 to 2.6.35 since a lot of ehci fixes look like the might be related but the issue still occurs.

- a usb analyzer always shows the same thing: transfers of 1292 bytes split into 2x 512 and 1x 268. When it goes wrong, the last transfer is 1x 512 and only the first 256 bytes of the second 512 bytes. (with an ellisys usb explorer).

 

Any help/pointers to avoid, fix or further investigate this issue are greatly appreciated!

 

The attached file contains the contents of /sys/kernel/debug/usb/ehci/fsl-ehci.0 before, during and after the transfer. It also contains a dump of all the USBCTRL registers.


Kind regards,

 

Axel

Original Attachment has been moved to: usb-debug.tgz

Labels (2)
0 Kudos
1 Solution
358 Views
PeterChan
NXP Employee
NXP Employee

Hi Axel,

If this problem is caused by fifo under run on the host side, you may try the following:

1. change the SBUSCFG register to 6 (AHB BURST=U_INCR8). This instructs the USB controller to use INCR8 bursts when possible to access the memory and uses less bus bandwidth

2. tune TXFIFOTHRES in the TXFILLTUNING register. This is normally set to 2 which means the fifo is filled with 2 bursts (64 bytes) before the transfer on the USB bus is started. Setting it to a higher value (ex. 4 = 128 Bytes) will provide more tolerance for latency on the memory bus and thus more less risk for FIFO under run.

Below is the code change to set SBUSCFG=6 and TXFIFOTHRES=8 in kernel 2.6.35 at BSP L2.6.35_10.12.01.

===================================================================================

diff --git a/arch/arm/mach-mx28/usb_dr.c b/arch/arm/mach-mx28/usb_dr.c

index e95b786..6575951 100644

--- a/arch/arm/mach-mx28/usb_dr.c

+++ b/arch/arm/mach-mx28/usb_dr.c

@@ -463,7 +463,7 @@ static int __init usb_dr_init(void)

        pr_debug("%s: \n", __func__);

        dr_utmi_config.change_ahb_burst = 1;

-       dr_utmi_config.ahb_burst_mode = 0;

+       dr_utmi_config.ahb_burst_mode = 6;

#ifdef CONFIG_USB_OTG

        dr_utmi_config.operating_mode = FSL_USB2_DR_OTG;

diff --git a/arch/arm/mach-mx28/usb_h1.c b/arch/arm/mach-mx28/usb_h1.c

index 579ce9a..6066143 100644

--- a/arch/arm/mach-mx28/usb_h1.c

+++ b/arch/arm/mach-mx28/usb_h1.c

@@ -202,6 +202,8 @@ static struct fsl_usb2_platform_data usbh1_config = {

        .phy_lowpower_suspend = _phy_lowpower_suspend,

        .is_wakeup_event = _is_usbh1_wakeup,

        .phy_regs = USBPHY1_PHYS_ADDR,

+       .change_ahb_burst = 1,

+       .ahb_burst_mode = 6,

};

static struct fsl_usb2_wakeup_platform_data usbh1_wakeup_config = {

diff --git a/arch/arm/plat-mxs/include/mach/fsl_usb.h b/arch/arm/plat-mxs/include/mach/fsl_usb.h

index f883248..6da125f 100644

--- a/arch/arm/plat-mxs/include/mach/fsl_usb.h

+++ b/arch/arm/plat-mxs/include/mach/fsl_usb.h

@@ -65,6 +65,15 @@ fsl_platform_set_vbus_power(struct fsl_usb2_platform_data *pdata, int on)

/* Set USB AHB burst length for host */

static inline void fsl_platform_set_ahb_burst(struct usb_hcd *hcd)

{

+       struct fsl_usb2_platform_data *pdata;

+       unsigned int temp;

+

+       pdata = hcd->self.controller->platform_data;

+       if (pdata->change_ahb_burst) {

+               temp = readl(hcd->regs + FSL_SOC_USB_SBUSCFG);

+               writel((temp & (~(0x7))) | pdata->ahb_burst_mode,

+               hcd->regs + FSL_SOC_USB_SBUSCFG);

+       }

}

void fsl_phy_usb_utmi_init(struct fsl_xcvr_ops *this);

diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c

index ea01454..7b1e6a6 100644

--- a/drivers/usb/host/ehci-hcd.c

+++ b/drivers/usb/host/ehci-hcd.c

@@ -268,6 +268,9 @@ static int ehci_reset (struct ehci_hcd *ehci)

        if (ehci->debug)

                dbgp_external_startup();

+       ehci_writel(ehci, TXFIFO_DEFAULT,

+               (u32 __iomem *)(((u8 *)ehci->regs) + TXFILLTUNING));

+

        return retval;

}

View solution in original post

0 Kudos
10 Replies
358 Views
lategoodbye
Senior Contributor I

Hello Axel,

can you reproduce the problem on the MX28EVK or only a custom board?

Does the problem still occur with latest mainline kernel?

Could you please post some dongle names, which trigger the problem?

Best regards,

Stefan

0 Kudos
358 Views
axels
Contributor I

Hello Stefan,

Thank you for your response.


We haven’t tried yet on an MX28EVK.  It's been a while since we moved from the EVK to our own HW design.  We will start looking into this today.


Is there a BSP available for the latest mainline kernel?  It is our understanding that for the i.MX28 there is only a 2.6.35 based BSP available. That’s also why we did the backport of the Linux EHCI driver from kernel 3.6 to kernel 2.6.35.


Our product uses the LAN9500AI-ABZJ-TR USB to Ethernet Controller from SMSC.

A dongle that we also used to reproduce the problem is the SD-ADU2LAN-M1, which is based on a MCS7830 from Moschip.

Kind regards,

Axel

0 Kudos
359 Views
PeterChan
NXP Employee
NXP Employee

Hi Axel,

If this problem is caused by fifo under run on the host side, you may try the following:

1. change the SBUSCFG register to 6 (AHB BURST=U_INCR8). This instructs the USB controller to use INCR8 bursts when possible to access the memory and uses less bus bandwidth

2. tune TXFIFOTHRES in the TXFILLTUNING register. This is normally set to 2 which means the fifo is filled with 2 bursts (64 bytes) before the transfer on the USB bus is started. Setting it to a higher value (ex. 4 = 128 Bytes) will provide more tolerance for latency on the memory bus and thus more less risk for FIFO under run.

Below is the code change to set SBUSCFG=6 and TXFIFOTHRES=8 in kernel 2.6.35 at BSP L2.6.35_10.12.01.

===================================================================================

diff --git a/arch/arm/mach-mx28/usb_dr.c b/arch/arm/mach-mx28/usb_dr.c

index e95b786..6575951 100644

--- a/arch/arm/mach-mx28/usb_dr.c

+++ b/arch/arm/mach-mx28/usb_dr.c

@@ -463,7 +463,7 @@ static int __init usb_dr_init(void)

        pr_debug("%s: \n", __func__);

        dr_utmi_config.change_ahb_burst = 1;

-       dr_utmi_config.ahb_burst_mode = 0;

+       dr_utmi_config.ahb_burst_mode = 6;

#ifdef CONFIG_USB_OTG

        dr_utmi_config.operating_mode = FSL_USB2_DR_OTG;

diff --git a/arch/arm/mach-mx28/usb_h1.c b/arch/arm/mach-mx28/usb_h1.c

index 579ce9a..6066143 100644

--- a/arch/arm/mach-mx28/usb_h1.c

+++ b/arch/arm/mach-mx28/usb_h1.c

@@ -202,6 +202,8 @@ static struct fsl_usb2_platform_data usbh1_config = {

        .phy_lowpower_suspend = _phy_lowpower_suspend,

        .is_wakeup_event = _is_usbh1_wakeup,

        .phy_regs = USBPHY1_PHYS_ADDR,

+       .change_ahb_burst = 1,

+       .ahb_burst_mode = 6,

};

static struct fsl_usb2_wakeup_platform_data usbh1_wakeup_config = {

diff --git a/arch/arm/plat-mxs/include/mach/fsl_usb.h b/arch/arm/plat-mxs/include/mach/fsl_usb.h

index f883248..6da125f 100644

--- a/arch/arm/plat-mxs/include/mach/fsl_usb.h

+++ b/arch/arm/plat-mxs/include/mach/fsl_usb.h

@@ -65,6 +65,15 @@ fsl_platform_set_vbus_power(struct fsl_usb2_platform_data *pdata, int on)

/* Set USB AHB burst length for host */

static inline void fsl_platform_set_ahb_burst(struct usb_hcd *hcd)

{

+       struct fsl_usb2_platform_data *pdata;

+       unsigned int temp;

+

+       pdata = hcd->self.controller->platform_data;

+       if (pdata->change_ahb_burst) {

+               temp = readl(hcd->regs + FSL_SOC_USB_SBUSCFG);

+               writel((temp & (~(0x7))) | pdata->ahb_burst_mode,

+               hcd->regs + FSL_SOC_USB_SBUSCFG);

+       }

}

void fsl_phy_usb_utmi_init(struct fsl_xcvr_ops *this);

diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c

index ea01454..7b1e6a6 100644

--- a/drivers/usb/host/ehci-hcd.c

+++ b/drivers/usb/host/ehci-hcd.c

@@ -268,6 +268,9 @@ static int ehci_reset (struct ehci_hcd *ehci)

        if (ehci->debug)

                dbgp_external_startup();

+       ehci_writel(ehci, TXFIFO_DEFAULT,

+               (u32 __iomem *)(((u8 *)ehci->regs) + TXFILLTUNING));

+

        return retval;

}

View solution in original post

0 Kudos
358 Views
axels
Contributor I

Hi Peter,

Thank you for these suggestions.  The results seem very promising at the moment :smileyhappy: !

At first we had applied both of your suggestions and initial results were good.  However, using the INCR8 burst opens the door for one of the errata that we came across along the way.  We we decided to try it also without that change and keep seeing good results.  Bottom line, the FIFO tuning you suggested seems to do the trick.  We are now running extensive tests on it and hope to get a final result by end of tomorrow.

Did you experience the same or a similar problem?

What we don't fully understand yet is how exactly this failure works.  It seems to be coupled to the bit rate?

I'll let you all know what the result is of our tests.

Thanks,

Axel

0 Kudos
358 Views
PeterChan
NXP Employee
NXP Employee

This problem may occur when the USB host is running out of data for packet due to TX FIFO under run. According to reference manual, a higher TXFIFOTHRES value can be used in systems with unpredictable latency and/or insufficient bandwidth where the FIFO may underrun because the data transferred from the latency FIFO to USB occurs before it can be replenished from system memory.

Using the Stream Disable Mode (SDIS = 1 in USBMODE register) in host mode ensures that overruns/underruns of the latency FIFO are eliminated for low bandwidth systems where the RX and TX buffers are sufficient to contain the entire packet. It also has the effect of enforcing the TX latency is filled to FIFO's capacity before the packet is launched onto the USB, regardless to the TXFIFOTHRES value.

So, the Stream Disable Mode is the last resort when the higher TXFIFOTHRES value still does not work.

The errata ENGR119650 mentions NOT to set BUSCFG.AHBBRST to S_INCR8 or S_INCR16. I am not sure whether this also affects the U_INCR8 or not.

0 Kudos
358 Views
axels
Contributor I

Hi Peter,

Thank you very much for your help and clarification ! :smileyhappy:  You really helped us to get our project back up and running.

Kind regards,

Axel

0 Kudos
358 Views
ThomasBandelier
Contributor II

Hi Axel, Peter,

We met the same kind of issue recently on 2.6.35 (with USB CDC-ETHER), but the TXFIFO tuning was not enough in our case, and we didn't want to use INCR8 either because of the errata.

The suggestion from FSL was then to disable streaming mode. This was done and seems to be working until now. Hope this can help.

BR,

Thomas

358 Views
axels
Contributor I

Hi Thomas,

Thanks for letting us know about this.  It's good to hear that we are not the only ones running into this problem.  For now the tuning seems to do the trick for us.

Kind regards,

Axel

0 Kudos
358 Views
lategoodbye
Senior Contributor I

Hi Axel,

as far as i know MX28EVK is initial supported since Kernel 3.5 and really practical since Kernel 3.11. Now the BSP is described in Device Tree Script.

If you can reproduce the problem with an MX28EVK and the current mainline Kernel, you could send also the problem report to Majordomo Lists at VGER.KERNEL.ORG. With the old Freescale Kernel 2.6 they can't help you at all.

Best regards,

Stefan

358 Views
axels
Contributor I

Hi Stefan,

Thank you for clarifying.

We did manage to reproduce the problem on the MX28EVK today using the 2.6.35 kernel.  Doing a move to kernel 3.11 is not something we consider to be feasible right now.

One of Peter's suggestions seems to be very promising right now (see below).  We'll focus on that first.

Thanks,

Axel

0 Kudos