Coldfire Lite DHCP Issues with no network connection

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Coldfire Lite DHCP Issues with no network connection

Jump to solution
4,706 Views
Dave_at_Mot
Contributor III

I'm chasing down some odd behaviour on a project running the latest V3.2 Coldfire Lite stack on a MCF52233.  It is setup as a TCP Client and I've got Eric Gregori's HTTP client described in AN3518 running on top.  I've even added all the updates that "Marc VDH" has put together and still I have the following issue:

 

If I start the unit with the network cable plugged in, everything is fine.  Once it goes through DHCP and all, I can remove and reinstall the cable and it all behaves nicely.   On the other hand if I start the board with no network connection it will at first wait for a valid network connection using:

 

     while (!iniche_net_ready)  
        TK_SLEEP(1);

It will sit here waiting properly for a few minutes, but eventually it will issue a DHCP timeout message on the console and then declare usage of the default IP address, and then things go very bad and I start getting "dtrap" error messages.

 

My guess is the the DHCP timout is doing something incorrectly.  I haven't dug into this too deeply but wanted to see if anybody else has run into this before I spend a lot of time on it.

 

MQX is starting to sound like something I want to try !

 

Thanks,

Dave

Labels (1)
0 Kudos
Reply
1 Solution
2,339 Views
vier_kuifjes
Senior Contributor I

I found a fix for the problem! During the search for the solution I found some (obvious) bugs in my set of modifications. It appears that de detection of the link status was actually inverted: link_down_detect was set TRUE when the link was available, which is obviously wrong!

 

There were 2 of those bugs present. The first one was in dhcsetup.c

 

void
dhc_setup(void)
{ 
   int      iface;
   uint16 mymrdata;
   ulong    dhcp_started;
   ip_addr  dhcp_saveaddr[STATIC_NETS];
   int      e;
   int      dhcnets = 0;   /* number of nets doing DHCP */

while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read proprietary status register
   ;
   if (mymrdata & PHY_R1_LS)  // detect if link is down
  set_link_down_detect(TRUE);
 else
  set_link_down_detect(FALSE);

   e = dhc_init();

 ...should become...

 

void
dhc_setup(void)
{ 
   int      iface;
   uint16 mymrdata;
   ulong    dhcp_started;
   ip_addr  dhcp_saveaddr[STATIC_NETS];
   int      e;
   int      dhcnets = 0;   /* number of nets doing DHCP */

while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read status register
   ;
   if ((mymrdata & PHY_R1_LS) == FALSE)  // detect if link is down
  set_link_down_detect(TRUE);
 else
  set_link_down_detect(FALSE);

   e = dhc_init();

 

 

The second one resides in timeouts.c:

 

#ifdef DHCP_CLIENT
while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read proprietary status register
   ;
   if (mymrdata & PHY_R1_LS)
     link_down_detect = TRUE;   // detect if link is down
   else
   {
    if (link_down_detect == TRUE)
    {
     if( POWERUP_CONFIG_DHCP_ENABLED )
     {
      for (iface = 0; iface < STATIC_NETS; iface++)
       dhc_state_init(iface, TRUE);
     }
    }
    link_down_detect = FALSE;
   }

      dhc_second();
#endif

 ...should become...

 

#ifdef DHCP_CLIENT
while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read status register
   ;
   if ((mymrdata & PHY_R1_LS) == 0)
    link_down_detect = TRUE;   // detect if link is down
   else
   {
    if (link_down_detect == TRUE)
    {
     if( POWERUP_CONFIG_DHCP_ENABLED )
     {
      for (iface = 0; iface < STATIC_NETS; iface++)
      {
      dhc_set_callback_dhc_main_ipset(iface);
      dhc_state_init(iface, TRUE);
      }
     }
    }
    link_down_detect = FALSE;
   }

      dhc_second();
#endif

 

 The line in green above is part of the fix of the problem. When DHCP fails because no link was available at startup, the IP address etc. will no longer be printed on the terminal when the link becomes available again and DHCP has done it's work. The green line fixes this. The function dhc_set_callback_dhc_main_ipset() is defined in dhcsetup.c. I have added it near the end:

 

void dhc_set_callback_dhc_main_ipset(int iface)
{
 // If callback is not already in use (by AutoIP) grab it for
 // our printf routine.

 if(dhc_states[iface].callback == NULL)
 {
  dhc_set_callback(iface, dhc_main_ipset);
 }
}

#endif   /* DHCP_CLIENT */

 

 

 The DTRAPs on the terminal screen will keep showing up when there's no link available for an extended period of time. This is OK, those DTRAPs are perfectly harmless!

 

I have added the 2 modified source files as attachment in a zip file.

View solution in original post

0 Kudos
Reply
16 Replies
2,340 Views
vier_kuifjes
Senior Contributor I

I found a fix for the problem! During the search for the solution I found some (obvious) bugs in my set of modifications. It appears that de detection of the link status was actually inverted: link_down_detect was set TRUE when the link was available, which is obviously wrong!

 

There were 2 of those bugs present. The first one was in dhcsetup.c

 

void
dhc_setup(void)
{ 
   int      iface;
   uint16 mymrdata;
   ulong    dhcp_started;
   ip_addr  dhcp_saveaddr[STATIC_NETS];
   int      e;
   int      dhcnets = 0;   /* number of nets doing DHCP */

while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read proprietary status register
   ;
   if (mymrdata & PHY_R1_LS)  // detect if link is down
  set_link_down_detect(TRUE);
 else
  set_link_down_detect(FALSE);

   e = dhc_init();

 ...should become...

 

void
dhc_setup(void)
{ 
   int      iface;
   uint16 mymrdata;
   ulong    dhcp_started;
   ip_addr  dhcp_saveaddr[STATIC_NETS];
   int      e;
   int      dhcnets = 0;   /* number of nets doing DHCP */

while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read status register
   ;
   if ((mymrdata & PHY_R1_LS) == FALSE)  // detect if link is down
  set_link_down_detect(TRUE);
 else
  set_link_down_detect(FALSE);

   e = dhc_init();

 

 

The second one resides in timeouts.c:

 

#ifdef DHCP_CLIENT
while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read proprietary status register
   ;
   if (mymrdata & PHY_R1_LS)
     link_down_detect = TRUE;   // detect if link is down
   else
   {
    if (link_down_detect == TRUE)
    {
     if( POWERUP_CONFIG_DHCP_ENABLED )
     {
      for (iface = 0; iface < STATIC_NETS; iface++)
       dhc_state_init(iface, TRUE);
     }
    }
    link_down_detect = FALSE;
   }

      dhc_second();
#endif

 ...should become...

 

#ifdef DHCP_CLIENT
while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read status register
   ;
   if ((mymrdata & PHY_R1_LS) == 0)
    link_down_detect = TRUE;   // detect if link is down
   else
   {
    if (link_down_detect == TRUE)
    {
     if( POWERUP_CONFIG_DHCP_ENABLED )
     {
      for (iface = 0; iface < STATIC_NETS; iface++)
      {
      dhc_set_callback_dhc_main_ipset(iface);
      dhc_state_init(iface, TRUE);
      }
     }
    }
    link_down_detect = FALSE;
   }

      dhc_second();
#endif

 

 The line in green above is part of the fix of the problem. When DHCP fails because no link was available at startup, the IP address etc. will no longer be printed on the terminal when the link becomes available again and DHCP has done it's work. The green line fixes this. The function dhc_set_callback_dhc_main_ipset() is defined in dhcsetup.c. I have added it near the end:

 

void dhc_set_callback_dhc_main_ipset(int iface)
{
 // If callback is not already in use (by AutoIP) grab it for
 // our printf routine.

 if(dhc_states[iface].callback == NULL)
 {
  dhc_set_callback(iface, dhc_main_ipset);
 }
}

#endif   /* DHCP_CLIENT */

 

 

 The DTRAPs on the terminal screen will keep showing up when there's no link available for an extended period of time. This is OK, those DTRAPs are perfectly harmless!

 

I have added the 2 modified source files as attachment in a zip file.

0 Kudos
Reply
2,339 Views
Dave_at_Mot
Contributor III
Very nice.  Testing at the moment and so far everything looks fine.  Many Thanks once again.
0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

 

Good to hear it's working!

 

I added some more stuff today. Like I expected, behaviour would be the same if the physical link is OK but no DHCP server is present. I added some code to automatically restart DHCP if the DHCP client is not in the BOUND state. This is done every minute.

 

First, we need a way to read the DHCP state from outside DHCPCLNT.C. I added a function to do this in DHCPCLNT.C:

 

/* FUNCTION: dhc_get_state()
 *
 * dhc_get_state() : Get the actual state for the interface
 *
 *
  * PARAM1: int iface
 *
 * RETURNS: unsigned int state
 */

unsigned int dhc_get_state(int iface)
{
   return (unsigned int) dhc_states[iface].state; /* Return the actual state */
}

#endif   /* DHCP_CLIENT - ifdef out whole file */

 

 The rest of the modifications are done in timeouts.c.

We need to know the definitions of the DHCP states in timeouts.c:

 

#ifdef DHCP_CLIENT
#include "dhcpclnt.h"
extern   int dhc_discover(int iface);
extern   int dhc_second(void);
#endif

 Then, we need to define the timeout time for restarting the DHCP sequence, and a variable to keep track of time:

 

#ifdef DHCP_CLIENT

#define DHCP_AUTO_RESTART_TIME (60*TPS) //auto restart DHCP 60 seconds after timeout

static char link_down_detect;
static u_long dhcp_timeout_timer = 0;


void set_link_down_detect(char link_status)
{
 link_down_detect = link_status;
}

char read_link_down_detect(void)
{
 return link_down_detect;
}

#endif

 And last, the code to restart DHCP:

 

#ifdef DHCP_CLIENT
if (dhc_get_state(0 /*iface*/) != DHCS_BOUND) // (there's only one interface!)
      {
          if ((cticks - dhcp_timeout_timer) > DHCP_AUTO_RESTART_TIME)
          {
     link_down_detect = TRUE;   // this will restart DHCP
           dhcp_timeout_timer = cticks;
          }
      }
      else
          dhcp_timeout_timer = cticks;


   while(!(fec_mii_read(FEC_PHY0, PHY_REG_SR, &mymrdata))) // read status register
   ;
   if ((mymrdata & PHY_R1_LS) == 0)
    link_down_detect = TRUE;   // detect if link is down
   else
   {
    if (link_down_detect == TRUE)
    {
     if( POWERUP_CONFIG_DHCP_ENABLED )
     {
     //for (iface = 0; iface < STATIC_NETS; iface++)
      //{
                        dprintf("\nRestarting DHCP...\n");
      dhc_set_callback_dhc_main_ipset(0 /*iface*/); // (there's only one interface!)
      dhc_state_init(0 /*iface*/, TRUE);
      //}
     }
    }
    link_down_detect = FALSE;
   }

      dhc_second();
#endif

 

Because the DTRAPs that appear after the initial 5 minutes timeout make working with the console a little problematic, I removed that DTRAP from the code, which is located in DNSCLNT.C:

 

 

static int
dnc_sendreq(struct dns_querys * dns_entry)
{     
   PACKET pkt;
   struct dns_hdr *  dns;  /* outgoing dns header */
   char *   cp;   /* scratch pointer for building question field */
   int   server;  /* index of server to use */
  
  
   /* figure out which server to try */
   for (server = 0; server < MAXDNSSERVERS; server++)
   {
      if (dns_servers[server] == 0L)
         break;
   }
   if (server == 0)  /* no servers set? */
   {
     //dtrap();
      return ENP_LOGIC;
   }
   server = dns_entry->tries % server;

 

 At least, that was the DTRAP that did it for me...

 

For convenience, I once again added the modified source code in the attached ZIP file.

 

 

It is possible to reduce those initial 5 minutes to ease testing somewhat. 5 Minutes is a long time to wait! But don't make the time shorter than 2 minutes as you will then start seeing some severe errors!

 

The time is defined in dhcsetup.c:

 

 

   if(dhcnets == 0)  /* no nets doing DHCP? */
      return;
     
   /* wait for DHCP activity to conclude */
//   while (((cticks - dhcp_started) < (30*TPS*10)) && // 5 minutes
   while (((cticks - dhcp_started) < (120*TPS)) && // 2 minutes
      (dhc_alldone() == FALSE))
   {
      /* let other tasks spin. This is required, since some systems
       * increment cticks in tasks, or use a polling task to receive
       * packets. Without this activity this loop will never exit.
       */
      tk_yield();
#ifndef SUPERLOOP
      /* In non-superloop systems the pktdemux task won;t be running
       * yet since the network has not been marked as UP. Force processing
       * of received packets here.
       */
      if (rcvdq.q_len)
         pktdemux();
#endif
   }

 

 

 

0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

OK, here's a first step in the right (?) direction. I modified mcf5223x_ePHY_init in MCF52235_sysinit.c to reflect the behaviour of the 52259 version. This means that the board initialisation will hang in a loop for as long as no link is available. That link will be 10Mbps half duplex if it is then established.

 

I have attached the modified MCF52235_sysinit.c to this message.

Message Edited by Marc VDH on 2009-05-04 08:55 PM
0 Kudos
Reply
2,339 Views
Dave_at_Mot
Contributor III

That certainly helps.  It at least hides the problem which is good enough for now.  If this runs nice and stable for a while I may just leave it like this until I have time to get back to this.  Many thanks.

 

Still wondering if I should dive deeper into this or give MQX a try.  I have to believe that MQX should be a bit more robust.  

0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

MQX looks very tempting to me too, but I'm afraid that my app ported to MQX may not fit within the 128kB limit of codewarrior special edition. But I think I will give it a try sometime.

 

I have figured out that the dtrap problem has to be located somewhere in the DNS client, but that's all I know for now.

0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

I have tried this out tonight, starting my application up with no network connected. (I tried this out on my 52235EVB board but that's essentially the same as on the 52233DEMO board). After a short while the board decided no network is present, at the same time the red "COL" led turned on. On the console I then get IP address and subnet mask 0.0.0.0

 

When I then connect the network cable, the red LED goes off and the LNK LED turns on. Nothing seems to happen for quite a while, and by the time I get the impression that there might be a problem, the board suddenly gets an IP address assigned and my application starts running!

 

I have put my complete coldfire application on my web space so you can download it and check it out if you wish. Maybe there's a difference in my application configuration compared to yours which causes the difference in behaviour. You can download it here:

http://www.mvdh.be/Metaf.zip

 

In case you have an LCD connected as described in AN3518... it will probably not work with my app because I do not use the transistor interface between the demo board and the LCD. Instead, I use some "proper" level shifters that are non-inverting as opposed to the transistor circuit. I have modified the LCD driver software to reflect this non-inverting behaviour.

If it DOES work... well... the texts that appear on screen are in dutch, my main language...

 

There are 4 different targets in the application. One is for the 52259DEMO board. It will probably not work on the 52233DEMO board. Then there's one for the 52235EVB board (called "testopstelling"). The remaining 2 are for the 52233DEMO board. The only difference between these targets (except for the CPU) was the MAC address, but I eliminated that difference in the version I put up here.

0 Kudos
Reply
2,339 Views
Dave_at_Mot
Contributor III

Marc,

 

Much appreciated.   I will dig in and see what I can find.   Interesting aviation app :smileyhappy:

 

Dave

 

0 Kudos
Reply
2,339 Views
Dave_at_Mot
Contributor III

Marc,

 

I've been going over things with my app and still have the following situation.  If I start the unit without a network cable installed and install it within the five minute DHCP timout period everything works fine.  Once it does sucessfully get an IP addresss I can then remove and reinstall the network cable repeatedly over a long period of time and it always recovers.

 

On the other hand if I start unit without a network cable and then wait just a bit more than 5 minutes from the sign on on the console to install the network cable, DHCP times out and I can NEVER get the unit to recover.   

 

I've reviewed my network setting and believe I have things setup just as you do.  I'd try your app but at the moment I'm still running the older version 6 of codewarrior.  

 

Tomorrow I'm going to go back the bone stock Coldfire lite project and see if I can replicate this problem.

 

Does anybody else out there have a colfire lite project running with the HTTP client from Eric Gregori, and if so can they comment on this?

 

Dave

 

0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

Yes, you're right. I see it now too, I guess I just didn't wait long enough for it to happen!

Now that I have seen it, I can start looking for it! :smileywink:

0 Kudos
Reply
2,339 Views
Dave_at_Mot
Contributor III

Marc,  Thats both good news and bad.   Apparently I didn't break it porting it to my project, so I can skip going back to the "as downloaded" coldfire lite project.  On the other hand a network stack that can recover from a DHCP timout is pretty well broken already.  My project is an HVAC black box and really needs to work without any user intervention.  If I can't get past this I have a problem.  

 

For the moment I may simply have to detect this condition and restart the stack.   Crude, but it may at least let me continue.   Fortunately The coldfire isn't doing much more than network comm on this board and a restart wouldn't be too disruptive.

 

FYI, I did try a build with and without AUTOIP enabled and saw no difference.  

 

Any progress you could make would be MUCH appreciated.  I'm not quite enough of an IP expert to dig to far down into this myself.  

0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

I'll do my best to try to figure out what's going on, and if I find something I will definitely post it here.

 

I am definitely not an expert in this either. I'm just playing with Coldfire for about a year now. In the beginning I hardly knew anything about IP. But I learned a lot by digging through the software!

0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

In case this is of any value... the M52259DEMO board does not show this behaviour. When it is powered up with no network attached, it just sits there doing nothing. The moment the network is connected, it immediately comes to life doing what it is supposed to do.

 

I believe this is because the 52259 PHY init routine remains in a wait loop as long as there is no physical network connection detected. As far as I have checked, the 52233 PHY init software does not do this.

 

I believe it is also worth checking what would happen when the physical link is OK at startup, but no DHCP server is available (with the DEMO board configured as DHCP client). I haven't done that test yet.

0 Kudos
Reply
2,339 Views
Dave_at_Mot
Contributor III

Thats also very interesting.  It sounds like they are aware of this problem and are coding around it on the 52259 implementation. 

 

I could live with the 52259 demo board behavior.  I'm guessing I could patch that in myself.  Perhaps I'll take a look at that later today.  Will also try the missing DHCP server test.

0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

Maybe the difference in behaviour has something to do with the bug in the 5223x PHY. The external PHY on the 52259 board automatically negotiates the network link. All the software has to do is wait for the link to become active.

 

The bug in the 5223x PHY causes auto negotiation of the link to fail in some cases (see http://www.freescale.com/files/32bit/doc/errata/MCF52235DE.pdf). The negotiation has to be worked around by software.

 

Just a wild guess...

0 Kudos
Reply
2,339 Views
vier_kuifjes
Senior Contributor I

Hmmm... strange... I'm sure I tried that out already...

I will check this out tonight when I get home.

0 Kudos
Reply