can not rejoining thread network

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

can not rejoining thread network

2,745 Views
phantomgz
Contributor III

It is a little complicated to describe this situation.

This test set up same like: "Thread Border Router using Linux + KW as Host Controlled Device with Serial TAP for DHCPv6-PD"


1. a REED "thr create" to start a new network, and then this REED becomes a LEADER and commissioner.

2. a "thread host controlled device" as a border router(BR), "thr join", and success to join this new network. 

3. after few second, thr BR requesting become a router, and router Id assigned.

4. disconnecting REED(the LEADER) from the network, should be used "thr detach" or "reboot".

5. after few minutes, BR transfer its role to "Router/Leader".

pastedImage_1.png

6. REED "thr join" again, and success attached. immediately "Requesting to become Active Router".

but REED never gets router Id assigned, and show up "Requesting to become Active Router" over and over.

the REED try to join, 

pastedImage_2.png

the attached file was captured from Wireshark.

an ED tries to join:

ED join.png

at this moment this network seems to dead.

7. power off or "reset" BR, after a moment,

REED show up "Node has taken the Leader role, (Local) Commissioner Started".

REED taken Leader.png

after that, everything smooth again

8. an ED "thr join" success, show up "Attached to network with PAN ID: xxxx".

pastedImage_25.png

9. BR "thr join", and success to attaching network. later "Device is Router, Router Id assigned".

pastedImage_26.png

Labels (2)
10 Replies

2,069 Views
jc_pacheco
NXP Employee
NXP Employee

Hi Diego,

A new version of the SDK (SDK Tag: rel_conn_ksdk_2.2_kw41z_zigbee_6.0.9_RC4.1) was recently released please upgrade your SDK.

I was able to confirm the issue was fixed in the latest SDK but it requires a few additional patches (attached).

There was major refactoring between 1.1.1.28 and 1.1.1.31. As for the THR_SERIAL_TUN_USE_DHCP6_ADDR it is not available any more. In order to have the same behavior BR_ROUTER_MODE has to be set and SERIAL_TUN_IF enabled.

By default the application will generate a new prefix after each factory reset of the border router; if you want an specific prefix you can of course manually change it. The hard-coded IP addresses (FD01::01, FD01::02) are not needed anymore, they were used by the TUN mode but TAP uses ND and the app_border_router.c. Linux host configures itself via the Router Advertisements from the BR, so the make_tap.sh is also updated (attached).

You just need to configure SERIAL_TUN_IF and BR_ROUTER_MODE in /source/config.h

    #define SERIAL_TUN_IF                1
...
    #define BR_ROUTER_MODE               1
    #define BR_HOST_MODE                 0
...‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The attached files should be available in an upcoming Maintenance release (no ETA yet).

Regards,

JC

0 Kudos

2,068 Views
jc_pacheco
NXP Employee
NXP Employee

Hi Diego,

I was able to reproduce the issue, the BR device becomes unresponsive after becoming a leader again... I noticed that a reset on the Host Controlled Device board restores the communication, please use this as a workaround while the issue is addressed.

Also, is the ULA required in your topology? Have you considered using a regular Host Controlled Device (THR_SERIAL_TUN  0) with Mesh local address (ML64)?  I noticed that the issue is not reproduced with this topology...

Thanks for the feedback, I'll let you know when I have some news.

Regards,

JC

0 Kudos

2,070 Views
diegocomin
Contributor IV

Hi JC,

I come with another issue related with my topology. The thing is that I need to maitain the tunnel application with (Thread_Shell or Thread_KW_Tun) and I need to periodically send "getnodesip" with python TCHI as the example "host_sdk\hsdk-python\src\com\nxp\wireless_connectivity\test\getaddr.py". When the tunnel application is open and I put in the terminal python getaddr.py /dev/ttyACM0 I can not receive properly the nodes ip, but when I close the tunnel application and put in the terminal python getaddr.py /dev/ttyACM0 I obtain without any problem the nodes ip.

I saw in the source code of Thread_Shell that the tap routine is related to a callback where I receive all the RX response to THCI commands so when tap routine is opened the RX response is bussy and I receive nothing with the getaddr.py. Is there any solution to maitain the tunnel application (Thread_Shell) and periodically send THCI commands in python and receive the answer in the python callbacks?

This question is related to my post here: 

https://community.nxp.com/thread/500356

I ask you directly because I think you answered me correctly all the questions I posted here at the moment. I feel you are the one that can help me in this field.

Regards,

Diego Comín

0 Kudos

2,070 Views
diegocomin
Contributor IV

Hi JC,

Thanks to take care of the issue, I appreciate. Ok, I noticed also that a reset of the HCD restores the communication, however I think it is not a "clean" solution. I prefer to wait for the news about this issue but I'll reset HCD as a workaround at the moment.

Yes, the ULA is required in our topology because the Raspberry PI needs to have direct IP connection with Thread nodes and this can be achieved only using the static ULA. Our application needs to comunicate with CoAP between Raspberry Pi and the Thread Network.

Regards,

Diego Comin

0 Kudos

2,070 Views
jc_pacheco
NXP Employee
NXP Employee

Hi Diego,

What SDK version are you using?

Keep in mind that the " make_tap.sh" command should only be called once, even if you stop the tunnel application (Thread_Shell or Thread_KW_TUN).

Could you also provide some sniffer logs?

0 Kudos

2,070 Views
diegocomin
Contributor IV

Hi JC,

I am using the SDK version 2.2.0 (released 2019-01-16). I use "make_tap.sh" every time my Border Router turns off (because the RP also turns off and do not save the tap tunnel with it then powers on). When I close the tunnel application (Thread_Shell or Thread_KW_TUN) I do not use "make_tap.sh" another time because I know it is yet started.

I provide you the sniffer logs, I make exactly the steps I posted before. At the step 9, when my REED is requesting to become Active Router on a loop if I put "getnodesip" I receive nothing but I see some logs at the sniffers. However, when I make the step 10 and I put "getnodesip" from the Tunnel Apllication (Thread_Shell) and I receive nothing, I can not see any logs at the sniffer related to my "getnodesip" command. It is like I am not in the Thread network.

Keep in mind in this Border Router setup I do not use a DHCPv6-PD router. I use TAP using static ULA (so I disable the THR_SERIAL_TUN_USE_DHCP6_ADDR define from the HCD program). I do not know if this may affect the network behaviour.

Regards,

Diego Comín

0 Kudos

2,070 Views
jc_pacheco
NXP Employee
NXP Employee

The Thread Border Router using Linux + KW as Host Controlled Device with Serial TAP for DHCPv6-PD post just got updated with the information for the current SDK and I was unable to reproduce the issue. Can you take a look and verify if the issue still happens.

2,070 Views
diegocomin
Contributor IV

Hi Juan Carlos,

I have more or less the same issue posted by caoxi cai. I created a Border Router using a Raspberry Pi Raspbian + Host Controlled device using TAP tunnel. I followed your tutorial in https://community.nxp.com/docs/DOC-334294 post, except I do not use a DHCPv6-PD router (I receive data on the Pi with a CoAP python library and send data to the Cloud using MQTT over the RP Wifi). I use TAP but keep using static ULA (so I disable the THR_SERIAL_TUN_USE_DHCP6_ADDR define from the HCD program). I use the SDK released 2019-01-16.

The issue is the following:

1) I have a router_eligible_device (REED) and a Border Router (Raspberry PI with Raspbian connected to Host Controlled Device HCD).

2) I modified the REED program to have a Thread auto start  (#define APP_AUTOSTART 1). And I power on first the REED so it starts searching any thread network.

pastedImage_3.png

3) Then I power on my Border Router (BR), I make a TAP interface (sudo bash make_tap.sh)

pastedImage_4.png

pastedImage_5.png

4) And I start the Thread_Shell (sudo ./Thread_Shell /dev/ttyACM0 25):

pastedImage_6.png

5) At this moment I create a Thread Network on the Thread_Shell (thr create) and the REED joins the network and becomes a Router:

pastedImage_12.png

6) Then, I power off my Border Router and the REED becomes the leader of the network:

pastedImage_13.png

7) At this moment I power on my Border Router, I make_tap.sh and Thread_Shell /dev/ttyACM0 where by default I am in the same Thread network that the REED (I put "getnodesip" to see the other node and I ping in another RP terminal to see if I have IP connection with it):

pastedImage_9.png

8) Then I power off the REED and my Border Route becomes the leader of the network:

pastedImage_10.png

9) Finally I power on the REED, apparently it is connected on the same network (I put ifconfig and have the same ULA IPv6) but I get nothing with the command "getnodesip" and it is requesting to become Active Router on a loop...

pastedImage_11.png

10) I put getnodesip on the BR but I receive nothing, so my node has disapaired from the Thread network.

pastedImage_21.png

Is this a bug?? If it is, I think Thread Network is not robust, I can loose easily the Thread nodes connections!!! 

Can someone give me a solution to this issue?

Regards,

Diego Comín

0 Kudos

2,070 Views
estephania_mart
NXP TechSupport
NXP TechSupport

Hello, 

I believe the document you are referring to it's outdated, we will check it and modify if needed. 

The current Thread Stack version it's the Kinetis Thread Stack 1.2.6. 

Could you please use the latest SDK and check the Kinetis FSCI Host Application Programming Interface? 

Regards, 

Estephania 

0 Kudos

2,070 Views
phantomgz
Contributor III

I did use the latest SDK from KSDK 2.2.0 (released 2018-09-04).

And I guess probably because after BR is upgraded to Leader, commissioning becomes external from native,

at this time, I don't have an external commissioner, and the referring document even no mentions anything about that.

But the most critical operation about Thread stack is not open source, so I can't dig in to find out what happened.

In addition, I can not find any official document from NXP about Thread border router.

0 Kudos