We have encountered a rare but persistent problem in our Zigbee network, which consists of a coordinator, three Zigbee routers (one of which is an NXP-based MKW41Z512 Zigbee router that we maintain) and nine Zigbee end devices.
Occasionally (every couple of months or so) one of the sleepy end devices (a third party device not developed by us) becomes unreachable. This device continues to send MAC data poll requests to our NXP-based router and receives ACKs from the router, but the frame pending bit in the ACK is always set to 0. I saw this in a Wireshark log taken after the problem occurred. On the end device screen we see a signal icon, which means it thinks it is connected to the network.
The end device is configured to report an attribute every two minutes and it appears that the device knows that it cannot reach the coordinator, so it broadcasts a network address request with the coordinator's extended address to attempt to send the report. Although the coordinator receives the network address request, it cannot respond due to a source route failure to this end device. The coordinator then sends a source route request, but no other devices respond, resulting in the device becoming completely unreachable from the network.
Using an LQI request (Mgmt_Lqi_req) to an NXP-based router, I confirmed that this end device is not in the router's child table, yet End Device continues to treat this router as its parent by sending MAC data poll requests. I also saw from the router's LQI response that this end device is marked as a sibling, but not as a child. There were also 3 other end devices connected to the router that were working fine and were listed as children.
Rebooting the NXP based router solves the problem by causing the end device to rejoin another router on the network.
According to Zigbee specification R22, section 3.6.10.4 MAC Data Poll processing, a parent node should send a leave and rejoin request to an end device when it receives a MAC Data Poll from a device that doesn't exist in the neighbour table, but our router for some reason didn't issue this request.
So my questions are:
Our router is based on NXP Zigbee stack version zigbee 3_0_6.0.9 (SDK v2.2.3).
Thank you for any insights or guidance on how to address this issue.
Hello,
Hope you are doing well. Could you please clarify what example you took as a base for your Router application? What modifications did you made?
By any chance, the behavior you are seeing can be "reproduced" with the example application without modifications?
I would recommend checking sections 5.61 and 5.6.2 from the "ZigBee 3.0 Stack User Guide.pdf" (Path: SDK\docs\wireless\Zigbee).
Hope this helps!
Regards,
Ricardo
Hi Ricardo,
Thanks for your reply.
The project started a couple of years ago and I'm not sure at the moment which example we used. I think it was:
SDK_2_2_3_MKW41Z512xxx4\boards\frdmkw41z\wireless_examples\zigbee_3_0\router\freertos\
Regarding the differences we have compared to the example:
// app_zps_cfg.h #define ZPS_BIND_REQUEST_SERVER_TIME_INTERVAL 11 // was 1 #define ZPS_APS_AIB_INIT_USE_INSTALL_CODE TRUE // was FALSE #define AF_SIMPLE_DESCRIPTOR_TABLE_SIZE 3 // was 2 // We increased it to fix ZPS_XS_E_NO_FREE_APS_ACK error as per // https://community.nxp.com/t5/MCUXpresso-General/ZPS-XS-E-NO-FREE-APS-ACK-ZPS-APL-APS-E-ILLEGAL-REQUEST/td-p/958539 #define APS_DCFM_RECORD_POOL_SIZE 15 // was 5 #define ZPS_ROUTE_DISCOVERY_TABLE_SIZE 7 // was 4 #define ZPS_BINDING_TABLE_SIZE 6 // was 4 // Increased it so that it is possible to join 7 child devices to our Router, // before this change it was possible to join 4 child devices #define ZPS_CHILD_TABLE_SIZE 8 // was 5
// config.h #define PDM_APP_ID PDM_APP_BDB_ZC_ID // In example it is equal to PDM_APP_BDB_ZR_ID
// app_mac_cfg.h #define gMacTaskStackSize_c 900 // was 1300
// app_framework_config.h #define PoolsDetails_c \ _block_size_ 64 _number_of_blocks_ 8 _pool_id_(0) _eol_ \ _block_size_ 180_number_of_blocks_ 25 _pool_id_(0) _eol_ \ // num of blocks was 20 _block_size_ 256 _number_of_blocks_ 9 _pool_id_(0) _eol_ // num of blocks was 6
// app_pdum_cfg.h #define PdumsDetails_c \ _pdum_handler_name_(pdum_apduZDP) _pdum_block_size_(100) _pdum_queue_size_(6) _eol_ _pdum_handler_name_(pdum_apduZCL \ // queue size was 3 ) _pdum_block_size_(100) _pdum_queue_size_(20) _eol_ // queue size was 10
Writing this reply, I noticed that we have a different PDM_APP_ID setting which is equal to PDM_APP_BDB_ZC_ID in our case and in the example it is equal to PDM_APP_BDB_ZR_ID. Should we change it to PDM_APP_BDB_ZR_ID?
Could you clarify the behavior of the NXP 3_0_6.0.9 (SDK v2.2.3) router when it receives a MAC data poll from a device marked as a sibling or former child in the neighbor table? Does it send a leave request with rejoin to such a device?
Thank you in advance for your reply!