We've mkw01 based end nodes running the 802.15.4g stack that are mains powered (so, no need for low power modes) that are supposed to be in the RxOnWhenIdle mode all the time. But, every once in a week or sometimes even a month, one of the nodes stops receiving data. Once we force the node to Tx a packet, it starts receiving again until the next time we hit this bug.
After going through a lot of trouble trying to reproduce this issue, we finally have one node attached to the debugger that finally hit the bug! The problem, it seems, is that somehow the Phy/Radio went into idle mode even though at init we set RxOnWhenIdle to true. We went through a bunch of our event logs and figured that this node stopped receiving right after it sent a broadcast packet about its status. Almost all our communication is unicast and Ack-based but every once in a while (depending on a lot of factors related to the dynamics of the network topology) one of the nodes broadcasts a status. And right after this node sent the broadcast packet, it went into Idle. Note that over the last week or so the same node has been sending the same broadcast packet but it did not stop Rx until yesterday. So, the problem doesn't seem to happen every time a packet is broadcast.
Now, I went through some of the Phy code and found that the PhyTxPacketSentEvent *might* be suspicious. I could very well be wrong about it because I don't know the Phy code too well but it seems that if an Ack is not expected (as is the case when a packet is broadcast), the PhyState is set to Idle and the Radio never goes into Rx. I do understand that this code worked for over a week before incorrectly going into the Idle state, so, there is some other code in the Phy state machine that checks if RxOnWhenIdle is true and puts the radio in the Rx mode. But I suspect there is some unhandled race condition in/around this piece of code which is causing us a lot of pain.
Can anyone please help shed some light on this?