Hi Luis,
We've been able to reproduce the problem with a debugger again! First of all, the problem isn't that rxOnWhenIdle is 0. It is 1, as expected but the mPhyState is still 0 (gIdle_c) like noted before. I've tried to gather as much PHY state as I could and I'd love to dig into this further and get you any other state that you'd like (fortunately, the debugger is still attached! And I'll try to keep it that way until we are sure we don't need it anymore).
Here's what I think I've found so far:
1. rxData was:
rxData <array>" " 0x20001A00 uint8_t[254]
[0] '.' (0x02) 0x20001A00 uint8_t
[1] '\0' (0x00) 0x20001A01 uint8_t
[2] 'Œ' (0x8C) 0x20001A02 uint8_t
[3] '\r' (0x0D) 0x20001A03 uint8_t
[4] '.' (0x10) 0x20001A04 uint8_t
[5] '€' (0x80) 0x20001A05 uint8_t
[6] '\0' (0x00) 0x20001A06 uint8_t
[7] '\0' (0x00) 0x20001A07 uint8_t
[8] '.' (0x0F) 0x20001A08 uint8_t
[9] '\0' (0x00) 0x20001A09 uint8_t
[10] 'z' (0x7A) 0x20001A0A uint8_t
[11] '.' (0x1E) 0x20001A0B uint8_t
[12] '.' (0x04) 0x20001A0C uint8_t
[13] '\r' (0x0D) 0x20001A0D uint8_t
[14] '.' (0x10) 0x20001A0E uint8_t
[15] '\0' (0x00) 0x20001A0F uint8_t
[16] '.' (0x01) 0x20001A10 uint8_t
[17] '\0' (0x00) 0x20001A11 uint8_t
[18] 'z' (0x7A) 0x20001A12 uint8_t
[19] '.' (0x1E) 0x20001A13 uint8_t
[20] '.' (0x04) 0x20001A14 uint8_t
Note that the frame control lsb is 0x02 which means that the last packet received was an ACK from the device with dst address = 0x041E7A000F000080 and src address = 0x041E7A000100100D. The src is the co-ordinator in the network that the "hung" device belongs to and its address is 0x041E7A00040003B5. Its very likely that everything in the rxData after index 2 is from the previous packet, but from our application logs on the co-ordinator, it looks like the radio hung right after the device 0x041E7A000F000080 sent a status packet to the co-ordinator, followed by the co-ordinator replying with an ACK.
2. Just to note again, mPhyState '\0' (0x00) 0x20002AD7 uint8_t
3. The phyLocal state:
phyLocal <struct> 0x1FFFF1D8 Phy_PhyLocalStruct_t
PD_MAC_SapHandler 0x0000B521 0x1FFFF1D8 PD_MAC_SapHandler_t
PLME_MAC_SapHandler 0x0000B543 0x1FFFF1DC PLME_MAC_SapHandler_t
phyTaskEventId <struct> 0x1FFFF1E0 event_t
macPhyInputQueue <struct> 0x1FFFF204 list_t
head 0x00000000 0x1FFFF204 struct listElement_tag *
tail 0x00000000 0x1FFFF208 struct listElement_tag *
size 0 0x1FFFF20C uint16_t
max 0 0x1FFFF20E uint16_t
maxFrameWaitTime 7610 0x1FFFF210 uint32_t
txParams <struct> 0x1FFFF214 phyTxParams_t
numOfCca '\0' (0x00) 0x1FFFF214 uint8_t
ackRequired gPhyRxAckRqd_c 0x1FFFF215 phyAckRequired_t
<union> 0x1FFFF218 union <Unnamed 98>
rxParams <struct> 0x1FFFF218 phyRxParams_t
timeStamp 856552102 0x1FFFF218 uint64_t
psduLength '.' (0x05) 0x1FFFF220 uint8_t
linkQuality '8' (0x38) 0x1FFFF221 uint8_t
headerLength '.' (0x03) 0x1FFFF222 uint8_t
macDataIndex '.' (0x02) 0x1FFFF223 uint8_t
fifoBlockLen '.' (0x05) 0x1FFFF224 uint8_t
phyHeader <struct> 0x1FFFF226 phyPHR_t
<union> 0x1FFFF226 union <Unnamed 69>
mask 24 0x1FFFF226 uint16_t
byteAccess <array>" " 0x1FFFF226 uint8_t[2]
[0] '.' (0x18) 0x1FFFF226 uint8_t
[1] '\0' (0x00) 0x1FFFF227 uint8_t
<struct> 0x1FFFF226 struct <Unnamed 68>
modeSwitch '\0' (0x00) 0x1FFFF226 uint8_t
reserved '\0' (0x00) 0x1FFFF226 uint8_t
fcsType '.' (0x01) 0x1FFFF226 uint8_t
dataWhitening '.' (0x01) 0x1FFFF226 uint8_t
frameLengthRsvd '\0' (0x00) 0x1FFFF226 uint8_t
frameLength '\0' (0x00) 0x1FFFF227 uint8_t
channelParams <struct> 0x1FFFF218 phyChannelParams_t
<union> 0x1FFFF218 union <Unnamed 97>
channelStatus '¦' (0xA6) 0x1FFFF218 phyStatus_t
energyLeveldB '¦' (0xA6) 0x1FFFF218 uint8_t
ccaParam 'ò' (0xF2) 0x1FFFF219 uint8_t
flags <struct> 0x1FFFF228 phyFlags_t
<union> 0x1FFFF228 union <Unnamed 91>
mask 2053 0x1FFFF228 uint32_t
<struct> 0x1FFFF228 struct <Unnamed 89>
rxOnWhenIdle 1 0x1FFFF228 uint32_t
rxFramePending 0 0x1FFFF228 uint32_t
idleRx 1 0x1FFFF228 uint32_t
ccaBfrTX 0 0x1FFFF228 uint32_t
rxAckRqd 0 0x1FFFF228 uint32_t
autoAck 0 0x1FFFF228 uint32_t
panCordntr 0 0x1FFFF228 uint32_t
promiscuous 0 0x1FFFF228 uint32_t
activePromiscuous 0 0x1FFFF228 uint32_t
cslRxEnabled 0 0x1FFFF228 uint32_t
rxEnhAckRqd 0 0x1FFFF228 uint32_t
ccaComplete 1 0x1FFFF228 uint32_t
tschEnabled 0 0x1FFFF228 uint32_t
filterFail 0 0x1FFFF228 uint32_t
rxIsListen 0 0x1FFFF228 uint32_t
reserved 0 0x1FFFF228 uint32_t
startTime 18446744073709551615 0x1FFFF230 uint64_t
phyUnavailableQueuePos 0 0x1FFFF238 uint16_t
phyIndirectQueue <array> 0x1FFFF23A uint16_t[10]
[0] 0 0x1FFFF23A uint16_t
[1] 0 0x1FFFF23C uint16_t
[2] 0 0x1FFFF23E uint16_t
[3] 0 0x1FFFF240 uint16_t
[4] 0 0x1FFFF242 uint16_t
[5] 0 0x1FFFF244 uint16_t
[6] 0 0x1FFFF246 uint16_t
[7] 0 0x1FFFF248 uint16_t
[8] 0 0x1FFFF24A uint16_t
[9] 0 0x1FFFF24C uint16_t
fcs 0x33B0 0x1FFFF24E uint16_t
macPanID <array>" þÿµ " 0x1FFFF250 uint8_t[2]
[0] '\r' (0x0D) 0x1FFFF250 uint8_t
[1] '.' (0x10) 0x1FFFF251 uint8_t
macShortAddress <array>"þÿµ " 0x1FFFF252 uint8_t[2]
[0] 'þ' (0xFE) 0x1FFFF252 uint8_t
[1] 'ÿ' (0xFF) 0x1FFFF253 uint8_t
macLongAddress <array>"µ " 0x1FFFF254 uint8_t[8]
[0] 'µ' (0xB5) 0x1FFFF254 uint8_t
[1] '.' (0x03) 0x1FFFF255 uint8_t
[2] '\0' (0x00) 0x1FFFF256 uint8_t
[3] '.' (0x04) 0x1FFFF257 uint8_t
[4] '\0' (0x00) 0x1FFFF258 uint8_t
[5] 'z' (0x7A) 0x1FFFF259 uint8_t
[6] '.' (0x1E) 0x1FFFF25A uint8_t
[7] '.' (0x04) 0x1FFFF25B uint8_t
currentMacInstance '\0' (0x00) 0x1FFFF25C uint8_t
4. To put it all together: (This part I'm not very sure about!)
- The radio started receiving an ACK sent by the co-ordinator and meant for another device in the network. (because frameControlLsb = 0x02)
- While receiving and parsing the incoming packet not meant for this device, something bad happened right after the 2nd byte was received or when Phy_RxFrameFilter was called for macDataIndex == 2 (I think so because macDataIndex is stuck at 0x02 where as the frameLength is 0x05)
- I also noted that mSeqNumber is 0xAC on the debugged device but rxData[3] == 0x8C, which indicates that whatever last ACK/packet was received was not meant for this device.
- Maybe the current packet reception was correctly aborted but the radio did not go into Rx as expected?
The device is paused in the debugger, so, feel free to ask for any additional details that you may require to debug this further. I can try for as long as possible, but cannot guarantee, to keep the debugger attached (because I've had issues where our debugger gets a mind of its own and gets disconnected once in a while). Also, if I'm on the right track with my analysis, you may want to take a look at the part where bad packets are discarded and RxRestart is called. The chances of you finding the bug, if there is one, are way higher than me! :-)
Also, we've almost run out of time and would like a fix/patch/workaround as soon as possible because our next batch is being held up for this bug and I'm not sure if and when I can reproduce this again! And besides, we need to test the patch before we can deploy, so that'll need some time as well. So, please hurry! And thanks!