GMAC Context Descriptor Bug

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

GMAC Context Descriptor Bug

2,594件の閲覧回数
oliver777777
Contributor II

Hi, 

 

I'm testing the GMAC driver under high load (200,000 packets w/ periodicity of 100us) to test stability of DMA. After initialize sys-time, I noticed that the rx ring descriptor state would break. When RX channels were odd or when packet overflow, I observed in the ring buffer two descriptors back to back that are both write back descriptors but both also mark timestamp available. So when calling ReadFrame on the first one, it reads in the second descriptor as a timestamp even though it is a write back descriptor. 

I was able to verify that this was the bug by patching a peeking mechanism in Gmac_Ip_ReadTimeStampInfo, by doing a check to see if its actually a timestamp context descriptor and returning prematurely from the function if not. 

This is the code: 

CtxtBd = ((Gmac_Ip_PtrSizeType)&Bd[1U] >= (Gmac_Ip_PtrSizeType)&ListBd[RingLength])? ListBd : &Bd[1U];
if((CtxtBd->Des3 & GMAC_RDES3_CTXT_MASK) == 0U){
Info->ErrMask = 0xDEADBEEF;
return;
}
 
Not sure if this is a known bug with system time stamping or I'm doing something wrong on my side? But it seems like the assumption that write back descriptors and context descriptors being contiguous runs into problems at high load.
0 件の賞賛
返信
11 返答(返信)

2,550件の閲覧回数
oliver777777
Contributor II

I fixed it fully by modifying readFrame to only read when both buffer and context descriptor are available when TSA is enabled and modifying peekFrame to also only return true when both buffer and context descriptor is available when TSA is enabled. Would it be possible to have this patched soon?

0 件の賞賛
返信

2,588件の閲覧回数
oliver777777
Contributor II

Here is the sample patch

0 件の賞賛
返信

2,520件の閲覧回数
Nhi_Nguyen
NXP Employee
NXP Employee

Hi @oliver777777 ,

"I observed in the ring buffer two descriptors back to back that are both write back descriptors but both also mark timestamp available. So when calling ReadFrame on the first one, it reads in the second descriptor as a timestamp even though it is a write back descriptor. " Sorry, I can't understand this. Because I understand that if the first buffer is write-back descriptor, TSA bit not set, driver doesn't read second descriptor as a timestamp. So, your issue can't occur.

Could you please show first and second descriptor in your case? First one for write-back descriptor and second one for timestamp. I think that you can take photo at address that indicated by Gmac_apxState[Instance]->RxCurrentDesc[Ring] for first one and  Gmac_apxState[Instance]->RxCurrentDesc[Ring]++ or ListBd for second one.

Best regards,

Nhi

0 件の賞賛
返信

2,503件の閲覧回数
oliver777777
Contributor II

In the attached screenshot, I'll show what happens when InitsysTime is called with 5 rx ring descriptors and a breakpoint set at the start of Gmac_Ip_ReadTimeStampInfo. 

 

At Breakpoint 1, the RxCurrentDesc is pointed to descriptor 5, which is a write back descriptor with TSA available (Des3 = 0x340100a3). So when I fin the function, the RxCurrentDesc is now incremented to descriptor 1, and it reads in the descriptor 1 as the timestamp. However, notice how descriptor 1 is a write back descriptor because the context descriptor bit is not set. As a result, when I print out Info, the timestamp is shwoing 16453, and 541368704, which is not a valid time but just Des0 and Des1 or a wb desciprtor. 

 

when I finish the readframe function, RxCurrentDesc now points to descriptor 2 after the final increment. Notice how descriptor 2 is a Timestamp and not a writeback because Des3 = 0x40000000, which means ctxt descriptor is set. So next time I ReadFrame, it will read in this timestamp and give it back as memory addresses to buffer because it thinks its a writeback descriptor. 

 

I've attached a slideshow to show step by step how ring buffer will lead to DMA suspend in the long run when context descriptors are enabled and high load is encountered (meaning the ring descriptor always stay full) 

0 件の賞賛
返信

2,351件の閲覧回数
Nhi_Nguyen
NXP Employee
NXP Employee

Hi @oliver777777 ,

From my point of view:

2. I tried to test the case that frame was received bigger than length of Rx buffer. It has same issue with Timestamp, this means if RxCurrentDesc is 5 and Rx buffers are full, a part of frame received at this buffer and another part still is in MTL Fifo because Rx buffer isn't available, at this time, buffer 1 is keeping another frame but still hasn't release yet, So, if after buffer 5 released(by ReadFrame) and check that still more data in queue, then continue to read buffer will return wrong data and status. For this case, OE isn't raised, so we will not handle anything relate to this bit. I will also report this issue to SW team.

However, I couldn't make bit OE(Overflow Error) is raised, even I tried to transmit frames to Rx MTL with total length bigger than Rx MTL FIFO supported by HW without reading Rx buffers(a part frame is received into MTL FIFO, another part isn't received  because MTL FIFO is overflow). But from description of OE, I understand that if this bit raised, the frame will be damaged in MTL FIFO and not read to the application, so, we don't need to handle it. If you can raise this bit, please show data of buffer descriptors.

3. I will send your suggestion to SW team and report issues with Timestamp and overflow. They will make decision how to handle them.

Best regards,

Nhi

0 件の賞賛
返信

2,477件の閲覧回数
Nhi_Nguyen
NXP Employee
NXP Employee

Hi @oliver777777 ,

Can you check for uploading photos? because I didn't see them.

Best regards,

Nhi

0 件の賞賛
返信

2,460件の閲覧回数
oliver777777
Contributor II

Yes apologies, here is the screenshot and slideshow.

0 件の賞賛
返信

2,439件の閲覧回数
Nhi_Nguyen
NXP Employee
NXP Employee

Hi @oliver777777 ,

First, from my point of view, I agree that this is a bug because the information about Timestamp were returned with values incorrectly. And in the function Gmac_Ip_ReadTimeStampInfo(), driver should check CTXT bit in Des 3 of next Buffer descriptor have to be raised to make sure that buffer is context descriptor before reading timestamp values. 

Second, I didn't see bit OVERFLOW raised in your case and as description from RM:

Overflow_ETH.PNG

So, bit OE just indicates the buffer doesn't have enough space to store entire frame and the frame will be damaged. I don't think that we need to handle with overflow here.

Third, about the function Gmac_Ip_IsFrameAvailable(), this function was used to check whether other frames are available in the buffers and haven't read yet or not. This function always be called after the function ReadFrame(). So, I don't think we need to change anything in this function.

I'll contact SW team about first point, it'll take time to SW team analyzes this because of their workload. If they confirm that this is a bug from driver, I think that they'll provide a fixing in the release RTM600 that will be released in 10-June. In the case, SW team said that this is not a bug, I'll come back to you with their response.

Best regards,

Nhi

 

0 件の賞賛
返信

2,404件の閲覧回数
oliver777777
Contributor II

Hi,

 

A few responses to your second and third point. 

 

2. The overflow bit wasn't shown in the examples I sent but from testing I was able to discover that overflow errors will result in similar bugs with the timestamp. In the example above, I had 5 ring descriptors which was why under high load, the last descriptor would always be a partial write (only buffer, timestamp in MTL queue). I also tested with 4 ring descriptors which in theory should never hit the same bug, since it will only always hold 2 buffer-timestamp pairs. However, I noticed that even for 4 ring descriptors, I would get an overflow error, which would take up one ring descriptor leading us to have effectively 3 ring descriptors, in which case we would hit the same bug as before because of odd number of ring descriptor. 

 

3. My impression was that Gmac_Ip_IsFrameAvailable() is a public function that is meant to peek so I don't consume a ring descriptor if I'm not ready to process a frame in my code. Meaning, there will be situations where it is fair to call it before ReadFrame as apart of some loop. 

 

Thanks for the help. Let me know what their response is and I'm also reachable by email. 

0 件の賞賛
返信

2,496件の閲覧回数
oliver777777
Contributor II

I believe this happens because when load is high, there is a situation where the buffer is written but the timestamp is still in MTL queue. In which case, we should only return a frame and increment desc pointers if both are available. Which is what I believe my patch accounts for here.

0 件の賞賛
返信

1,927件の閲覧回数
Nhi_Nguyen
NXP Employee
NXP Employee

Hi @oliver777777 ,

I missed the note in UM about the sequence of receiving frame: 

Nhi_Nguyen_0-1747129093538.png

So, the issue can't occur if you call Gmac_Ip_ProvideRxBuff after each Gmac_Ip_ReadFrame();

Please let me know if you have any response about this.

Best regards,

Nhi

0 件の賞賛
返信