AnsweredAssumed Answered

We meet a critical problem in P3041 v2.0

Question asked by 巍松 郭 on Sep 18, 2014
We run this CPU in linux-3.0.48-rt70 , in SMP mode,and run an execute file "nos" in multi-thread mode, glibc version is 2.13. We found in one version of us, our "nos" raise Illegal Instruction signal on address 0x122b1ddc(the start address of .plt section of nos),we can reproduce this problem in our testCenter. but sometimes it can run without error for long time (several days)——in our "nos", this instruction run many times per second(each time call functions in lib). This build coredump file, we can see the address 0x122b1ddc is right, not be modified. it can be display in gdb: => 0x122b1ddc:  addis   r11,r11,4651

0x122b1de0:  lwz r11,9284(r11)

0x122b1de4:  mtctr   r11

0x122b1de8:  bctr

Not only this version have this problem. but all of these raise Illegal Instruction signal in the start address of .plt section of nos. We have not seen this problem in P3041 v1.1,which we run in several devices for more than 3 months (although not all in this version). Because the memory is right but raise Illegal Instruction, we doubt the ICACHE is corrupt but be used still.  In these bad versions, this address is 0xXXXXXXdc or 0xXXXXXX68, in the (seem) good versions, the address is 0xXXXXXXc8.  one global variable which be modified often in our system is very close before this address. in bad versions, they are in one cacheline. in good versions, they are not in one cacheline. what we doubt is it happened when this cacheline often invalidate because of data modify. but we have not verify it.  Can you give us some suggestion about this problem. This first happened in one of our important customer, you know it is a critical problem. Hope your help eagerly.

Outcomes