MPC8270 ECC double bit error debug

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

MPC8270 ECC double bit error debug

1,550件の閲覧回数
sai_jagannathan
Contributor II

We are using the MPC8270 processor. We are seeing ECC double errors randomly. Sometimes after 1 hour, sometimes after a couple of days. The only observation we have is that the error comes when we have lots of multicast packets coming in through the ethernet port. We have not been able to determine what is causing this random failure. Can you suggest some clues to debug this issue? Since cache is enabled, we are unable to hit the exact instruction that caused the error. We have a local bus which interfaces to an SRAM. We have checked the timings of the waveforms and we dont find anything that violates the timing of the memory device. We are running the 8 byte wide data bus at 64MHz clock.

Also is there any way where the Machine Check Exception can be routed to the processor pin? This seems to be available on NMI_OUT pin only when the 603e core is disabled. Any other suggestions?

タグ(2)
0 件の賞賛
返信
3 返答(返信)

1,388件の閲覧回数
alexander_yakov
NXP Employee
NXP Employee

1. Please specify how many (identical) boards you have, and how many of them show such erratic behavior. If (for example) the problem is observed on only one board out of (for example) 10, than this may be a quality issue of this particular board, but not a design issue.

2. MPC8270 is quite old processor, so I assume the design is also quite old. Please specify what is changed in the design or in BOM, before the problem started to appear.

3. You said "We are running the 8 byte wide data bus at 64MHz clock" - ECC assumes 9-th chip for ECC data storage, I assume this chip is present in your design, as long as you see ECC errors only occasionally, but still - please verify 9-th memory chip is present and has exactly the same part marking, as all other 8 memory chips.

4. Using ECC feature assumes special requirements for data pipelining and CAS latency. Please look MPC8280 Reference Manual, Sections 11.2.9 and 11.2.4 and confirm these requirements are taken into account in your memory controller settings.

To answer your question:

MCP can not be routed out to IRQ0 if the core is not disabled, but you can disable MCP propagation to the core in HID0[EMCP]


Have a great day,
Alexander
TIC

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 件の賞賛
返信

1,388件の閲覧回数
sai_jagannathan
Contributor II

Thanks Alexander.

  1. Please specify how many (identical) boards you have, and how many of them show such erratic behavior. If (for example) the problem is observed on only one board out of (for example) 10, than this may be a quality issue of this particular board, but not a design issue.

 We have 5 boards with the 8270 and 5 boards with the 8260. We see this random crash on all the boards. While the failure is random, it takes less time for it to be seen in the 8270 based boards than the boards with the 8260. Probably within 6hrs to 12 hrs on a MPC8270 but close to a week on MPC8260 boards. Between the two boards, the only change is the processor.

  1. MPC8270 is quite old processor, so I assume the design is also quite old. Please specify what is changed in the design or in BOM, before the problem started to appear.

We had to redo the board for obsolescence reasons. The main changes are the SRAM memory and the Ethernet transceiver. We have also moved from MPC8260 to MPC8270.

We checked the timing on the memory devices and there seems to be sufficient timing allowances in the signals.

Also what we notice is that there are no single bit errors that are ever reported. 

The one time, when we were able to log the data just before a crash, it appears that the data read from the memory is correct. But it still reported a ECC error. And the data which reportedly had the ECC error (on this occasion) is a value which is written to the RAM only once early after power up. Its read multiple times (correctly with no ECC error) before the module crashes after a few hours.

  1. You said "We are running the 8 byte wide data bus at 64MHz clock" - ECC assumes 9-th chip for ECC data storage, I assume this chip is present in your design, as long as you see ECC errors only occasionally, but still - please verify 9-th memory chip is present and has exactly the same part marking, as all other 8 memory chips.

Yes, there is a 9th memory chip present for the ECC memory.

Each time, the module fails, we see that the TESCR2 register has a value of 0x00004000

The memory bank indicated is the (CS1) which connects to the SRAM. But the “TESCR2[PB]” shows up as 0 for all the 8 byte lanes. Do the PB bits get updated on an ECC failure?

  1. Using ECC feature assumes special requirements for data pipelining and CAS latency. Please look MPC8280 Reference Manual, Sections 11.2.9 and 11.2.4 and confirm these requirements are taken into account in your memory controller settings.

We are using ECC for the SRAM interface. We don’t have parity enabled for the SDRAM interface.

 

To answer your question:

 

MCP can not be routed out to IRQ0 if the core is not disabled, but you can disable MCP propagation to the core in HID0[EMCP]

We did try this. The module does fail later on due to execution of an unknown instruction.

0 件の賞賛
返信

1,388件の閲覧回数
alexander_yakov
NXP Employee
NXP Employee

Please open a support case and submit your schematic and memory setup for analysis.

https://community.nxp.com/docs/DOC-329745


Have a great day,
Alexander
TIC

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 件の賞賛
返信