DAR Register is Corrupted

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

DAR Register is Corrupted

9,153 Views
mohammak
Contributor II

There is an issue with MPC555 code or my hardware that I am investigating. I am using a debugger to find the issue.
The issue happens only at cold temperature, most probably one of the components upset.
By debugger I can see DAR register address gets corrupted out of range and so MCU stops execution.
Using the debugger I do not have a clue why this happens because by this debugger I can not go back in the code and do a lot of tracing. The assembly line where debugger stops is: lfs f28,0x5C(r12)
It seems MCU is trying to access an address in the SRAM located at (0x5C+r12), but because r12 is out of range, MCU stops.
The only anomaly I see using the debugger is DAR register which is set to (0x5C+r12) and is out of range.
DAR definition in MPC555 datasheet: After an alignment exception, the DAR is set to the effective address of a load or store element.
I am still scratching my head how this address corruption happens. So I need some help from an expert.

In the HW I have ADDR[8-31] pins and DATA[0-31] shared between my 3 peripherals that are SRAM, FLASH and a Digital Transceiver.
How the value in DAR is relevant with ADDR[8-31]. Is it possible that that when r12 data is being passed on data bus it gets corrupted due to a timing issue by one of the other devices? 

0 Kudos
Reply
14 Replies

8,834 Views
mohammak
Contributor II

In my case, a machine check exception dialog box appears from the debugger at the time of failure.

When I look at the ERC register, it is 0x00000000 or MCE=0. Does it mean that the failure is caused by a data storage or instruction storage error? 

How can I get details about cause of this machine check exception? 

0 Kudos
Reply

8,863 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi Kaveh,

did you try to compare the firmware at room temperature and at cold temperature? Using CRC, for example. If it is a timing issue or some bus disturbances, I would expect that the result will be random. If you can see still the same failing instruction and still the same failing address then it could be caused by weak flash cell which is read in different way at different temperature.

Regards,

Lukas

8,863 Views
mohammak
Contributor II

I agree with you on this. If it was a timing issue on shared bus with SRAM, FLASH or Serial Interface, then the MPC555 would fail at different instructions or different addresses. 

In my case, debugger stops always at a particular instruction, particular address as below: 

00060F8C   lfs   f28, 0x5C(r12)

When I make some minor changes to the C code, debugger stops as a different address (than 00060F8C), but still stops at the same instruction "lfs   f28, 0x5C(r12)". It always stops at this same instruction "lfs   f28, 0x5C(r12)"

What do you think about the cause? Thanks Lukas for the answers. 

0 Kudos
Reply

8,863 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

The root cause of failing lfs instruction is wrong value in r12. If you check instructions executed before lfs, where is the r12 loaded from? Is it loaded from flash (i.e. is it constant at this point)? Or is it calculated on fly?

8,863 Views
mohammak
Contributor II

Can you let me understand how finding this answer will help me toward the root cause? If it is calculated on the fly vs loaded from FLASH? You mean internal FLASH, right? It could be loaded from other memories like SRAM as well.

0 Kudos
Reply

8,863 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi Kaveh,

please disregard that message. I can see now it doesn't matter.

Honestly, I'm out of ideas. An option would to to send the part for failure analysis to confirm it is not related to the device itself. Notice that such request must go via sales path (ask your local sales point for help).

If it is related to signal integrity (i.e. to external memories) than I probably can't help.

Regards,

Lukas

0 Kudos
Reply

8,863 Views
mohammak
Contributor II

CRC is checked continuously during operation so that if there is any discrepancy with the default it stops in CRC check function and assert failure.
I did this again since you mentioned and there is no change in CRC with temperature.
Still I dont understand how the value of "Data Address Register" is relevant with ADDR[8-31] pins and DATA[0-31].

0 Kudos
Reply

8,863 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

DAR register contains address which can be seen by the core. Physical address on ADDR pins is then given by configuration of the external bus. When the core access an address [base address]+[offset] then [base address] is given by configuration of BRn register and you will see only [offset] on ADDR pins.

Regards,

Lukas

8,863 Views
mohammak
Contributor II

I am using SC511660MZP40 from NXP since MPC555 from Motorola is obsolete.
Is there any difference in thermal-electrical behaviour of SC511660MZP40 that is different from MPC555 that can cause this DAR corruption?
Is SC511660MZP40 a direct replacement of MPC555?
Is it manufactured with the same CMOS process and especifications as MPC555?

0 Kudos
Reply

8,863 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

SC511660MZP40 contains mask set 7K83H (called as revision K3). In fact, this is older one than other MPC555 which are currently available. Conversion from rev K3 to latest rev M was done in 2005. All MPC555xxxx shipped after WW19 / WW20 2005 (WW stands for work week) are rev M.

If possible, please take a look at the MASKNUM field in the IMMR (Internal Memory Map Register), this will tell you which mask you have. The MASKNUM filed will have 0x40 for rev. M and 0x32 for rev. K3.

The differences are described in this product change notification:

https://www.nxp.com/docs/en/product-change-notice/PCN10601.htm 

Mentioned errata lists are:

https://www.nxp.com/docs/en/errata/MPC555MCE.pdf 

https://www.nxp.com/docs/en/errata/MPC555K3CE.pdf 

Regards,

Lukas

8,863 Views
mohammak
Contributor II

Mine is 0x32 revision K3. It means my chips are ~20 years old? Could this be an aging issue? 

The errata for K3 chips lists all the issues that has been fixed in K3 revision or all the issues that exist in K3 revision? 

When you mention MASKNUM is this refer to the MASK that was used for chip fabrication? 

Thanks

0 Kudos
Reply

8,863 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

There should be date code printed on the top of the package. The string is "FAWLYYWWZ" where YY is year and WW work week.

I meant that part numbers MPC555xxxx are shipped as rev M. If you have SC511660MZP40, it's still K3.

The errata list shows issues that exist on that revision.

Yes, MASKNUM tells us which silicon mask set is used.

Regards,

Lukas

8,863 Views
mohammak
Contributor II

What debugger HW and SW you suggest to me to be able to trace this DAR corruption? Something that will allow me to go back in the code and find out when this corruption is initiated?

0 Kudos
Reply

8,863 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi Kaveh,

in the past, the only 3rd party vendor that could supply such tools was Lauterbach. They have their TRACE32 tool for the MPC5xx BDM port:
http://www.lauterbach.com/frames.html?bdmppc.html

Please, contact Lauterbach support for more extensive info on their tool.

Regards,

Lukas