questions about e500v2 (P2020) machine check interrupt

GeekFork · ‎09-19-2021

Hi Experts,

I am having some troubles about P2020’s machine check interrupt. Could someone kindly help to answer my questions below? Thanks in advance!

The hardware board is produced by ourselves with a P2020 on it. Our software applications run under a embedded OS.

The system raise a machine check exception from time to time. When it happens:
- (MCSR) Machine Check Syndrome Register is 0x8, this indicates a “BUS_RBERR(Bus read data bus error)”
- (MCAR) Machine Check Address Register is 0xB000xxxx. This is a RapidIO address in system. However, I am sure system never read/write this address after initialization.
- (MCSRR0) Machine Check Save/Restore Register 0 usually points to a “sync” instruction, and occasionally points to a store instruction(the address of store instruction is a normal DRAM address used as function stack)

Currently I cannot address the reason why machine check happen.

My questions are:
1.What is the meaning of a “Bus read data bus error”? It looks that my MCAR and MCSRR0 register value have no relationship with this syndrome type?
2.Do MCSR/MCAR/MCSRR0 always save the correct informations about a machine check exception?
3.I configure a address(say 0xc0000000) in MMU and do not configure it in LAW(Local Access Windows), system will immediately raise a machine check interrupt when 0xc0000000 is read, but if 0xc0000000 is written, there is no any interrupt/exception in system, is this a expected behavior of the processor?

Thanks.
Jerry

ufedor · ‎09-20-2021

1) BUS_RBERR gets set because the core_fault_in gets asserted to the CPU signaling a fault on the internal bus.

Sources, capable to generate core_fault_in are described in the P2020 QorIQ Integrated Processor Reference Manual, Rev. 2, Table 5-1. Differences between the e500 core and the QorIQ core implementation, HID1[RFXE].

2) The registers should contain correct data for unsuccessful read (uncorrectable read error) operations.

3) Yes, see 2).

Note:

RFXE should always be 0 for normal operation for the e500v2; it should be set only if it is necessary that the assertion of core_fault_in generate a machine check or a checkstop because peripherals are not properly configured to report bus faults. This would typically occur only during software or firmware development.

GeekFork · ‎09-26-2021

Hi ufedor,

Thanks for your reply.

I set HID1[RFXE] to 0, and the machine check still occur in my test. So is this a expected behavior? Or there is something wrong?

ufedor · ‎09-26-2021

For possible sources of the machine check please refer to the PowerPC e500 Core Family Reference Manual, Table 5-8. e500 Machine Check Exception Sources.

GeekFork · ‎09-26-2021

When the exception occurs, Machine Check Syndrome Register is 0x8(BUS_RBERR - Bus read data bus error), but the MCSRR0 and MCAR register do not provide instruction and address which have something to do with BUS_RBERR. This is what are confusing me. Do you have any other suggestions I can follow to look into the reason of this exception.

Thanks.

ufedor · ‎09-26-2021

You wrote:

> Machine Check Save/Restore Register 0 usually points to a “sync” instruction

Which instruction is before the "sync"?

GeekFork · ‎09-26-2021

it's a store or read instruction. the access address of the instruction is a valid DRAM memory region

ufedor · ‎09-27-2021

How many boards were tested?

GeekFork · ‎09-27-2021

Maybe 2 boards. Both have the same result.

I am manually creating a machine check exception, and then to see if the software & hardware can provide the expected information (instruction address and the address instruction are reading).

I'll ping you again when I collect other questions.

Much appreciate for your reply.

questions about e500v2 (P2020) machine check interrupt

questions about e500v2 (P2020) machine check interrupt

QorIQ P2 Devices