Hi Experts,
I am having some troubles about P2020’s machine check interrupt. Could someone kindly help to answer my questions below? Thanks in advance!
The hardware board is produced by ourselves with a P2020 on it. Our software applications run under a embedded OS.
The system raise a machine check exception from time to time. When it happens:
- (MCSR) Machine Check Syndrome Register is 0x8, this indicates a “BUS_RBERR(Bus read data bus error)”
- (MCAR) Machine Check Address Register is 0xB000xxxx. This is a RapidIO address in system. However, I am sure system never read/write this address after initialization.
- (MCSRR0) Machine Check Save/Restore Register 0 usually points to a “sync” instruction, and occasionally points to a store instruction(the address of store instruction is a normal DRAM address used as function stack)
Currently I cannot address the reason why machine check happen.
My questions are:
1.What is the meaning of a “Bus read data bus error”? It looks that my MCAR and MCSRR0 register value have no relationship with this syndrome type?
2.Do MCSR/MCAR/MCSRR0 always save the correct informations about a machine check exception?
3.I configure a address(say 0xc0000000) in MMU and do not configure it in LAW(Local Access Windows), system will immediately raise a machine check interrupt when 0xc0000000 is read, but if 0xc0000000 is written, there is no any interrupt/exception in system, is this a expected behavior of the processor?
Thanks.
Jerry
1) BUS_RBERR gets set because the core_fault_in gets asserted to the CPU signaling a fault on the internal bus.
Sources, capable to generate core_fault_in are described in the P2020 QorIQ Integrated Processor Reference Manual, Rev. 2, Table 5-1. Differences between the e500 core and the QorIQ core implementation, HID1[RFXE].
2) The registers should contain correct data for unsuccessful read (uncorrectable read error) operations.
3) Yes, see 2).
Note:
RFXE should always be 0 for normal operation for the e500v2; it should be set only if it is necessary that the assertion of core_fault_in generate a machine check or a checkstop because peripherals are not properly configured to report bus faults. This would typically occur only during software or firmware development.
Hi ufedor,
Thanks for your reply.
I set HID1[RFXE] to 0, and the machine check still occur in my test. So is this a expected behavior? Or there is something wrong?
For possible sources of the machine check please refer to the PowerPC e500 Core Family Reference Manual, Table 5-8. e500 Machine Check Exception Sources.
When the exception occurs, Machine Check Syndrome Register is 0x8(BUS_RBERR - Bus read data bus error), but the MCSRR0 and MCAR register do not provide instruction and address which have something to do with BUS_RBERR. This is what are confusing me. Do you have any other suggestions I can follow to look into the reason of this exception.
Thanks.
You wrote:
> Machine Check Save/Restore Register 0 usually points to a “sync” instruction
Which instruction is before the "sync"?
it's a store or read instruction. the access address of the instruction is a valid DRAM memory region
How many boards were tested?
Maybe 2 boards. Both have the same result.
I am manually creating a machine check exception, and then to see if the software & hardware can provide the expected information (instruction address and the address instruction are reading).
I'll ping you again when I collect other questions.
Much appreciate for your reply.