HI All.
We are using T1040 in one of Switch/Router. We did Booting by using Code Warrior Tools. While loading Linux we are facing the Issue. Those logs are as follows :
Machine check in kernel mode.
Machine check in kernel mode.
Caused by (from MCSR=400c0000): Instruction Cache Parity Error
Machine Check Effective Address: 0xc000000000314004
Machine check in kernel mode.
Caused by (from MCSR=400d0000): Instruction Cache Parity Error
Machine Check Effective Address: 0xc00000000004b404
Machine check in kernel mode.
Caused by (from MCSR=400d0000): Instruction Cache Parity Error
Machine Check Effective Address: 0xc00000000013f600
Machine check in kernel mode.
Caused by (from MCSR=400d0000): Instruction Cache Parity Error
Machine Check Effective Address: 0xc00000000004b404
Unable to handle kernel paging request for data at address 0x00000061
Faulting instruction address: 0xc00000000013f604
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=4 CoreNet Generic
It need to give us Linux Prompt. But we are facing issue related to Machine Check Parity Error. Pls try to reply if you have any suggestion regarding this.
Thank You.
For the clock attached, it not only the frequency but also the PPJ and phase noise, and so on.
Please confirm follow all the SPEC.
Problems start with a cache parity error. Most likely, this has to do with overclocking, noisy, unstable or inappropriate voltage power, external conditions like high level of radiation or overheating. The only way the software can directly create this condition is described in E5500RM, Section 5.4.5. I do not think Linux has any code for cache error injection, suggestions below are actually sanity checks:
Check if it violate the errata.
Make sure no customizations have been done to u-Boot and/or Linux to switch on/off cache error protection without cache invalidation.
Compare the problematic kernel configuration against the SDK default to make sure the build flags are correct for the CPU and no unsupported code is included in the build.
Use SDK provided build tools to ensure the kernel is not miscompiled.
Check general system operation conditions, make sure there is no overheating, overclocking, ESD, radiation.
Check the power rails for stability and noise. The fact that the problem aggravates with more cores enabled together with the observation that the error shoots when a core leaves the normal idle routine, point to power as the most likely root cause.
Thank You for Replying to Us. We have considered your suggestions and proceeded to debug. In our Cards , all the hardware and software debug options mentioned by you are in Suitable Working Condition according to our observation. We have total 30 Cards based on T1040RDB, in which 28 Cards are finished with U-boot, Linux Installation, and they are Working. We are following same procedure, but specifically in 2 Cards we are facing Machine Check Parity Error. We checked Power Supply levels (3.3V & 1.8V) which are within the Permissible limit. The clock input to the processor is measured to be 99.9997 MHz (Screenshot attached). Are there any other specific tests that we may do to further debug the source of the problem faced?
Thank You!