AnsweredAssumed Answered

Linux boot hanging and kernel panic with SMP enabled

Question asked by arashaz. on Jul 16, 2013
Latest reply on Jul 23, 2013 by arashaz.

Dear All,
I have a strange problem regarding synchronous multiprocessing (SMP) with a
P1020 compatible board and I appreciate if could advise me.  While booting
Linux, if I pass "-nosmp" as a boot argument, it works fine.  Enabling the SMP
(removing the -nosmp"), Linux crashes and does not boot. Note that when it
crashes, the kernel log appears somehow randomly (find some of them below). Most of the time,  it reports a "Data Cache Parity Error" or "Data Cache Push Parity Error" and sometimes "Unable to handle kernel paging request for instruction fetch" and “Unable to handle kernel paging request for data at address...”. In some cases, it even halts with no error message (just hangs up in the middle of the boot). So, I guess that the error can not be related
to one of the kernel drivers which are being loaded one by one.

 

Since the problem has a random nature, I even tried putting a big fan on the board, using a more strong power supply, using different Linux images with
different boot arguments such as -noapic , -nomce, ... None of them solved the problem.

Any comment is appreciated.


Unable to handle kernel paging request for instruction fetch 

Faulting instruction address: 0x00000000 

Oops: Kernel access of bad area, sig: 11 [#1] 

SMP NR_CPUS=2 P1020 RDB

Modules linked in:

 

Machine check in kernel mode. 

Caused by (from MCSR=20000000): Data Cache Push Parity Error 

Oops: Machine check, sig: 7 [#2] 

SMP NR_CPUS=2 P1020 RDB 

Modules linked in:

 

Oops: Exception in kernel mode, sig: 4 [#1] 

SMP NR_CPUS=2 P1020 RDB 

Modules linked in:

 

Machine check in kernel mode. 

Caused by (from MCSR=20000000): Data Cache Push Parity Error 

Oops: Machine check, sig: 7 [#2] 

SMP NR_CPUS=2 P1020 RDB 

Modules linked in:

 

Fixing recursive fault but reboot is needed!

 

Unable to handle kernel paging request for instruction fetch 

Faulting instruction address: 0x00000000 

Oops: Kernel access of bad area, sig: 11 [#1] 

SMP NR_CPUS=2 P1020 RDB 

Modules linked in:

 

 

Unable to handle kernel paging request for data at address 0x81a403b0 

Faulting instruction address: 0xc01d7d88 

Oops: Kernel access of bad area, sig: 11 [#1] 

SMP NR_CPUS=2 P1020 RDB



Outcomes