T1022 processor based board machine check exception thrown by kernel

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

T1022 processor based board machine check exception thrown by kernel

1,570 次查看
hemwant
Contributor IV

We are having a T1022 processor based board , one of the memory mapped device is configured via IFC bus chip select(CS1) using GASIC mode. The memory read/ write to the memory device sometimes leads to kernel throwing exception. I am attaching the T104xRDB.h file in which configuration related to GASIC mode is mentioned , also the kernel exception is as follows-

 

Machine check in kernel mode.
[169160.053749] Caused by (from MCSR=a000): Load Error Report
[169160.057842] Guarded Load Error Report
[169160.060196] Oops: Machine check, sig: 7 [#3]
[169160.063155] SMP NR_CPUS=8 CoreNet Generic
[169160.065857] Modules linked in: [last unloaded: drv_dwdm]
[169160.069870] CPU: 0 PID: 2873 Comm: util_t10xx-32b Tainted: G D W O 4.1.8-rt8 #1
[169160.076656] task: e817d910 ti: e6214000 task.ti: e6214000
[169160.080745] **bleep**: 10000b54 LR: 10000ad4 CTR: 0fea03a0
[169160.084401] REGS: e6215f10 TRAP: 0204 Tainted: G D W O (4.1.8-rt8)
[169160.090317] MSR: 0002d002 <CE,EE,PR,ME> CR: 28000422 XER: 00000000
[169160.095383] DEAR: b7d363e0 ESR: 00800000
GPR00: 10000ad4 bf9e3be0 b7d3c4c0 00100000 00100000 00000010 00000000 0ffecab0
GPR08: 0ffb71a4 b7d363e0 00100000 0fffffff ffffffff 10019268 10100000 00000000
GPR16: 00000000 100fda04 100fd9f4 10100000 00000000 42222442 10100000 00000000
GPR24: 101115f0 00000000 00000000 0000000f e8220000 0000000f e8220000 bf9e3be0
[169160.126354] **bleep** [10000b54] 0x10000b54
[169160.128705] LR [10000ad4] 0x10000ad4
[169160.130969] Call Trace:
[169160.132106] ---[ end trace 8e582779e7ba09a3 ]---

 

 

标记 (3)
0 项奖励
4 回复数

1,201 次查看
hemwant
Contributor IV

@ufedorplease provide your input .

0 项奖励

1,303 次查看
hemwant
Contributor IV

@ufedorplease respond to the above observation, the issue is still persistent in our board.

0 项奖励

1,561 次查看
ufedor
NXP Employee
NXP Employee

You wrote:

> read/ write to the memory device sometimes leads to kernel throwing exception

Please check whether the IFC_CSORn_GPCM[PTO] is small, so timeout during ASIC read is possible.

Consider that uncorrectable error during read transaction is reported to the core as error report machine check.

Refer to the e5500 Core Reference Manual, 4.9.3.1 General Machine Check, Error Report, and NMI Mechanism.

0 项奖励

1,460 次查看
hemwant
Contributor IV

@ufedor Sorry for late reply, due to current pandemic situation we were not able to work on the same.

Please check whether the IFC_CSORn_GPCM[PTO] is small, so timeout during ASIC read is possible.

We have tried both minimum value and maximum value of IFC_CSORn_GPCM[PTO] , still we are facing the same issue. Although on increasing the GPCM[PTO] the frequency of such instances decreases sharply.

 

Refer to the e5500 Core Reference Manual, 4.9.3.1 General Machine Check, Error Report, and NMI Mechanism.

We have referred to the recommended manual  and we found that MCSR[a000] refers to LD and LDG bit.

We are still not able to find the root cause of the problem . Refer to the attached image for description of LD and LDG bits.

We have one more query regarding the above issue , when we are floating the cycle on the state machine of the ASIC device, then there is probability that the processor and ASIC device are driving at the same time leading to contention. What will happen in this case due to contention, will it cause MCSR issue or this will switch off the processor device?

 

 

 

0 项奖励