Dear community,
I have a custom board with an i.MX8 DualX. U-Boot is running and the linux kernel is able to boot. I use Linux 5.10.72_2.2.0 via Yocto together with SCFW Porting Kit 1.11.0.
Unfortunately, the kernel always crashes after several seconds up to half an hour with different panic messages. Mostly, the message is something like the following:
Unable to handle kernel paging request at virtual address 000000000000698b
[ 20.370464] Mem abort info:
[ 20.373259] ESR = 0x96000004
[ 20.376319] EC = 0x25: DABT (current EL), IL = 32 bits
[ 20.381633] SET = 0, FnV = 0
[ 20.384692] EA = 0, S1PTW = 0
[ 20.387835] Data abort info:
[ 20.390712] ISV = 0, ISS = 0x00000004
[ 20.394552] CM = 0, WnR = 0
[ 20.397526] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000085c66000
[ 20.403970] [000000000000698b] pgd=0000000000000000, p4d=0000000000000000
[ 20.410775] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 20.416349] Modules linked in:
[ 20.419414] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.72-lts-5.10.y+ga68e31b63f86 #1
[ 20.427599] Hardware name: Freescale i.MX8DX MEK (DT)
[ 20.432660] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--)
[ 20.438682] pc : calc_global_load+0x18c/0x210
[ 20.443041] lr : calc_global_load+0x178/0x210
[ 20.447398] sp : ffff800011d5bee0
[ 20.450717] x29: ffff800011d5bee0 x28: ffff800011b52380
[ 20.456042] x27: ffff800011b52380 x26: ffff800011d5c000
[ 20.461367] x25: ffff800011d58000 x24: ffff800011b49360
[ 20.466693] x23: ffff800011cee000 x22: ffff800011cee000
[ 20.472018] x21: ffff800011b46000 x20: ffff800011b46a00
[ 20.477344] x19: 00000004b75a0aee x18: 0000000000000000
[ 20.482669] x17: 0000000000000000 x16: 0000000000000000
[ 20.487995] x15: 0000000fee30533a x14: 00000000000215a2
[ 20.493320] x13: 00000000000007f5 x12: 00000000fffef377
[ 20.498645] x11: 00000000000060cb x10: ffff800011cc88e0
[ 20.503971] x9 : 00000000fffef85a x8 : 0000000000000042
[ 20.509296] x7 : ffff800011cc88c0 x6 : 00000000000000c7
[ 20.514621] x5 : 00000000003fa800 x4 : 000000000002ad29
[ 20.519947] x3 : 0000000000000000 x2 : 0000000000000800
[ 20.525272] x1 : 00000000000004e3 x0 : 0000000000000055
[ 20.530598] Call trace:
[ 20.533055] calc_global_load+0x18c/0x210
[ 20.537076] do_timer+0x20/0x30
[ 20.540222] tick_do_update_jiffies64.part.0+0x78/0x114
[ 20.545449] tick_irq_enter+0xf0/0x130
[ 20.549203] irq_enter_rcu+0x64/0x70
[ 20.552780] irq_enter+0x14/0x20
[ 20.556014] __handle_domain_irq+0x40/0xe0
[ 20.560114] gic_handle_irq+0xc0/0x140
[ 20.563867] el1_irq+0xcc/0x180
[ 20.567014] arch_cpu_idle+0x18/0x30
[ 20.570591] default_idle_call+0x24/0x6c
[ 20.574518] do_idle+0x230/0x2a0
[ 20.577749] cpu_startup_entry+0x24/0x70
[ 20.581675] rest_init+0xd8/0xe8
[ 20.584909] arch_call_rest_init+0x10/0x1c
[ 20.589007] start_kernel+0x4ac/0x4e4
[ 20.592682] Code: d2809c61 9b013129 f90004e9 d5033abf (b948c160)
[ 20.598788] ---[ end trace 5863192a640cb186 ]---
[ 20.603411] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[ 20.610290] SMP: stopping secondary CPUs
The "virtual address" is not always the same. Also, the call trace is not always the same, but mostly, the last function is something timer-related.
Sometimes, the panic message is different:
[ 20.901152] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000060
[ 20.909965] Mem abort info:
[ 20.912759] ESR = 0x96000004    ...
Two complete boot logs are attached.
I use 2*512MB (1GB) of DDR3L memory. The DDR stress test was successfully executed for 2 hours, which is why I think, a hardware issue is improbable.
The memory node in the device tree is:
memory@80000000 {
   device_type = "memory";
   reg = <0x00000000 0x40000000>;
};
We checked the DCD file several times and did not find any wrong configurations.
RAM-Config in U-Boot is the following:
#define CONFIG_SYS_SDRAM_BASE 0x80000000
#define PHYS_SDRAM_1 0x80000000
#define PHYS_SDRAM_2 0x880000000
#define PHYS_SDRAM_1_SIZE 0x40000000  /* 1 GB */
#define PHYS_SDRAM_2_SIZE 0x00000000  /* 0 GB */
and
CONFIG_NR_DRAM_BANKS=4
The performance was improved a little bit by including CONFIG_DEBUG_PAGEALLOC=y, I think.
What could be the problem? What else could I try?
Regards,
Tobi
Hi @Sanket_Parekh,
no, I don't have an i.MX8DX MEK, so I didn't try it with an MEK.
But I think this would not be useful, since the RAM configuration is different than on the MEKs and I also have different periphery hardware on different pins.
EDIT: Additionally, I found out that the memory node is automatically changed by U-Boot on kernel bootup. After booting, it is:
memory@80000000 {
device_type = "memory";
reg = <0x00 0x80200000 0x00 0x3fe00000>;
};
Regards,
Tobi
Hi @Sanket_Parekh,
I already tried with every drive strength available in RPA. Also with several ODT-configurations.
Changes in those configurations didn't change anything in the crash behaviour.
Regards,
Tobi
 Sanket_Parekh
		
			Sanket_Parekh
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi @Tobi_Edu
On MMC1 slot one SD card is connected right?
If yes, then can you please remove the same and try to reproduce the issue?
Please share the log file.
Thanks & Regards.
Sanket Parekh
Hi @Sanket_Parekh,
thanks for your reply.
Yes, there is a micro SD card connected in slot MMC1, which is also the medium the device is booting from.
As image files, I use flash.bin, Image.bin, dtb and rootfs on separate partition. How can I make the device boot without inserted SD card (=flash all those files to eMMC)? UUU won't work, will it?
Regards,
Tobi
 Sanket_Parekh
		
			Sanket_Parekh
		
		
		
		
		
		
		
		
	
			
		
		
			
					
		Hi @Tobi_Edu 
I hope you are doing well.
From the logs it seems, during log-in kernel crash happened.
Have you flashed imx8dx-mek binaries on your custom board?
Thanks & Regards.
Sanket Parekh
Hello @Sanket_Parekh,
sometimes it crahes during log-in, sometimes I can log-in and the kernel crashes some seconds or minutes later.
I use the sources of the imx8dx-mek as a base for my custom code, so I use the custom binaries, not the ones from the MEK.
Regards,
Tobi
