kexec/kdump couldn't work in MX6Q/SDL-SabreSD

do_feel · ‎12-12-2016

1. My env:
HW version:

CPU: Freescale i.MX6SOLO rev1.2 at 792 MHz
Board: MX6Q/SDL-SabreSD

DDR: 512M

SW version:

Kernel version: 3.10.53

2. kexec produce:

a, set up system kernel bootargs
nosmp rootwait root=/dev/mmcblk0p2 console=ttymxc0,115200 crashkernel=50M@0x13000000
b. run kexec command
./kexec -d -p zImage_crash --dtb uImage-imx6dl-sabresd.dtb --command-line="init=/init nosmp rootwait root=/dev/mmcblk0p2 console=ttymxc0,115200"
output log:

kernel: 0x75d79008 kernel_size: 41a1e0
phys_offset: 0x10000000
get_crash_notes_per_cpu: crash_nSYSC_kexec_load: E
otes addr = 10fa6790
Elf header: p_type = 4, p_offset = 0x10fa6790 p_paddr = 0x10fa6790 p_vaddr = 0x0 p_filesz = 0x400 p_memsz = 0x400
vmcoreinfo header: p_type = 4, p_offset = 0x10b78c9c p_paddr = 0x10b78c9c p_vaddr = 0x0 p_filesz = 0x1000 p_memsz = 0x1000
Elf header: p_type = 1, p_offset = 0x10000000 p_paddr = 0x10000000 p_vaddr = 0xc0000000 p_filesz = 0x3000000 p_memsz = 0x3000000
Elf header: p_type = 1, p_offset = 0x16200000 p_paddr = 0x16200000 p_vaddr = 0xc6200000 p_filesz = 0x19e00000 p_memsz = 0x19e00000
elfcorehdr: 0x16100000
crashkernel: [0x13000000 - 0x161fffff] (50M)
memory range: [0x10000000 - 0x12ffffff] (48M)
memory range: [0x16200000 - 0x2fffffff] (414M)
kernel command line: "init=/init nosmp rootwait root=/dev/mmcblk0p2 console=ttymxc0,115200 elfcorehdr=0x16100000 mem=50176K"
kexec_load: entry = 0x13010000 flags = 280001
nr_segments = 3
segment[0].buf = 0x75d79008
segment[0].bufsz = 41a1e0
segment[0].mem = 0x13010000
segment[0].memsz = 41b000
segment[1].buf = 0x17bb850
segment[1].bufsz = cbc7
segment[1].mem = 0x14069000
segment[1].memsz = d000
segment[2].buf = 0x17bb410
segment[2].bufsz = 400
segment[2].mem = 0x16100000
segment[2].memsz = 1000

3. run command "echo c > /proc/sysrq-trigger"

system kernel occurred oops, after kernel is running in __soft_restart(void *addr),
I have confirmed the error point is follow:

static void __soft_restart(void *addr)
{
printk("%s: E addr = 0x%p\n", __func__, addr);
phys_reset_t phys_reset;
/* Take out a flat memory mapping. */
setup_mm_for_reboot();
printk("%s: E 111\n", __func__);
/* Clean and invalidate caches */
flush_cache_all();
printk("%s: E 222\n", __func__);
/* Turn off caching */
cpu_proc_fin();
printk("%s: E 333\n", __func__);
printk("%s: E 333 addr = 0x%p\n", __func__, addr);
/* Push out any further dirty data, and ensure cache is empty */
flush_cache_all(); <============= oops in here!
printk("%s: E 444\n", __func__);
/* Switch to the identity mapping. */
phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
__phys_debug = (unsigned long)virt_to_phys(imx_uart_debug);
printk("%s: E 555\n", __func__);
printk("%s: E phys_reset = 0x%p, __phys_debug = 0x%lx\n", __func__, phys_reset, __phys_debug);
phys_reset((unsigned long)addr);
printk("%s: X\n", __func__);
/* Should never get here. */
BUG();
}

I don't know why flush_cache_all() will occur oops error after Turned off caching.

Is there anyone meet this issue? and how can fix this issue?

thanks

the runtime log is as follow:

SysRq : Trigger a crash
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 9633c000
[00000000] *pgd=26707831, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1] PREEMPT ARM
Modules linked in: evbug
CPU: 0 PID: 957 Comm: sh Tainted: G W 3.10.53-1.1.0_ga+gref:re #23
task: 96975a40 ti: 967ce000 task.ti: 967ce000
PC is at sysrq_handle_crash+0x38/0x40
LR is at l2x0_cache_sync+0x44/0x64
pc : [<8027ec84>] lr : [<8001a958>] psr: 40000093
sp : 967cff10 ip : 00000001 fp : 00000000
r10: 00000000 r9 : 00000000 r8 : 00000008
r7 : 60000013 r6 : 00000063 r5 : 80b332f8 r4 : 80b10188
r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : 00000001
Flags: nZcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 2633c059 DAC: 00000015
Process sh (pid: 957, stack limit = 0x967ce230)
Stack: (0x967cff10 to 0x967d0000)
ff00: 8027ec4c 8027f278 967ce000 00000002
ff20: 00000001 00000000 00000000 96222180 00000002 8027f760 00000000 80108eb4
ff40: 01747408 967cff80 963b9540 00000002 967ce018 800ba124 00000000 00000001
ff60: 00000000 00000000 963b9540 01747408 00000000 00000002 00000000 800ba79c
ff80: 00000000 00000000 00000006 00000002 01747408 76f65b48 00000004 8000e204
ffa0: 967ce000 8000e080 00000002 01747408 00000001 01747408 00000002 00000000
ffc0: 00000002 01747408 76f65b48 00000004 00000002 01747408 00000002 00000000
ffe0: 00000000 7ec5898c 76e9dae4 76ef22fc 60000010 00000001 00000000 00000000
[<8027ec84>] (sysrq_handle_crash+0x38/0x40) from [<8027f278>] (__handle_sysrq+0xb4/0x198)
[<8027f278>] (__handle_sysrq+0xb4/0x198) from [<8027f760>] (write_sysrq_trigger+0x38/0x48)
[<8027f760>] (write_sysrq_trigger+0x38/0x48) from [<80108eb4>] (proc_reg_write+0x50/0x78)
[<80108eb4>] (proc_reg_write+0x50/0x78) from [<800ba124>] (vfs_write+0xb0/0x1c8)
[<800ba124>] (vfs_write+0xb0/0x1c8) from [<800ba79c>] (SyS_write+0x3c/0x78)
[<800ba79c>] (SyS_write+0x3c/0x78) from [<8000e080>] (ret_fast_syscall+0x0/0x30)
Code: 0a000000 e12fff33 e3a03000 e3a02001 (e5c32000)
crash_kexec: E regs = 967cfec8
crash_kexec: E kexec_crash_image = 968f6a00
machine_crash_shutdown: E
Loading crashdump kernel...
machine_crash_shutdown: X
machine_kexec: E
machine_kexec: E image->control_code_page = 0x80bf0000, image->start = 0x13010000
machine_kexec: E reboot_code_buffer_phys = 0x13000000, reboot_code_buffer = 0x83000000
Bye!
machine_kexec: E soft restart
__soft_restart: E addr = 0x13000000
[wuxy] setup_mm_for_reboot: E idmap_pgd = 0x96064000, &init_mm = 0x80b13e78
[wuxy] setup_mm_for_reboot: E idmap_pgd = 0x96064000, &init_mm = 0x80b13e78
__soft_restart: E 111
__soft_restart: E 222
__soft_restart: E 333
__soft_restart: E 333 addr = 0x13000000
Unable to handle kernel NULL pointer dereference at virtual address 000000e0
pgd = 80004000
[000000e0] *pgd=00000000
Internal error: Oops: 5 [#2] PREEMPT ARM
Modules linked in: evbug
CPU: 0 PID: -2135909344 Comm: Tainted: G W 3.10.53-1.1.0_ga+gref:re #23
task: 80b5002c ti: 80b50000 task.ti: 80b5000c
PC is at do_page_fault+0x40/0x390
LR is at do_DataAbort+0x38/0x98
pc : [<800174a8>] lr : [<800083d8>] psr: 000001d3
sp : 80b52130 ip : 80b52268 fp : 00000005
r10: 804eb170 r9 : 00000000 r8 : 000000e0
r7 : 000000e0 r6 : 80b52268 r5 : 80b52030 r4 : 00000005
r3 : 00000028 r2 : 000001d3 r1 : 00000000 r0 : 000000e0
Flags: nzcv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel
Control: 10c52c79 Table: 26064059 DAC: 00000015
Process (pid: -2135909344, stack limit = 0x80b5023c)
Stack: (0x80b52130 to 0x80b5200c)
[<800174a8>] (do_page_fault+0x40/0x390) from [<800083d8>] (do_DataAbort+0x38/0x98)
[<800083d8>] (do_DataAbort+0x38/0x98) from [<8000dc18>] (__dabt_svc+0x38/0x60)
Exception stack(0x80b52268 to 0x80b522b0)
2260: 000000e0 00000000 000001d3 00000028 00000005 80b52030
2280: 80b523e8 000000e0 000000e0 00000000 804eb170 00000005 80b523e8 80b522b0
22a0: 800083d8 800174a8 000001d3 ffffffff

b36401 · ‎12-14-2016

Here is a patch for 3.14 kernel:

https://github.com/Freescale/linux-fslc/commit/fee3fd4fd2ad136b26226346c3f8b446cc120bf5

We did not test it with 3.10 kernel however you can try it.

Have a great day,
Victor

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

do_feel · ‎12-15-2016

Thanks for your reply.

It seem that the issue which is resolved by the patch is not same as my.

I have confirmed that kexec_load is OK in our produce, and my issue is that kernel occurred oops exception when called flush_cache_all() after executed cpu_proc_fin().

thanks

kexec/kdump couldn't work in MX6Q/SDL-SabreSD

kexec/kdump couldn't work in MX6Q/SDL-SabreSD

i.MX6Quad