Linux Kernel 3.10 memory issue

julienmorand · ‎05-22-2014

Hi all,

I'm facing an issue with my Linux kernel 3.10.17 (compiled with Yocto). I'm using U-boot 2013.10.

The system boots up great and fast but sometimes this error message is printed on my console:

[ 2994.146905] Internal error: Oops - undefined instruction: 0 [#1] ARM

[ 2994.153310] Modules linked in:

[ 2994.156417] CPU: 0 PID: 48 Comm: kjournald Not tainted 3.10.17-yocto-standard #13

[ 2994.163928] task: cf5be600 ti: cec00000 task.ti: cec00000

[ 2994.169372] PC is at __wake_up+0x10/0x50

[ 2994.173330] LR is at journal_commit_transaction+0x288/0x15b8

[ 2994.179016] pc : [<c0046878>] lr : [<c0193b74>] psr: 60000013

[ 2994.179016] sp : cec01e38 ip : 0000001a fp : cec01e5c

[ 2994.190514] r10: 00000000 r9 : c06ad33c r8 : 00000000

[ 2994.195759] r7 : 000002b9 r6 : 1983e43b r5 : cf7e79a4 r4 : cf7f6900

[ 2994.202303] r3 : 00000000 r2 : 00000001 r1 : 00000003 r0 : cf7e7844

[ 2994.208849] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel

[ 2994.216178] Control: 0005317f Table: 4ec34000 DAC: 00000017

[ 2994.221941] Process kjournald (pid: 48, stack limit = 0xcec001b8)

[ 2994.228053] Stack: (0xcec01e38 to 0xcec02000)

[ 2994.232436] 1e20: cf7e79a4 cf7f6900

[ 2994.240647] 1e40: cf7e79a4 cf7f6900 cf7e79a4 1983e43b 000002b9 cf7e7894 cf7e7800 c0193b74

[ 2994.248858] 1e60: cf7f693c 00000000 c003dcc4 cf5be8f0 cf7dde00 cf7e7814 1983e43b 000002b9

[ 2994.257068] 1e80: 1983e43b 000002b9 c0755674 c005b480 cf5be600 cf7e78e4 80000013 cec00000

[ 2994.265279] 1ea0: cf5be600 c0452a5c 00000001 cf7e7894 c06ad33c c06556c8 cf7e78e4 c005b638

[ 2994.273489] 1ec0: cf7e7814 cf7e7800 cf7e7814 cf7e79f8 cec00028 cf7e7894 c06ad33c c06556c8

[ 2994.281698] 1ee0: 00000000 c01983c8 cf5be600 c0452a5c 00000001 00000000 cf5be600 c003dde8

[ 2994.289908] 1f00: cec01f00 cec01f00 60000013 cf45dc74 00000000 cf7e7800 c0198304 00000000

[ 2994.298118] 1f20: 00000000 00000000 00000000 c003d170 cf5be600 00000000 00000001 cf7e7800

[ 2994.306329] 1f40: 00000000 00000001 dead4ead ffffffff ffffffff c06ad1a0 00000000 00000000

[ 2994.314539] 1f60: c0550034 cec01f64 cec01f64 00000000 00000001 dead4ead ffffffff ffffffff

[ 2994.322748] 1f80: c06ad1a0 00000000 00000000 c0550034 cec01f90 cec01f90 cec01fac cf45dc74

[ 2994.330957] 1fa0: c003d0cc 00000000 00000000 c000ea80 00000000 00000000 00000000 00000000

[ 2994.339165] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

[ 2994.347373] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000

[ 2994.355611] [<c0046878>] (__wake_up+0x10/0x50) from [<c0193b74>] (journal_commit_transaction+0x288/0x15b8)

[ 2994.365324] [<c0193b74>] (journal_commit_transaction+0x288/0x15b8) from [<c01983c8>] (kjournald+0xc4/0x270)

[ 2994.375114] [<c01983c8>] (kjournald+0xc4/0x270) from [<c003d170>] (kthread+0xa4/0xb0)

[ 2994.383001] [<c003d170>] (kthread+0xa4/0xb0) from [<c000ea80>] (ret_from_fork+0x14/0x34)

[ 2994.391128] Code: e92d49f0 e28db018 e24dd00c e1a08003 (e1a04000)

[ 2994.397277] ---[ end trace 74a65aa4021594b9 ]---

[ 2999.651371] BUG: spinlock lockup suspected on CPU#0, syslogd/222

[ 2999.657533] lock: 0xcf7e7814, .magic: dead4ead, .owner: kjournald/48, .owner_cpu: 0

[ 2999.665326] CPU: 0 PID: 222 Comm: syslogd Tainted: G D 3.10.17-yocto-standard #13

[ 2999.673849] [<c00142a0>] (unwind_backtrace+0x0/0xe8) from [<c0012014>] (show_stack+0x10/0x14)

[ 2999.682545] [<c0012014>] (show_stack+0x10/0x14) from [<c025f7c4>] (do_raw_spin_lock+0xf4/0x13c)

[ 2999.691400] [<c025f7c4>] (do_raw_spin_lock+0xf4/0x13c) from [<c0191080>] (start_this_handle+0x50/0x3d8)

[ 2999.700948] [<c0191080>] (start_this_handle+0x50/0x3d8) from [<c01915f4>] (journal_start+0xa8/0xec)

[ 2999.710147] [<c01915f4>] (journal_start+0xa8/0xec) from [<c01338e0>] (ext3_dirty_inode+0x28/0x80)

[ 2999.719188] [<c01338e0>] (ext3_dirty_inode+0x28/0x80) from [<c00f1d08>] (__mark_inode_dirty+0x44/0x258)

[ 2999.728744] [<c00f1d08>] (__mark_inode_dirty+0x44/0x258) from [<c00e48d4>] (update_time+0x6c/0x9c)

[ 2999.737857] [<c00e48d4>] (update_time+0x6c/0x9c) from [<c00e4a0c>] (touch_atime+0x108/0x180)

[ 2999.746364] [<c00e4a0c>] (touch_atime+0x108/0x180) from [<c00d8688>] (link_path_walk+0x470/0x854)

[ 2999.755395] [<c00d8688>] (link_path_walk+0x470/0x854) from [<c00daba0>] (path_openat.isra.43+0x84/0x484)

[ 2999.765037] [<c00daba0>] (path_openat.isra.43+0x84/0x484) from [<c00dbbb8>] (do_filp_open+0x2c/0x80)

[ 2999.774330] [<c00dbbb8>] (do_filp_open+0x2c/0x80) from [<c00cd528>] (do_sys_open+0xe4/0x170)

[ 2999.782938] [<c00cd528>] (do_sys_open+0xe4/0x170) from [<c000e9c0>] (ret_fast_syscall+0x0/0x44)

[ 2999.796557] [sched_delayed] sched: RT throttling activated

This kernel is executed on a custom board which is close to the imx28EVK board, the only difference is that I'm using the Micron MT47H128M16 (256M DDR2).

I've made some modifications as suggested in this thread: How to put i.mx28 with DDR2 256MB ?

and I've also optimised my DDR2 parameters with the Freescale "MX28_DDR2_register_programming.xlsx" file.

I'm trying to backtrace this error but I don't know where to look. The only things I came up with are:

RT throttling activated means that a task is consuming too much time.
kjournald is the process used for data journalling system (sometimes I also lose files after reboot)
syslogd is "tainted" but it's not always the case (top, python, etc...)

So, if one of you guys has already seen this before, or can give me a lead to follow it would be great !

Thanks a lot. Regards.

johndonnelly · ‎05-22-2014

Hi,

You do realize you might have some memory corruption going on - The back trace shows an illegal Op ( bad Instruction )

[ 2994.146905] Internal error: Oops - undefined instruction: 0 [#1] ARM

PC is at __wake_up+0x10/0x50

void __wake_up(wait_queue_head_t *q, unsigned int mode,

int nr_exclusive, void *key)

{

unsigned long flags;

spin_lock_irqsave(&q->lock, flags);

__wake_up_common(q, mode, nr_exclusive, 0, key);

spin_unlock_irqrestore(&q->lock, flags);

fabio_estevam · ‎05-22-2014

Yes, it looks like RAM is not properly configured.

Also, I would expect that the kernel crash log would vary each time, right?

julienmorand · ‎05-23-2014

Hi guys,

John, yes I do. I just don't know how to fix it.

Fabio, yes, the log never looks the same.

Also, here is what I've done in the spl_mem_init.c file in U-boot :

static uint32_t dram_vals[] = {

#if defined(CONFIG_MX28)

0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000100, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00010101, 0x01010101,

0x000f0f01, 0x0102010a, 0x00000000, 0x00010101,

0x00000100, 0x00000100, 0x00000000, 0x00000002,

0x01010000, 0x07080403, 0x06005003, 0x090000c8,

0x02009c40, 0x0002030b, 0x0036b009, 0x03270612,

0x02030202, 0x00c80029, 0x00000000, 0x00000000,

0x00012100, 0xffff0303, 0x00012100, 0xffff0303,

0x00000003, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000612, 0x01000F02,

0x06120612, 0x00000200, 0x00020007, 0xf4004a27,

0xf4004a27, 0xf4004a27, 0xf4004a27, 0x07000300,

0x07000300, 0x07400300, 0x07400300, 0x00000005,

0x00000000, 0x00000000, 0x01000000, 0x01020408,

0x08040201, 0x000f1133, 0x00000000, 0x00001f04,

0x00001f04, 0x00001f04, 0x00001f04, 0x00001f04,

0x00001f04, 0x00001f04, 0x00001f04, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00010000, 0x00030404,

0x00000003, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x01010000,

0x01000000, 0x03030000, 0x00010303, 0x01020202,

0x00000000, 0x02030303, 0x21002103, 0x00061200,

0x06120612, 0x04420442, 0x04420442, 0x00040004,

0x00040004, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0xffffffff

according to the register programming file as stated in my previous message.

Thank you.

igorpadykov · ‎05-23-2014

Hi Julien

one can start with DDR tests and finding new calibration settings

https://community.freescale.com/message/331721#331721

https://community.freescale.com/docs/DOC-96412

Best regards

chip

julienmorand · ‎05-27-2014

Hi chipexpert !

I wasn't aware of DDR stress test, this is an interesting tool.

I've already try the "mem test" from this post https://community.freescale.com/message/375046#375046

on my imx28EVK and I'm working on executing it on my custom board right now.

Also, do you guys have some ddr stress test files ? Or do I need to contact my FAE ?

Thanks a lot for your help.

igorpadykov · ‎05-27-2014

Hi Julien

sorry I missed that you used i.MX28 and confused with i.MX6 processor.

In general you can try more mature product (just for test)

L2.6.35_1.1.0_ER_SOURCE

Also what power are you using : 5V only or battery source ?

This may be important since 2.6.35 kernel has patches (below that web page) for some issues.

It may be useful to check if power (if it is provided from i.MX28) is sufficient

for DDR2. One can try this reducing DDR2 operating frequency.

Best regards

chip

julienmorand · ‎06-04-2014

Hi chipexpert,

In fact, the linux kernel 2.6.35 was the first kernel I used. It boots up great (so, my system is passing the memory test at boot).

I'm using 5V only power source and I have design my board with an external regulator for DDR2 so that power is sufficient.

Maybe I can try to solder a 128MB RAM chip to see if it's really a matter of RAM configuration.

Thank you, have a nice day.

zaheerm · ‎07-02-2014

Facing similar random kernel crashes on our custom platform running iMX6D with 3.10.17. The board works fine with 3.10.9 kernel. Seeking any pointers to identify and resolve the crash.

igorpadykov · ‎07-02-2014

Hi Julien

Please try update calibration

https://community.freescale.com/message/331721#331721

https://community.freescale.com/docs/DOC-96412

and if issue will persist, create new thread since this is different

processor from original topic.

Best regards

chip

Linux Kernel 3.10 memory issue

Linux Kernel 3.10 memory issue

i.MX2x

Linux

Yocto Project