Linux Kernel 3.10 memory issue

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Linux Kernel 3.10 memory issue

7,878 Views
julienmorand
Contributor II

Hi all,

I'm facing an issue with my Linux kernel 3.10.17 (compiled with Yocto). I'm using U-boot 2013.10.

The system boots up great and fast but sometimes this error message is printed on my console:

[ 2994.146905] Internal error: Oops - undefined instruction: 0 [#1] ARM

[ 2994.153310] Modules linked in:

[ 2994.156417] CPU: 0 PID: 48 Comm: kjournald Not tainted 3.10.17-yocto-standard #13

[ 2994.163928] task: cf5be600 ti: cec00000 task.ti: cec00000

[ 2994.169372] PC is at __wake_up+0x10/0x50

[ 2994.173330] LR is at journal_commit_transaction+0x288/0x15b8

[ 2994.179016] pc : [<c0046878>]    lr : [<c0193b74>]    psr: 60000013

[ 2994.179016] sp : cec01e38  ip : 0000001a  fp : cec01e5c

[ 2994.190514] r10: 00000000  r9 : c06ad33c  r8 : 00000000

[ 2994.195759] r7 : 000002b9  r6 : 1983e43b  r5 : cf7e79a4  r4 : cf7f6900

[ 2994.202303] r3 : 00000000  r2 : 00000001  r1 : 00000003  r0 : cf7e7844

[ 2994.208849] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel

[ 2994.216178] Control: 0005317f  Table: 4ec34000  DAC: 00000017

[ 2994.221941] Process kjournald (pid: 48, stack limit = 0xcec001b8)

[ 2994.228053] Stack: (0xcec01e38 to 0xcec02000)

[ 2994.232436] 1e20:                                                       cf7e79a4 cf7f6900

[ 2994.240647] 1e40: cf7e79a4 cf7f6900 cf7e79a4 1983e43b 000002b9 cf7e7894 cf7e7800 c0193b74

[ 2994.248858] 1e60: cf7f693c 00000000 c003dcc4 cf5be8f0 cf7dde00 cf7e7814 1983e43b 000002b9

[ 2994.257068] 1e80: 1983e43b 000002b9 c0755674 c005b480 cf5be600 cf7e78e4 80000013 cec00000

[ 2994.265279] 1ea0: cf5be600 c0452a5c 00000001 cf7e7894 c06ad33c c06556c8 cf7e78e4 c005b638

[ 2994.273489] 1ec0: cf7e7814 cf7e7800 cf7e7814 cf7e79f8 cec00028 cf7e7894 c06ad33c c06556c8

[ 2994.281698] 1ee0: 00000000 c01983c8 cf5be600 c0452a5c 00000001 00000000 cf5be600 c003dde8

[ 2994.289908] 1f00: cec01f00 cec01f00 60000013 cf45dc74 00000000 cf7e7800 c0198304 00000000

[ 2994.298118] 1f20: 00000000 00000000 00000000 c003d170 cf5be600 00000000 00000001 cf7e7800

[ 2994.306329] 1f40: 00000000 00000001 dead4ead ffffffff ffffffff c06ad1a0 00000000 00000000

[ 2994.314539] 1f60: c0550034 cec01f64 cec01f64 00000000 00000001 dead4ead ffffffff ffffffff

[ 2994.322748] 1f80: c06ad1a0 00000000 00000000 c0550034 cec01f90 cec01f90 cec01fac cf45dc74

[ 2994.330957] 1fa0: c003d0cc 00000000 00000000 c000ea80 00000000 00000000 00000000 00000000

[ 2994.339165] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

[ 2994.347373] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000

[ 2994.355611] [<c0046878>] (__wake_up+0x10/0x50) from [<c0193b74>] (journal_commit_transaction+0x288/0x15b8)

[ 2994.365324] [<c0193b74>] (journal_commit_transaction+0x288/0x15b8) from [<c01983c8>] (kjournald+0xc4/0x270)

[ 2994.375114] [<c01983c8>] (kjournald+0xc4/0x270) from [<c003d170>] (kthread+0xa4/0xb0)

[ 2994.383001] [<c003d170>] (kthread+0xa4/0xb0) from [<c000ea80>] (ret_from_fork+0x14/0x34)

[ 2994.391128] Code: e92d49f0 e28db018 e24dd00c e1a08003 (e1a04000)

[ 2994.397277] ---[ end trace 74a65aa4021594b9 ]---

[ 2999.651371] BUG: spinlock lockup suspected on CPU#0, syslogd/222

[ 2999.657533]  lock: 0xcf7e7814, .magic: dead4ead, .owner: kjournald/48, .owner_cpu: 0

[ 2999.665326] CPU: 0 PID: 222 Comm: syslogd Tainted: G      D      3.10.17-yocto-standard #13

[ 2999.673849] [<c00142a0>] (unwind_backtrace+0x0/0xe8) from [<c0012014>] (show_stack+0x10/0x14)

[ 2999.682545] [<c0012014>] (show_stack+0x10/0x14) from [<c025f7c4>] (do_raw_spin_lock+0xf4/0x13c)

[ 2999.691400] [<c025f7c4>] (do_raw_spin_lock+0xf4/0x13c) from [<c0191080>] (start_this_handle+0x50/0x3d8)

[ 2999.700948] [<c0191080>] (start_this_handle+0x50/0x3d8) from [<c01915f4>] (journal_start+0xa8/0xec)

[ 2999.710147] [<c01915f4>] (journal_start+0xa8/0xec) from [<c01338e0>] (ext3_dirty_inode+0x28/0x80)

[ 2999.719188] [<c01338e0>] (ext3_dirty_inode+0x28/0x80) from [<c00f1d08>] (__mark_inode_dirty+0x44/0x258)

[ 2999.728744] [<c00f1d08>] (__mark_inode_dirty+0x44/0x258) from [<c00e48d4>] (update_time+0x6c/0x9c)

[ 2999.737857] [<c00e48d4>] (update_time+0x6c/0x9c) from [<c00e4a0c>] (touch_atime+0x108/0x180)

[ 2999.746364] [<c00e4a0c>] (touch_atime+0x108/0x180) from [<c00d8688>] (link_path_walk+0x470/0x854)

[ 2999.755395] [<c00d8688>] (link_path_walk+0x470/0x854) from [<c00daba0>] (path_openat.isra.43+0x84/0x484)

[ 2999.765037] [<c00daba0>] (path_openat.isra.43+0x84/0x484) from [<c00dbbb8>] (do_filp_open+0x2c/0x80)

[ 2999.774330] [<c00dbbb8>] (do_filp_open+0x2c/0x80) from [<c00cd528>] (do_sys_open+0xe4/0x170)

[ 2999.782938] [<c00cd528>] (do_sys_open+0xe4/0x170) from [<c000e9c0>] (ret_fast_syscall+0x0/0x44)

[ 2999.796557] [sched_delayed] sched: RT throttling activated

This kernel is executed on a custom board which is close to the imx28EVK board, the only difference is that I'm using the Micron MT47H128M16 (256M DDR2).

I've made some modifications as suggested in this thread: How to put i.mx28 with DDR2 256MB ?

and I've also optimised my DDR2 parameters with the Freescale "MX28_DDR2_register_programming.xlsx" file.

I'm trying to backtrace this error but I don't know where to look. The only things I came up with are:

  • RT throttling activated means that a task is consuming too much time.
  • kjournald is the process used for data journalling system (sometimes I also lose files after reboot)
  • syslogd is "tainted" but it's not always the case (top, python, etc...)

So, if one of you guys has already seen this before, or can give me a lead to follow it would be great !

Thanks a lot. Regards.

Labels (3)
9 Replies

3,475 Views
johndonnelly
Contributor I

Hi,

You do realize  you might have some memory corruption going on - The back trace shows an illegal Op ( bad Instruction )

[ 2994.146905] Internal error: Oops - undefined instruction: 0 [#1] ARM


PC is at __wake_up+0x10/0x50


void __wake_up(wait_queue_head_t *q, unsigned int mode,

                        int nr_exclusive, void *key)

{

        unsigned long flags;

        spin_lock_irqsave(&q->lock, flags);

        __wake_up_common(q, mode, nr_exclusive, 0, key);

        spin_unlock_irqrestore(&q->lock, flags);

0 Kudos
Reply

3,475 Views
fabio_estevam
NXP Employee
NXP Employee

Yes, it looks like RAM is not properly configured.

Also, I would expect that the kernel crash log would vary each time, right?

3,475 Views
julienmorand
Contributor II

Hi guys,

John, yes I do. I just don't know how to fix it.

Fabio, yes, the log never looks the same.

Also, here is what I've done in the spl_mem_init.c file in U-boot :

static uint32_t dram_vals[] = {

#if defined(CONFIG_MX28)

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000100, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00010101, 0x01010101,

  0x000f0f01, 0x0102010a, 0x00000000, 0x00010101,

  0x00000100, 0x00000100, 0x00000000, 0x00000002,

  0x01010000, 0x07080403, 0x06005003, 0x090000c8,

  0x02009c40, 0x0002030b, 0x0036b009, 0x03270612,

  0x02030202, 0x00c80029, 0x00000000, 0x00000000,

  0x00012100, 0xffff0303, 0x00012100, 0xffff0303,

  0x00012100, 0xffff0303, 0x00012100, 0xffff0303,

  0x00000003, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000612, 0x01000F02,

  0x06120612, 0x00000200, 0x00020007, 0xf4004a27,

  0xf4004a27, 0xf4004a27, 0xf4004a27, 0x07000300,

  0x07000300, 0x07400300, 0x07400300, 0x00000005,

  0x00000000, 0x00000000, 0x01000000, 0x01020408,

  0x08040201, 0x000f1133, 0x00000000, 0x00001f04,

  0x00001f04, 0x00001f04, 0x00001f04, 0x00001f04,

  0x00001f04, 0x00001f04, 0x00001f04, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00010000, 0x00030404,

  0x00000003, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0x00000000, 0x00000000, 0x01010000,

  0x01000000, 0x03030000, 0x00010303, 0x01020202,

  0x00000000, 0x02030303, 0x21002103, 0x00061200,

  0x06120612, 0x04420442, 0x04420442, 0x00040004,

  0x00040004, 0x00000000, 0x00000000, 0x00000000,

  0x00000000, 0xffffffff

according to the register programming file as stated in my previous message.

Thank you.

0 Kudos
Reply

3,475 Views
igorpadykov
NXP Employee
NXP Employee

Hi Julien

one can start with DDR tests and finding new calibration settings

https://community.freescale.com/message/331721#331721

https://community.freescale.com/docs/DOC-96412

Best regards

chip

3,475 Views
julienmorand
Contributor II

Hi chipexpert !

I wasn't aware of DDR stress test, this is an interesting tool.

I've already try the "mem test" from this post https://community.freescale.com/message/375046#375046

on my imx28EVK and I'm working on executing it on my custom board right now.

Also, do you guys have some ddr stress test files ? Or do I need to contact my FAE ?

Thanks a lot for your help.

0 Kudos
Reply

3,475 Views
igorpadykov
NXP Employee
NXP Employee

Hi Julien

sorry I missed that you used i.MX28  and confused with i.MX6 processor.

In general you can try more mature product (just for test)

L2.6.35_1.1.0_ER_SOURCE

Also what power are you using : 5V only or battery source ?

This may be important since 2.6.35 kernel has patches  (below that web page) for some issues.

It may be useful to check if power (if it is provided from i.MX28) is sufficient

for DDR2. One can try this reducing DDR2 operating frequency.

Best regards

chip

0 Kudos
Reply

3,475 Views
julienmorand
Contributor II

Hi chipexpert,

In fact, the linux kernel 2.6.35 was the first kernel I used. It boots up great (so, my system is passing the memory test at boot).

I'm using 5V only power source and I have design my board with an external regulator for DDR2 so that power is sufficient.

Maybe I can try to solder a 128MB RAM chip to see if it's really a matter of RAM configuration.

Thank you, have a nice day.

0 Kudos
Reply

3,475 Views
zaheerm
Contributor I

Facing similar random kernel crashes on our custom platform running iMX6D with 3.10.17. The board works fine with 3.10.9 kernel. Seeking any pointers to identify and resolve the crash.

0 Kudos
Reply

3,475 Views
igorpadykov
NXP Employee
NXP Employee

Hi Julien

Please try update calibration

https://community.freescale.com/message/331721#331721

https://community.freescale.com/docs/DOC-96412

and if issue will persist, create new thread since this is different

processor from original topic.

Best regards

chip

0 Kudos
Reply