How to debug Oops: Kernel access of bad area, sig: 11 [#1] issue

cancel
Showing results for 
Search instead for 
Did you mean: 

How to debug Oops: Kernel access of bad area, sig: 11 [#1] issue

3,791 Views
Contributor III

Hi,

We are getting below kernel crash while using "insmod" for kernel module built for t1040 processor using 64 bit toolchain.

root@t1040rdb:/media/ram# insmod linux-kernel-bde.ko
linux_kernel_bde: module license 'Proprietary' taints kernel.
Disabling lock debugging due to kernel taint
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0x80000000001a0758
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=24 CoreNet Generic
Modules linked in: linux_kernel_bde(PO+)
CPU: 0 PID: 2700 Comm: insmod Tainted: P           O 3.12.19-rt30-QorIQ-SDK-V1.6+gc29fe1a #3
task: c000000006937400 ti: c00000000657c000 task.ti: c00000000657c000
NIP: 80000000001a0758 LR: 800000000019b274 CTR: c00000000036ddb4
REGS: c00000000657f7a0 TRAP: 0300   Tainted: P           O  (3.12.19-rt30-QorIQ-SDK-V1.6+gc29fe1a)
MSR: 0000000080029000 <CE,EE,ME>  CR: 44000444  XER: 20000000
SOFTE: 1
DEAR: 0000000000000000, ESR: 0000000000000000

GPR00: 800000000019b268 c00000000657fa20 80000000001a8ad0 000000000000002a
GPR04: 0000000044000444 000000000000000d 0000000000000008 0000000000000008
GPR08: 0000000000000000 0000000000000001 00000001a66b1cbc 0000000000000000
GPR12: 0000000024000442 c00000000fff4000 80000000001a7fc8 0000000000000154
GPR16: 0000000000000018 c000000000b7c518 0000000000000000 0000000000000124
GPR20: c000000000afc210 c00000000657fdc0 0000000000000001 80000000001a0b50
GPR24: c0000000069df1c0 c0000000007d9648 0000000000000001 c000000000b3e980
GPR28: 800000000019d118 80000000001a2068 ffffffffffffffed 800000000019c318
NIP [80000000001a0758] gmodule_get+0x0/0xffffffffffffbad8 [linux_kernel_bde]
LR [800000000019b274] ____versions+0x169ac/0x17968 [linux_kernel_bde]
Call Trace:
[c00000000657fa20] [800000000019b268] ____versions+0x169a0/0x17968 [linux_kernel_bde] (unreliable)
[c00000000657fab0] [c00000000000184c] .do_one_initcall+0x14c/0x1a0
[c00000000657fba0] [c0000000000acfc8] .load_module+0x1ea4/0x2394
[c00000000657fd40] [c0000000000ad564] .SyS_init_module+0xac/0xec
[c00000000657fe30] [c000000000000598] syscall_exit+0x0/0x8c
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 75186f9f417e1c86 ]---

Segmentation fault
root@t1040rdb:/media/ram#

However when the same is compiled for 32bit kernel with 32 bit toolchain the kernel module works perfectly. Is there any configuration missing? Can someone provide some clue to debug it further?

Regards,

Chandra Shekhar

Labels (1)
0 Kudos
12 Replies

185 Views
NXP Employee
NXP Employee

Your module crashed in what appears to be the first instruction of the gmodule_get function (which is oddly reported as having a negative size).  Use gdb or objdump to see what that instruction is, and try to figure out what's going wrong.

0 Kudos

185 Views
Contributor III

Hi Scott,

I took the objdump of module to figure out the failure but could not get where negative size is coming. Below is the code and objdump snippet,

/* code for module insertion =====>

int __init

init_module(void)

{

    int rc;

    printk("chanmish[%s,%d]!!!!\n", __FUNCTION__,__LINE__);

    printk("chanmish[%s,%d] %p\n", __FUNCTION__,__LINE__,gmodule_get);

    /* Get our definition */

    _gmodule = gmodule_get();

    if(!_gmodule) return -ENODEV;

/* objdump for above section =====>

0000000000000000 <.init_module>:

  0: 7c 08 02 a6    mflr    r0

  4: fb e1 ff f8        std    r31,-8(r1)

  8: 3f e2 00 00    addis  r31,r2,0

  c: f8 01 00 10    std    r0,16(r1)

  10: fb 81 ff e0      std    r28,-32(r1)

  14: 3b ff 00 00    addi    r31,r31,0

  18: fb a1 ff e8      std    r29,-24(r1)

  1c: 3f 82 00 40    addis  r28,r2,64

  20: fb c1 ff f0      std    r30,-16(r1)

  24: 3b ff 00 30    addi    r31,r31,48

  28: f8 21 ff 71      stdu    r1,-144(r1)

  2c: 3b 9c 00 40  addi    r28,r28,64

  30: 7f e4 fb 78    mr      r4,r31

  34: 38 a0 01 8f  li      r5,399

  38: 7f 83 e3 78  mr      r3,r28

  3c: 3f a2 00 00  addis  r29,r2,0

  40: 48 00 00 01  bl      40 <.init_module+0x40>

  44: 60 00 00 00  nop

  48: 3d 22 00 00  addis  r9,r2,0

  4c: e8 c9 00 00  ld      r6,0(r9)

  50: 3c 62 00 58  addis  r3,r2,88

  54: 7f e4 fb 78    mr      r4,r31

  58: 38 a0 01 90  li      r5,400

  5c: 38 63 00 58  addi    r3,r3,88

  60: 3b bd 00 00  addi    r29,r29,0

  64: 48 00 00 01  bl      64 <.init_module+0x64>

  68: 60 00 00 00  nop

  6c: 3b c0 ff ed    li      r30,-19

  70: 48 00 00 01  bl      70 <.init_module+0x70>

  74: 60 00 00 00  nop

  78: 2f a3 00 00  cmpdi  cr7,r3,0

  7c: f8 7d 00 08  std    r3,8(r29)

  80: 41 fe 01 10  beq+    cr7,190 <.init_module+0x190>

  84: 7f e4 fb 78    mr      r4,r31

  88: 38 a0 01 95  li      r5,405

  8c: 7f 83 e3 78  mr      r3,r28

  90: 48 00 00 01  bl      90 <.init_module+0x90>

/* code for gmodule_get =====>

gmodule_t *

gmodule_get(void)

{

    printk(KERN_ERR "chanmish[%s,%d]\n",__FUNCTION__,__LINE__);

    _gmodule.name = _modname;

    return &_gmodule;

}

/* objdump for above section =====>

0000000000004880 <.gmodule_get>:
    4880: 7c 08 02 a6  mflr    r0
    4884: 3c 82 00 00  addis  r4,r2,0
    4888: f8 01 00 10  std    r0,16(r1)
    488c: 38 84 00 00  addi    r4,r4,0
    4890: f8 21 ff 91    stdu    r1,-112(r1)
    4894: 3c 62 09 60  addis  r3,r2,2400
    4898: 38 84 00 20  addi    r4,r4,32
    489c: 38 a0 0e ad  li      r5,3757
    48a0: 38 63 09 60  addi    r3,r3,2400
    48a4: 48 00 00 01  bl      48a4 <.gmodule_get+0x24>
    48a8: 60 00 00 00  nop
    48ac: 38 21 00 70  addi    r1,r1,112
    48b0: e8 01 00 10  ld      r0,16(r1)
    48b4: 3c 62 00 00  addis  r3,r2,0
    48b8: 38 63 00 00  addi    r3,r3,0
    48bc: 38 63 01 e8  addi    r3,r3,488
    48c0: 7c 08 03 a6  mtlr    r0
    48c4: 4e 80 00 20  blr
    48c8: 00 00 00 00  .long 0x0
    48cc: 00 00 00 01  .long 0x1
    48d0: 80 00 00 00  lwz    r0,0(0)
    48d4: 60 00 00 00  nop
    48d8: 60 00 00 00  nop
    48dc: 60 00 00 00  nop

Please suggest what else I should try to resolve this issue.

Regards,

Chandra Shekhar

0 Kudos

185 Views
NXP Employee
NXP Employee

Is this the exact code that corresponds to the crash dump you posted?  You don't get any output from the printk statements?

It's strange that the instruction dump was all XXXXXXXX.

How did you build the module?  You used the same headers and config as the running kernel?

I tried building a simple out-of-tree module with SDK 1.6, and did not have this problem.

0 Kudos

185 Views
Contributor III

Hi Scott,

Below is the correct dump for the code/objdump snippet I shared.  I am getting printk messages from "init_module" but not from "gmodule_get".

root@t1040rdb:/media/ram# insmod linux-kernel-bde.ko
linux_kernel_bde: module license 'Proprietary' taints kernel.
Disabling lock debugging due to kernel taint
chanmish[init_module,399]!!!!
chanmish[init_module,400] 80000000001a0758
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0x80000000001a0758
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=24 CoreNet Generic
Modules linked in: linux_kernel_bde(PO+)
CPU: 0 PID: 2700 Comm: insmod Tainted: P           O 3.12.19-rt30-QorIQ-SDK-V1.6+gc29fe1a #3
task: c000000006937400 ti: c00000000657c000 task.ti: c00000000657c000
NIP: 80000000001a0758 LR: 800000000019b274 CTR: c00000000036ddb4
REGS: c00000000657f7a0 TRAP: 0300   Tainted: P           O  (3.12.19-rt30-QorIQ-SDK-V1.6+gc29fe1a)
MSR: 0000000080029000 <CE,EE,ME>  CR: 44000444  XER: 20000000
SOFTE: 1
DEAR: 0000000000000000, ESR: 0000000000000000

GPR00: 800000000019b268 c00000000657fa20 80000000001a8ad0 000000000000002a
GPR04: 0000000044000444 000000000000000d 0000000000000008 0000000000000008
GPR08: 0000000000000000 0000000000000001 00000001a66b1cbc 0000000000000000
GPR12: 0000000024000442 c00000000fff4000 80000000001a7fc8 0000000000000154
GPR16: 0000000000000018 c000000000b7c518 0000000000000000 0000000000000124
GPR20: c000000000afc210 c00000000657fdc0 0000000000000001 80000000001a0b50
GPR24: c0000000069df1c0 c0000000007d9648 0000000000000001 c000000000b3e980
GPR28: 800000000019d118 80000000001a2068 ffffffffffffffed 800000000019c318
NIP [80000000001a0758] gmodule_get+0x0/0xffffffffffffbad8 [linux_kernel_bde]
LR [800000000019b274] ____versions+0x169ac/0x17968 [linux_kernel_bde]
Call Trace:
[c00000000657fa20] [800000000019b268] ____versions+0x169a0/0x17968 [linux_kernel_bde] (unreliable)
[c00000000657fab0] [c00000000000184c] .do_one_initcall+0x14c/0x1a0
[c00000000657fba0] [c0000000000acfc8] .load_module+0x1ea4/0x2394
[c00000000657fd40] [c0000000000ad564] .SyS_init_module+0xac/0xec
[c00000000657fe30] [c000000000000598] syscall_exit+0x0/0x8c
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 75186f9f417e1c86 ]---

Segmentation fault
root@t1040rdb:/media/ram#

Also, the kernel module (Linux-kernel-bde) is built with same toolchain and header files which is being used or Linux kernel (uImage) build. However, Filesystem is built with bitbake.

Regards,

Chandra Shekhar

0 Kudos

185 Views
Contributor III

Hi Scot,

One more thing I noticed that generated objdump for .ko module shows "out of bounds" message at few places. Whether this could be the reason for crash? As I am not getting any clue to proceed,

00000000000003f1 <__UNIQUE_ID_vermagic0>:

3f1: 76 65 72 6d  andis.  r5,r19,29293

3f5: 61 67 69 63  ori     r7,r11,26979

3f9: 3d 33 2e 31  addis   r9,r19,11825

3fd: 32 2e 31 39  addic   r17,r14,12601

401: 2d 72 74 33  cmpdi   cr2,r18,29747

405: 30 2d 51 6f  addic   r1,r13,20847

409: 72 49 51 2d  andi.   r9,r18,20781

40d: 53 44 4b 2d  rlwimi. r4,r26,9,12,22

411: 56 31 2e 36  rlwinm  r17,r17,5,24,27

415: 2b 67 63 32  cmpldi  cr6,r7,25394

419: 39 66 65 31  addi    r11,r6,25905

41d: 61 20 53 4d  ori     r0,r9,21325

421: 50 20 6d 6f  rlwimi. r0,r1,13,21,23

425: 64 5f 75 6e  oris    r31,r2,30062

429: 6c 6f 61 64  xoris   r15,r3,24932

42d: 20 6d 6f 64  subfic  r3,r13,28516

431: 76 65 72 73  andis.  r5,r19,29299

435: 69 6f 6e 73  xori    r15,r11,28275

439: Address 0x0000000000000439 is out of bounds.

Regards,

Chandra

0 Kudos

185 Views
NXP Employee
NXP Employee

That looks like you're disassembling something that isn't code.

Could you show precisely how you are building this module?

0 Kudos

185 Views
Contributor III

Hi Scott,

Sorry for very late reply as I was busy with some other priority issues. The kernel module in the question is build using standard out of kernel directory module build process. Below is the Makefile for the same.

MODULE := $(MOD_NAME).o

KMODULE := $(MOD_NAME).ko

PRE_COMPILED_OBJ := obj_$(MOD_NAME).o

obj-m := $(MODULE)

$(MOD_NAME)-y := $(MODULE_SYM) $(PRE_COMPILED_OBJ)

ifeq (,$(CROSS_COMPILE))

# CROSS compiler is  powerpc64-fsl_networking-linux-

export CROSS_COMPILE

endif

SAVE_CFLAGS := ${CFLAGS}

include $(SDK)/make/Make.config

PWD := $(shell pwd)

ifneq ($(ARCH),)

# ARCH is powerpc

A := ARCH=$(ARCH)

export ARCH

endif

# Standard SDK include path for building source files that export

# kernel symbols.

override EXTRA_CFLAGS = -I${SDK}/include -I${SDK}/systems/linux/kernel/modules/include -I${SDK}/systems/bde/linux/include

# The precopiled object needs a dummy command file to avoid warnings

# from the Kbuild scripts (modpost stage).

# Kernels before 2.6.17 do not support external module symbols files,

# so we create a dummy to prevent build failures.

   

$(KMODULE):

    rm -f *.o *.ko .*.cmd

    rm -fr .tmp_versions

    ln -s $(LIBDIR)/$(MODULE) $(PRE_COMPILED_OBJ)_shipped

    echo "suppress warning" > .$(PRE_COMPILED_OBJ).cmd

    $(MAKE) -C $(KERNDIR) CROSS_COMPILE=$(CROSS_COMPILE) M=$(PWD) modules

    if [ ! -f Module.symvers ]; then echo "old kernel (pre-2.6.17)" > Module.symvers; fi

    cp -f $(KMODULE) $(LIBDIR)

    rm -f $(PRE_COMPILED_OBJ)_shipped

EXTRA_CFLAGS = $(CFLAGS)

CFLAGS := ${SAVE_CFLAGS}

0 Kudos

185 Views
NXP Employee
NXP Employee

That is not any "standard out of kernel directory module build process" that I've seen before, nor do I know what "$(SDK)/make/Make.config" is.

See https://www.kernel.org/doc/Documentation/kbuild/modules.txt  for the standard procedure.

0 Kudos

185 Views
Contributor III

Hi Scott,

include $(SDK)/make/Make.config -> this file defines some macros used in code base. This is specific to SDK code.

Regards,

Chandra

0 Kudos

185 Views
Contributor I

Have you tried compiling it in-tree ? Also as compiled-in (instead of module) ?

0 Kudos

185 Views
Contributor III

Hi Max,

The module in this question is part of vendor provided SDK. Which is not possible to compile in  tree. The same code base compiles and work perfectly with 32 bit kernel and tool chain. Only issue is when we compile with 64bit kernel and tool chain. I am doubting there is some issue with linking.

0 Kudos

185 Views
Contributor III

Hi Scott/Max,

I added the gcc flag "-mlongcall" while building and get rid of crash while insmod. Exactly what this flag does?

Regards,

Chandra

0 Kudos