How to debug crash in CFS scheduler in linux kernel

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to debug crash in CFS scheduler in linux kernel

596 Views
RockDeng
Contributor I

Recently, I've encounterd a kernel oops on a embedded linux system based on I.MX6ULL and 5.10.9 version of kernel. This issue has bothered me for more than a week. It's really painful.

First of all, let me show the kernel oops for your reference.

<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 0000001c
pgd = f6125cdd
[0000001c] *pgd=00000000
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in: lp(O) lrdmwl_sdio(O) lrdmwl(O) mac80211(O) cfg80211(O) compat(O) g_serial usb_f_serial u_serial rfid(O) industrial(O) applicator(O) libcomposite tm(PO) cutter(O) sensor(O) motor(O) pe(O) ui_leds(O) doorbell(O) psu(O) ui_buttons(O) modules(O)
CPU: 0 PID: 1107 Comm: kworker/0:0 Tainted: P           O      5.10.9 #10
Hardware name: Freescale i.MX6 Ultralite (Device Tree)
Workqueue: events sdio_irq_work
PC is at set_next_entity+0x8/0x244
LR is at pick_next_task_fair+0xc0/0x3d0
pc : [<c01496b0>]    lr : [<c0149b00>]    psr: 20000093
sp : c105bc04  ip : 00000000  fp : c105bc8c
r10: c30de198  r9 : 0000000a  r8 : cbfe6bc9
r7 : 00000000  r6 : c0c0df18  r5 : 00000000  r4 : 00000000
r3 : c105a000  r2 : 00000002  r1 : 00000000  r0 : c0c0df18
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c53c7d  Table: 81cd4059  DAC: 00000051
Process kworker/0:0 (pid: 1107, stack limit = 0x4d6559d7)
Stack: (0xc105bc04 to 0xc105c000)
bc00:          c0c0df00 00000000 c0c0df18 00000000 cbfe6bc9 0000000a c30de198
bc20: c105bc8c c0149b00 c30dde00 c0c0df00 cbfe6bc9 0000000a cbfe6bc9 c07274c8
bc40: 40000093 c057d8e8 c18cf380 c057e284 00000000 00000000 0000353a c0727acc
bc60: c105bd6c 00000000 00000000 c30dde00 ffffe000 c105bdb4 c105bdb8 00000002
bc80: 00000001 00000000 c105bc9c c0727acc 7fffffff ffffe000 10000100 c072b4ac
bca0: c1155210 00000013 c18cf380 0000000b c105bd3c 00000000 ffffe000 00000000
bcc0: 10000100 7fffffff ffffe000 c105bdb4 c105bdb8 00000002 00000001 c0728ebc
bce0: 10000100 c30dde00 c105bdb8 c105bdb8 c18cf000 c105bda4 c105bdb4 00000000
bd00: c19aa800 c056669c 00000100 0000ffff 00000000 00000001 c19aa800 c0571cb4
bd20: c0d09494 c09e2bd8 c09e2bb0 c7f34162 00000900 00000100 81e0b900 00000035
bd40: 10000100 00000000 00000000 00000000 00000000 000001b5 00000000 00000000
bd60: 00000000 c105bd6c c105bda4 3b9aca00 00000000 00000100 00000001 00000000
bd80: 00000000 00000200 00000000 00000000 c105bda4 00000001 00000001 c105bd2c
bda0: 00000001 00000000 c105bd3c c105bd6c 00000000 00000000 c105bce8 c105bce8
bdc0: 00000000 c105bdc4 c105bdc4 c0565bac 00000000 00000000 00000000 00000000
bde0: c30de19c 00000100 00000000 c1155400 c1e0ba00 00000000 00000000 00000000
be00: 00000000 c057328c 00000000 c1e0b900 00000000 00000100 c30dde00 00000000
be20: 00000100 00000001 c1e0a800 c1e0b900 c1dc1da0 c1e0a800 c0c0d4d8 c0573480
be40: c1e0b900 00000100 c105beb4 bf1b705c c0c0df00 c11a4000 00000000 c193a8c0
be60: c105beb4 c18cf000 c19aa800 00000001 c7edc200 00000000 c18cf2d8 bf1b8388
be80: 00000000 c0727bd0 c1dc1da0 00000000 00000000 c1dc2da0 c014e830 00000000
bea0: 00000003 c18cf274 c1dc0ca0 00000004 60000013 00000001 00000000 00000000
bec0: 00000002 c046cc94 00000000 c18cf000 c19aa800 00000001 c7edc200 00000000
bee0: c18cf2d8 00000000 c0c0d4d8 c05736a0 00000100 00000122 c7edc200 c18cf000
bf00: c18cf2d4 c18cf000 c18cf2d4 00000000 c7edc200 00000000 c18cf2d8 c0573cb8
bf20: c18cf2d4 c4d14880 00000000 c0137088 c0cdb0a0 ffffe000 c4d14880 c0c0d4d8
bf40: c4d14894 c0cdb0a0 ffffe000 c0c0d4ec 00000008 c0137354 c0d09426 c09dfd28
bf60: c3085300 c3085300 c3085bc0 c105a000 00000000 c0137310 c4d14880 c10dfed4
bf80: c3085320 c013c7ec 00000000 c3085bc0 c013c6a8 00000000 00000000 00000000
bfa0: 00000000 00000000 00000000 c0100148 00000000 00000000 00000000 00000000
bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c01496b0>] (set_next_entity) from [<cbfe6bc9>] (0xcbfe6bc9)
Code: c0c0df18 c0c0df48 e92d4ff0 e1a04001 (e591301c)
---[ end trace b08b1ab27e2a6927 ]---
note: kworker/0:0[1107] exited with preempt_count 2

By using gdb and addr2line, it shows that the NULL pointer dereference happens at the if statment in set_next_entity(), please refer to the folloing code sample, and the parameter se is a NULL pointer.

set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
      /* 'current' is not kept within the tree. */
      if (se->on_rq) {

By a little bit deep diving, it turns out that the NULL pointer "se" is returned by pick_next_entity(cfs_rq, NULL) which called by pick_next_task_fair. The sample code show as below. Thus, something is woring in the pick_next_entity(cfs_rq, NULL) just before calling set_net_entity(cfs_rq, se).

pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
……
    do {
         se = pick_next_entity(cfs_rq, NULL);
         set_next_entity(cfs_rq, se);
         cfs_rq = group_cfs_rq(se);
     } while (cfs_rq);

And then, I've tried to dump some members in cfs_rq in pick_next_entity(cfs_rq, NULL). It shows as below. As you can see, the root node and leftmost of the rb_tree are equal to 0/NULL. So I guess the rb_tree has corrupted sometime early than calling this pick_next_entity().

(pick_next_entity:4505)cfs_rq[0xC0C0DF18]
(pick_next_entity:4507)cfs_rq->curr[0x0]
(pick_next_entity:4508)cfs_rq->next[0x0]
(pick_next_entity:4509)cfs_rq->last[0x0]
(pick_next_entity:4510)cfs_rq->skip[0x0]
(pick_next_entity:4511)cfs_rq->nr_running[1]
(pick_next_entity:4512)cfs_rq->h_nr_running[1]
(pick_next_entity:4513)cfs_rq->idle_h_nr_running[0]
(pick_next_entity:4514)cfs_rq->tasks_timeline.rb_leftmost[0x0]
(pick_next_entity:4515)cfs_rq->tasks_timeline.rb_root.rb_node[0x0]
(pick_next_task_fair:7131) picked next entity.
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 0000001c

As I'm not familiar with the fair scheduler and rb tree of kernel, I have no idea how to detect when the rb tree corrupted and potential reasons cause this issue. But I found a discussion of a similar issue on this url, which provide the folloing patch to detect when the rb tree corrupted.

diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 1fd61a9af45c..b4b4df3ad0fc 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -130,7 +130,28 @@ struct rb_root_cached {
 #define RB_ROOT_CACHED (struct rb_root_cached) { {NULL, }, NULL }

 /* Same as rb_first(), but O(1) */
-#define rb_first_cached(root) (root)->rb_leftmost
+#define __rb_first_cached(root) (root)->rb_leftmost
+
+#ifndef CONFIG_RBTREE_DEBUG
+# define rb_first_cached(root) __rb_first_cached(root)
+# define rbtree_cached_debug(root) do { } while(0)
+
+#else
+static inline struct rb_node *rb_first_cached(struct rb_root_cached *root)
+{
+   struct rb_node *leftmost = __rb_first_cached(root);
+
+   WARN_ON(leftmost != rb_first(&root->rb_root));
+   return leftmost;
+}
+
+#define rbtree_cached_debug(root)                  \
+do {                                   \
+   WARN_ON(rb_first(&(root)->rb_root) != __rb_first_cached((root)));   \
+   WARN_ON(!RB_EMPTY_ROOT(&(root)->rb_root) && !__rb_first_cached((root))); \
+   WARN_ON(RB_EMPTY_ROOT(&(root)->rb_root) && __rb_first_cached((root))); \
+} while (0)
+#endif /* CONFIG_RBTREE_DEBUG */

 static inline void rb_insert_color_cached(struct rb_node *node,
                      struct rb_root_cached *root,
@@ -139,6 +160,8 @@ static inline void rb_insert_color_cached(struct rb_node *node,
    if (leftmost)
        root->rb_leftmost = node;
    rb_insert_color(node, &root->rb_root);
+
+   rbtree_cached_debug(root);
 }

 static inline void rb_erase_cached(struct rb_node *node,
@@ -147,6 +170,8 @@ static inline void rb_erase_cached(struct rb_node *node,
    if (root->rb_leftmost == node)
        root->rb_leftmost = rb_next(node);
    rb_erase(node, &root->rb_root);
+
+   rbtree_cached_debug(root);
 }

 static inline void rb_replace_node_cached(struct rb_node *victim,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2f6fb96405af..62ab9f978bc6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1727,6 +1727,16 @@ config BACKTRACE_SELF_TEST

      Say N if you are unsure.

+config RBTREE_DEBUG
+   bool "Red-Black tree sanity tests"
+   depends on DEBUG_KERNEL
+   help
+     This option enables runtime sanity checks on all variants
+     of the rbtree library. Doing so can cause significant overhead,
+     so only enable it in non-production environments.
+
+     Say N if you are unsure.
+

After implementing this patch, multiple warning message can be found before the kernel oops, and the first warning shows as below. By some analysis of this message, it turns out that the first corruption of rb tree is in rb_first_cached() which called by __dequeue_entity() which called by set_next_entity().

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at ./include/linux/rbtree.h:175 set_next_entity+0x1f4/0x2dc
Modules linked in: lp(O) g_serial usb_f_serial u_serial rfid(O) industrial(O) applicator(O) libcomposite tm(PO) cutter(O) sensor(O) motor(O) pe(O) ui_leds(O) doorbell(O) psu(O) ui_buttons(O) modules(O)
CPU: 0 PID: 0 Comm: swapper Tainted: P           O      5.10.9 #11
Hardware name: Freescale i.MX6 Ultralite (Device Tree)
[<c010bb64>] (unwind_backtrace) from [<c0109ed8>] (show_stack+0x10/0x14)
[<c0109ed8>] (show_stack) from [<c0120ba4>] (__warn+0xe4/0xe8)
[<c0120ba4>] (__warn) from [<c0120c38>] (warn_slowpath_fmt+0x90/0xa0)
[<c0120c38>] (warn_slowpath_fmt) from [<c014987c>] (set_next_entity+0x1f4/0x2dc)
[<c014987c>] (set_next_entity) from [<c0149d40>] (pick_next_task_fair+0xbc/0x3c4)
[<c0149d40>] (pick_next_task_fair) from [<c07286a8>] (__schedule+0x200/0x7b0)
[<c07286a8>] (__schedule) from [<c072905c>] (schedule_idle+0x38/0x7c)
[<c072905c>] (schedule_idle) from [<c01478a8>] (do_idle+0x14c/0x214)
[<c01478a8>] (do_idle) from [<c0147c40>] (cpu_startup_entry+0xc/0x10)
[<c0147c40>] (cpu_startup_entry) from [<c0b00d44>] (start_kernel+0x42c/0x444)
---[ end trace 5b768ae127781c97 ]---

That's all I got at this time, but still have no idea why the rb tree corrupted and not able to get the root cause of this issue. Any suggestion to debug this kind of issue, to get more information related to it for deep analysis will be greatly appreciated. Could it be caused by some defect in device drivers? or it might be a kernel bug?

Labels (1)
0 Kudos
0 Replies