android kernel crash at exit_mmap

dmitry_sidorenkov · ‎01-23-2024

We have a lot of crashes with unclear reason.
They happen more frequently during monkey tests when there are a lot of actions and eth networking.
This call trace happened on shutdown but shutdown is not necessary for it. Here is it:

Spoiler

[    T1] Call trace:
[    T1]  dump_backtrace.cfi_jt+0x0/0x4
[    T1]  show_stack+0x24/0x34
[    T1]  sysrq_handle_showallcpus+0x9c/0xd4
[    T1]  __handle_sysrq+0x108/0x1f4
[    T1]  write_sysrq_trigger+0x11c/0x188
[    T1]  proc_reg_write+0xec/0x210
[    T1]  vfs_write+0x118/0x3cc
[    T1]  ksys_write+0x84/0xf4
[    T1]  __arm64_sys_write+0x28/0x38
[    T1]  invoke_syscall+0x68/0x150
[    T1]  el0_svc_common.llvm.4347159286016627091+0xc0/0x100
[    T1]  do_el0_svc+0x30/0x9c
[    T1]  el0_svc+0x24/0x7c
[    T1]  el0t_64_sync_handler+0x6c/0xb4
[    T1]  el0t_64_sync+0x1b4/0x1b8
[    C0] sysrq: CPU0: backtrace skipped as idling
[    C4] sysrq: CPU4: backtrace skipped as idling
[    C2] sysrq: CPU2:
[    C3] sysrq: CPU3: backtrace skipped as idling
[    C5] sysrq: CPU5: backtrace skipped as idling
[    T1] sysrq: Show Blocked State
[    T1] task:kworker/1:0     state:D stack:14120 pid:30224 ppid:     2 flags:0x00000008
[    T1] Workqueue: events key_garbage_collector.860fd2f9b6d01b497b0f700c85752b22.cfi_jt
[    T1] Call trace:
[    T1]  __switch_to+0x150/0x1e8
[    T1]  __schedule+0x5bc/0x9a8
[    T1]  schedule+0x8c/0x10c
[    T1]  schedule_timeout+0x4c/0x10c
[    T1]  wait_for_common+0xb0/0x140
[    T1]  wait_for_completion+0x24/0x34
[    T1]  __wait_rcu_gp+0x1ac/0x1d8
[    T1]  synchronize_rcu+0x70/0x9c
[    T1]  key_garbage_collector+0x3d8/0x518
[    T1]  process_one_work+0x22c/0x4a4
[    T1]  worker_thread+0x290/0x510
[    T1]  kthread+0x178/0x1e4
[    T1]  ret_from_fork+0x10/0x20
[    T1] sysrq: Kill All Tasks
[  T325] watchdog: watchdog0: watchdog did not stop!
[  T261] printk: ueventd: 2 output lines suppressed due to ratelimiting
[    C2] Call trace:
[    C2]  dump_backtrace.cfi_jt+0x0/0x4
[    C2]  show_stack+0x24/0x34
[    T1] reboot: Set up alarm timer for 5 sec
[    C2]  showacpu+0x70/0xb4
[    C2]  flush_smp_call_function_queue.llvm.7698391963899168790+0x1d4/0x2e4
[    T1] kvm: exiting hardware virtualization
[    C2]  ipi_handler+0x98/0x150
[    C2]  handle_percpu_devid_irq+0xc4/0x318
[    C2]  handle_domain_irq+0x84/0xf8
[    C2]  gic_handle_irq+0x5c/0x124
[    C2]  call_on_irq_stack+0x3c/0x6c
[    C2]  do_interrupt_handler+0x4c/0xa8
[    C2]  el1_interrupt+0x34/0x64
[    C2]  el1h_64_irq_handler+0x1c/0x2c
[    C2]  el1h_64_irq+0x7c/0x80
[    C2]  follow_page_mask+0x70/0x3f8
[    C2]  follow_page+0x38/0x80
[    C2]  munlock_vma_pages_range+0xbc/0x2c0
[    C2]  exit_mmap+0xec/0x2ec
[    C2]  __mmput+0x3c/0x168
[    C2]  mmput+0x3c/0x78
[    C2]  exit_mm+0x1f8/0x334
[    C2]  do_exit+0x1c8/0xa38
[    C2]  do_group_exit+0x90/0xac
[    C2]  get_signal+0x1d0/0x7b0
[    C2]  do_signal+0xa4/0x258
[    C2]  do_notify_resume+0x7c/0x164
[    C2]  el0_svc+0x5c/0x7c
[    C2]  el0t_64_sync_handler+0x6c/0xb4
[    C2]  el0t_64_sync+0x1b4/0x1b8

[ T1] Call trace: [ T1] dump_backtrace.cfi_jt+0x0/0x4 [ T1] show_stack+0x24/0x34 [ T1] sysrq_handle_showallcpus+0x9c/0xd4 [ T1] __handle_sysrq+0x108/0x1f4 [ T1] write_sysrq_trigger+0x11c/0x188 [ T1] proc_reg_write+0xec/0x210 [ T1] vfs_write+0x118/0x3cc [ T1] ksys_write+0x84/0xf4 [ T1] __arm64_sys_write+0x28/0x38 [ T1] invoke_syscall+0x68/0x150 [ T1] el0_svc_common.llvm.4347159286016627091+0xc0/0x100 [ T1] do_el0_svc+0x30/0x9c [ T1] el0_svc+0x24/0x7c [ T1] el0t_64_sync_handler+0x6c/0xb4 [ T1] el0t_64_sync+0x1b4/0x1b8 [ C0] sysrq: CPU0: backtrace skipped as idling [ C4] sysrq: CPU4: backtrace skipped as idling [ C2] sysrq: CPU2: [ C3] sysrq: CPU3: backtrace skipped as idling [ C5] sysrq: CPU5: backtrace skipped as idling [ T1] sysrq: Show Blocked State [ T1] task:kworker/1:0 state:D stack:14120 pid:30224 ppid: 2 flags:0x00000008 [ T1] Workqueue: events key_garbage_collector.860fd2f9b6d01b497b0f700c85752b22.cfi_jt [ T1] Call trace: [ T1] __switch_to+0x150/0x1e8 [ T1] __schedule+0x5bc/0x9a8 [ T1] schedule+0x8c/0x10c [ T1] schedule_timeout+0x4c/0x10c [ T1] wait_for_common+0xb0/0x140 [ T1] wait_for_completion+0x24/0x34 [ T1] __wait_rcu_gp+0x1ac/0x1d8 [ T1] synchronize_rcu+0x70/0x9c [ T1] key_garbage_collector+0x3d8/0x518 [ T1] process_one_work+0x22c/0x4a4 [ T1] worker_thread+0x290/0x510 [ T1] kthread+0x178/0x1e4 [ T1] ret_from_fork+0x10/0x20 [ T1] sysrq: Kill All Tasks [ T325] watchdog: watchdog0: watchdog did not stop! [ T261] printk: ueventd: 2 output lines suppressed due to ratelimiting [ C2] Call trace: [ C2] dump_backtrace.cfi_jt+0x0/0x4 [ C2] show_stack+0x24/0x34 [ T1] reboot: Set up alarm timer for 5 sec [ C2] showacpu+0x70/0xb4 [ C2] flush_smp_call_function_queue.llvm.7698391963899168790+0x1d4/0x2e4 [ T1] kvm: exiting hardware virtualization [ C2] ipi_handler+0x98/0x150 [ C2] handle_percpu_devid_irq+0xc4/0x318 [ C2] handle_domain_irq+0x84/0xf8 [ C2] gic_handle_irq+0x5c/0x124 [ C2] call_on_irq_stack+0x3c/0x6c [ C2] do_interrupt_handler+0x4c/0xa8 [ C2] el1_interrupt+0x34/0x64 [ C2] el1h_64_irq_handler+0x1c/0x2c [ C2] el1h_64_irq+0x7c/0x80 [ C2] follow_page_mask+0x70/0x3f8 [ C2] follow_page+0x38/0x80 [ C2] munlock_vma_pages_range+0xbc/0x2c0 [ C2] exit_mmap+0xec/0x2ec [ C2] __mmput+0x3c/0x168 [ C2] mmput+0x3c/0x78 [ C2] exit_mm+0x1f8/0x334 [ C2] do_exit+0x1c8/0xa38 [ C2] do_group_exit+0x90/0xac [ C2] get_signal+0x1d0/0x7b0 [ C2] do_signal+0xa4/0x258 [ C2] do_notify_resume+0x7c/0x164 [ C2] el0_svc+0x5c/0x7c [ C2] el0t_64_sync_handler+0x6c/0xb4 [ C2] el0t_64_sync+0x1b4/0x1b8

I think this may be related:Stack corruption in libart.so art::ClassLinker::ResolveMethod in android automotive

A logs file with two crashes attached.

Bio_TICFSL · ‎01-24-2024

Hello,

I can not see why this happening, but probably is due to your are using a non-supported kernel, so please download android from the official website and download it.

https://www.nxp.com/design/design-center/software/embedded-software/i-mx-software/android-os-for-i-m...

Regards

dmitry_sidorenkov · ‎01-25-2024

I've tested kernel with minimal changes from our side and it crashed several times on 84 reboots.
I did checkout to tag: automotive-12.1.0_1.1.0, then added our dts and defconfig files, then added a little changes(patch attached) in kernel code to get android partially working with our SXM application.
To test I used is a simple reboot script, it could made 84 iterations.

#!/bin/bash
for VAR in {1..200}
do
   echo "n=$VAR"
   adb reboot
   sleep 120
done

And has got a dmesg logs file with several crashes, 3 blocks of crashes.
The deadlock from logs we see frequently on our main build.

dmitry_sidorenkov · ‎01-24-2024

This logs are based on Android 12.1.0_1.0.0 (L5.15.52_2.1.0 BSP)
We had to add multiple changes to to fix bugs, added audio codecs, cypress wifi drivers, fixed fec_main.c driver for two eth interfaces. So it is big changes.
I will try get the same with no kernel code changes.

android kernel crash at exit_mmap

android kernel crash at exit_mmap

Android

Linux