We have a lot of crashes with unclear reason.
They happen more frequently during monkey tests when there are a lot of actions and eth networking.
This call trace happened on shutdown but shutdown is not necessary for it. Here is it:
[ T1] Call trace:
[ T1] dump_backtrace.cfi_jt+0x0/0x4
[ T1] show_stack+0x24/0x34
[ T1] sysrq_handle_showallcpus+0x9c/0xd4
[ T1] __handle_sysrq+0x108/0x1f4
[ T1] write_sysrq_trigger+0x11c/0x188
[ T1] proc_reg_write+0xec/0x210
[ T1] vfs_write+0x118/0x3cc
[ T1] ksys_write+0x84/0xf4
[ T1] __arm64_sys_write+0x28/0x38
[ T1] invoke_syscall+0x68/0x150
[ T1] el0_svc_common.llvm.4347159286016627091+0xc0/0x100
[ T1] do_el0_svc+0x30/0x9c
[ T1] el0_svc+0x24/0x7c
[ T1] el0t_64_sync_handler+0x6c/0xb4
[ T1] el0t_64_sync+0x1b4/0x1b8
[ C0] sysrq: CPU0: backtrace skipped as idling
[ C4] sysrq: CPU4: backtrace skipped as idling
[ C2] sysrq: CPU2:
[ C3] sysrq: CPU3: backtrace skipped as idling
[ C5] sysrq: CPU5: backtrace skipped as idling
[ T1] sysrq: Show Blocked State
[ T1] task:kworker/1:0 state:D stack:14120 pid:30224 ppid: 2 flags:0x00000008
[ T1] Workqueue: events key_garbage_collector.860fd2f9b6d01b497b0f700c85752b22.cfi_jt
[ T1] Call trace:
[ T1] __switch_to+0x150/0x1e8
[ T1] __schedule+0x5bc/0x9a8
[ T1] schedule+0x8c/0x10c
[ T1] schedule_timeout+0x4c/0x10c
[ T1] wait_for_common+0xb0/0x140
[ T1] wait_for_completion+0x24/0x34
[ T1] __wait_rcu_gp+0x1ac/0x1d8
[ T1] synchronize_rcu+0x70/0x9c
[ T1] key_garbage_collector+0x3d8/0x518
[ T1] process_one_work+0x22c/0x4a4
[ T1] worker_thread+0x290/0x510
[ T1] kthread+0x178/0x1e4
[ T1] ret_from_fork+0x10/0x20
[ T1] sysrq: Kill All Tasks
[ T325] watchdog: watchdog0: watchdog did not stop!
[ T261] printk: ueventd: 2 output lines suppressed due to ratelimiting
[ C2] Call trace:
[ C2] dump_backtrace.cfi_jt+0x0/0x4
[ C2] show_stack+0x24/0x34
[ T1] reboot: Set up alarm timer for 5 sec
[ C2] showacpu+0x70/0xb4
[ C2] flush_smp_call_function_queue.llvm.7698391963899168790+0x1d4/0x2e4
[ T1] kvm: exiting hardware virtualization
[ C2] ipi_handler+0x98/0x150
[ C2] handle_percpu_devid_irq+0xc4/0x318
[ C2] handle_domain_irq+0x84/0xf8
[ C2] gic_handle_irq+0x5c/0x124
[ C2] call_on_irq_stack+0x3c/0x6c
[ C2] do_interrupt_handler+0x4c/0xa8
[ C2] el1_interrupt+0x34/0x64
[ C2] el1h_64_irq_handler+0x1c/0x2c
[ C2] el1h_64_irq+0x7c/0x80
[ C2] follow_page_mask+0x70/0x3f8
[ C2] follow_page+0x38/0x80
[ C2] munlock_vma_pages_range+0xbc/0x2c0
[ C2] exit_mmap+0xec/0x2ec
[ C2] __mmput+0x3c/0x168
[ C2] mmput+0x3c/0x78
[ C2] exit_mm+0x1f8/0x334
[ C2] do_exit+0x1c8/0xa38
[ C2] do_group_exit+0x90/0xac
[ C2] get_signal+0x1d0/0x7b0
[ C2] do_signal+0xa4/0x258
[ C2] do_notify_resume+0x7c/0x164
[ C2] el0_svc+0x5c/0x7c
[ C2] el0t_64_sync_handler+0x6c/0xb4
[ C2] el0t_64_sync+0x1b4/0x1b8
I think this may be related:Stack corruption in libart.so art::ClassLinker::ResolveMethod in android automotive
A logs file with two crashes attached.
Hello,
I can not see why this happening, but probably is due to your are using a non-supported kernel, so please download android from the official website and download it.
Regards
I've tested kernel with minimal changes from our side and it crashed several times on 84 reboots.
I did checkout to tag: automotive-12.1.0_1.1.0, then added our dts and defconfig files, then added a little changes(patch attached) in kernel code to get android partially working with our SXM application.
To test I used is a simple reboot script, it could made 84 iterations.
#!/bin/bash
for VAR in {1..200}
do
echo "n=$VAR"
adb reboot
sleep 120
done
And has got a dmesg logs file with several crashes, 3 blocks of crashes.
The deadlock from logs we see frequently on our main build.
This logs are based on Android 12.1.0_1.0.0 (L5.15.52_2.1.0 BSP)
We had to add multiple changes to to fix bugs, added audio codecs, cypress wifi drivers, fixed fec_main.c driver for two eth interfaces. So it is big changes.
I will try get the same with no kernel code changes.