Random application crashes due to SIGILL at a fixed offset (ending with 0xdc0)

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Random application crashes due to SIGILL at a fixed offset (ending with 0xdc0)

555 Views
nikhilutane
Contributor I

Hi,

There are a number of SIGILL crashes observed from multiple applications (both proprietary and open source) on our system which is based on B4860 platform.

First thing we verified was whether there are any illegal instructions in our binary and we didn't find any.

Coredumps and extra logs indicate that the instruction is zeroed out at the faulting address (*si_addr = 0x0).

Next suspect was some corruption of the code area in flash. To rule this out flash is mounted read-only.

The nanddump of the flash partition (where binaries are stored) before & during a SIGILL turned out to be exactly the same and thus we rule out any flash corruption.

Since the SIGILL causing code area is in segments of the running process which are write-protected by the MMU, our current suspects are the kernel & drivers (including DMA). The DSP DPAA is directly using physical address. However all DPAA memory regions are mapped into DSP area. So, these pointers may not be from DPAA area. Also the shared memory access for IPC is checked for valid address range.

One lead that we have is that the corruption is always occurring at an offset that ends with 0xdc0.

For e.g.

Faulting instruction address: 0x10653dc0 << printed by our application after catching SIGILL

Faulting instruction address: 0x1000ddc0 << printed by our application after catching SIGILL

flash_erase[8557]: unhandled signal 4 at 0fed6dc0 nip 0fed6dc0 lr 0fed6dac code 30001

nandwrite[8561]: unhandled signal 4 at 0fed6dc0 nip 0fed6dc0 lr 0fed6dac code 30001

awk[4448]: unhandled signal 4 at 0fe09dc0 nip 0fe09dc0 lr 0fe09dbc code 30001

awk[16002]: unhandled signal 4 at 0fe09dc0 nip 0fe09dc0 lr 0fe09dbc code 30001

getStats[20670]: unhandled signal 4 at 0fecfdc0 nip 0fecfdc0 lr 0fecfdbc code 30001

expr[27923]: unhandled signal 4 at 0fe74dc0 nip 0fe74dc0 lr 0fe74dc0 code 30001

Has anyone faced this kind of problem before or can let me know how to proceed with the investigation. Please let me know what additional information you'd need.

I even posted this on Stack Overflow​, but then I came across this forum where I am more likely to get the help.

Looking forward to some pointers.

Thanks in advance.

Nikhil

0 Kudos
0 Replies