I am having a very strange failure on the 3.0.35 kernel. We have a mostly-identical sister board that has no such fault.
Kernel Revision 3.0.35
Patches Applied (all ltib patches distributed && as many patches up to 3.0.101 as possible)
** it doesn't matter if I'm taking straight 3.0.35 ltib based code or not, no change to the failure ***
Bad Board:
Processor: imx6 solo : MC1MX6S7CVM08AB / XAA1317
DDR3: 3QK17 / D9PSL x 2
DDR3 I/O voltage @ 1.5 volts
Good Board:
Processor: imx6 solo : MC1MX6S7CVM08AB / XAA1350
DDR3: 3UK17 / D9QTF
DDR3 I/O voltage @ 1.35 volts
The failure is:
50] Unhandled fault: imprecise external abort (0x1c06) at 0xf932d7a1.
<0>[ 0.092526] Internal error: : 1c06
This is happening just after the kernel attempts to do a __schedule() of the first threads in the system. It is just after the normal kernel debug line:
"CPU: Testing write buffer coherency: ok."
but I've added debug to get further into the code
__schedule:4293 <<< extra debug to tell me the line number of where I am in __schedule()
<5>[0.092414] Scheduling: swapper <<< extra debug to tell me what thread I am about to schedule, prints out next->comm[] in task structure.
Notice that I'm capturing this from RAM after a restart: there is no serial console or debug output functioning after the kernel boots.
Any thoughts on why this might be happening? Any and all thoughts are welcome.
Just to clarify the issue a little.
First off, notes that several instances (every one we have tested) of the "bad board" board fail, and all instances (up to 10 boards) of the "bad board" have had no failure. Essentially these are two distinct revisions of the board, with PCB changes in addition to the part changes mentioned above (NO changes to the line-lengths and layout to the DDR3).
More details of the crash:
kernel_init is a kernel thread
kernel_init is responsible for executing do_basic_setup
do_basic_setup calls do_initcalls
one of the arch_initcalls is customize_machine
customize_machine would pick the board_file and execute it.
In other words... because the device does not even run kernel_init I cannot get the board-file to execute. which means that the problem is so fundamental, we cannot even reconfigure the board as needed.
I have seen some other references that indicate this might be some kind of u-boot failure...?
Update:
the kernel is crashing exactly at kernel_thread_helper, at the very first instruction. the very first instruction is a
msr CPSR_c, r7
10601f80: 20637020 3c5b203a 34303038 38306131 pc : [<80041a08
0x8041a08 == kernel_thread_helper
objdump @ kernel_thread_helper
80041a08 <kernel_thread_helper>:
80041a08: e121f007 msr CPSR_c, r7
80041a0c: e1a00004 mov r0, r4
80041a10: e1a0e006 mov lr, r6
80041a14: e1a0f005 mov pc, r5
But also from the crash log:
106020e0: 616c4620 203a7367 76637a6e 52492020 Flags: nzcv IR
106020f0: 6f207351 20206666 73514946 206e6f20 Qs off FIQs on
10602100: 646f4d20 56532065 32335f43 53492020 Mode SVC_32 IS
10602110: 52412041 5320204d 656d6765 6b20746e A ARM Segment k
In other words: Mode is SVC_32, and if I understand correctly, we should be allowed to execute the msr into r7 without causing a fault.
Now.... since this is an "imprecise" abort, perhaps the crash didn't happen exactly @ 0x80041a08?
Lost and confused.
Hello,
Thank you for your post, however please consider moving it to the right community place (e.g. i.MX Community ) to get it visible for active members.
For details please see general advice Where to post a Discussion? (https://community.freescale.com/docs/DOC-99909 )
Thank you for using Freescale Community.