3.0.35 kernel on iMX6Q Sabre Lite

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

3.0.35 kernel on iMX6Q Sabre Lite

Jump to solution
11,053 Views
marcinmiklas
Contributor III

I've recently tried the 3.0.35 kernel from L3.0.35_12.09.03_ER. It is very unstable no matter what kernel configuration I'm using, even on kernel provided in debian package in L3.0.35_12.09.03_ER

 

It crashes with errors like:

 

Unable to handle kernel paging request at virtual address ffffffff

pgd = ba2f4000

[ffffffff] *pgd=4fffe821, *pte=00000000, *ppte=00000000

Internal error: Oops: 17 [#1] PREEMPT SMP

Modules linked in:

CPU: 1    Not tainted  (3.0.35 #1)

PC is at anon_vma_clone+0x4c/0x158

LR is at anon_vma_clone+0x3c/0x158

pc : [<800e0504>]    lr : [<800e04f4>]    psr: a00f0013

sp : bff8be78  ip : 0bfd7000  fp : 80ad8d40

r10: 00000000  r9 : ba142cf0  r8 : baa561c8

r7 : bff0e840  r6 : 00000000  r5 : ffffffff  r4 : bfa31f78

r3 : bff8a000  r2 : 80a7fc90  r1 : 8003d004  r0 : bfa31f78

Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user

Control: 10c53c7d  Table: 4a2f404a  DAC: 00000015

Process init (pid: 1, stack limit = 0xbff8a2f0)

Stack: (0xbff8be78 to 0xbff8c000)

be60:                                                       2ab98fff bff0e878

be80: 80ad8d20 ba142cb8 bff0e840 ba142cb8 bff0e754 80acdd20 00000001 bff0e738

bea0: bc0b07dc 800e0630 bff0e840 bfc29040 ba142cb8 bff0e754 80acdd20 8007138c

bec0: bffd2054 00000020 bff0e744 bff0e758 bfc28000 bff8a000 bfc2803c bfc2907c

bee0: bc0178cc ba050000 01200011 80acdd20 bff8bf08 00000000 bff8bfb0 bff8a000

bf00: 7eabd718 80072118 bfd82c80 80a97fe0 ba0501d8 ba0501d0 bff8a000 00000000

bf20: 00000000 ba050124 00000020 00000000 bffd2000 01200011 00000000 2ab91360

bf40: 00000000 800416c4 bff8a000 00000000 2ad08000 80072534 2ab913c8 00000000

bf60: 00000000 bff8bf80 800416c4 7eabd91c 00000008 00000000 7eabd91c 80081910

bf80: 7ffbfeff fffffffe 00000000 00000000 0000000b 2ab913c8 7eabd720 2ab91360

<0>bfa0: 00000078 80041540 00000000 2ab913c8 01200011 00000000 00000000 00000000

bfc0: 2ab913c8 7eabd720 2ab91360 00000078 2ab91820 00000001 00000001 2ad08000

bfe0: 00000078 7eabd718 2ac97edf 2ac41276 000f0030 01200011 00000000 00000000

[<800e0504>] (anon_vma_clone+0x4c/0x158) from [<800e0630>] (anon_vma_fork+0x20/)

[<800e0630>] (anon_vma_fork+0x20/0x130) from [<8007138c>] (dup_mm+0x1a8/0x4d8)

[<8007138c>] (dup_mm+0x1a8/0x4d8) from [<80072118>] (copy_process+0x9e4/0xdb8)

[<80072118>] (copy_process+0x9e4/0xdb8) from [<80072534>] (do_fork+0x48/0x2a4)

[<80072534>] (do_fork+0x48/0x2a4) from [<80041540>] (ret_fast_syscall+0x0/0x30)

Code: e2504000 11a0a006 0a000024 e5985004 (e5956000)

---[ end trace 8479eb65fc3380dc ]---

 

Virtual addresses are different each time.

 

With kernel 3.0.15 and the same rootfs everything works perfectly. Have somebody experienced simillar issue?

 

Googling the issue suggests that the broken memory might be an issue, but I run the memcheck program on 3.0.15 kernel and it founds nothing.

 

I would really appreciate any help.

 

Regards,

Marcin Miklas

 

PS. More crashes attached.

Original Attachment has been moved to: 3.0.35_crashes.zip

Labels (2)
Tags (2)
1 Solution
6,246 Views
JasonLiu
NXP Employee
NXP Employee

This 12.09.03 release is for the i.MX6DL Beta release. Could you please use the i.MX6Q GA release, which has been released out recently.

View solution in original post

0 Kudos
Reply
50 Replies
3,229 Views
daiane_angolini
NXP Employee
NXP Employee

After one night working, my imx6 board is alive!

Loop 23/500:

  Stuck Address       : ok        

  Random Value        : ok

  Compare XOR         : ok

  Compare SUB         : ok

  Compare MUL         : ok

  Compare DIV         : ok

  Compare OR          : ok

  Compare AND         : ok

  Sequential Increment: ok

  Solid Bits          : ok        

  Block Sequential    : testing 175

root@freescale /home$ uname -a

Linux freescale 3.0.35-2039-g267e004 #1 SMP PREEMPT Sun Sep 23 07:23:09 CDT 2012 armv7l GNU/Linux



Do you think one night would be enough for test?

Do you have another board?

Could you, please, try to use the prebuilt rootfs, but not the Ubuntu one.

Let's work to figure out what is our difference.

0 Kudos
Reply
3,229 Views
RandyKrakora
NXP Employee
NXP Employee

Someone mentioned there might be a userspace mismatch, so I would try what Daiane suggested, use the images for uboot, kernel and rootfs ( non-Ubuntu ) from the L3.0.35_12.09.01_GA_images_MX6Q.tar.gz package.

0 Kudos
Reply
3,229 Views
marcinmiklas
Contributor III

I've already did. I've tried uboot + kernel + rootfs from L3.0.35_12.09.01_GA_images, effect is always the same - Unable to handle kernel paging request at virtual address - sometimes during booting, sometimes after some time. In case when booting goes without error, if you leave the board untouched it can last for ours. But when you try to run some applications or interact with them, the error happens almost immediately. 

I've also written kernel directly to sdcard at 1MB and got the same error.

To summarize:

I've tried kernels from L3.0.35_12.09.01 and L3.0.35_12.09.03, both precompiled by Freescale and compiled by me (using various configuration, booth upgrading working configuration from 3.0.15 kernel, and using imx6_defconfig).

I've tried uboot from L3.0.35_12.09.03 and L3.0.35_12.09.01.

I've tried booting kernel from rootfs using 6q_bootscript, booting from tftp and booting 1MB offset of sdcard.

I've tried rootfs: provided by Freescale (rootfs.tar.gz and oneiric.tar.gz) and also mine based on ubuntu_core_12.04.

I've got always the same Unable to handle kernel paging request at virtual address ...

Daiane has the same revision of the board and for her everything works correctly.

There is for sure some difference in kernel. Some changes that makes use of something that is broken in my board? I think that for now I stay with 3.0.15 kernel. Unless someone has some completely different idea to try?

Thanks a lot all of you for ideas and testing.

0 Kudos
Reply
3,229 Views
EricNelson
Senior Contributor II

Hi Marcin,

Sorry for all of the issues you're seeing. We're not generally seeing this issue elsewhere with any of the 3.0.35 kernels, so it's not clear whether you're doing something unique and exposing a bug (hardware or software) or whether there perhaps there's an issue with your board.

The reason for checking the userspace is that there are sometimes ABI changes between the kernel and userspace (planned or inadvertent) which cause problems if the two don't match.

The first log in the file 3.0.35_crashes.zip above shows that the failure happened in process tutorial4_es20 (one of the Vivante OpenGL examples):

             Process tutorial4_es20 (pid: 5063, stack limit = 0xb99822f0)

Have you seen the crashes without using OpenGL? Since the OpenGL stack and Vivante driver do return kernel memory to userspace, I wonder if something you're doing to test is exposing a bug.

The nature of the crashes (random locations, random processes) makes it appear that something  is stepping on kernel data structures.

It sounds as if you're using (or have used) a stock kernel and U-Boot (or multiple of them) and seeing the issue, but I haven't seen a complete set of testing steps that we could follow. Daiane and Richard have tried, but it seems that they've been unsuccessful in reproducing your results.

I'd like to do the same, but it seems we're missing something here.

3,229 Views
marcinmiklas
Contributor III

I've seen the crash without OpenGL also, sometimes even before mounting rootfs.

I suspect some hardware issue with my board. But the mystery is that it works great with 2.6.38 and 3.0.15 kernels.

Can someone suggest some kind of automatic tests that I can run on 3.0.15 kernel to check board for hardware issues?

0 Kudos
Reply
3,229 Views
marcinmiklas
Contributor III

I've got the new Sabre Lite board rev D and it works like a charm with 3.0.35 kernel.

3,229 Views
JasonLiu
NXP Employee
NXP Employee

Could you please try the attached u-boot.bin and see any improvement for your situation?

0 Kudos
Reply
3,229 Views
EricNelson
Senior Contributor II

Hi Hui,

What's the origin of this U-Boot version?

     U-Boot 2009.08-00619-ge4ab626-dirty (Oct 19 2012 - 15:56:24)

I can't seem to find e4ab626 in the commit logs from git://git.freescale.com/imx/uboot-imx.git.

0 Kudos
Reply
3,229 Views
marcinmiklas
Contributor III

Well, your u-boot is magical. No crashes so far. But it seems that OpenGL examples and accelerated video decoding doesn't work. But I need double check that.

0 Kudos
Reply
3,229 Views
marcinmiklas
Contributor III

More testing shows that it still crashes, but less frequently.

0 Kudos
Reply
3,229 Views
marcinmiklas
Contributor III

Bad thing happened.

I tried to restore my version of uboot, so I run upgradeu in uboot console to upgrade uboot from sdcard as usual, everything went ok, memory was erased, new uboot written, then read to verify if it was written correctly, script asked me to do reset and after reset the board didn't talk to me any more.

Nothing on serial console.

I've switched to USB OTG mode connected micro usb cable, connected other end to laptop, reset the board, but no freescale device is shown in lsusb and unbicking instructions doesn't work. Also windows machine doesn't detect it.

So I guess, my sabre lite died permanently.

0 Kudos
Reply
3,229 Views
EricNelson
Senior Contributor II

Can you try both 0/1 and 1/0 positions for SW1 to see if  you get the device to show up via USB?

0 Kudos
Reply
3,229 Views
marcinmiklas
Contributor III

Thanks for suggesting 1/0 settings for SW1, I was able to bring board back to life today.  But since I'm 80% sure that I tried that settings on Friday also, I suspect that the board has some hardware issue.

0 Kudos
Reply
3,229 Views
daiane_angolini
NXP Employee
NXP Employee

Hey, guy. Try one more time from scratch!

You may not kill your board only changing bootloader...

Try again.

0 Kudos
Reply
3,229 Views
RandyKrakora
NXP Employee
NXP Employee

When you used the 12.09.01 images, did you burn that uboot into the SPI NOR as well? And then cycle power? Because it sounds like something was mismatched before.

0 Kudos
Reply
3,229 Views
marcinmiklas
Contributor III

Hmm, where I can find the attachment?

0 Kudos
Reply
3,229 Views
JasonLiu
NXP Employee
NXP Employee

Just in my previous reply. There is attachment named: u-boot.bin.zip.

0 Kudos
Reply
3,229 Views
daiane_angolini
NXP Employee
NXP Employee

Thanks.

I will let the board run memtest overnight. But so far I got

root@freescale /home$ ./memtester 1024 200

memtester version 4.2.2 (32-bit)

Copyright (C) 2010 Charles Cazabon.

Licensed under the GNU General Public License version 2 (only).

pagesize is 4096

pagesizemask is 0xfffff000

want 1024MB (1073741824 bytes)

got  657MB (689893376 bytes), trying mlock ...locked.

Loop 1/200:

  Stuck Address       : ok        

  Random Value        : ok

  Compare XOR         : ok

  Compare SUB         : ok

  Compare MUL         : ok

  Compare DIV         : ok

  Compare OR          : ok

  Compare AND         : ok

  Sequential Increment: ok

  Solid Bits          : ok        

  Block Sequential    : ok        

  Checkerboard        : ok        

  Bit Spread          : ok        

  Bit Flip            : ok        

  Walking Ones        : ok        

  Walking Zeroes      : ok        

  8-bit Writes        : ok

  16-bit Writes       : ok

Loop 2/200:

  Stuck Address       : ok        

  Random Value        : ok

  Compare XOR         : ok

  Compare SUB         : ok

  Compare MUL         : ok

  Compare DIV         : ok

  Compare OR          : ok

  Compare AND         : ok

  Sequential Increment: ok

  Solid Bits          : ok        

  Block Sequential    : testing   6


0 Kudos
Reply
3,229 Views
daiane_angolini
NXP Employee
NXP Employee

My board is SABER LITE 10-18-11.

I´m using L3.0.35_12.09.03_ER_images_MX6Q pre built image.

I copied the no-padding uboot to 1K

I copied the uImage to 1M

And I placed the rootfs.ext2 content into a EXT4 partition on my SDcard (my board is "booting" from SD3

How much time does it take to crash your kernel?

0 Kudos
Reply
3,229 Views
marcinmiklas
Contributor III

From 2 seconds to 5 hours (when memtester is run). In 99% cases it is within first couple of minutes.

0 Kudos
Reply