I have been using iMX6Q for a while now - mostly using a Sabrelite board.
We use Freescale latest LTIB BSP with linux 3.0.35.
My company now have their own custom board which was based on the ARM2 board design, and for the board bring up, the reference linux kernel configuration we used was as well the ARM2 configuration.
During and after the board bring up, we noticed many crashes with varying symptoms, all look like some sort of memory corruption. (Invalid instructions, Page table corruption, etc...). Some unexplained hangs were noted as well.
So far we discovered that these crashes are related to:
1, The rate of hardware interrupts (no matter what was their origin: i2c, mmc, usb) where low rate=LESS crashes, high rate=MORE crashes, very high rate=LESS crashes.
2. A function in the kernel called arch_idle_with_workaround(), in system.c,
We managed to overcome the problem by using a newer revision of the iMX (1.2 or AC), or using the previous silicon (rev1.1) by running enough processes so that the processor won't become "idle" as often.
What puzzles me is why we see these crashes on our board, but not on the Sabrelite (which is of rev 1.0 or rev 1.1)?
I suspect it is one of the following:
1. We have been using the ARM2 linux configuration as reference, so that perhaps we missed kernel updates which are not affecting the ARM2 configuration as it is not a commonly used board.
2. We have a mismatch with our clock configuration between the kernel and the board (i.e. DDR clocks).
Can anyone assist?
Can you suggest other things we should check?
And generally speaking, is it safe to use the linux kernel ARM2 configuration on the new Freescale LTIB releases?
* We have JTAG gear and software (DSTREAM/DS5), but failed to use it so far for debugging the processor on our board.
The latest BSP is not support of the old chips. (the old chips for example TO1.0/TO1.1 for i.MX6Q and TO1.0 for i.MX6DL.)
It will crash on old the chips with FSL latest BSP release.
If you have the old board with old chips, please try to add the enable_wait_mode=off to disable the wait mode.
Actually we have been using the latest BSP with the old chips (1.0,1.1) successfully so far with BD.SL (Sabrelite) boards. I suppose that if the BSP is running ok on Sabrelite, it should function properly on our custom board as well - am I missing something?
We continued debugging the issue, and found out that using the new iMX6 revision (TO1.2) we see similar failures.
When we removed all CPU frequency scaling features from the linux kernel configuration, the board became stable (But very hot, as cpu frequency is always high).
Unlike what we expected, we tried using a kernel with cpu frequency scaling disabled on our TO1.1 board - but it still crashed from time to time.
Like I mentioned before, we suspect the issue is related to CPU clock rates configuration, probably with respect to the fact that we have LPDDR2 and not DDR3.
Can you please check the following points so that we can narrow down the root cause?
- When adding "enable_wait_mode=off" option into the command line, it's helpful? From your early description, the hang happens in arch_idle_with_workaround. So it mean wait mode is enabled.
- Are you checking whether it's related to CPU voltage instead of CPU frequency switch? Can you please provide the failed logs?
Eventually, after digging in the kernel configuration, we noticed that the LDO bypass was disabled. Enabling it resolved most of our hangs / crashes.
As we are still investigating these hangs (which might as well be root caused in the board manufacturing), this might not be the correct or only answer, but it sure did made our system more stable...