AnsweredAssumed Answered

Intermittent Hangup on Custom Hardware

Question asked by Steve Anderson on May 4, 2015
Latest reply on Jun 1, 2015 by Steve Anderson

A new and interesting bug has landed on my lap, and I am fishing for ideas to trap it beyond those I list at the bottom of this note:

I THINK this is more MX6 than Android or Linux, but it is a 'fun' bug to try and trap - very elusive so far.

 

Unfortunately it occurs on custom hardware so it won't help anyone else to repro the problem.

I am told that the same versions, from the same software base, built for the SabreSD does not exhibit the problem - and I am taking that for an important clue.

 

The target system has an iMX6q and is using Android version 4.2.2:

- U-Boot 2009.08-00690-gee2c5a6

- Linux version 3.0.35-06147-g1b17dab

 

The problem is intermittent and related to playing media (either audio or video) in a repetitive loop.

It appears to happen when one iteration of the loop tries to start up before the media player has finished shutting down the last instance, but this is mostly speculation based upon observations of others..


The symptom is pretty much that it stops dead, presenting a black screen - then the watchdog timer resets the system.

However, when I say intermittent, I mean that running a 1 second video or audio clip back to back it will take 2-8 hours to produce the bug by accident.

Therefore it is not something I can easily just sit there and watch, this is far beyond the useful attention span of a typical person to watch for that long effectively.

 

I have modified the watchdog driver to provide the watchdog pre-timeout ISR with a stack dump, and it all works well.

(tested by making a 30 second timeout with 10 second refresh and a 28-second pre-timeout interrupt).

I have that ISR call dump_stack() and for the purposes of development and testing it all works swimmingly well.

I further instrumented the ISR to indicate by LEDs when the ISR is triggered...

 

I then set the watchdog for 30-second timeout and 10-second refresh with a 5-second pre-timeout interrupt...

However, when the bug manifests itself, there is no stack dump... and no LED showing the ISR is even called.

Since there is also no illegal instruction trap, or access violation trap, or anything like that, I do conclude that the system is not running off the rails...

Since the pre-timeout ISR provides no stack dump, and the ISR is apparently not called, I also conclude that interrupts are disabled at the time of the failure...

 

I don't think an emulator will help me much, a reset would kill the register values I need - and the attention span thing becomes a problem to manually trap it...

Any emulator suggestions, am I missing an obvious idea here?

 

I am just beginning to explore making the interrupt an FIQ instead of an IRQ.

In this case the FIQ is for its non-maskable property - speed is not an issue, I know the system is about to die, and whole seconds have gone by with no activity...

Any pointers or examples?  Any potential problems I am missing calling dump_stack() during an FIQ?

 

One idea suggested was to instrument the player code around startup and shutdown, to see which drivers may be invoked but not returned from...

 

Am I missing some other relatively obvious course of action?

-- no content change, just trying to get rid of the assumed answered, which it wasn't...

Outcomes