iMX28 audio stalls irrecoverably

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

iMX28 audio stalls irrecoverably

1,925 Views
raúlssiles
Contributor II

Hello:

We have developed a device based on the iMX.28 evk with some modifications, which includes TAS5760 audio codec.

We also based on the L2.6.35_1.1.0_130130 BSP, where ltib tool is used to build the image.

On top of that we run our application which uses audio device on top of the alsa-lib 1.0.14 (included in the BSP).

After some long audio tests (hours), the audio stalls. Irrecoverably. We tried to avoid the stall using alsa-lib API in different ways(see below), closing and opening the sound device (snd_pcm_close/snd_pcm_open) and even killing the application and starting it over. The only way to recover sound is restarting the whole system.

This is the sound configuration dump(from alsa-lib) in one of our latest tests.


stream       : PLAYBACK
access       : MMAP_INTERLEAVED
format       : S16_LE
subformat    : STD
channels     : 1
rate         : 48000
exact rate   : 48000 (48000/1)
msbits       : 16
buffer_size  : 16384
period_size  : 2048
period_time  : 42666
tstamp_mode  : NONE
period_step  : 1
avail_min    : 2048
period_event : 0
start_threshold  : 16384
stop_threshold   : 16384
silence_threshold: 0
silence_size : 0
boundary     : 1073741824
state       : PREPARED
trigger_time: 0.000000
tstamp      : 30989.487253764
delay       : 0
avail       : 16384
avail_max   : 0

 

Our audio routine is called every 100ms approximately. When there's audio to play we check (snd_pcm_avail) if there's room in the audio ring buffer to inject samples, if not we wait for next call. We write sample chunks while there's room enough in the audio ring buffer. We do some basic error checking to handle underrun or audio readiness (snd_pcm_state). Also, it's usual that audio needs to be stopped, in that case we call snd_pcm_drop.

 

Once we started the test, unpredictably, audio stalls. It may take from 30 minutes to 10 hours. When audio stalls, what we see is that snd_pcm_avail always returns 0 (or a very low value), snd_pcm_state returns 3 SND_PCM_STATE_RUNNING  but no audio can be heard. At this point we can see through /proc/interrupts that MXS PCM DMA channel interrupt count also stalls, it doesn't increase.

Shortly before this point, when audio still worked, we had an empty audio ringbuffer, audio routine filled the audio ring buffer till there was no more room for a new audio chunk. When the audio behaved correctly, the ring buffer space increased when audio routine was called again, meaning that audio was send to the codec.

 

In case it helps, rarely kernel throws a message like:

klogd: [ 5129.800000] mxs_pcm_dma_irq: DMA audio channel 20 (mxs-saif-0) error

 

From the userspace point of view, we tried using  SND_PCM_ACCESS_MMAP_INTERLEAVED and SND_PCM_ACCESS_RW_INTERLEAVED modes. We also tried using snd_pcm_avail_update as oposed to snd_pcm_avail. None of these tests changed audio behaviour.

 

We would appreciate some directions to solve the issue.

Original Attachment has been moved to: xrun.zip

Original Attachment has been moved to: hw_ptr-skip_lost-interrupt.zip

Original Attachment has been moved to: appcrash_reboot.zip

Original Attachment has been moved to: lost-interrupt.zip

Labels (3)
0 Kudos
7 Replies

1,218 Views
abhijeetb89
Contributor I

Hello,

we are also facing similar problem. We are using OSS driver instead of ALSA. After playing and stopping .wav files for some time, audio stalls and we can see the message "Is the DMA channel dead?" in kernel logs.

I would like to know if any solution exists for this problem.

Also, i would appreciate if you could tell us the way to dump AHB-to-APBX registers.

Thanks.

0 Kudos

1,218 Views
raúlssiles
Contributor II

Hello Abhijeet:

Provided NXP support in this case has not fulfilled the expectations I'll try to guide you despite my time limitations.

When we researched into the issue we found that it was an undocumented misbehaviour of the chip. Someone else shed light into it and the problem is actually minimised into vanilla kernel. Take a look at Mr Pargmann related work:

kernel/git/stable/linux-stable.git - Linux kernel stable tree 

There may be some other related commits and some discussion at linux-arm-kernel mailing list.

[v4,3/5] dma: mxs-dma: Fix channel reset hardware bug - Patchwork 

Mr. Pargmann and Pengutronix deserves all the credit.

Regarding the register dump, we used a physical memory dumper in userspace called memtool.

As we did not reviewed the topic there may have been some unnoticed advances.

Hope this helps.

0 Kudos

1,218 Views
raúlssiles
Contributor II

We investigated the issue a little further. We made register dump and find some things.

This is register dump *AFTER* an audio stall.

SAIF0

0x80042000: 0x08000801
0x80042010: 0x80000050
0x80042020: 0x00000000
0x80042030: 0x01010000

APBX DMA channel 4
0x800242C0: 0x40800780
0x800242D0: 0x408007E0
0x800242E0: 0x2000000E
0x800242F0: 0x40820020
0x80024300: 0x00010000
0x80024310: 0x0160000D
0x80024320: 0x20001FE0

APBX DMA
0x80024000: 0x00000000
0x80024010: 0xF0530000
0x80024020: 0x00000000
0x80024030: 0x00000000
0x80024040: 0x00000000

After this, we stop the application. And have this (some comments are added)

0x80024000:  00000000
0x80024010:  00500000
0x80024020:  00000000
0x80024030:  00100000 SAIF0 reset is not automatically cleared (unable to free resources)
0x80024040:  00000000

0x800242C0:  408008A0
0x800242D0:  40800780
0x800242E0:  0008000E
0x800242F0:  40828000
0x80024300:  00020000
0x80024310:  01A00008 State 0x8 READ_FLUSH
0x80024320:  00080000

It looks like SAIF0 dma channel reset can't be completed. I wonder why.

Let me know if you need further information.

0 Kudos

1,218 Views
igorpadykov
NXP Employee
NXP Employee

Hi Raúl 

could you try patch on

i.MX28: aplay fails after playing a WAV file many times 

other choices for narrow down problem could be reproducing issue

on i.MX28 EVK board, use obds standalone audio test and additional test memory.

https://community.freescale.com/docs/DOC-1455

https://community.freescale.com/message/375692#375692

Lab and Test Software (1)
On-Board Diagnostic Suit for the i.MX28 (REV 1)
http://www.nxp.com/products/software-and-tools/software-development-tools/i.mx-software-and-tools/i....

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

1,218 Views
raúlssiles
Contributor II

Hi again. Happen to read the i.MX28 Chip Errata and I wonder if we are bumping this one:

ENGR121616
DMA: APBH/APBX DMA channel can stall while waiting to access a APBH/APBX bus peripheral when the channel freeze bit is set

Thanks.

0 Kudos

1,218 Views
raúlssiles
Contributor II

Reading again my description of the problem I noticed that I incorrectly stated that we are using L2.6.35_1.1.0_130130 SDK, whereas we are actually using L2.6.35_11.09.01_ER_source. I'm sorry about this.

We will therefore give a try to the 1.1.0 release but I'd appreciate if you could advance if the upgrade could be relevant to this issue or not.

Apologies for the inconvenience.

0 Kudos

1,218 Views
raúlssiles
Contributor II

We proceeded to apply the patch you suggested. Also we enabled some sound debug options to ease debug, namely:

CONFIG_SND_VERBOSE_PROCFS=y

CONFIG_SND_VERBOSE_PRINTK=y
CONFIG_SND_DEBUG=y
CONFIG_SND_PCM_XRUN_DEBUG=y

After this we eventually got the audio system to stall (again). It's a very hard to reproduce problem so far, so we are investing some effort in getting a method to reproduce it. Once the stalled happens I can't really say anything different is showing at the logs, what we see are some messages from the kernel reporting some issue that may or may not be relevant to the problem. Anyway I think this latter messages are due to the above kernel debug enabling.

I've attached some logs to this thread, I'll summarise what I see in them.
(Note, due to the time required for the tests most of the time they were unattended, hence the behavior is not totally known)
- xrun: Possibly one of the calls to the snd_pcm_avail went wrong, there was an underrun and kernel reported it. Application also detected and handled the error. I *think* audio went on after this but stalled eventually.
- lost interrupt: kernel reports "mxs_pcm_dma_irq: DMA audio channel 20 (mxs-saif-0) error" and a backtrace dump is triggered, I presume it is triggered by some inconsistency in the audio hw_ptr. The mxs_pcm_dma_irq error is not fatal for the audio subsystem, not even the later underrun. It goes on playing after that before it definitely stall.
- hw_ptr-skip_lost-interrupt: Several "hw_ptr skipping" and "PCM: Lost interrupts?" I can't tell if sound is still working or not after this traces, but I thought these traces may give some clue about the problem.
- appcrash_reboot: In previous tests, once the audio stalls, application does nothing. In this case, when the audio is detected to be stalled, audio is restarted. This is: closed (snd_pcm_close), reopened, reconfigured and used again. Unfortunately, this also fails at first step since close is not possible according to the error "Is the DMA channel dead?" which is part of the suggested and applied patch.

As I said, we will focus our efforts on reproducing the issue, first on our system and if possible, in the EVK. Anyway, pointers are welcome.

Thanks and regards,

0 Kudos