Hang in sDMA driver on Linux

cancel
Showing results for 
Search instead for 
Did you mean: 

Hang in sDMA driver on Linux

3,545 Views
matthewcampbell
Contributor III

We are working on a custom design using the iMX6 solo lite that is based off the MCIMX6SLEVK. Our kernel is based off the Freescale 3.14.52_1.0.0ga release. We have an application we are developing using ALSA for audio that is encountering serious issues with hangs that seem to originate in the sDMA driver. When there is an audio under/over flow the sDMA driver enters enters an unrecoverable error state. When examining the driver (drivers/dma/imx-sdma.c) it is unclear to us how the driver could detect error conditions in the sDMA processor/scripts/etc.

It looks like the DMA Request Error Register (SDMAARM_EVTERR) is the mechanism for the sDMA processor to report errors to the ARM core, yet this register is never accessed from the sDMA driver. By manually reading this register we can verify that an error condition for our channel (SSI2) is waiting to be read indicating this was never read by the driver as this register read-to-clear. Additionally, the sDMA driver in Linux is in an inconsistent state where it believes the DMA is in progress (sdmac->status == DMA_IN_PROGRESS), however the SDMA processor has reported an error, and the SDMA script running on the SDMA channel has halted and no further sDMA interrupts are received by the kernel.

The user-facing manifestation of this problem is a permanent audio dropout, followed by occasional messages from ALSA: "capture write error (DMA or IRQ trouble?)".

Strangely when we check the buffer descriptors there is no error reported in error bit. However we cannot investigate this any further as the sDMA scripts and firmware are closed-source. We would appreciate some direction or advice on how how to debug our initial problem and any insight into the architecture of the sDMA driver that may be missing.

One final node, we saw similar behavior on the iMX6 quad (SabreSD board) with SGTL5000 codec.

Thanks,

~Matt

Tags (4)
7 Replies

554 Views
paul_katarzis
Contributor III

I believe we are experiencing the same problem. We are using a UDOO Quad (IMX6) with UDOObuntu 2.2.0. The UDOObuntu 2.2.0 kernel provides the fsl-ssi, imx-pcm-dma, and imx-sdma drivers to aid in building software for devices that interface with ALSA. For certain period and buffer size combinations (very small period size and few periods in the stream buffer), we noticed that audio would likely cut out unexpectedly. When this happens we would notice errors reported by ALSA possibly indicating buffer xrun. We found out that the errors were due to the function snd_pcm_period_elapsed no longer being called. The SDMA interrupt handler ultimately leads to this function being called, but we found that the interrupt handler was no longer being triggered.

We thought that perhaps there was a race condition between the SDMA ROM scripts and imx-sdma driver modifying the D bits of a channel's BDs. If all D bits become 0, the SDMA ROM scripts would halt and no longer transfer data even if the SSI peripherals want to initiate a transfer. However, we examined the BDs and saw that all of the D bits were set to 1 when the audio cut out.

We examined the SDMA ROM scripts and found points where the SDMA core could execute a software break point. However, the SDMAARM_ONCE_STAT register did not reveal that a software break point had been executed when audio cut out. What we found was that the SDMAARM_ONCE_STAT register indicated that the SDMA core was always sleeping. The SDMA core does not seem to leave this strange sleep state even when SDMAARM_EVTPEND is nonzero.

If we kill our audio application when the audio goes out and run it again, the SDMA works again but briefly as expected. We found we could mitigate the problem by doing the following:

  • If the period size must be small, dramatically increase the number of periods in the stream buffer. This translates to having a large number of BDs that point to small buffers.
  • If the stream buffer must be small, increase the period size and reduce the number of periods. This translates to having a small number of BDs that point to large buffers.
0 Kudos

554 Views
per_orback
Contributor II

Hi Matt,

I am a bit curious if you have found a solution for the SDMA hang? We experience the same type of problem using the SDMA with UART on imx6dl. Our Kernel is based on Freescale 3.14.52_2.0.0ga. After some various time of use the SDMA seems to stall and does no longer trigger any interrupts.

0 Kudos

554 Views
jonahpetri
Contributor II

This issue seems to affect more than just audio - we can get the same behavior to appear in other systems which use the SDMA driver.

This seems to occur when the sdma_tasklet does not execute in time, or is preempted by some other thread.  This behavior is easily replicated on a busy system, or by introducing manual stalls into sdma_tasklet.

The SDMA script is still opaque, so we are quite stuck on where to go from here.  What information would be useful to provide?

554 Views
igorpadykov
NXP TechSupport
NXP TechSupport

Hi Matt, Jonah

all sdma documenation is under nda and may be provided creating service request.

Available public documentation was in SDK (attached iMX6_Firmware_Guide.pdf) and may

be useful to look at tutorial

Freescale i.MX SDMA tutorial (part I)

Note NXP/FSL i.MX6 Sabre/EVK reference boards do not use SGTL5000, they use WM8962,CS42888

which working fine with SDMA in BSPs provided for these boards (attached Linux Manual from package below)

Board Support Packages (29)

L3.14.52_1.1.0_MX6QDLSOLO (REV L3.14.52_1.1.0)

http://www.nxp.com/products/microcontrollers-and-processors/arm-processors/i.mx-applications-process...

for bringing up SDMA with custom BSPs and peripherals it may be recommended to apply to

NXP Professional Services|NXP

Best regards

igor

-----------------------------------------------------------------------------------------------------------------------

Note: If this post answers your question, please click the Correct Answer button. Thank you!

-----------------------------------------------------------------------------------------------------------------------

554 Views
matthewcampbell
Contributor III

Hi Igor,

I misspoke with the SGTL5000. We did see this issue with the WM8962 on the SabreSD evaluation board. Generally audio would work fine, but it would present the DMA timeout issue we describe above randomly after a few hours of playback.

We currently have an NDA with NXP. Could you let me know who I can contact about getting the SDMA documentation? We are confident about the bug existing in the Linux SDMA driver, but our efforts to fix it have stalled without further documentation. If we are unable to get the documentation is there a way we can escalate this bug to a programming team that could confirm it and create a fix?

One last point, the Linux SDMA driver from the NXP BSP is doing several things that the documentation you provided above explicitly discourage. For example, the documentation states that you should never set the 'CONT' and 'WRAP' bits on the final buffer descriptor field, however the cyclic DMA configuration of the Linux SDMA driver does exactly this. Additionally, the Linux SDMA driver sets a bit in the buffer descriptor that is marked as reserved in all documentation we have access to.

Thank you,

~Matt

554 Views
igorpadykov
NXP TechSupport
NXP TechSupport

Hi Matt

for obtaining sdma documentation one can create service request.

For escalating driver issues, preferable option to ask local fae to

create ticket in mpu support group.

Best regards

igor

0 Kudos

554 Views
jonahpetri
Contributor II

Hi igor,

Thanks for the reply.  We will try to file a service request to get at some SDMA documentation.  Is this issue known?  It seems that there are many reports of this behavior on this forum and elsewhere, but no solutions have been found.  I find it likely that this is a driver problem. 

My current theory is that the SDMA script finishes transferring a buffer, and finds that there are no buffers descriptors marked as DONE (that is, that SDMA owns the buffer).  My guess is that this situation results in an error in the SDMA script which goes unhandled by the driver.  This would be easy to reproduce by stalling the CPU, as I described above.

Is this correct?  If so, how do we restart the SDMA script at that point?

0 Kudos