imx6 SPI in DMA mode sometime throws "I/O Error in DMA RX"

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

imx6 SPI in DMA mode sometime throws "I/O Error in DMA RX"

跳至解决方案
12,584 次查看
maxmaxmax
Contributor II

Hello,

I'm using IMX6 Dual Cote, it runs linux 3.18, all drivers are stock. Application I created continuously pushes lots of data on the SPI bus (4kB blocks, every 20msec).

And everything seems to be working most of the time. However once every 10-60 minutes I'm getting message: "I/O Error in DMA RX".

This message issued by spi-imx, because it doesn't get a callback from the RX DMA channel. I traced the problem back to the imx-sdma.c driver, and figured that DMA interrupt is not being called with the RX Interrupt Flag set for the appropriate channel.

I tried running SPI at 5Mhz and 30Mhz, this did not effect frequency of the "I/O Error in DMA RX" messages.

Anyone experienced such issued in the past?

Any suggestion what it can be/what next shall I try to debug it?

Anyone runs continuous SPI communication with 3.18 kernel?

I would really appreciate any help!

Thank you!

标记 (4)
1 解答
6,021 次查看
maxmaxmax
Contributor II

After some debugging the problem was identified. Here is original code from the spi-imx.c driver from 3.18.4 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/spi/spi-imx.c


939: dma_async_issue_pending(master->dma_tx);

940: dma_async_issue_pending(master->dma_rx);

This code starts TX dma channel first, and RX channel second. TX channel is responsible for driving clock line, RX just watches clock, reads MISO, and pushed data into the buffer. The issue is that the TX started first, and if start of the RX is delayed due to another interrupt/event, RX channel is started with a delay, which makes RX channel miss some data on start. So it doesn't get full data transfer, and sitting and waiting for more clocks (but TX is done at this point). This trips RX_DONE timeout timer, and it issues "I/O Error in RX DMA".

It can be fixed by swapping line 940 and 930, and/or adding a spin_lock. Here is how the fix looks like:

spin_lock_irqsave( &start_lock, start_flags );

dma_async_issue_pending(master->dma_rx);

dma_async_issue_pending(master->dma_tx);

spin_unlock_irqrestore( &start_lock, start_flags );

I'm 100% sure if we need to go to the extend of changing DMA driver, to be able to enable both channels with a single write to the (sdma->regs + SDMA_H_START). Can Freescale support engineers review the code, and suggest if this fix is enough?


Offered solution seem to fix the race condition between TX and RX DMA channels.

We should be submitting the bug-report/patch to the kernel.org shortly.

在原帖中查看解决方案

0 项奖励
回复
11 回复数
6,021 次查看
DirkBehme
Contributor III
6,021 次查看
rendy
NXP Employee
NXP Employee

Hi,

TKT238285 is our internal ticket number, it landed in mainline by mistake. I think the errata are prepared and not yet released but cannot tell you more at this time. Will post more information when the assigned engineer comes from vacation.

Rene

DirkBehme, maxmaxmax

0 项奖励
回复
6,021 次查看
chikigai
Contributor II

Hello,

Has the engineer come back from vacation yet?

I am also having issues trying to get SPI in DMA mode working with i.MX6 DualLite despite trying out the fixes mentioned in the posts above.

I would really appreciate if anyone can answer whether the TKT238285 hardware issue is a blocker to get DMA mode working with i.MX6 DualLite at all.

6,021 次查看
maxmaxmax
Contributor II

Dirk, Thank you for the post! This is what I found out myself last Thursday :smileysad:

I use SPI to push 16kb transfers as often as possible @15Mhz. I do see this error happening once every 3-5 minutes. At 10Mhz - it happens faster, at 1Mhz - almost immediately.

At 30Mhz - less than once an hour. Also I've noticed that sometimes, during 16Kb transfer I txfifo was pushing not 1, but 2 extra bytes (extra bytes are added in a random places of the 16kb block, not at the end). Also, interesting fact: the way TX and RX channels are setup, if TX clocks 1 or 2 extra bytes, the slave will sent 1 or 2 bytes, and it seems like at the end of the DMA transfer this "extra data" will remain in RXFIFO, and will become a part of the RX-buffer during the next transfer.

After I switched to PIO mode, and it seems to work well, I did few overnight loop-back tests at 15Mhz and 20Mhz. data seems fine. But latency in PIO mode is poor, I'm still trying to do something about it. But i'd much rather use DMA. Also, since ECSPI itself works, it seems like a bug in SDMA (which is programmable), so there is a chance it can be fixed...

Anyways. I'm joining Dirk:

Can anybody provide any help about possible workarounds, or at least info about "TKT238285 hardware issue"... It is not even in the Erratas.  :smileysad:

0 项奖励
回复
6,021 次查看
DirkBehme
Contributor III

Does anybody know where to find some detailed information about

"TKT238285 hardware issue"

?

Looking into the

Chip Errata for the i.MX 6Solo/6DualLite IMX6SDLCE Rev. 5, 12/2014

no "TKT" numbers are used, there.

0 项奖励
回复
6,020 次查看
albertovignani
Contributor I

I have to disagree. The DMA support in spi_imx.c is totally broken, and your patch doesn't change anything for me. On my Wandboard Quad I am not able to get correct results from the simple test included in the kernel documentation unless I totally remove the DMA code. This happens with kernels 3.18.x and 3.19-rc; the bytes I read back after shorting the MISO and MOSI lines are never correctly aligned, bytes are missing or zeroed, word length is ignored; on the logic analyzer clocks are missing or badly deformed, etc.

Reverting to PIO mode restores the correct SPI functionality.

A.Vignani

0 项奖励
回复
6,021 次查看
antonb21
Contributor II

Hello Alberto,

This could be interesting for you https://patchwork.kernel.org/patch/5908171/, https://patchwork.kernel.org/patch/5908181/, https://patchwork.kernel.org/patch/5908191/

These 3 patches fixes DMA mode itself, timeout due to long transactions and added support for 16- and 32-bits SPI words.

0 项奖励
回复
6,021 次查看
albertovignani
Contributor I

Yes, these patches work and fix all my problems (I am now on kernel 4.0.0-rc1). Thank you.

0 项奖励
回复
6,022 次查看
maxmaxmax
Contributor II

After some debugging the problem was identified. Here is original code from the spi-imx.c driver from 3.18.4 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/spi/spi-imx.c


939: dma_async_issue_pending(master->dma_tx);

940: dma_async_issue_pending(master->dma_rx);

This code starts TX dma channel first, and RX channel second. TX channel is responsible for driving clock line, RX just watches clock, reads MISO, and pushed data into the buffer. The issue is that the TX started first, and if start of the RX is delayed due to another interrupt/event, RX channel is started with a delay, which makes RX channel miss some data on start. So it doesn't get full data transfer, and sitting and waiting for more clocks (but TX is done at this point). This trips RX_DONE timeout timer, and it issues "I/O Error in RX DMA".

It can be fixed by swapping line 940 and 930, and/or adding a spin_lock. Here is how the fix looks like:

spin_lock_irqsave( &start_lock, start_flags );

dma_async_issue_pending(master->dma_rx);

dma_async_issue_pending(master->dma_tx);

spin_unlock_irqrestore( &start_lock, start_flags );

I'm 100% sure if we need to go to the extend of changing DMA driver, to be able to enable both channels with a single write to the (sdma->regs + SDMA_H_START). Can Freescale support engineers review the code, and suggest if this fix is enough?


Offered solution seem to fix the race condition between TX and RX DMA channels.

We should be submitting the bug-report/patch to the kernel.org shortly.

0 项奖励
回复
6,021 次查看
CarlosCasillas
NXP Employee
NXP Employee

Hi Max,

Have you verified if the issue is also present on the Freescale BSP of Linux L3.0.35_4.1.0? Linux 3.18 is not officially supported and it is possible that the drivers are not fully integrated, so, it is recommended testing with the mentioned BSP version.


Hope this will be useful for you.
Best regards!
/Carlos

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 项奖励
回复
6,021 次查看
maxmaxmax
Contributor II

Carlos, I believe I looked at the latest BSP for iMX from Freescale, and seems like SPI driver supplied with it does not support DMA mode.

However we found the bug in SPI/DMA driver from 3.18. I will post a solution as reply to my original post.

Thanks for suggestion! :smileyhappy:

0 项奖励
回复