Hello,
I'm using IMX6 Dual Cote, it runs linux 3.18, all drivers are stock. Application I created continuously pushes lots of data on the SPI bus (4kB blocks, every 20msec).
And everything seems to be working most of the time. However once every 10-60 minutes I'm getting message: "I/O Error in DMA RX".
This message issued by spi-imx, because it doesn't get a callback from the RX DMA channel. I traced the problem back to the imx-sdma.c driver, and figured that DMA interrupt is not being called with the RX Interrupt Flag set for the appropriate channel.
I tried running SPI at 5Mhz and 30Mhz, this did not effect frequency of the "I/O Error in DMA RX" messages.
Anyone experienced such issued in the past?
Any suggestion what it can be/what next shall I try to debug it?
Anyone runs continuous SPI communication with 3.18 kernel?
I would really appreciate any help!
Thank you!
已解决! 转到解答。
After some debugging the problem was identified. Here is original code from the spi-imx.c driver from 3.18.4 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/spi/spi-imx.c
939: dma_async_issue_pending(master->dma_tx);
940: dma_async_issue_pending(master->dma_rx);
This code starts TX dma channel first, and RX channel second. TX channel is responsible for driving clock line, RX just watches clock, reads MISO, and pushed data into the buffer. The issue is that the TX started first, and if start of the RX is delayed due to another interrupt/event, RX channel is started with a delay, which makes RX channel miss some data on start. So it doesn't get full data transfer, and sitting and waiting for more clocks (but TX is done at this point). This trips RX_DONE timeout timer, and it issues "I/O Error in RX DMA".
It can be fixed by swapping line 940 and 930, and/or adding a spin_lock. Here is how the fix looks like:
spin_lock_irqsave( &start_lock, start_flags );
dma_async_issue_pending(master->dma_rx);
dma_async_issue_pending(master->dma_tx);
spin_unlock_irqrestore( &start_lock, start_flags );
I'm 100% sure if we need to go to the extend of changing DMA driver, to be able to enable both channels with a single write to the (sdma->regs + SDMA_H_START). Can Freescale support engineers review the code, and suggest if this fix is enough?
Offered solution seem to fix the race condition between TX and RX DMA channels.
We should be submitting the bug-report/patch to the kernel.org shortly.
Hello,
Has the engineer come back from vacation yet?
I am also having issues trying to get SPI in DMA mode working with i.MX6 DualLite despite trying out the fixes mentioned in the posts above.
I would really appreciate if anyone can answer whether the TKT238285 hardware issue is a blocker to get DMA mode working with i.MX6 DualLite at all.
Dirk, Thank you for the post! This is what I found out myself last Thursday :smileysad:
I use SPI to push 16kb transfers as often as possible @15Mhz. I do see this error happening once every 3-5 minutes. At 10Mhz - it happens faster, at 1Mhz - almost immediately.
At 30Mhz - less than once an hour. Also I've noticed that sometimes, during 16Kb transfer I txfifo was pushing not 1, but 2 extra bytes (extra bytes are added in a random places of the 16kb block, not at the end). Also, interesting fact: the way TX and RX channels are setup, if TX clocks 1 or 2 extra bytes, the slave will sent 1 or 2 bytes, and it seems like at the end of the DMA transfer this "extra data" will remain in RXFIFO, and will become a part of the RX-buffer during the next transfer.
After I switched to PIO mode, and it seems to work well, I did few overnight loop-back tests at 15Mhz and 20Mhz. data seems fine. But latency in PIO mode is poor, I'm still trying to do something about it. But i'd much rather use DMA. Also, since ECSPI itself works, it seems like a bug in SDMA (which is programmable), so there is a chance it can be fixed...
Anyways. I'm joining Dirk:
Can anybody provide any help about possible workarounds, or at least info about "TKT238285 hardware issue"... It is not even in the Erratas. :smileysad:
Does anybody know where to find some detailed information about
"TKT238285 hardware issue"
?
Looking into the
Chip Errata for the i.MX 6Solo/6DualLite IMX6SDLCE Rev. 5, 12/2014
no "TKT" numbers are used, there.
I have to disagree. The DMA support in spi_imx.c is totally broken, and your patch doesn't change anything for me. On my Wandboard Quad I am not able to get correct results from the simple test included in the kernel documentation unless I totally remove the DMA code. This happens with kernels 3.18.x and 3.19-rc; the bytes I read back after shorting the MISO and MOSI lines are never correctly aligned, bytes are missing or zeroed, word length is ignored; on the logic analyzer clocks are missing or badly deformed, etc.
Reverting to PIO mode restores the correct SPI functionality.
A.Vignani
Hello Alberto,
This could be interesting for you https://patchwork.kernel.org/patch/5908171/, https://patchwork.kernel.org/patch/5908181/, https://patchwork.kernel.org/patch/5908191/
These 3 patches fixes DMA mode itself, timeout due to long transactions and added support for 16- and 32-bits SPI words.
After some debugging the problem was identified. Here is original code from the spi-imx.c driver from 3.18.4 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/spi/spi-imx.c
939: dma_async_issue_pending(master->dma_tx);
940: dma_async_issue_pending(master->dma_rx);
This code starts TX dma channel first, and RX channel second. TX channel is responsible for driving clock line, RX just watches clock, reads MISO, and pushed data into the buffer. The issue is that the TX started first, and if start of the RX is delayed due to another interrupt/event, RX channel is started with a delay, which makes RX channel miss some data on start. So it doesn't get full data transfer, and sitting and waiting for more clocks (but TX is done at this point). This trips RX_DONE timeout timer, and it issues "I/O Error in RX DMA".
It can be fixed by swapping line 940 and 930, and/or adding a spin_lock. Here is how the fix looks like:
spin_lock_irqsave( &start_lock, start_flags );
dma_async_issue_pending(master->dma_rx);
dma_async_issue_pending(master->dma_tx);
spin_unlock_irqrestore( &start_lock, start_flags );
I'm 100% sure if we need to go to the extend of changing DMA driver, to be able to enable both channels with a single write to the (sdma->regs + SDMA_H_START). Can Freescale support engineers review the code, and suggest if this fix is enough?
Offered solution seem to fix the race condition between TX and RX DMA channels.
We should be submitting the bug-report/patch to the kernel.org shortly.
Hi Max,
Have you verified if the issue is also present on the Freescale BSP of Linux L3.0.35_4.1.0? Linux 3.18 is not officially supported and it is possible that the drivers are not fully integrated, so, it is recommended testing with the mentioned BSP version.
Hope this will be useful for you.
Best regards!
/Carlos
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Carlos, I believe I looked at the latest BSP for iMX from Freescale, and seems like SPI driver supplied with it does not support DMA mode.
However we found the bug in SPI/DMA driver from 3.18. I will post a solution as reply to my original post.
Thanks for suggestion! :smileyhappy: