After some debugging the problem was identified. Here is original code from the spi-imx.c driver from 3.18.4 https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/spi/spi-imx.c
939: dma_async_issue_pending(master->dma_tx);
940: dma_async_issue_pending(master->dma_rx);
This code starts TX dma channel first, and RX channel second. TX channel is responsible for driving clock line, RX just watches clock, reads MISO, and pushed data into the buffer. The issue is that the TX started first, and if start of the RX is delayed due to another interrupt/event, RX channel is started with a delay, which makes RX channel miss some data on start. So it doesn't get full data transfer, and sitting and waiting for more clocks (but TX is done at this point). This trips RX_DONE timeout timer, and it issues "I/O Error in RX DMA".
It can be fixed by swapping line 940 and 930, and/or adding a spin_lock. Here is how the fix looks like:
spin_lock_irqsave( &start_lock, start_flags );
dma_async_issue_pending(master->dma_rx);
dma_async_issue_pending(master->dma_tx);
spin_unlock_irqrestore( &start_lock, start_flags );
I'm 100% sure if we need to go to the extend of changing DMA driver, to be able to enable both channels with a single write to the (sdma->regs + SDMA_H_START). Can Freescale support engineers review the code, and suggest if this fix is enough?
Offered solution seem to fix the race condition between TX and RX DMA channels.
We should be submitting the bug-report/patch to the kernel.org shortly.