On our custom iMX6Q board, we use the SPI1 and SPI2 to communicate with other devices.
Currently, we write a program with many threads to access SPI1 and SPI2 simultaneously to do the burn-in test.
And we found that sometimes the one of these two SPI bus will hang on "__spi_sync" function. (use "cat /proc/xxxx/wchan" to check)
After we trace the SPI driver source code in the kernel, we found that:
In the "spi_imx.c" and in the function "spi_imx_transfer", you call the "clk_enable" and "clk_disable" function.
But the function "spi_imx_transfer" is protected by a spinlock(see the "spi_async_locked" function in "spi.c")
I also check the source code "clock.c" and fount that the "clk_enable" and "clk_disable" function use mutex to protect.
It means you use mutex in the wrong context.
Now, I have modified the driver to fix the problem and start test again. The board still alive for 16 hours.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Update:
I found that the "clk_enable" and "clk_disable" were just called in the work queue. It seems OK to use the mutex lock inside the work queue.
But I think there still have problems in the SPI bus driver.
I have tried two methods to fix the SPI hang issue:
1. Modify the SPI bus driver. We just enable the clock at probe function and delete the "clk_enable" and "clk_disable" in the "spi_imx_transfer" and "spi_imx_setupxfer".
2. Modify the clk_enable and clk_disable function. Using the spin lock to protect (not use the mutex lock).
Both of them can fix the hang issue on my test.
BR
Brian
消息编辑者为:brian wu
Solved! Go to Solution.
Hi, Alex,
Did you also get hang on the spi_sync() function?
The attached patch file is for BSP 4.0.0.
BR
Brian
Does your test run fine if you use 3.10.17 kernel from FSL or 3.13 from kernel.org?
Regards,
Fabio Estevam
Hi, Fabio
Thanks for your reply.
No, I only test on the 3.0.35 kernel.
In our project, we have used the kernel 3.0.35 for about three months.
If we use other version kernel, we have to re-porting the source code and test all the I/O function again.
Additional information for the SPI hang issue:
1. The issue can be duplicated by using two (or mores) SPI bus simultaneously.
2. Use more threads to access the SPI bus could be more easier to duplicate this issue.
3. Only one SPI bus driver hang and the other one still work.
BR
Brian
Hi brianwu I am noticing a similar hang sometimes when using SPI with BSP 4.1.0 - can you post the patch you used to fix this?
Hi brianwu,
I have a similar issue.
I have a thread which repeatedly call spi_sync for reading from a spi slave device(in a while loop). The size of the data to be received is small in each spi_sync call (maximum of 8 bytes)
My problem is the system freezes after the thread runs for a very short time. But the thread works fine if I give a small sleep (msleep of 1 ms) after each spi_sync. I initially thought it to be a scheduling related issue. But if I comment the spi_sync and msleep calls, the thread loop runs infinitely without any issue. Could this be anyway related to the clock issue you have faced or any other thoughts?
Thanks
Sebi
Brian
Had your issue got resolved? If yes, we are going to close the discussion in 3 days. If you still need help, please feel free to reply with an update to this discussion.
Thanks,
Yixing
Dear Yixing,
I couldn't find the root cause about this problem, but there is no hang on SPI bus after I modify the SPI bus driver to enable the clock all the time.
BR
Brian
Brian
It is good to know that your issue disappeared after you made change. I guess we can close your post now. If anyone knows the root cause, he/she can come to here post the resolution.
Thanks,
Yixing