LPSPI bugs around TCR: More FSL library bugs plus an LPSPI hardware problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our application uses an LPSPI module to read from different sensors in quick repetition. These sensors require different SPI timing parameters. One sensor uses continuous transfer, and the others non-continuous transfers and large frame sizes. Operation was unreliable with frequent faults (returning bad sensor data). This has been traced to further bugs in the FSL library plus an apparent hardware issue. This post discusses the latest problems found and work-arounds.
Prior Problems
Previously reported problems (which NXP has not fixed) and work-arounds:
https://community.nxp.com/t5/i-MX-RT/LPSPI-driver-bug-Bytes-sent-IN-WRONG-ORDER/m-p/1644776
https://community.nxp.com/t5/i-MX-RT/Further-serious-bugs-in-LPSPI-driver/td-p/1833857
https://community.nxp.com/t5/i-MX-RT/LPSPI-bugs-LPSPI-MasterInit-replacement/m-p/1837991/highlight/f...
TCR Issues
There are several serious problems with the TCR register. As TCR is written into FIFO then processed by LPSPI module sometime later, there can be a significant delay between writing the TCR and the effective value showing up readable (especially with larger LPSI master clock divisors in the clock tree). The reference manual i.MX RT1024 Processor Reference Manual, Rev. 1, 02/2021 documents erroneous values may be returned and specifies repeating reads until successive reads match in section 43.5.1.15.2 TCR Function as follows:
Avoid register reading problems: Reading the Transmit Command Register returns the current state of the command register. Reading the Transmit Command Register at the same time that the Transmit Command Register is loaded from the transmit FIFO, can return an incorrect Transmit Command Register value. It is recommended:
• Read the Transmit Command Register when the transmit FIFO is empty,
• Read the Transmit Command Register more than once and then compare the returned values.
Errata ERR050606 LPSPI: TCR value does not get resampled when polling the register explains another hardware problem, where successive reads do not actually resample the register value. The work-around is to read a different register in the LPSPI module prior another read of TCR.
FSL Bugs
FSL_LPSPI.c routines read TCR without implementing the checks suggested in the reference manual and without taking into account Errata ERR050606. Because of the unfortunate API design, the routines in this module do not have on hand the desired TCR settings but rely on reading back the TCR value. Also, routines such as LPSPI_MasterTransferBlocking repeatedly re-read the TCR unnecessarily.
Work-Around #1
I implemented LPSPI_Read_TCR_Safely implementing the above required checks and replaced all relevant direct TCR reads in fsl_lpsi.c with reads using this routine.
extern "C" uint32_t LPSPI_Read_TCR_Safely(LPSPI_Type *pLPSPI);
extern int maxLPSPIwaitTries; // diagnostic test only
/// Wait until an LPSPI module is not busy (it can stay busy for a while after changing CS settings,
/// or just sending out data from FIFO).
inline void WaitForLPSPInotBusy(LPSPI_Type *pLPSPI) {
volatile uint32_t *pSR = &pLPSPI->SR; // Status Register
volatile uint32_t *pFSR = &pLPSPI->FSR; // FIFO Status Register (counts of data in FIFOs)
const int maxTries = 1000000;
int tries = 0; // Paranoia, prevent an infinite loop
// In Master mode, Master Busy Flag MBF asserts when there is data to transmit and LPSPI is able to transmit.
// It negates after the PCS negates and the LPSPI master has waited half the DBT time with no new data to transmit.
// Wait until no data in TX FIFO and MBF is clear (or until time-out).
while( (tries++ < maxTries) &&
( ((*pFSR) & LPSPI_FSR_TXCOUNT_MASK) || ((*pSR) & LPSPI_SR_MBF_MASK) ) ) {}; // Module Busy flag
assert(tries<maxTries);
if(tries>maxLPSPIwaitTries) maxLPSPIwaitTries=tries;
}
int maxDifferentTCRvaluesRead; // Diagnostics only: track maximum different TCR values read
/// Read TCR to avoid race conditions as recommended in RM and errata ERR050606
uint32_t LPSPI_Read_TCR_Safely(LPSPI_Type *pLPSPI) {
/*
=== From i.MX RT1024 Processor Reference Manual, Rev. 1, 02/2021 section 43.5.1.15.2 TCR Function ===
Avoid register reading problems: Reading the Transmit Command Register returns the
current state of the command register. Reading the Transmit Command Register at the
same time that the Transmit Command Register is loaded from the transmit FIFO, can
return an incorrect Transmit Command Register value. It is recommended:
- Read the Transmit Command Register when the transmit FIFO is empty,
- Read the Transmit Command Register more than once and then compare the returned values.
*/
WaitForLPSPInotBusy(pLPSPI); // includes wait for TX FIFO empty
uint32_t TCRvalue = pLPSPI->TCR; // first read...
const int SuccessiveReadsMustMatch = 5-1;
int readMatches=0;
int tries = 0;
while(readMatches<SuccessiveReadsMustMatch) {
// per errata ERR050606 LPSPI: TCR value does not get re-sampled when polling the register,
// its necessary to read a different register between TCR polls
(void)pLPSPI->SR;
uint32_t TCRnewVal = pLPSPI->TCR;
if(TCRnewVal==TCRvalue) {
readMatches++;
} else {
TCRvalue = TCRnewVal;
readMatches = 0;
tries++;
}
}
if(tries>maxDifferentTCRvaluesRead) maxDifferentTCRvaluesRead=tries; // diagnostics only
return TCRvalue;
}
Another Hardware Problem, and Work-Around #2
In addition to the above mess, there appears to be another hardware bug: Other LPSPI register accesses may get garbage or improper results before the new TCR is stable, particularly when clearing the CONTC bit. To avoid this bug (which shows up after a continuous transfer), where fsl_lpsi.c clears CONT and CONTC to de-assert CS, after setting the CONTC=0 TCR, wait until read-back shows the new TCR with CONTC cleared (using LPSPI_Read_TCR_Safely).
What Next?
It appears NXP no longer supports FSL. @EdwinHz took one month to reply to my last bug report (a nice thank you but nothing resolved), and replies on GitHub from developers indicate they are unlikely to fix FSL problems. Needless to say, this affects whether we will in future use NXP products - It is extremely costly to chase and fix problems like these. If anyone is interested in the patched code let me know...
Thanks,
Best Regards, Dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again @davenadler!
Firstly, I would like to thank you for your efforts on this workaround as well as the rest previously reported ones. These types of posts are very useful for us and the rest of the NXP community as a whole.
I would also like to address the perception of NXP's efforts on fixing these issues. I understand why you arrive to the conclusion that we no longer support FSL, and I'm with you as I consider this support is very slow and should definitely be more efficient for the wellbeing of our clients. Currently it is quite difficult for our SDK engineers to chase every single bug on every single board SDK that we have. But the efforts that the NXP community makes, and especially when recompiling a whole set or issues, and even actively working on workarounds is greatly appreciated and will definitely help on making the bug-fixing process from our side much more efficient.
We will definitely take in your feedback, not only about the reported bug here, but also about the other bugs reported previously. The second hardware problem you report here, I believe has already been reported before here: Solved: RT1050 LPSPI last bit not completing in continuous mode - NXP Community. I'll take this opportunity to reemphasize the previously reported issue with the CONTC bit once again.
Thank you once again for your efforts!
BR,
Edwin.