I've started recoding LPSPI driver as I've got no response from NXP and it looks like NXP is no longer supporting FSL. So far just LPSPI_MasterInit (still to do: rework transfer API to remove unnecessary long delays between transferred 32-bit words, complained about on this forum but never corrected in FSL). The code below performs all delay calculations at compile-time and produces extremely compact code for run-time LPSPI setup.
I'd really appreciate any constructive comments!
Requirements for Efficient Support of LPSPI
- correct timing parameters must be generated (unlike buggy FSL LPSPI_MasterInit)
- all timing calculations must be done at compile time (timing settings are static; for efficient code size and speed, calculations must not be done at run time).
- compile-time calculations must produce complete register values ready to load.
Example of Code Using New Initialization Classes
We need to write simple efficient code. The following results in just a handful of instructions (replacing nested functions with iterations getting wrong results):
// example LPSI setups...
BMP581_times.Set_CCR_and_TCR(LPSPI4, BMP581_TCR_initial.TCR);
// or
ND120_times.Set_CCR_and_TCR(LPSPI4, ND120_TCR_initial.TCR);
The clock calculation coding must be simple, readable, and compile-time only. The following generates in only 3 words in flash for CCR class and 1 word for TCR class (no run-time calculations or calculation code):
// Example constant calculations showing exact timing results of calculated CCR values:
// ND120 has *REALLY SLOW* SPI. Big nuisances:
// - 6uS clock period is 166.666kHz clock (ie 48uSec for 8-bit byte)
// - inter-byte delay of 52uSec is needed for minimum byte cycle time 100uSec
// - 100uSec delay required from CS assertion to first clock
// - 20uSec delay required after last clock til de-asserting CS
/// SPI timing information for ND120 pressure sensor
constexpr static LPSPI_timeCalc_T ND120_times({
.LPSPIrootClockHz=BOARD_BOOTCLOCKRUN_LPSPI_CLK_ROOT,
.maxClockHz=166666U, // 6us per bit
.initialDelayNsec= 100000U,
.delayBetweenTransfersNsec= 52000U, // (100us min byte cycle time - 8*6us) = 52us
.finalDelayNsec= 20000U
});
// ND120 times: CCR = 0x29ce6a0b, prescaleExponent=5, SPI clock signal=158931Hz
// ... scaled clock period=484ns, delays=100188ns,52272ns,20328ns
constexpr static LPSI_TCR_T ND120_TCR_initial({
// Clock SPI mode '01' (CPOL = 0, CPHA = 1)
.CPOL_SCK_Inactive_High = 0,
.CPHA_SCK_Capture_trailing_edge = 1,
.PRESCALE_exponent = ND120_times.prescaleExponent,
.PCS_number = 0,
.CONT_Continuous_Transfer = 1,
.CONTC_Continuing_Command = 0,
.Frame_Size = 8
});
/// SPI timing information for BMP581 pressure sensor
constexpr static LPSPI_timeCalc_T BMP581_times({
.LPSPIrootClockHz=BOARD_BOOTCLOCKRUN_LPSPI_CLK_ROOT,
.maxClockHz=12000000U,
.initialDelayNsec=40U, // Missing from manual bst-bmp581-ds004.pdf figure 7 / table 18
// From Bosch tech support: the minimal T_setup_csb value is 40ns.
.delayBetweenTransfersNsec=0U,
.finalDelayNsec=40U // T_hold_csb = 40ns
});
// BMP581 times: CCR = 0x02020004, prescaleExponent=0, SPI clock signal=11111111Hz
// ... scaled clock period=15ns, delays=45ns,30ns,45ns
constexpr static LPSI_TCR_T BMP581_TCR_initial({
// Clock SPI mode '11' (CPOL = 1, CPHA = 1)
.CPOL_SCK_Inactive_High = 1,
.CPHA_SCK_Capture_trailing_edge = 1,
.PRESCALE_exponent = BMP581_times.prescaleExponent,
.PCS_number = 1, // 1-2-3 for the three BMP581
.CONT_Continuous_Transfer = 0, // BMP581 IO uses a single frame of 8*nBytes length
.CONTC_Continuing_Command = 0,
.Frame_Size = 8 // overriden during IO
});
Implementation of Compile-Time Constant Calculation Classes
/// \file LPSPI_driver.hpp
/// \brief
/// Classes to aid replacement of severely buggy NXP LPSPI driver for
/// iMX.RT processors.
/// - Compute timing values needed for a specific SPI device at compile-time
/// - Create a TCR value more easily at compile-time
/// - inline function to initialize (or re-initialize) timing for a specific SPI device
/// - Implementation below completely replaces severely buggy LPSPI_MasterInit
///
/// ToDo SPI: Provide replacements for FSL LPSPI transfer API, especially
/// eliminate absurd 10usec delay between 4-byte chunks of BMP581, etc.
/// \author Dave Nadler
/// \copyright MIT License
/*
* The MIT License (MIT)
*
* Copyright (c) 2024 Dave Nadler (www.nadler.com)
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
*/
#include <bit> // std::countl_zero
#include <algorithm> // std::max
#include <stdio.h> // printf (diagnostic only)
#include <stdint.h> // uint32_t
#include "MIMXRT1024.h" // CMSIS-style register definitions
#include "MIMXRT1024_features.h" // CPU specific feature definitions
/// Parameter structure supports named arguments to LPSPI_timeCalc_T ctor below...
struct LPSPI_timeCalc_params {
uint32_t LPSPIrootClockHz; ///< root clock for all LPSPI modules as configured in clocks setup
uint32_t maxClockHz; ///< maximum transfer rate for this device
uint32_t initialDelayNsec; ///< delay from CS to first clock
uint32_t delayBetweenTransfersNsec; ///< delay between bytes
uint32_t finalDelayNsec; ///< delay from last clock to de-asserting CS
};
/// LPSPI_timeCalc_T calculates all timing parameters needed for an SPI device
/// at compile-time (CCR clock control register value and required clock prescale).
/// The **ONLY** thing placed in ROM is 3 words of timing information;
/// no timing calculation code is placed in ROM nor executed at runtime.
/// Small executable members include:
/// - Set_CCR_and_TCR Write the timing parameters to an LPSPI module, and
/// - Printf Print diagnostic output for debugging only
class LPSPI_timeCalc_T {
public:
uint32_t CCR; ///< register image ready for loading into CCR, containing delays and clock divisor (but not prescaler value)
uint32_t prescaleExponent; ///< set in TCR, not CCR...
uint32_t LPSI_clockPeriodNsec; ///< saved for diagnostics only
constexpr LPSPI_timeCalc_T(const LPSPI_timeCalc_params &p)
{
uint32_t baseLPSIclockPeriodNsec = 1000000000U/p.LPSPIrootClockHz; // base clock period before any scaling...
// All delays and clock divisor in CCR work from prescaled (divided-down) clock...
// Find maximum required delay in nanoseconds
uint32_t maxDelayNsec = std::max(std::max(p.initialDelayNsec,p.delayBetweenTransfersNsec),p.finalDelayNsec);
// Max delay as a multiple of the period of un-scaled LPSI clock root
uint32_t maxDelayCycles = maxDelayNsec / baseLPSIclockPeriodNsec;
// Clock divisor is also in units of prescaled-clock cycles and required to fit in 8 bits.
uint32_t clockDivisor = p.LPSPIrootClockHz/p.maxClockHz; // assuming prescale is 0
// Find maximum value (in units of prescaled-clock cycles) required to fit in 8 bits
uint32_t maxUnscaledCyclecount = std::max(maxDelayCycles,clockDivisor);
uint8_t leadingZeroBits = std::countl_zero(maxUnscaledCyclecount);
// Leading zero bit count of a uint32 must be >= 24 for a value to fit in 8 bits.
// Otherwise, the number of bits to shift and fit delay in 8 bits is the minimum required exponent.
prescaleExponent = (leadingZeroBits>=24) ? 0 : (24-leadingZeroBits);
LPSI_clockPeriodNsec = 1000000000U/(p.LPSPIrootClockHz >> prescaleExponent);
// precise lambda rounding function ensures minimal delays...
auto CCRdelayValue = [this](uint32_t delayNsec, uint32_t CCR_offset) {
uint32_t delay = (delayNsec+LPSI_clockPeriodNsec-1)/LPSI_clockPeriodNsec;
if(delay>=CCR_offset) delay-=CCR_offset; // Actual delay is 1-2 cycle more than CCR value, but don't go negative
return delay;
};
CCR =
((clockDivisor>>prescaleExponent)-1) << LPSPI_CCR_SCKDIV_SHIFT | // More precise rounding here could yield faster clock...
CCRdelayValue(p.delayBetweenTransfersNsec,2) << LPSPI_CCR_DBT_SHIFT | // delay between continuous frames
CCRdelayValue(p.initialDelayNsec,1) << LPSPI_CCR_PCSSCK_SHIFT | // delay from CS to first clock
CCRdelayValue(p.finalDelayNsec,1) << LPSPI_CCR_SCKPCS_SHIFT ; // delay from last clock to de-asserting CS
}
/// Set up for a specific SPI device: Set CCR and TCR (disables module first, and re-enables after)
/// TCR value should be constructed using prescaleExponent
inline void Set_CCR_and_TCR(LPSPI_Type * pLPSPI, uint32_t TCR_) const {
pLPSPI->CR &= ~LPSPI_CR_MEN_MASK; // CCR write is not permitted with module enabled
pLPSPI->CCR = CCR;
pLPSPI->CR |= LPSPI_CR_MEN_MASK; // re-enable module to put above into effect
pLPSPI->TCR = TCR_; // don't write TCR with module disabled
}
/// Printf member to aid debug (show actual delays and SPI clock values computed)...
void Printf(const char* label) const {
printf("%s times: CCR = 0x%08lx, prescaleExponent=%ld, SPI clock signal=%ldHz\n",
label, CCR, prescaleExponent,
(1000000000U/LPSI_clockPeriodNsec)/(((CCR&LPSPI_CCR_SCKDIV_MASK)>>LPSPI_CCR_SCKDIV_SHIFT)+2) );
printf("... scaled clock period=%ldns, delays=%ldns,%ldns,%ldns \n",
LPSI_clockPeriodNsec,
(((CCR&LPSPI_CCR_PCSSCK_MASK)>>LPSPI_CCR_PCSSCK_SHIFT)+1)*LPSI_clockPeriodNsec,
(((CCR&LPSPI_CCR_DBT_MASK )>>LPSPI_CCR_DBT_SHIFT )+2)*LPSI_clockPeriodNsec,
(((CCR&LPSPI_CCR_SCKPCS_MASK)>>LPSPI_CCR_SCKPCS_SHIFT)+1)*LPSI_clockPeriodNsec
);
}
};
/// LPSI_TCR_params provides a named-parameter argument list to LPSI_TCR_T ctor below.
/// TCR fields we will never ever ever use are not parameterized.
struct LPSI_TCR_params {
/// 31 CPOL Clock Polarity
/// The Clock Polarity field is only updated when PCS negated.
/// See Figure 43-2.
/// - 0b - The inactive state value of SCK is low
/// - 1b - The inactive state value of SCK is high
uint8_t CPOL_SCK_Inactive_High;
/// 30 CPHA Clock Phase
/// The Clock Phase field is only updated when PCS negated.
/// See Figure 43-2.
/// - 0b - Captured. Data is captured on the leading edge of SCK and changed on the following edge of SCK
/// - 1b - Changed. Data is changed on the leading edge of SCK and captured on the following edge of SCK
uint8_t CPHA_SCK_Capture_trailing_edge;
/// 29-27 PRESCALE
/// Prescaler Value (exponent; clock is divided by 2^PRESCALE)
/// For all SPI bus transfers, the Prescaler value applied to the clock configuration register.
/// The Prescaler Value field is only updated when PCS negated.
uint8_t PRESCALE_exponent;
// 26 Reserved
/// 25-24 PCS Peripheral Chip Select
/// Configures the peripheral chip select used for the transfer. The Peripheral Chip Select field is only
/// updated when PCS negated.
/// - 00b - Transfer using PCS[0]
/// - 01b - Transfer using PCS[1]
/// - 10b - Transfer using PCS[2]
/// - 11b - Transfer using PCS[3]
uint8_t PCS_number;
// 23 LSBF LSB First
// - 0b - Data is transferred MSB first
// - 1b - Data is transferred LSB first
// 22 BYSW Byte Swap
// Byte swap swaps the contents of [31:24] with [7:0] and [23:16] with [15:8] for each transmit data word
// read from the FIFO and for each received data word stored to the FIFO (or compared with match
// registers).
// - 0b - Byte swap is disabled
// - 1b - Byte swap is enabled
/// 21 CONT Continuous Transfer
/// - In Master mode, CONT keeps the PCS asserted at the end of the frame size, until a command
/// word is received that starts a new frame.
/// - In Slave mode, when CONT is enabled, LPSPI only transmits the first FRAMESZ bits; after which
/// LPSPI transmits received data (assuming a 32-bit shift register) until the next PCS negation.
/// - 0b - Continuous transfer is disabled
/// - 1b - Continuous transfer is enabled
uint8_t CONT_Continuous_Transfer;
/// 20 CONTC Continuing Command
/// In Master mode, the CONTC bit allows the command word to be changed within a continuous transfer.
/// - The initial command word must enable continuous transfer (CONT = 1),
/// - the continuing command must set this bit (CONTC = 1),
/// - and the continuing command word must be loaded on a frame size boundary.
/// For example, if the continuous transfer has a frame size of 64-bits, then a continuing command word
/// must be loaded on a 64-bit boundary.
/// - 0b - Command word for start of new transfer
/// - 1b - Command word for continuing transfer
uint8_t CONTC_Continuing_Command;
// 19 RXMSK Receive Data Mask
// When set, receive data is masked (receive data is not stored in receive FIFO).
// - 0b - Normal transfer
// - 1b - Receive data is masked
// 18 TXMSK Transmit Data Mask
// When set, transmit data is masked (no data is loaded from transmit FIFO and output pin is tristated).
// In Master mode, the TXMSK bit initiates a new transfer which cannot be aborted by another command word;
// the TXMSK bit is cleared by hardware at the end of the transfer.
// - 0b - Normal transfer
// - 1b - Mask transmit data
// 17-16 WIDTH Transfer Width
// Configures between serial (1-bit) or parallel transfers. For half-duplex parallel transfers, either Receive
// Data Mask (RXMSK) or Transmit Data Mask (TXMSK) must be set.
// - 00b - 1 bit transfer
// - 01b - 2 bit transfer
// - 10b - 4 bit transfer
// - 11b - Reserved
// 15-12 Reserved
/// 11-0 FRAMESZ Frame Size
/// Configures the frame size in number of bits equal to (FRAMESZ + 1).
/// - The minimum frame size is 8 bits
/// - The minimum word size is 2 bits; a frame size of 33 bits is not supported.
/// - If the frame size is larger than 32 bits, then the frame is divided into multiple words of 32-bits;
/// each word is loaded from the transmit FIFO and stored in the receive FIFO separately.
/// - If the size of the frame is not divisible by 32, then the last load of the transmit FIFO and store of the
/// receive FIFO contains the remainder bits. For example, a 72-bit transfer consists of 3 words: the
/// 1st and 2nd words are 32 bits, and the 3rd word is 8 bits.
uint8_t Frame_Size;
};
/// Construct a constant TCR from a named parameter list
class LPSI_TCR_T {
public:
uint32_t TCR;
constexpr LPSI_TCR_T(const struct LPSI_TCR_params &p) {
TCR =
LPSPI_TCR_CPOL(p.CPOL_SCK_Inactive_High) |
LPSPI_TCR_CPHA(p.CPHA_SCK_Capture_trailing_edge)|
LPSPI_TCR_PRESCALE(p.PRESCALE_exponent) |
LPSPI_TCR_PCS(p.PCS_number) |
LPSPI_TCR_CONT(p.CONT_Continuous_Transfer) |
LPSPI_TCR_CONTC(p.CONTC_Continuing_Command) |
LPSPI_TCR_FRAMESZ(p.Frame_Size-1) ;
}
};
Hope this helps folks!
Best Regards, Dave