Hello!
We encountered a bottleneck in SPI transmission performance in our project based on the S32K3 series MCU, and urgently need your company's assistance for analysis. Our application scenario has extremely high requirements for real-time performance. However, during the testing, we found that the time consumption of the DMA asynchronous mode was actually higher than that of the CPU synchronous mode, which was inconsistent with our expectations. Here are the detailed technical background and test data:
1. Application scenario background
MCU model: NXP S32K3xx
Business scenario: SPI is used as the host for high-frequency data exchange, and a complete SPI transmission and reception process needs to be completed in an external interrupt.
Interrupt frequency: 11,200 Hz (period approximately 89.2 µs).
2. Problem Description and Test Data
In order to find the fastest transmission solution, we compared the two main modes in the LPSPI driver library. The test results are as follows:
Scheme A: Synchronous blocking mode (Lpspi_Ip_SyncTransmit)
Configuration: No use of DMA, CPU polling of registers.
Measured execution time: Approximately 34 µs.
Evaluation: The speed does not meet the requirements and it occupies CPU resources. It is not the best practice.
Scheme B: Asynchronous DMA mode (Lpspi_Ip_AsyncTransmit + DMA)
Configuration: Enable TX/RX DMA channels, call the asynchronous interface, handle subsequent logic in the DMA completion interrupt.
Measured execution time: Approximately 50+ µs (even higher).
Exception point: Theoretically, DMA should release CPU and improve efficiency, but the measured execution time within the main interrupt (or overall response delay) is actually slower by ~16 µs compared to the synchronous mode.
3. Our Questions
Given the abnormal data above, we need to confirm the following issues:
Is the API selection correct: In an interrupt scenario with an extremely high frequency of 11.2 kHz, should Lpspi_Ip_AsyncTransmit be used? Are there any lower-level APIs (such as macros for directly operating registers or specific Fast-Path functions) that are more suitable for high-frequency and low-latency scenarios?
Performance bottleneck analysis: Why does the execution time increase after enabling DMA?
Is it because Lpspi_Ip_AsyncTransmit contains too many configuration checks, linked list initialization, or interrupt masking operations internally?
Is our DMA configuration (such as TCD settings, link mode) not optimized, resulting in excessive startup overhead?
Best practice recommendation: What is the SPI driver architecture recommended by NXP for a hard real-time constraint of < 90 µs?
Is it recommended to bypass the HAL/Ip layer and directly use register operations?
Is there an example code for "zero-copy" or "pre-configured DMA loop" for the S32K3 LPSPI module?
4. Requirements
We need a solution with the fastest execution speed. If the standard IP driver cannot meet the time requirement of < 40 µs, please provide:
Guidelines for optimizing DMA configuration (how to reduce startup overhead).
Or recommended alternative functions/low-level operation methods.
Looking forward to your professional guidance!