Is anyone really using PE Generated SPI driver with MQXLite? How?

VictorLorenzo · ‎05-20-2013

Hi,

Sorry for such a post title, but that's simply how it came to my mind after being more than three days struggling with this and debugging it with one DSO.

We made a custom board based on the MK20DX256VLK100 kinetis microcontroller. Now I'm trying to use the SPI0 port to control some other device, something so simple that MQXLite functionalities are enough for my application. End of the easy and good part of the story.

The device I need to control requires CS to be kept asserted (LOW) for the entire transaction time (as usually happeds) so I tried the following solutions:

1) Configuring the chipselect list with one entry and properly indicating the CS output pin.

It seems to work if this conditions {1.a and {1.b-1 or 1.b-2} } apply:

1.a) Transactions in ONLY ONE DIRECTION (only read or only write)

1.b-1) Transactions with ONLY ONE CHAR, or

1.b-2) Transactions with VERY LONG DELAY BETWEEN CHARS.

As one interrupt is required for sending EACH and EVERY char to Tx FIFO, under some circumtances the delay introduced between one char and the next is greater that the Delay Between Chars PE Parameter so CS is automatically deasserted (Driven HIGH) by the silicon IP. And that is totally wrong!

When the transaction involved TX {n} chars and RX {m} chars... it was impossible to keep CS asserted, always was driven HIGH before being able to even start reading.

Here is a screen capture from the DSO: CS is Ch1, CK is Ch4 and MOSI is Ch3. This corresponds to one TX transaction, three bytes long.

2) Configuring the component without any chipselect output selected and driving it manually.

It seemed to work, but! When trusting in PE function ***_SPI_GetBlockSentStatus() it was also dissastrous. This funcion does not take into account the real Silicon IP state, but the driver's internal state. It reports that transmission was completed before data in TX FIFO was really streamed out.

Furthermore, things got worst when I tried to optimise performance and configured PE for taking advantage of TX/RX FIFO capacities. No need to say why.

My Questions:

What should do?

Should I just forget about that so famous Processor Expert and code a real driver myself?

Are there any other solutions out there?

How are you managing to use the SPI port?

Thanks a lot for your comments, Victor

bowerymarc · ‎01-07-2014

Thanks for this thread... I just ran into this problem (NOT using MQX, just the PE component) as my reads weren't working. Nowhere in the driver doc or comments is this explained....

IMHO the driver ReadBlock function should do what it's named - read a block. If a write has to be triggered, that should be dealt with and hidden within the driver.

Then, add a ReadWriteBlock to do a simultaneous xfer.

Then, either add the flag that folks have hacked in above, or add some useful functions for doing what is very common, needing to write a 'register address' to a peripheral before reading or writing it. Called, maybe ReadRegisterBlock, WriteRegisterBlock, with an added buffer with size & register (could probably be a 'long' for the buffer, since I have never seen a register address more than 4 bytes...)

Just my $0.02.

lfschrickte · ‎05-23-2013

Hi,

Got the same issue here, with different setup.

I am not using MQXLite, only PE routines, and the KL14 MKL14Z64VFM4 device. Depending on the SPI frequency I select the problem shows up. For example, using a 20MHz Bus clock frequency and 1MHz SPI bus frequency the problem happens. If I use 833kHz SPI bus frequency, however, everything works fine. It clearly depends on the three conditions Victor stated above.

I've disabled the chip toggling also, as Kan Li suggested, without success. I am monitoring the signals with a scope as Victor did.

I've noticed also that raising the Bus Clock I can use faster SPI bus frequencies without problems, but at some threshold the problem appears again. This is so disappointing! The problem happens with or without interrupt services enabled.

Hope there is some solution!

Thanks!

Kan_Li · ‎05-23-2013

Hi , I am sorry but I failed to reproduce the issue, I did the test based on TWR-K40X256, and the following timing setting is for my project,

and I got the expected result as below:

To simulate a higher priority interrupt, I used PIT0 with priority 0. and add a for loop in it to extend the duration.

I did observe there is delay between some charactors in one transaction, but CS signal still kept low in that case.

I also tried removing the PIT0, but added a for loop in the SPI ISR right before or after it updates the SPIx_PUSHR, and found there were constant delays between each charactor but CS still kept low.

so I am suspecting if there is difference between our configurations, and I attached my project as well, Please kindly refer to it for details.

BTW, I disabled the auto generated code for debug purpose, please generate the code manually when you modify any configuration in the module.

Please kindly let me know if the issue is still there.

Have a nice weekend:smileyhappy:

VictorLorenzo · ‎05-28-2013

Hi Kan,

Finally I have to tune things since SPI communication is introducing a totally unacceptable (due to app requirements) latency in system response.

I followed what you've proposed regarding parameters configuration:

The result was 'as expected', chip select gets deasserted 2.56us after last CLK pulse of the first character, clock rate is OK (240ns) as well as CS_to_CLK (240ns).

The really annoying thing here was the fact that I was executing two consecutive _Write() operations (one byte each) and the second byte was at first sight 'missing in action'. I found it, but, just as expected from configuration parameters, 1.15ms after the first character.

After modifying PE parameters for these ones (bellow:

the final result was:

which needs no further explanation on the fact that both writes are being considered as two independent transactions.

I think one important PE driver method is missing here, the one we could use for sending data to Tx FIFO without saying 'this is the end of frame'.

Finally, I think I'll need to write my own SPI driver.

Kan_Li · ‎05-28-2013

Hi VictorLorenzo,

Yes, your understanding is right, the ???_SPI_SendBlock is used to send one SPI frame/transaction, and the frame length could be 1 byte or more, so if it is called twice consecutively, two SPI frame would be sent out and two chip select cycles would be observed. so we recommend the user application prepares the whole SPI frame in the buffer before calling this API. so that the driver can work without any modification. If you concerns the external slave is a slow device and some delay needed between characters while CS is required asserted during that operation, you may just add some delay in the interrupt handling function, ???_SPI_Interrupt(), just like what I did in my previous test, if you need a precise timing, you can use a timer instead of using the for loop statement to generate the delay.

Hope that helps,

Kan

VictorLorenzo · ‎05-28-2013

Hi Kan, thanks for your comments.

No, it's not a matter of being writing to slow devices (or at least not 'that' slow).

I'm working with one device that requires CS to be low during the whole transaction. The transaction is composed of one or two parts, these are examples:

Single byte command;
Register/Registers write, which comprises one addess byte followed by one or more data bytes;
Register/Registers read, in this case the first byte is one address byte and the device ouputs one or more bytes according to pulses sent in through CLK.

In all cases involving write+read (cases 2&3) CS must be kept asserted for the whole transaction, otherwise the state machine in the chip just resets the communication engine.

From the auto-generated code and some samples I've seen, it seems that the silicon IP does not automatically generate CLK pulses for reading so it's needed to call the _SendBlock() function after the call to _ReceiveBlock().

Am I right on that? Is there any other method for making write+read transactions without calling _SendBlock() twice?

Just one side comment, slow interrupt handling code (because of complexity or delays) will almost certainly degrade overall system performance. Adding delays (or even accessing slow devices) in interrupt handling functions is usually considered a bad programming/design practice.

Kan_Li · ‎05-29-2013

Thanks for your information!! Now I completely understand your concerns and the required timing as well. You are right for the silicon IP SCK generation, the API _ReceiveBlock() just setup the RX buffer , to receive the data, _SendBlock() function is needed to set the clock avaiable with dummy write operation, so for your case, there are two methods to implement the timing without any change.

For example, first byte sent as command, second sent as address, and the third data to be read.

Method 1: prepare two buffers with same length, here it is 3 bytes, and use the following code snippet:

TX_BUFF[0] = command;

TX_BUFF[1] = address;

TX_BUFF[2] = 0;//for dummy write

SM1_ReceiveBlock(SPI_DEV,RX_BUFF,3);

SM1_SendBlock(SPI_DEV,TX_BUFF,3);

the application would have the read content in RX_BUFF[2]

Method 2: CS is configured as GPIO output, user application polls SPIx_SR [EOQF] to check if the end of transfer is reached. Please refer to the following for details.

//CS is low;

TX_BUFF[0] = command;

SM1_SendBlock(SPI_DEV,TX_BUFF,1);

TX_BUFF[0] = address;

SM1_SendBlock(SPI_DEV,TX_BUFF,1);

TX_BUFF[0] = 0|SPI_PUSHR_EOQ_MASK;//for dummy write, enable EOQ as it is the last data of this transaction

SM1_ReceiveBlock(SPI_DEV,RX_BUFF,1);

SM1_SendBlock(SPI_DEV,TX_BUFF,1);

//polling SPIx_SR [EOQF] with while statement

//CS set high

//have data in RX_BUFF[0]

Hope that helps,

Kan

VictorLorenzo · ‎05-29-2013

Hi Kan,

Finally I've got the time and modified the driver code, I've reduced the time between end of write and start of read from 12us to about 1.5us. I think it will get even better after applying code optimisations at compile time. Here is a DSO capture:

I've attached the code to this post, the new function is named ModifiedSPI_Tranceive(). This function assumes you will write then read, both operations, it can be improved so you can make only writes and only reads. It can be done with easy.

I've to say that this code should be taken only for reference, it has not been fully tested and possible (most surely) contains errors. And of course there could be even better approaches than the one I used. I've changed the component name root from the one in my project to ModifiedSPI, it compiles but that could probably have introduced some error.

Thanks for your help.

Victor

VictorLorenzo · ‎06-01-2013

Hi,

Just found one bug in the code: CS gets deasserted before transaction end. To reproduce it start one transaction with 1_Write+2_Reads, the second read is treated as an independent transaction.

To correct it find the following code (in the interrupt handling routine) and insert the condition test marked in red.

else if (DeviceDataPrv->OutTotalWrites > 0) {

DeviceDataPrv->OutTotalWrites--;

TxCommand = DeviceDataPrv->TxCommand | DeviceDataPrv->DummyWriteValue;

if ((DeviceDataPrv->OutTotalWrites == 0) && (0 != (DeviceDataPrv->XferMode & XFER_MODE_FLAG_ENDOFTRANSACT))) {

TxCommand &= 0x7FFFFFFFU;

}

SPI_PDD_WriteMasterPushTxFIFOReg(SPI0_BASE_PTR, TxCommand);

}

Kan_Li · ‎05-29-2013

Hi Victor,

Thanks for the sharing!! Your solution gives us a different way to refine the PE drivers , and I think it would be great helpful for those who have the same or similar problem. and I also agree your comments on the Method 1, this implementation might be simple, but less readability, but have you tried the method 2? The following is what I implemented for it, and the time between each chars has been reduced to 1.39us, and can be even less. This solution needn't any modification in the PE drivers, so I also attached it here for reference. Please kindly refer to it for details.

GPIO1_ClearFieldBits(GPIO_DEV,SPI0_CS0,1);// CS low

TX_BUFF[0] = 0x55;//send command

SM1_SendBlock(SPI_DEV,TX_BUFF,1);

TX_BUFF[0] = 0xAA;//send address

SM1_SendBlock(SPI_DEV,TX_BUFF,1);

*(uint32_t*)SPI_DEV |= SPI_PUSHR_EOQ_MASK;// modify TxCommand to act as the last data of this transfer

TX_BUFF[0] = 0xFF;//send dummy

SM1_ReceiveBlock(SPI_DEV,RX_BUFF,1);

SM1_SendBlock(SPI_DEV,TX_BUFF,1); //read data

while (!(SPI_PDD_GetInterruptFlags(SPI0_BASE_PTR) & SPI_SR_EOQF_MASK));//wait for the end of transfer

SPI_PDD_ClearInterruptFlags(SPI0_BASE_PTR,SPI_SR_EOQF_MASK);//clear flag

GPIO1_SetFieldBits(GPIO_DEV,SPI0_CS0,1);// CS high

*(uint32_t*)SPI_DEV &= ~SPI_PUSHR_EOQ_MASK;

Hope it helps,

Thanks and Best Regards,

Kan

VictorLorenzo · ‎06-01-2013

Thanks Kan,

Using the SPI_PUSHR_EOQ_MASK method could also do part of the job. The PE driver could be improved simplifying last byte stream out completion detection (and other tasks) by taking advantage of this (and others) silicon IP functionalities.

VictorLorenzo · ‎05-29-2013

Hi Kan,

Once again, thanks for taking your time to look at it.

I've noticed the first method as one possible solution, I mentioned it in a previous post. Major inconvinients with this method are code portability and readability. The middleware code I'm working on has a strict requirement on portability and efficient memory usage and it must also be ported to other controllers. Low level communication drivers are also constrained in terms of latencies and memory operations overhead.

I've already tested the solution I proposed based on modifying the driver and it works fine, but it still needs more fine tuning. The 12.9us (see capture bellow) is the time elapsed from first write end (Address ->MOSI) to second write start (trigger for reading data at MISO). I used the function ??_SPI_GetBlockSentStatus() for knowing that all data was pushed to TX FIFO before continuing with the second 'write'. Using the function ???_SPI_GetSentDataNum() I reduced this time to 9.1us.

I'll make deeper changes in the driver for reducing this time to the minimum I can. When ready, if you believe it could be usefull I can post it here.

VictorLorenzo · ‎05-28-2013

Hi,

Here is one solution proposal (I'll test it first thing tomorrow morning), just in case some one could be interested on this topic.

Take a look at the generated driver source code (under the generated_code folder), this is how the device data looks like in my case, it could vary depending on PE parameters:

typedef struct {

uint32_t TxCommand; /* Current Tx command */

LDD_SPIMASTER_TError ErrFlag; /* Error flags */

uint16_t InpRecvDataNum; /* The counter of received characters */

uint8_t *InpDataPtr; /* The buffer pointer for received characters */

uint16_t InpDataNumReq; /* The counter of characters to receive by ReceiveBlock() */

uint16_t OutSentDataNum; /* The counter of sent characters */

uint8_t *OutDataPtr; /* The buffer pointer for data to be transmitted */

uint16_t OutDataNumReq; /* The counter of characters to be send by SendBlock() */

uint8_t SerFlag; /* Flags for serial communication */

LDD_RTOS_TISRVectorSettings SavedISRSettings_Interrupt; /* {MQXLite RTOS Adapter} Saved settings of allocated interrupt vector */

LDD_TUserData *UserData; /* User device data structure */

} ???_SPI_TDeviceData; /* Device data structure type */

The first field, TxCommand, holds the initial value for the command which the driver uses for pushing data into the TX FIFO. Most significant bit for this command (Continuous Peripheral Chip Select Enable) controls whether CS line is held asserted or not after last byte written to FIFO is streamed out. This bit is masked out (reset) when writing the last byte from the buffer.

Modifications are simple and take less than two minutes (literally):

Add one boolean field in the device data: bool EndOfTransaction;
Add one boolean parameter to funtion ???_SPI_SendBlock( ...etc..., bool EndOfTransaction );
Add this line of code to function ???_SPI_SendBlock():
((?????_SPI_TDeviceDataPtr)DeviceDataPtr)->EndOfTransaction = EndOfTransaction;
---Note that it should be added before the call to function :SPI_PDD_EnableDmasInterrupts()---
In the interrupt handling function, ???_SPI_Interrupt(), locate the following lines of codes:
if (DeviceDataPrv->OutSentDataNum == DeviceDataPrv->OutDataNumReq) {
TxCommand &= 0x7FFFFFFFU;
}

... and add one additional test condition for the EndOfTransaction, like this:

if (DeviceDataPrv->EndOfTransaction) {
TxCommand &= 0x7FFFFFFFU;
}
Not required at all, but, ...cross your fingers :smileywink:, I have not tested the code yet, it's just an idea.

When calling the function, writes chaining may be achieved by passing FALSE in parameter EndOfTransaction for all writes except for the last one.

One important thing, depending on CW IDE configuration PE is automatically called before every build so any change made on auto generated code will be lost. Kan gives all needed details for modifying this project option on his previous post.

It compiles in my project and seems to be correct with respect to the datasheet but that doesn't guarantee it works, It's still to be test.

If someone has tried this approach before, or has another working solution, please let us know.

Kan_Li · ‎05-21-2013

There is a control bit to keep CS asserted between transfers, the details can be referred from below:

you may configure it in PE, please refer to the following for details.

Hope that helps,

VictorLorenzo · ‎05-23-2013

Hi Kan,

I think I made one mistake when writing use case 1.b-2), it should say "1.b-2) Transactions with VERY LONG SHORT DELAY BETWEEN CHARS." (lowest interrupt latency as possible)

I was aware of the "Continuous Peripheral Chipselect Enable" configuration parameter. The DSO capture in fact corresponds to one test run with this parameter enabled. The thing to be noticed in the screen capture is timing, in PE I configured the parameter "CLK to CS" to be 2.56us (+-5%), and CS just goes high (deasserted) after this time has elapsed from the last CLK edge of previous char.

The DSO capture was for depicting the situation in a deterministic way, real problem was hard to debug as it was related to interrupt latency, one higher priority interrupt was taking too long (as long as 4us can be too long, of course) to return and that caused the issue.

At the end, and just for not being wasting too much of my time on that at this moment, I simply manage CS manually and make one short delay (num_bits*bit_duration) after detecting that the last TX char has been sent to TX FIFO.

Current driver implementation doesnot properly support (from my modest knowledge and point of view) the TX and RX FIFO's.

There is one workaround on this for write+read transactions, provided that you can affoard to have 'long' enough delays between transactions, and is to reserve one buffer of (Tx count + Rx count) bytes with the first Tx_count bytes corresponding to data to emit and leaving the rest for reception. After calling **_SPI_ReceiveBlock( ?,Rx_count ) and **_SPI_SendBlock( ?, Tx_count ) it's just a matter of adjunting (moving) the buffer contents. Not a good enough solution (bad design/practice in fact) but works.

Thanks a lot for your time and your comments in your post.