Errata ERR050181: Implementation and Alternate Workaround

parth_rastogi10 · ‎07-11-2022

Hi NXP,

We are using the MKE18F512VLH16 controller in applications. As we are close to the end of our development activity, concerning the errata "ERR050181: LPIT CVAL cannot be read correctly during timer running" we have analyzed and implemented the Triple Vote method to handle the issue mentioned in the errata as per the suggestion in another thread.

In our Application, Timer is configured to read as a 64-bit value using 2-channels in chain mode. And with the implemented Triple Vote Method, the timer is read three times and returns the best value of the timer by comparing three values. We also added some logic to analyze if the errata issue occurred and if the triple vote mechanism would fix that. But after running the system for two consecutive days, we didn’t see the issue occurring in errata and always timer was reading the best value in the first read-only.

With Implementation and the observations, we have the below Queries:

Errata mentions that the CVAL value may read incorrectly during the run time but didn’t clearly describe in what cases the issue could occur. And if the issue is rare and the reason is unknown, how much could the difference of CVAL can be read, and how much does it impact the time value in terms of delay.
Sometimes we observed that when we read the timer CVAL values consecutively and when the timer value overflows, the higher 32-Bit word is not getting updated in the second read. For Example, we are using 2 channels (Channel 0 and Channel 1) in a chain mode to make it a 64-Bit value. When we read the timer value thrice i.e., Timer1: 0x1B9FFFFFFF7, Timer2: 0x1B900000001, Timer3: 0x1BA0000000B, it is observed that when the Timer2 value is read, the higher 32-Bit timer value is not updated and when the timer value is read for the third time, the higher 32-Bit timer value is updated with the actual rollover count value.
- Is the Errata ERR050181 mentioned in the above-mentioned observation?
- Or the above observation is expected as the timer values are read consecutively?
As we are using the MKE18F512VLH16 controller, does the errata ERR050181 applies to this revision of the controller?

Expecting your quick response as this implementation is impacting the System behavior and further delaying our major Milestones.

Thanks & Regards,

Parth Rastogi

bobpaddock · ‎07-19-2022

"For Example, we are using 2 channels (Channel 0 and Channel 1) in a chain mode to make it a 64-Bit value. When we read the timer value thrice i.e., Timer1: 0x1B9FFFFFFF7, Timer2: 0x1B900000001, Timer3: 0x1BA0000000B, it is observed that when the Timer2 value is read, the higher 32-Bit timer value is not updated and when the timer value is read for the third time, the higher 32-Bit timer value is updated with the actual rollover count value."

A properly designed Micro will have facilities to latch the multi-byte values, so that they are stable during reads. Channing things together there is no such latch across the whole span. This is actually expected behavior that few expect. It depends on the sampling rate how often the 'bad reads' will happen. Sampling rate depends on system clock speeds, counter frequencies, how often the software reads, where interrupts fall etc. which makes it look 'random' when it isn't. As well as being rare. A two day sample run is not long enough. It could take months or years for things to fall 'just right'. It comes down to Murphy's Law. It will happen at the worst possible time.

This is the style of code I use for such situations:

/*
* Return system_tick count while accounting for possible timer
* interrupt happening while reading the counter:
*/
uint32_t system_ticks_get( void )
{
uint32_t t1_u32 = system_ticks_u32; /* Read twice to make sure that */
uint32_t t2_u32 = system_ticks_u32; /* an interrupt did not occur during the read */

while( t1_u32 != t2_u32 ) /* If the readings do not match, */
{
t1_u32 = system_ticks_u32; /* try again */
t2_u32 = system_ticks_u32;
}

return( t1_u32 ); /* Return the stable time */
}

I originally had code in there to count that there was a split read.

There never was, so I removed it to speed up the read function.
Even if I've never seen the issue, I know that it can happen, so I protect against it.

More details here:

http://www.ganssle.com/articles/asynchf.htm

parth_rastogi10 · ‎07-18-2022

Hi NXP,

Request to please respond to the above query that is raised regarding the LPIT Errata ERR050181.

We have to finalize this implementation as this is impacting the System behavior and further delaying our major Milestones for release.

@lukaszadrapa

Thanks & Regards,

Parth Rastogi

PabloAvalos · ‎07-21-2022

Hi @parth_rastogi10

Thanks a lot for reaching out technical support. I really appreciate your patience.

I think that bobpaddock answered your concern, but of course if you need more help or an answer from my own, it would be my pleasure to assist you, do not hesitate to reply to this post to continue assisting you.

Please let me know if you have more questions.

Best Regards.
Pablo Avalos.

parth_rastogi10 · ‎07-27-2022

Hi @PabloAvalos,

A Gentle Reminder!

Request to please provide some details of the occurrence of the Errata issue.

Thanks & Regards,

Parth Rastogi

PabloAvalos · ‎07-27-2022

Hi @parth_rastogi10

Thanks a lot for your reply, and please accept my apologies for the delay.

I am still working on a precise answer to your concern, so please give me a little more time to continue on this and I will make sure to reach you back as soon as possible.

Thanks in advance!
Sincerely,
Pablo Avalos.

parth_rastogi10 · ‎07-29-2022

Hi @PabloAvalos,

I got your reply through the mail and it seems like we are facing some issues.

We have implemented the alternate workaround for this Errata "ERR050181" suggested by NXP only (Triple Vote Method) and just need to know like in what scenario this Errata (unreal read of CVAL) may occur.

Also, as answered by @bobpaddock it mentions that it could take months or years to fall things "just right". So, can we get an explanation of the scenario that is "just right" and is it like it should run continuously for months or years, or any power cycle also can be done in between??

Please refer to the below NXP Ticket link for the alternate workaround suggested and also attached the mail response:

https://community.nxp.com/t5/S32K/Tolerance-of-LPIT-CVAL-Current-timer-value/td-p/1273291

Also, request you to please post the replies/queries in this ticket only as it is tracked at multiple levels.

Thanks & Regards,

Parth Rastogi

parth_rastogi10 · ‎08-01-2022

Hi @PabloAvalos

Can you please reply to the above query on what scenario the Errtata issue may occur?

Request you to please respond as we are waiting for the scenario as it is delaying our testing and build release to the customer.

Thanks & Regards,

Parth Rastogi

PabloAvalos · ‎08-02-2022

Hi @parth_rastogi10

Thanks a lot for your replies.

After checking internally and talking with my colleagues, due to be an ERRATA we do not have the certainty when this issue may occur, we just know that this can happen when a power on reset, or after months or years of running as you mentioned, but a specific scenario does not exist, it might happen anytime, but we cannot force something in the mcu for this issue happens when we want to see it.

Hope you find well, please let me know if you have more questions or any comment,

Thank you in advance.
Sincerely,
Pablo Avalos.

bobpaddock · ‎07-29-2022

With extreme analytics this kind of problem timing could be calculated with a super computer as when it will happen.

It is best to treat it as a random event that can happen anytime, or never happen at all, and move on with life.

It can happen moments after a power cycle or years after a power cycle.
It comes down to sampling times across multiple clock domains, where interrupts fall, the phase of the Moon etc.

At some point we accept we have done the best we can possible do, with triple read vote, or the version of code I posted, and ship the product. Everything in this industry is a tradeoff.

In an extreme safety application we'd be designing our own chips with 64-bit Grey Encoded synchronous counters and proper 64-bit read latches. Someday the Chip Industry might get us there in commodity chips. Soon I hope.

bobpaddock · ‎07-29-2022

The S32K issue is one of crossing multiple clock domains, it is different than chaining two 32-bit counters together to make a unified 64-bit counter. Alas same end results, corrupted counter reads.

The S32K bit issue would be solved if the hardware used Grey Encoding rather than a ripple-carry-counter. Alas that is not how the chip is designed.

parth_rastogi10 · ‎07-21-2022

Hi @PabloAvalos

Thanks for your reply.

I just need to know like in what scenarios this Errata issue may occur.

Can you please explain more on this as we didn't see the issue occurring when ran it for a continuous couple of days?

Thanks & Regards,

Parth Rastogi

Errata ERR050181: Implementation and Alternate Workaround

Errata ERR050181: Implementation and Alternate Workaround

Kinetis M Series MCUs