Hi All

The KL28 includes a MMDVSQ and today I did a few test of its performance, and looked at how it can possibly be used to generally speed up 'standard' code.

First of all, this module is a (small) co-processor dedicated to performing **integer square root** calculations or **integer divide/remainder** calculations which NXP is adding to some select Cortex-M0+ based processors that don't have these instructions supported in the Cortex core - in order to give them a bit more calculating performance when used in applications that rely on such calculations.

These are some tests of the calculation times measured on a KL28 running at 48MHz (not its top speed) and then compared to the same time taken for the calculation to be performed by the processor when it uses traditional code to do it.

MMDVSQ Integer square root sqrt

0 0.77us

1 0.78us

2 0.77us

9 0.78us

100 0.77us

1000 0.92us

10000 0.92us

100000 0.92us

1000000 0.92us

10000000 1.07us

100000000 1.07us

0xffffffff 1.07us

These are just to show the slight dependency on the input it needs to calculate on and there is no reference to using a library square root since this will use floating point rather than integer, which is not very interesting for a comparison. There is also a slight overhead due to a subroutine call included in the measured time. The times in comparison to integer divides are however interesting because the integer square root is obviously efficient....

Next are some values of calculating the quotient of an integer division (that is the rounded-down divide result):

MMDVSQ signed divide quotient

1/1 0.52us

0x7fffffff / 3 0.52us

0x7fffffff / 0x7fffffff 0.83us

2536 / 8827634 0.62us

63 / 32 0.64us

and in comparison to tradition code doing the same:

1/1 1.29us

0x7fffffff / 3 6.45us

0x7fffffff / 0x7fffffff 1.13us

2536 / 8827634 0.52us

63 / 32 1.96us

Interestingly, the traditional code is slightly faster in the case where the result is 0 but overall the MMDVSQ is faster, to a few times faster (depending on the numbers involved).

The calculation of the remainder is next compared, bearing in mind that this is the result of a modulo calculation.

MMDVSQ signed divide remainder

1/1 0.64us

0x7fffffff / 3 0.96us

0x7fffffff / 0x7fffffff 0.96us

2536 / 8827634 0.75us

63 / 32 0.64us

in comparison to traditional code calculation:

1/1 1.77us

0x7fffffff / 3 6.92us

0x7fffffff / 0x7fffffff 1.60us

2536 / 8827634 0.95us

63 / 32 2.44us

The MMDVSQ improves performance in all cases.

Considering general purpose code, the question was how useful it would be to make use of the MMDVSQ?

The following is an example of something that is often done in embedded code - it is the method used to calculate register and bit locations in the NVIC based on an interrupt ID and similar code is probably found in many locations in an embedded project.

**ptrIntSet += (iInterruptID / 32); // move to the interrupt enable register in which this interrupt is controlled*****ptrIntSet = (0x01 << (iInterruptID & 32)); // enable the interrupt**

After adding the functions to make use of the MMDVSQ (sub-routines or in-lined) this code can now be replaced by

**ptrIntSet += (fnFastUnsignedIntegerDivide(iInterruptID, 32)); // move to the interrupt enable register in which this interrupt is controlled*****ptrIntSet = (0x01 << (fnFastUnsignedModulo(iInterruptID, 32))); // enable the interrupt**

The result is that this particular calculation (the 63 / 32 is a representative reference in the benchmark measurements) no longer takes typically **70ns** to execute but instead around **1us**, some 14x longer!

*Therefore the result shows that the use of the MMDVSQ method for many typical embedded code tasks is not of interest since it greatly reduces efficiency.*

Explanation of limitation:

The reason for this is due to the fact that the compiler will not perform integer divides or remainder calculations when a modulo 2 divisor is used. Instead it can perform the operation using a much more efficient shift. The MMDVSQ will always perform a division and so doesn't profit from this potential.

The only locations where it makes sense to use MMDVSQ routines is when the divisor is a variable or a fixed non-modulo 2 value. In these cases it is mostly more efficient, as shown by the comparisons.

Although there are usually such locations in general project code (analog oriented rather than digital) the tend to be rather less dominant than the reference case type.

Therefore the MMDVSQ can be used to increase code efficiency *if used carefully* but is not s a blanket solution to increasing efficiency of all "mod" and "div" usage, where it can instead have a degradation effect!

Regards

Mark

*P.S. To be absolutely fair to the MMDVSQ , when the reference case does use a volatile variable with the value 32 instead of a fixed value (forcing the integer divides) the MMDVSQ does win. The time goes down from typically 1.5us to around 1.0us....*

Kinetis: http://www.utasker.com/kinetis.html

Kinetis KL28: http://www.utasker.com/kinetis/FRDM-KL28Z.html

In Security systems a constant execution time is actually more important than the fastest execution time.

Variability can lead to side channel timing attacks.

Would the Square Root be beneficial to doing the square root of the sum of the squares? Comes up often in Accelerometer projects.