Mark Butcher

MMDVSQ (Memory-Mapped Divide and Square Root)

Discussion created by Mark Butcher on Dec 4, 2017
Latest reply on Dec 5, 2017 by Mark Butcher

Hi All

 

The KL28 includes a MMDVSQ and today I did a few test of its performance, and looked at how it can possibly be used to generally speed up 'standard' code.

 

First of all, this module is a (small) co-processor dedicated to performing integer square root calculations or integer divide/remainder calculations which NXP is adding to some select Cortex-M0+ based processors that don't have these instructions supported in the Cortex core - in order to give them a bit more calculating performance when used in applications that rely on such calculations.

 

These are some tests of the calculation times measured on a KL28 running at 48MHz (not its top speed) and then compared to the same time taken for the calculation to be performed by the processor when it uses traditional code to do it.

 

MMDVSQ Integer square root sqrt
0          0.77us
1          0.78us
2          0.77us
9          0.78us
100        0.77us
1000       0.92us
10000      0.92us
100000     0.92us
1000000    0.92us
10000000   1.07us
100000000  1.07us
0xffffffff 1.07us

 

These are just to show the slight dependency on the input it needs to calculate on and there is no reference to using a library square root since this will use floating point rather than integer, which is not very interesting for a comparison. There is also a slight overhead due to a subroutine call included in the measured time. The times in comparison to integer divides are however interesting because the integer square root is obviously efficient....

 

 

Next are some values of calculating the quotient of an integer division (that is the rounded-down divide result):

 

MMDVSQ signed divide quotient
1/1 0.52us
0x7fffffff / 3  0.52us
0x7fffffff / 0x7fffffff  0.83us
2536 / 8827634  0.62us
63 / 32  0.64us

 

and in comparison to tradition code doing the same:

 

1/1 1.29us
0x7fffffff / 3  6.45us
0x7fffffff / 0x7fffffff  1.13us
2536 / 8827634  0.52us
63 / 32  1.96us

 

Interestingly, the traditional code is slightly faster in the case where the result is 0 but overall the MMDVSQ is faster, to a few times faster (depending on the numbers involved).

 

 

The calculation of the remainder is next compared, bearing in mind that this is the result of a modulo calculation.

 

MMDVSQ signed divide remainder
1/1 0.64us
0x7fffffff / 3  0.96us
0x7fffffff / 0x7fffffff  0.96us
2536 / 8827634  0.75us
63 / 32  0.64us

 

in comparison to traditional code calculation:

 

1/1 1.77us
0x7fffffff / 3  6.92us
0x7fffffff / 0x7fffffff  1.60us
2536 / 8827634  0.95us
63 / 32  2.44us

 

The MMDVSQ  improves performance in all cases.

 

 

Considering general purpose code, the question was how useful it would be to make use of the MMDVSQ?
The following is an example of something that is often done in embedded code - it is the method used to calculate register and bit locations in the NVIC based on an interrupt ID and similar code is probably found in many locations in an embedded project.

 

ptrIntSet += (iInterruptID / 32);        // move to the interrupt enable register in which this interrupt is controlled
*ptrIntSet = (0x01 << (iInterruptID & 32));  // enable the interrupt

 

 

After adding the functions to make use of the MMDVSQ  (sub-routines or in-lined) this code can now be replaced by

 

ptrIntSet += (fnFastUnsignedIntegerDivide(iInterruptID, 32));        // move to the interrupt enable register in which this interrupt is controlled
*ptrIntSet = (0x01 << (fnFastUnsignedModulo(iInterruptID, 32)));     // enable the interrupt

 

The result is that this particular calculation (the 63 / 32 is a representative reference in the benchmark measurements) no longer takes typically 70ns to execute but instead around 1us, some 14x longer!

 

Therefore the result shows that the use of the MMDVSQ method for many typical embedded code tasks is not of interest since it greatly reduces efficiency.

 

Explanation of limitation:
The reason for this is due to the fact that the compiler will not perform integer divides or remainder calculations when a modulo 2 divisor is used. Instead it can perform the operation using a much more efficient shift. The MMDVSQ will always perform a division and so doesn't profit from this potential.

The only locations where it makes sense to use MMDVSQ routines is when the divisor is a variable or a fixed non-modulo 2 value. In these cases it is mostly more efficient, as shown by the comparisons.

 

Although there are usually such locations in general project code (analog oriented rather than digital) the tend to be rather less dominant than the reference case type.

 

Therefore the MMDVSQ can be used to increase code efficiency if used carefully but is not s a blanket solution to increasing efficiency of all "mod" and "div" usage, where it can instead have a degradation effect!

 

Regards

 

Mark

 

P.S. To be absolutely fair to the MMDVSQ , when the reference case does use a volatile variable with the value 32 instead of a fixed value (forcing the integer divides) the MMDVSQ  does win. The time goes down from typically 1.5us to around 1.0us....

 

 


Kinetis: http://www.utasker.com/kinetis.html
Kinetis KL28: http://www.utasker.com/kinetis/FRDM-KL28Z.html

Outcomes