<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>Kinetis Microcontrollers中的主题 MMDVSQ (Memory-Mapped Divide and Square Root)</title>
    <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/MMDVSQ-Memory-Mapped-Divide-and-Square-Root/m-p/761577#M46406</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi All&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The KL28 includes a MMDVSQ and today I did a few test of its performance, and looked at how it can possibly be used to generally speed up 'standard' code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First of all, this module is a (small) co-processor dedicated to performing &lt;STRONG&gt;integer square root&lt;/STRONG&gt; calculations or &lt;STRONG&gt;integer divide/remainder&lt;/STRONG&gt; calculations which NXP is adding to some select Cortex-M0+ based processors that don't have these instructions supported in the Cortex core - in order to give them a bit more calculating performance when used in applications that rely on such calculations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;These are some tests of the calculation times measured on a KL28 running at 48MHz (not its top speed) and then compared to the same time taken for the calculation to be performed by the processor when it uses traditional code to do it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;MMDVSQ Integer square root sqrt&lt;/SPAN&gt;&lt;BR /&gt;0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.77us&lt;BR /&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.78us&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.77us&lt;BR /&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.78us&lt;BR /&gt;100&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.77us&lt;BR /&gt;1000&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.92us&lt;BR /&gt;10000&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.92us&lt;BR /&gt;100000&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.92us&lt;BR /&gt;1000000&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.92us&lt;BR /&gt;10000000&amp;nbsp;&amp;nbsp; 1.07us&lt;BR /&gt;100000000&amp;nbsp; 1.07us&lt;BR /&gt;0xffffffff 1.07us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;These are just to show the slight dependency on the input it needs to calculate on and there is no reference to using a library square root since this will use floating point rather than integer, which is not very interesting for a comparison. There is also a slight overhead due to a subroutine call included in the measured time. The times in comparison to integer divides are however interesting because the integer square root is obviously efficient....&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Next are some values of calculating the quotient of an integer division (that is the rounded-down divide result):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;MMDVSQ signed divide quotient&lt;/SPAN&gt;&lt;BR /&gt;1/1 0.52us&lt;BR /&gt;0x7fffffff / 3&amp;nbsp; 0.52us&lt;BR /&gt;0x7fffffff / 0x7fffffff&amp;nbsp; 0.83us&lt;BR /&gt;2536 / 8827634&amp;nbsp; 0.62us&lt;BR /&gt;63 / 32&amp;nbsp; 0.64us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and in comparison to &lt;SPAN style="text-decoration: underline;"&gt;tradition code&lt;/SPAN&gt; doing the same:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1/1 1.29us&lt;BR /&gt;0x7fffffff / 3&amp;nbsp; 6.45us&lt;BR /&gt;0x7fffffff / 0x7fffffff&amp;nbsp; 1.13us&lt;BR /&gt;2536 / 8827634&amp;nbsp; 0.52us&lt;BR /&gt;63 / 32&amp;nbsp; 1.96us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Interestingly, the traditional code is slightly faster in the case where the result is 0 but overall the MMDVSQ is faster, to a few times faster (depending on the numbers involved).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The calculation of the remainder is next compared, bearing in mind that this is the result of a modulo calculation.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;MMDVSQ signed divide remainder&lt;/SPAN&gt;&lt;BR /&gt;1/1 0.64us&lt;BR /&gt;0x7fffffff / 3&amp;nbsp; 0.96us&lt;BR /&gt;0x7fffffff / 0x7fffffff&amp;nbsp; 0.96us&lt;BR /&gt;2536 / 8827634&amp;nbsp; 0.75us&lt;BR /&gt;63 / 32&amp;nbsp; 0.64us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;in comparison to &lt;SPAN style="text-decoration: underline;"&gt;traditional code&lt;/SPAN&gt; calculation:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1/1 1.77us&lt;BR /&gt;0x7fffffff / 3&amp;nbsp; 6.92us&lt;BR /&gt;0x7fffffff / 0x7fffffff&amp;nbsp; 1.60us&lt;BR /&gt;2536 / 8827634&amp;nbsp; 0.95us&lt;BR /&gt;63 / 32&amp;nbsp; 2.44us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The MMDVSQ&amp;nbsp; improves performance in all cases.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Considering general purpose code, the question was how useful it would be to make use of the MMDVSQ?&lt;BR /&gt;The following is an example of something that is often done in embedded code - it is the method used to calculate register and bit locations in the NVIC based on an interrupt ID and similar code is probably found in many locations in an embedded project.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ptrIntSet += (iInterruptID / 32);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // move to the interrupt enable register in which this interrupt is controlled&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;*ptrIntSet = (0x01 &amp;lt;&amp;lt; (iInterruptID &amp;amp; 32));&amp;nbsp; // enable the interrupt&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After adding the functions to make use of the MMDVSQ&amp;nbsp; (sub-routines or in-lined) this code can now be replaced by&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ptrIntSet += (fnFastUnsignedIntegerDivide(iInterruptID, 32));&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // move to the interrupt enable register in which this interrupt is controlled&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;*ptrIntSet = (0x01 &amp;lt;&amp;lt; (fnFastUnsignedModulo(iInterruptID, 32)));&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // enable the interrupt&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The result is that this particular calculation (the 63 / 32 is a representative reference in the benchmark measurements) no longer takes typically &lt;STRONG&gt;70ns&lt;/STRONG&gt; to execute but instead around &lt;STRONG&gt;1us&lt;/STRONG&gt;, some 14&lt;SPAN style="text-decoration: underline;"&gt;x longer!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Therefore the result shows that the use of the MMDVSQ method for many typical embedded code tasks is not of interest since it greatly reduces efficiency.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;Explanation of limitation:&lt;/SPAN&gt;&lt;BR /&gt;The reason for this is due to the fact that the compiler will not perform integer divides or remainder calculations when a modulo 2 divisor is used. Instead it can perform the operation using a much more efficient shift. The MMDVSQ will &lt;SPAN style="text-decoration: underline;"&gt;always&lt;/SPAN&gt; perform a division and so doesn't profit from this potential.&lt;/P&gt;&lt;P&gt;The only locations where it makes sense to use MMDVSQ routines is when the divisor is a variable or a fixed non-modulo 2 value. In these cases it is mostly more efficient, as shown by the comparisons.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Although there are usually such locations in general project code (analog oriented rather than digital) the tend to be rather less dominant than the reference case type.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Therefore the MMDVSQ can be used to increase code efficiency &lt;EM&gt;if used carefully&lt;/EM&gt; but is not s a blanket solution to increasing efficiency of all "mod" and "div" usage, where it can instead have a degradation effect!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mark&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;P.S. To be absolutely fair to the MMDVSQ , when the reference case does use a volatile variable with the value 32 instead of a fixed value (forcing the integer divides) the MMDVSQ&amp;nbsp; does win. The time goes down from typically 1.5us to around 1.0us....&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN&gt;Kinetis: &lt;/SPAN&gt;&lt;A class="jive-link-external-small" href="https://community.nxp.com/external-link.jspa?url=http%3A%2F%2Fwww.utasker.com%2Fkinetis.html" rel="nofollow" target="_blank"&gt;http://www.utasker.com/kinetis.html&lt;/A&gt;&lt;BR /&gt;&lt;SPAN&gt;Kinetis KL28: &lt;/SPAN&gt;&lt;A class="jive-link-external-small" href="https://community.nxp.com/external-link.jspa?url=http%3A%2F%2Fwww.utasker.com%2Fkinetis%2FFRDM-KL28Z.html" rel="nofollow" target="_blank"&gt;http://www.utasker.com/kinetis/FRDM-KL28Z.html&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 04 Dec 2017 18:36:05 GMT</pubDate>
    <dc:creator>mjbcswitzerland</dc:creator>
    <dc:date>2017-12-04T18:36:05Z</dc:date>
    <item>
      <title>MMDVSQ (Memory-Mapped Divide and Square Root)</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/MMDVSQ-Memory-Mapped-Divide-and-Square-Root/m-p/761577#M46406</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi All&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The KL28 includes a MMDVSQ and today I did a few test of its performance, and looked at how it can possibly be used to generally speed up 'standard' code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First of all, this module is a (small) co-processor dedicated to performing &lt;STRONG&gt;integer square root&lt;/STRONG&gt; calculations or &lt;STRONG&gt;integer divide/remainder&lt;/STRONG&gt; calculations which NXP is adding to some select Cortex-M0+ based processors that don't have these instructions supported in the Cortex core - in order to give them a bit more calculating performance when used in applications that rely on such calculations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;These are some tests of the calculation times measured on a KL28 running at 48MHz (not its top speed) and then compared to the same time taken for the calculation to be performed by the processor when it uses traditional code to do it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;MMDVSQ Integer square root sqrt&lt;/SPAN&gt;&lt;BR /&gt;0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.77us&lt;BR /&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.78us&lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.77us&lt;BR /&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.78us&lt;BR /&gt;100&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.77us&lt;BR /&gt;1000&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.92us&lt;BR /&gt;10000&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.92us&lt;BR /&gt;100000&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.92us&lt;BR /&gt;1000000&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.92us&lt;BR /&gt;10000000&amp;nbsp;&amp;nbsp; 1.07us&lt;BR /&gt;100000000&amp;nbsp; 1.07us&lt;BR /&gt;0xffffffff 1.07us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;These are just to show the slight dependency on the input it needs to calculate on and there is no reference to using a library square root since this will use floating point rather than integer, which is not very interesting for a comparison. There is also a slight overhead due to a subroutine call included in the measured time. The times in comparison to integer divides are however interesting because the integer square root is obviously efficient....&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Next are some values of calculating the quotient of an integer division (that is the rounded-down divide result):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;MMDVSQ signed divide quotient&lt;/SPAN&gt;&lt;BR /&gt;1/1 0.52us&lt;BR /&gt;0x7fffffff / 3&amp;nbsp; 0.52us&lt;BR /&gt;0x7fffffff / 0x7fffffff&amp;nbsp; 0.83us&lt;BR /&gt;2536 / 8827634&amp;nbsp; 0.62us&lt;BR /&gt;63 / 32&amp;nbsp; 0.64us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and in comparison to &lt;SPAN style="text-decoration: underline;"&gt;tradition code&lt;/SPAN&gt; doing the same:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1/1 1.29us&lt;BR /&gt;0x7fffffff / 3&amp;nbsp; 6.45us&lt;BR /&gt;0x7fffffff / 0x7fffffff&amp;nbsp; 1.13us&lt;BR /&gt;2536 / 8827634&amp;nbsp; 0.52us&lt;BR /&gt;63 / 32&amp;nbsp; 1.96us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Interestingly, the traditional code is slightly faster in the case where the result is 0 but overall the MMDVSQ is faster, to a few times faster (depending on the numbers involved).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The calculation of the remainder is next compared, bearing in mind that this is the result of a modulo calculation.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;MMDVSQ signed divide remainder&lt;/SPAN&gt;&lt;BR /&gt;1/1 0.64us&lt;BR /&gt;0x7fffffff / 3&amp;nbsp; 0.96us&lt;BR /&gt;0x7fffffff / 0x7fffffff&amp;nbsp; 0.96us&lt;BR /&gt;2536 / 8827634&amp;nbsp; 0.75us&lt;BR /&gt;63 / 32&amp;nbsp; 0.64us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;in comparison to &lt;SPAN style="text-decoration: underline;"&gt;traditional code&lt;/SPAN&gt; calculation:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1/1 1.77us&lt;BR /&gt;0x7fffffff / 3&amp;nbsp; 6.92us&lt;BR /&gt;0x7fffffff / 0x7fffffff&amp;nbsp; 1.60us&lt;BR /&gt;2536 / 8827634&amp;nbsp; 0.95us&lt;BR /&gt;63 / 32&amp;nbsp; 2.44us&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The MMDVSQ&amp;nbsp; improves performance in all cases.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Considering general purpose code, the question was how useful it would be to make use of the MMDVSQ?&lt;BR /&gt;The following is an example of something that is often done in embedded code - it is the method used to calculate register and bit locations in the NVIC based on an interrupt ID and similar code is probably found in many locations in an embedded project.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ptrIntSet += (iInterruptID / 32);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // move to the interrupt enable register in which this interrupt is controlled&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;*ptrIntSet = (0x01 &amp;lt;&amp;lt; (iInterruptID &amp;amp; 32));&amp;nbsp; // enable the interrupt&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After adding the functions to make use of the MMDVSQ&amp;nbsp; (sub-routines or in-lined) this code can now be replaced by&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ptrIntSet += (fnFastUnsignedIntegerDivide(iInterruptID, 32));&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // move to the interrupt enable register in which this interrupt is controlled&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;*ptrIntSet = (0x01 &amp;lt;&amp;lt; (fnFastUnsignedModulo(iInterruptID, 32)));&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // enable the interrupt&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The result is that this particular calculation (the 63 / 32 is a representative reference in the benchmark measurements) no longer takes typically &lt;STRONG&gt;70ns&lt;/STRONG&gt; to execute but instead around &lt;STRONG&gt;1us&lt;/STRONG&gt;, some 14&lt;SPAN style="text-decoration: underline;"&gt;x longer!&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Therefore the result shows that the use of the MMDVSQ method for many typical embedded code tasks is not of interest since it greatly reduces efficiency.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;Explanation of limitation:&lt;/SPAN&gt;&lt;BR /&gt;The reason for this is due to the fact that the compiler will not perform integer divides or remainder calculations when a modulo 2 divisor is used. Instead it can perform the operation using a much more efficient shift. The MMDVSQ will &lt;SPAN style="text-decoration: underline;"&gt;always&lt;/SPAN&gt; perform a division and so doesn't profit from this potential.&lt;/P&gt;&lt;P&gt;The only locations where it makes sense to use MMDVSQ routines is when the divisor is a variable or a fixed non-modulo 2 value. In these cases it is mostly more efficient, as shown by the comparisons.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Although there are usually such locations in general project code (analog oriented rather than digital) the tend to be rather less dominant than the reference case type.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Therefore the MMDVSQ can be used to increase code efficiency &lt;EM&gt;if used carefully&lt;/EM&gt; but is not s a blanket solution to increasing efficiency of all "mod" and "div" usage, where it can instead have a degradation effect!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mark&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;P.S. To be absolutely fair to the MMDVSQ , when the reference case does use a volatile variable with the value 32 instead of a fixed value (forcing the integer divides) the MMDVSQ&amp;nbsp; does win. The time goes down from typically 1.5us to around 1.0us....&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN&gt;Kinetis: &lt;/SPAN&gt;&lt;A class="jive-link-external-small" href="https://community.nxp.com/external-link.jspa?url=http%3A%2F%2Fwww.utasker.com%2Fkinetis.html" rel="nofollow" target="_blank"&gt;http://www.utasker.com/kinetis.html&lt;/A&gt;&lt;BR /&gt;&lt;SPAN&gt;Kinetis KL28: &lt;/SPAN&gt;&lt;A class="jive-link-external-small" href="https://community.nxp.com/external-link.jspa?url=http%3A%2F%2Fwww.utasker.com%2Fkinetis%2FFRDM-KL28Z.html" rel="nofollow" target="_blank"&gt;http://www.utasker.com/kinetis/FRDM-KL28Z.html&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 04 Dec 2017 18:36:05 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/MMDVSQ-Memory-Mapped-Divide-and-Square-Root/m-p/761577#M46406</guid>
      <dc:creator>mjbcswitzerland</dc:creator>
      <dc:date>2017-12-04T18:36:05Z</dc:date>
    </item>
    <item>
      <title>Re: MMDVSQ (Memory-Mapped Divide and Square Root)</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/MMDVSQ-Memory-Mapped-Divide-and-Square-Root/m-p/761578#M46407</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;In Security systems a constant execution time is actually more important than the fastest execution time.&lt;/P&gt;&lt;P&gt;Variability can lead to side channel timing attacks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Would the Square Root be beneficial to doing the square root of the sum of the squares?&amp;nbsp; Comes up often in Accelerometer projects.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Dec 2017 13:41:59 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/MMDVSQ-Memory-Mapped-Divide-and-Square-Root/m-p/761578#M46407</guid>
      <dc:creator>bobpaddock</dc:creator>
      <dc:date>2017-12-05T13:41:59Z</dc:date>
    </item>
    <item>
      <title>Re: MMDVSQ (Memory-Mapped Divide and Square Root)</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/MMDVSQ-Memory-Mapped-Divide-and-Square-Root/m-p/761579#M46408</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Bob&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also SW based square root calculations will have an execution time dependent on the input value due to the fact that the number of program iterations to approximate the result is not always the same.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is also seen in the SW &lt;EM&gt;integer divide&lt;/EM&gt; result, which varies by a factor of 10 in time depending on the input values. With the MMDVSQ the 'jitter' is less ad so would be safer/less predictable.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To achieve constant timing a HW timer can be used to start the calculation and to return the result after a fixed time (longer than the worst case calculation duration). The KL28 also has TSTM (Time Stamp Timer Module) which can be used to synchronise such operations to a us resolution.&lt;/P&gt;&lt;P&gt;Eg.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;disable_int();&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;x = TSTMR0_L;&lt;BR /&gt;(void)TSTMR0_H;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;y = (x + 3);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;while (TSTMR0_L == x) {&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; // wait for next us boundary&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (void)TSTMR0_H;&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(void)TSTMR0_H;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;result = fnIntegerSQRT(input);&amp;nbsp; // takes between approx. 0.7us and 1.1us&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;while (TSTMR0_L != y) {&lt;/STRONG&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // wait for us match boundary&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; (void)TSTMR0_H;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;enable_int();&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;return result;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will give 1us jitter due to the synchronisation of the clock to the instruction but the result will always take 2u to be returned, irrespective of its value.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If your sum of squares are integers and you need a RMS in integer the MMDVSQ will do it. The maximum summed square input is limited to 0xffffffff and the maximum RMS result is 0xffff.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;For floating point RMS I use the CMSIS &lt;STRONG&gt;arm_sqrt_f32()&lt;/STRONG&gt;.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;For accelerometer to velocity to displacement measurements I use CMSIS &lt;STRONG&gt;arm_cfft_f32()&lt;/STRONG&gt; to perform the integration in the frequency domain (and remove DC offsets).&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mark&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Dec 2017 16:41:56 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/MMDVSQ-Memory-Mapped-Divide-and-Square-Root/m-p/761579#M46408</guid>
      <dc:creator>mjbcswitzerland</dc:creator>
      <dc:date>2017-12-05T16:41:56Z</dc:date>
    </item>
  </channel>
</rss>

