<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Inline asm GCC vs. MW (e500v2) in P-Series</title>
    <link>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443779#M2571</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Because the first instruction is rlwinm, whose output is a pure output rather than rlwimi's read-modify-write, you don't need the + when you combine them all into one asm statement.&amp;nbsp; Eliminating the + gets rid of the li instruction.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 15 Sep 2015 15:17:43 GMT</pubDate>
    <dc:creator>scottwood</dc:creator>
    <dc:date>2015-09-15T15:17:43Z</dc:date>
    <item>
      <title>Inline asm GCC vs. MW (e500v2)</title>
      <link>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443774#M2566</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;There is following function&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;static uint32 u32SwapEndianness(register uint32 val)&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; register uint32 ret = 0;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; asm volatile("rlwinm %0, %1, 24, 0,&amp;nbsp; 31" : "=r"(ret) : "r"(val));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; asm volatile("rlwimi %0, %1, 8,&amp;nbsp; 8,&amp;nbsp; 15" : "=r"(ret) : "r"(val));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; asm volatile("rlwimi %0, %1, 8,&amp;nbsp; 24, 31" : "=r"(ret) : "r"(val));&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return ret;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This generates different code with MW compiler and GCC compiler.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Below are instructions what MW emits. r31=ret, r03=val&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;li r31,0&lt;/P&gt;&lt;P&gt;rotrwi r31,r3,24&lt;/P&gt;&lt;P&gt;rlwimi r31,r3,8,8,15&lt;/P&gt;&lt;P&gt;inslwi r31,r3,8,24&lt;/P&gt;&lt;P&gt;mr r3,r31&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So this very clean; how I excepted output to be.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now then, below are instructions what GCC emits. r9=ret, r10=val&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;mr r9,r3&lt;/P&gt;&lt;P&gt;li r30,0&lt;/P&gt;&lt;P&gt;rotrwi r10,r9,24&lt;/P&gt;&lt;P&gt;mr r30,r10&lt;/P&gt;&lt;P&gt;rlwimi r10,r9,8,8,15&lt;/P&gt;&lt;P&gt;mr r30,r10&lt;/P&gt;&lt;P&gt;inslwi r9,r9,8,24&lt;/P&gt;&lt;P&gt;mr r30,r9&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now here &lt;STRONG&gt;first I don't understand why GCC uses weird swap/alias between r10/r30 and r9/r3&lt;/STRONG&gt;. Not necessary, but not wrong.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Then the last instruction inslwi operands are wrong&lt;/STRONG&gt;. Should be r10,r9...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;GCC is 4.8.2 (gcc-4.8.2-Ee500v2-eabispe) and MW compiler is 4.3 build 278. Both supplied with CW 10.4. Gcc parameter '-Wa,-mregnames' used.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 11 Sep 2015 12:02:57 GMT</pubDate>
      <guid>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443774#M2566</guid>
      <dc:creator>juhalaukkanen</dc:creator>
      <dc:date>2015-09-11T12:02:57Z</dc:date>
    </item>
    <item>
      <title>Re: Inline asm GCC vs. MW (e500v2)</title>
      <link>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443775#M2567</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;What optimization parameter did you pass to GCC?&amp;nbsp; If you didn't enable any optimizations, it's not surprising that the code&amp;nbsp; generated is far from optimal.&amp;nbsp; I'm able to reproduce this with -O0 but not -O1 or -O2.&amp;nbsp; BTW, with optimizations on, you shouldn't need the register keyword.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Why is using r9 wrong in inslwi?&amp;nbsp; You don't show the code that uses the result.&amp;nbsp; As long as that code uses r9 or r30, it's correct.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 11 Sep 2015 17:55:06 GMT</pubDate>
      <guid>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443775#M2567</guid>
      <dc:creator>scottwood</dc:creator>
      <dc:date>2015-09-11T17:55:06Z</dc:date>
    </item>
    <item>
      <title>Re: Inline asm GCC vs. MW (e500v2)</title>
      <link>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443776#M2568</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;OK optimize levels indeed remove those unnecessary swaps.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However I still don't understand how GCC generated version can be right? Here's my sample with values.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;// MW&lt;/P&gt;&lt;P&gt;r3=0x49fb6, r31=n/a&lt;/P&gt;&lt;P&gt;rotrwi r31,r3,24 //r31=0xb600049f, r3=0x49fb6&lt;/P&gt;&lt;P&gt;rlwimi r31,r3,8,8,15 //r31=0xb69f049f, r3=0x49fb6&lt;/P&gt;&lt;P&gt;inslwi r31,r3,8,24 //r31=0xb69f0400, r3=0x49fb6&lt;/P&gt;&lt;P&gt;mr r3,r31 //r3=ret&lt;/P&gt;&lt;P&gt;//ret -&amp;gt; 0xb69f0400 -&amp;gt; OK!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;// GCC&lt;/P&gt;&lt;P&gt;r9=0x49fb6, r10=n/a&lt;/P&gt;&lt;P&gt;rotrwi r10,r9,24 // r10 = 0xb600049f, r9=0x49fb6&lt;/P&gt;&lt;P&gt;rlwimi r10,r9,8,8,15 // r10 = 0xb69f049f, r9=0x49fb6&lt;/P&gt;&lt;P&gt;inslwi r9,r9,8,24 // r9=0x49f00, r10 = 0xb69f049f&lt;/P&gt;&lt;P&gt;mr r3,r9 //r3=ret&lt;/P&gt;&lt;P&gt;//ret -&amp;gt; 0x49f00 -&amp;gt; WRONG!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also there's single instruction which does the same. Emitted by GCC's __builtin_bswap32()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;r9=0x49fb6, r0,r7 memory offset/location where store wanted (&amp;amp;ret).&lt;/P&gt;&lt;P&gt;stwbrx r9,r0,r7&lt;/P&gt;&lt;P&gt;//*ret -&amp;gt; 0xb69f0400 -&amp;gt; OK!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 15 Sep 2015 06:58:59 GMT</pubDate>
      <guid>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443776#M2568</guid>
      <dc:creator>juhalaukkanen</dc:creator>
      <dc:date>2015-09-15T06:58:59Z</dc:date>
    </item>
    <item>
      <title>Re: Inline asm GCC vs. MW (e500v2)</title>
      <link>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443777#M2569</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Juha,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also get 0x49f00 with the code generated by gcc. But in my opinion, the compiler does what you asked. The instructions you wrote are right but there is a problem with the use of inline asm (that is error-prone) and how parameters are declared.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;At each line, you ask to apply an operation on "ret" with "val" as input but ... nothing says that "ret" is the previously modified "ret" to be reused!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;By chance with MW, it does not use an additional register, so luckily reuses the same destination register on the third instruction.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For me, that works with the following code, that says "ret" is modified (what prevents it to be trashed):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;static unsigned int u32SwapEndianness(register unsigned int val)&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; register unsigned int ret = 0;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; asm volatile("rlwinm %0, %1, 24, 0,&amp;nbsp; 31" : "=r"(ret) : "r"(val));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; asm volatile("rlwimi %0, %1, 8,&amp;nbsp; 8,&amp;nbsp; 15" : "+r"(ret) : "r"(val));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; asm volatile("rlwimi %0, %1, 8,&amp;nbsp; 24, 31" : "+r"(ret) : "r"(val));&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return ret;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But the generated code is not very good and that is not good either to write independent asm lines like that. So to be cleaner (even if, again, it is difficult with inline asm code), the code should be written in a single block:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;static unsigned int u32SwapEndianness(register unsigned int val)&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; register unsigned int ret = 0;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; asm volatile("rlwinm %0, %1, 24, 0,&amp;nbsp; 31 \n\t"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "rlwimi %0, %1, 8,&amp;nbsp; 8,&amp;nbsp; 15 \n\t"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "rlwimi %0, %1, 8,&amp;nbsp; 24, 31 \n\t" : "+r"(ret) : "r"(val));&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return ret;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;That gives something like that:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; mr r10,r3&lt;/P&gt;&lt;P&gt;&amp;nbsp; li r31,0&lt;/P&gt;&lt;P&gt;&amp;nbsp; mr r9,r31&lt;/P&gt;&lt;P&gt;&amp;nbsp; rotrwi r9,r10,24&lt;/P&gt;&lt;P&gt;&amp;nbsp; rlwimi r9,r10,8,8,15&lt;/P&gt;&lt;P&gt;&amp;nbsp; inslwi r9,r10,8,24&lt;/P&gt;&lt;P&gt;&amp;nbsp; mr r31,r9&lt;/P&gt;&lt;P&gt;&amp;nbsp; mr r9,r31&lt;/P&gt;&lt;P&gt;&amp;nbsp; mr r3,r9&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And these days (and with this CPU architecture), I think using the keyword is quite useless. And here, it is even worst because it forces to use registers and adds an overhead. Removing these keywords, the code looks like:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; li r9,0&lt;/P&gt;&lt;P&gt;&amp;nbsp; rotrwi r9,r10,24&lt;/P&gt;&lt;P&gt;&amp;nbsp; rlwimi r9,r10,8,8,15&lt;/P&gt;&lt;P&gt;&amp;nbsp; inslwi r9,r10,8,24&lt;/P&gt;&lt;P&gt;&amp;nbsp; mr r3,r9&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mathias&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 15 Sep 2015 08:54:55 GMT</pubDate>
      <guid>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443777#M2569</guid>
      <dc:creator>mathiasparnaude</dc:creator>
      <dc:date>2015-09-15T08:54:55Z</dc:date>
    </item>
    <item>
      <title>Re: Inline asm GCC vs. MW (e500v2)</title>
      <link>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443778#M2570</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you. Indeed the pieces were loose and so then malformed sequence.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;ps. and yes indeed this pasted block of asm is kinda useless, but it's obfuscated just as a simpler reproducible example from similar concept where using inline asm was applicable. :smileyhappy:&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 15 Sep 2015 13:23:32 GMT</pubDate>
      <guid>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443778#M2570</guid>
      <dc:creator>juhalaukkanen</dc:creator>
      <dc:date>2015-09-15T13:23:32Z</dc:date>
    </item>
    <item>
      <title>Re: Inline asm GCC vs. MW (e500v2)</title>
      <link>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443779#M2571</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Because the first instruction is rlwinm, whose output is a pure output rather than rlwimi's read-modify-write, you don't need the + when you combine them all into one asm statement.&amp;nbsp; Eliminating the + gets rid of the li instruction.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 15 Sep 2015 15:17:43 GMT</pubDate>
      <guid>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443779#M2571</guid>
      <dc:creator>scottwood</dc:creator>
      <dc:date>2015-09-15T15:17:43Z</dc:date>
    </item>
    <item>
      <title>Re: Inline asm GCC vs. MW (e500v2)</title>
      <link>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443780#M2572</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Scott,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;That looked right to me but I tried and with the single asm statement, if I use "=r" instead of "+r", the result is wrong, because r9 is used as source and destination for each instruction.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The other solution would be to use the early clobber modifier like this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;static unsigned int u32SwapEndianness(unsigned int val)&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; unsigned int ret;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; asm volatile("rlwinm %0, %1, 24, 0,&amp;nbsp; 31 \n\t"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "rlwimi %0, %1, 8,&amp;nbsp; 8,&amp;nbsp; 15 \n\t"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "rlwimi %0, %1, 8,&amp;nbsp; 24, 31 \n\t" : "=&amp;amp;r"(ret) : "r"(val));&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return ret;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the same time, I removed the useless initialization of ret.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 16 Sep 2015 12:40:45 GMT</pubDate>
      <guid>https://community.nxp.com/t5/P-Series/Inline-asm-GCC-vs-MW-e500v2/m-p/443780#M2572</guid>
      <dc:creator>mathiasparnaude</dc:creator>
      <dc:date>2015-09-16T12:40:45Z</dc:date>
    </item>
  </channel>
</rss>

