KE与PT的简单代码效率对比(code efficiency comparison between KE and PT)

Document created by Weihua Liang Employee on Nov 4, 2013Last modified by Weihua Liang Employee on Nov 12, 2013
Version 3Show Document
  • View in full screen mode

作者 Sam Wang & River Liang

   说明,本文对比8MCU的位操作在系统升级到M0+内核的MCU后所带来的影响,可以作为客户对升级MCU,对代码及RAM上资源的评估使用.

 

)简单的I/O翻转对比.对比条件:MCUMC9S08PTKE02,开发平台CW10.3            

      1.使用PT的代码如下:

         if (!PORT_PTAD_PTAD0){

            PORT_PTAD_PTAD0=1;

        }else{

            PORT_PTAD_PTAD0=0;}

      PT的代码编译后占用9Byte

               000002D4 000004   BRSET  0,PORT_PTAD,PORT_PTAD

               000002D7 1000     BSET   0,PORT_PTAD

               000002D9 202E     BRA    *+48       ;abs = 0x0309

               000002DB 1100     BCLR   0,PORT_PTAD

 

      2.KE是基于ARM的M0+内核,使用的代码如下

                 if(GPIOA_PDOR & 0x1)

                    { GPIOA_PCOR = 0x1;}

                 else

                    { GPIOA_PSOR = 0X1;}

      编译后结果为

               00000706:   ldr r3,[pc,#24]

               00000708:   ldr r2,[r3,#0]

               0000070a:   movs r3,#1

               0000070c:   ands r3,r2

               0000070e:   beq main+0x6c (0x718)       ; 0x00000718

               00000710:   ldr r3,[pc,#12]

               00000712:   movs r2,#1

               00000714:   str r2,[r3,#8]

               00000716:   b main+0x20 (0x6cc)       ; 0x000006cc

               00000718:   ldr r3,[pc,#4]

               0000071a:   movs r2,#1

               0000071c:   str r2,[r3,#4]

      这段M0+内核的代码编译后占用24个Byte

 

      3.KE系列是Freescale在M0+的基础上加入了位操作引擎BME,用以优化ARM内核的位操作性能,使用BME功能的代码如下

                    #define PTA0_SET   (void) (*((volatile unsigned char *)(0x4C000000+(0<<21)+0xF004))) //?==0

                                                      //LAS1      0       GPIOA_PSOR地址的A0-A19

                    #define PTA0_CLR   (void)(*((volatile unsigned char *)(0x4C000000+(0<<21)+0xFF008)))

                                                     //LAS1      0       GPIOA_PCOR地址的A0-A19

                    #define PTA0                *((volatile unsigned char *)(0x50000000+(0<<23)+(0<<19)+0xF000))

                                                    //UBFX           0     1           GPIOA_PDOR 地址的A0-A18

               if (!(PTA0))

                    {PTA0_SET; }

               else

                    {PTA0_CLR;}    

      KE的BME代码编译结果如下:

                   165                if (!(PTA0)){ 

               00000998:   ldr r3,[pc,#24]

               0000099a:   ldrb r3,[r3,#0]

               0000099c:   uxtb r3,r3

               0000099e:   cmp r3,#0

               000009a0:   bne RTC_IRQHandler+0x18 (0x9a8); 0x000009a8

                 166                PTA0_SET;                                             //Using BME

               000009a2:   ldr r3,[pc,#20]

               000009a4:   ldrb r3,[r3,#0]

               000009a6:   b RTC_IRQHandler+0x1c (0x9ac); 0x000009ac

                 168                PTA0_CLR;      //Using FASTER GPIO

               000009a8:   ldr r3,[pc,#16]

               000009aa:   ldrb r3,[r3,#0]

      代码编译后占用20个Byte  

 

      4, CW里面有设置可以优化C编译器,具体路径在Project->Proteries->C/C++ Build->Setting->GCC C Complier->Optimization

          优化后共用16个Byte

                 165         if (!(PTA0)){ 

               0000091e:   ldr r3,[pc,#20]

               00000920:   ldrb r3,[r3,#0]

               00000922:   cmp r3,#0

               00000924:   bne RTC_IRQHandler+0x12 (0x92a); 0x0000092a

                 166                         PTA0_SET;                                             //Using BME

               00000926:   ldr r3,[pc,#16]

               00000928:   b RTC_IRQHandler+0x12 (0x92c); 0x0000092c

                 168                         PTA0_CLR;      //Using FASTER GPIO

               0000092a:   ldr r3,[pc,#16]

               0000092c:   ldrb r3,[r3,#0]

 

      5, 结果

    如果单纯靠M0+内核访问寄存器,KE代码的占用空间与PT的比为24:9

       如果使用KE的BME功能,代码与PT的比为16:9(使用了BME)

       在判断Bit, KE使用代码与PT的比为8:3

       单单设置一个Bit时KE与PT代码占比为4:2

因此在M0+等ARM核上进行位操作,其效率比8位单片机低,使用了BME功能后,可以有效提高位操作的性能。

 

)典型变量的位操作. 对比条件:MCUMC9S08PTKE02,开发平台CW10.3               

测试代码:if (xx&1){

             xx&=0xFE;

      }else{

            xx|=1;}

      1,设置XX0 page,其与上面的I/O翻转结果一样,代码为9BYTES

 

   2,KE中,编译结果如下,设置优化前,需要52Bytes的代码量,26个执行周期.

                                                    if (xx&1){

               00000a52:   ldr r3,[pc,#64]

               00000a54:   ldrb r3,[r3,#0]

               00000a56:   uxtb r3,r3

               00000a58:   mov r2,r3

               00000a5a:   movs r3,#1

               00000a5c:   ands r3,r2

               00000a5e:   uxtb r3,r3

               00000a60:   cmp r3,#0

               00000a62:   beq main+0x5a (0xa76)       ; 0x00000a76

                    200                         xx&=0xFe;

               00000a64:   ldr r3,[pc,#44]

               00000a66:   ldrb r3,[r3,#0]

               00000a68:   uxtb r3,r3

               00000a6a:   movs r2,#1

               00000a6c:   bics r3,r2

               00000a6e:   uxtb r2,r3

               00000a70:   ldr r3,[pc,#32]

               00000a72:   strb r2,[r3,#0]

                    203         }}

               00000a74:   b main+0x28 (0xa44)       ; 0x00000a44

                    202                         xx|=1;

               00000a76:   ldr r3,[pc,#28]

               00000a78:   ldrb r3,[r3,#0]

               00000a7a:   uxtb r3,r3

               00000a7c:   movs r2,#1

               00000a7e:   orrs r3,r2

               00000a80:   uxtb r2,r3

               00000a82:   ldr r3,[pc,#16]

               00000a84:   strb r2,[r3,#0]

                    203         }}

 

      3, 设置优化后,需要22/20Bytes的代码量,11/10个执行周期.

               ldr r3,[pc,#40]

                    199         if (xx&1){

               0000095e:   movs r2,#1

                    197         xx++;

               00000960:   ldrb r1,[r3,#0]

               00000962:   adds r1,#1

               00000964:   uxtb r1,r1

               00000966:   strb r1,[r3,#0]

                    199         if (xx&1){

               00000968:   ldrb r1,[r3,#0]

               0000096a:   tst r1,r2

               0000096c:   beq main+0x34 (0x974)       ; 0x00000974

                    200                         xx&=0xFe;

               0000096e:   ldrb r1,[r3,#0]

               00000970:   bics r1,r2

               00000972:   b main+0x34 (0x978)       ; 0x00000978

                    202                         xx|=1;

               00000974:   ldrb r1,[r3,#0]

               00000976:   orrs r1,r2

               00000978:   strb r1,[r3,#0]

               0000097a:   b main+0x20 (0x960)       ; 0x00000960

如果采用以空间换时间的话,其参考代码如下.

                if (xx==0){

                                xx=1;

                }else{

                                xx=0;  }

      4, 如考虑中断嵌套的话,还令需要4Byte代码。

 

      5, 结果 KE使用代码与PT的比为至少为20:9

     在判断Bit, KE使用代码与PT的比为8:3.

 

      6,使用BYTE替换Bit, 编译结果,设置优化前,需要22BYTES.

                    197         if (xx==0){

               000009e8:   ldr r3,[pc,#44]

               000009ea:   ldrb r3,[r3,#0]

               000009ec:   cmp r3,#0

               000009ee:   bne main+0x38 (0x9f8)       ; 0x000009f8

                    198                         xx=1;

               000009f0:   ldr r3,[pc,#36]

               000009f2:   movs r2,#1

               000009f4:   strb r2,[r3,#0]

               000009f6:   b main+0x3e (0x9fe)       ; 0x000009fe

                    200                         xx=0;

               000009f8:   ldr r3,[pc,#28]

               000009fa:   movs r2,#0

               000009fc:   strb r2,[r3,#0]

 

   7,设置优化后,需要16/14BYTES的代码量.

                    197         xx++;

               0000095c:   ldr r3,[pc,#36]

                    202                         xx=1;

               0000095e:   movs r1,#1

                    197         xx++;

               00000960:   ldrb r0,[r3,#0]

               00000962:   adds r0,#1

               00000964:   uxtb r0,r0

               00000966:   strb r0,[r3,#0]

                    199         if (xx){

               00000968:   ldrb r0,[r3,#0]

               0000096a:   cmp r0,#0

               0000096c:   beq main+0x32 (0x972)       ; 0x00000972

                    200                         xx=0;

               0000096e:   strb r2,[r3,#0]

               00000970:   b main+0x20 (0x960)       ; 0x00000960

                    202                         xx=1;

               00000972:   strb r1,[r3,#0]

               00000974:   b main+0x20 (0x960)       ; 0x00000960

 

      8, 结果 ,RAM的空间允许的情况下,KE使用代码与PT的比为至少为12:9.

 

) 8 bit变量加1

      1,在PT中对8 bit变量加1,只需要4BYTES.

   24:                 XX++;

00000014 450000   LDHX   #XX

      00000017 7C       INC    ,X

 

      2,M0+8 bit变量加1,设置优化前,需要14BYTES

                    197         xx++;

               00000a44:   ldr r3,[pc,#48]

               00000a46:   ldrb r3,[r3,#0]

               00000a48:   uxtb r3,r3

               00000a4a:   adds r3,#1

               00000a4c:   uxtb r2,r3

               00000a4e:   ldr r3,[pc,#40]

               00000a50:   strb r2,[r3,#0]

 

      3,而如果使用优化设置,那么12BYTES

                    197         xx++;

               0000095c:   ldr r3,[pc,#36]

                    202                         xx=1;

               0000095e:   movs r1,#1

                    197         xx++;

               00000960:   ldrb r0,[r3,#0]

               00000962:   adds r0,#1

               00000964:   uxtb r0,r0

               00000966:   strb r0,[r3,#0]

 

      4, 结果 , 在8 bit变量加1时,KE使用代码与PT的比为至少为12:4,但这是32bitARM内核操作8bit变量都普遍存在效率变低的现象

 

) 16+8位加法

      1, 8 bit 编译结果,需要8BYTES.

               0000008 320000    LDHX   xx

               0000000B AF01     AIX    #1

               0000000D 960000   STHX   xx

 

      2, M0+ 编译结果,设置优化前,需要10BYTES.

               00000a44:   ldr r3,[pc,#44]

               00000a46:   ldr r3,[r3,#0]

               00000a48:   adds r2,r3,#1

               00000a4a:   ldr r3,[pc,#40]

               00000a4c:   str r2,[r3,#0].

 

      3, M0+ 编译结果,设置优化后,需要8BYTES.

               0000095c:   ldr r3,[pc,#20]

               0000095c:   ldr r3,[pc,#20]

               0000095e:   ldr r2,[r3,#0]

               00000960:   adds r2,#1

               00000962:   str r2,[r3,#0]

 

      4,结果,M0+在16位加法时能够达到8bit单片机的效率,结果相同.

 

)结论

    因此用户在移植PT(或其它8 bit MCU)代码到KE02,要选型时需要充分考虑客户原先代码具体运算情况,理论上存在使用KE后代码变大的情况.

  但是使用KE等32bitM0+内核时可以在16bit或以上的乘、加运算时获得更好的效率,占用更小的代码空间和运算时间。

  另外KEGPIO的控制寄存器比PT多了一些功能,可以一次操作多个I/O,是不错的功能.

Attachments

    Outcomes