Source file encoding

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

Source file encoding

1,443件の閲覧回数
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Alexey Goncharov on Thu Apr 07 05:24:14 MST 2011
Hi there,

i have a source tree which is encoded in UTF8 by default. Now I'm working on a driver for hd44780-based character lcd to use with FreeRTOS. As you may know hd44780 (and similar) has it's own codepage for alphabets different from latin. Below is the way i've used before to recode cyrillic strings to hd44780-codepage:

[FONT=Courier New][SIZE=2]unsigned char rus_small_chars[]  = {0x61,0xB2,0xB3,0xB4,0xE3,
                                    0x65,0xB6,0xB7,0xB8,0xB9,
                                    0xBA,0xBB,0xBC,0xBD,0x6F,
                                    0xBE,0x70,0x63,0xBF,0x79,
                                    0xE4,0x78,0xE5,0xC0,0xC1,
                                    0xE6,0xC2,0xC3,0xC4,0xC5,
                                    0xC6,0xC7};

unsigned  char rus_big_chars[]   = {0x41,0xA0,0x42,0xA1,0xE0,
                                    0x45,0xA3,0xA4,0xA5,0xA6,
                                    0x4B,0xA7,0x4D,0x48,0x4F,
                                    0xA8,0x50,0x43,0x54,0xA9,
                                    0xAA,0x58,0xE1,0xAB,0xAC,
                                    0xE2,0xAD,0xAE,0x62,0xAF,
                                    0xB0,0xB1};

................................

        hd44780_wait_ready( true );

        if( c >= '&#1072;' && c <= '&#1103;')
        {
            hd44780_outdata( rus_small_chars[ c - '&#1072;'] );
        }
        else if( c >= '&#1040;' && c <= '&#1071;')
        {
            hd44780_outdata( rus_big_chars[ c - '&#1040;'] );
        }
        else if (c >= 0x20)
        {
            hd44780_outdata( c );
        }
[/SIZE][/FONT]
As you may see my source contains string constants like '&#1072;' and '&#1103;'. The way the recoding occurs:

[FONT=Courier New][SIZE=2]hd44780_outdata( rus_small_chars[ c - '&#1072;'] );
hd44780_outdata( rus_big_chars[ c - '&#1040;'] );[/SIZE][/FONT]

implies that source file codepage has to be single-byte (like cp-1251 or koi8-r). To fix this issue I used the additional gcc-flag "-fexec-charset=cp1251" with Eclipse in my previous projects. LPCXpresso (Windows-version) throws the following when I tried to do the same:

[FONT=Courier New][SIZE=2]cc1.exe: error: no iconv implementation, cannot convert from UTF-8 to cp1251[/SIZE][/FONT]

I guess it's because of GCC which is a part of LPCXpresso IDE wasn't built in a proper way.

Any advice on how to behave in this situation is really appreciated
0 件の賞賛
返信
6 返答(返信)

1,365件の閲覧回数
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Alexey Goncharov on Fri Apr 08 01:37:55 MST 2011

Quote: Zero
Use symbol 'CR_INTEGER_PRINTF'

See 'Reducing codesize of printf':

http://support.code-red-tech.com/CodeRedWiki/UsingPrintf



Thank you! Total code size falls down to 13...14k with -O1.
0 件の賞賛
返信

1,365件の閲覧回数
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Ex-Zero on Fri Apr 08 00:44:04 MST 2011

Quote: Alexey Goncharov

The third question is related to the size of Standard CodeRed C Library. I've built my application with -O1 and nohosting C Library. Right after that the total size (.text + .data) of applications immediately jumped to 20k+ . Just to compare - the same code without C Library weights only 7...8k. I'm using an LPC1343 which has only 32k, so it's kind of strange and unlikely for me. Could you please give me some recommendations? :)



Use symbol 'CR_INTEGER_PRINTF'

See 'Reducing codesize of printf':

http://support.code-red-tech.com/CodeRedWiki/UsingPrintf
0 件の賞賛
返信

1,365件の閲覧回数
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Alexey Goncharov on Fri Apr 08 00:32:30 MST 2011
Hi gdm!

I'm not sure what is occured. After reading this article (in Russian), and installing the codepage set for iconv as described in that article, [B]-fexec-charset=cp1251[/B] started to work for me together with LPCXpresso in Linux. And cyrillic alphabet support becomes available "just from the box".


Quote: gbm
It's generally not a good idea to use non/latin  characters in C source. If you plan to update your strings, maybe  writing a simple native-encoding to LCD-encoding utility that would  create C source files containing strings encoded in hexadecimal. For  your application I would use the target hardware encoding (LCD codepage)  rather than UTF-8.



Will it be possible to use functions like [B]printf[/B] together with such a kind of conversion  utilities?

The third question is related to the size of Standard CodeRed C Library. I've built my application with -O1 and nohosting C Library. Right after that the total size (.text + .data) of applications immediately jumped to 20k+ . Just to compare - the same code without C Library weights only 7...8k. I'm using an LPC1343 which has only 32k, so it's kind of strange and unlikely for me. Could you please give me some recommendations?

P.S.: Forgot to say. While diving into the problem described above, I've figured that cyrillic alphabet letters follow in CP-1251 codepage without any break/gap. So the code, which you may see in my first post, can be updated to the listed below:

char cyrillic_chars[]  = { 0x41, 0xA0, 0x42, 0xA1, 0xE0, 0x45, 0xA3, 0xA4,
   0xA5, 0xA6, 0x4B, 0xA7, 0x4D, 0x48, 0x4F, 0xA8,
   0x50, 0x43, 0x54, 0xA9, 0xAA, 0x58, 0xE1, 0xAB,
   0xAC, 0xE2, 0xAD, 0xAE, 0x62, 0xAF, 0xB0, 0xB1,
   0x61, 0xB2, 0xB3, 0xB4, 0xE3, 0x65, 0xB6, 0xB7,
   0xB8, 0xB9, 0xBA, 0xBB, 0xBC, 0xBD, 0x6F, 0xBE,
   0x70, 0x63, 0xBF, 0x79, 0xE4, 0x78, 0xE5, 0xC0,
   0xC1, 0xE6, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7 };

if( c >= '&#1040;' && c <= '&#1103;' )
{
    hd44780_write_byte( cyrillic_chars[c - '&#1040;'], HD44780_DATA );
}
0 件の賞賛
返信

1,365件の閲覧回数
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by gbm on Thu Apr 07 10:33:59 MST 2011
It's generally not a good idea to use non/latin characters in C source. If you plan to update your strings, maybe writing a simple native-encoding to LCD-encoding utility that would create C source files containing strings encoded in hexadecimal. For your application I would use the target hardware encoding (LCD codepage) rather than UTF-8.
0 件の賞賛
返信

1,365件の閲覧回数
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Alexey Goncharov on Thu Apr 07 08:11:46 MST 2011
Hi, gbm!

Thank you for the quick answer.


Quote:
Simple but not very elegant: use explicit hex codes for non-latin characters. That's what I did in my project.        

Well.. i thought about this. How should i define and save constant strings (e.g. for menu) in this case. Substitute every char in string :

const char *str_main = "&#1059;&#1078;&#1072;&#1089;";

in the way like:

const uint16_t str_main[] = {0xD0A3, 0xD096, 0xD090, 0xD0A1, 0x0000};

If i'm right, then how to maintain this kind of strings in future?


P.S: I've just checked the linux version of LPCXpresso (v3.8.1) - behaviour is the same
0 件の賞賛
返信

1,365件の閲覧回数
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by gbm on Thu Apr 07 06:01:52 MST 2011
Simple but not very elegant: use explicit hex codes for non-latin characters. That's what I did in my project.
0 件の賞賛
返信