Source file encoding

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Alexey Goncharov on Thu Apr 07 05:24:14 MST 2011
Hi there,

i have a source tree which is encoded in UTF8 by default. Now I'm working on a driver for hd44780-based character lcd to use with FreeRTOS. As you may know hd44780 (and similar) has it's own codepage for alphabets different from latin. Below is the way i've used before to recode cyrillic strings to hd44780-codepage:

[FONT=Courier New][SIZE=2]unsigned char rus_small_chars[] = {0x61,0xB2,0xB3,0xB4,0xE3,
                                    0x65,0xB6,0xB7,0xB8,0xB9,
                                    0xBA,0xBB,0xBC,0xBD,0x6F,
                                    0xBE,0x70,0x63,0xBF,0x79,
                                    0xE4,0x78,0xE5,0xC0,0xC1,
                                    0xE6,0xC2,0xC3,0xC4,0xC5,
                                    0xC6,0xC7};

unsigned char rus_big_chars[]   = {0x41,0xA0,0x42,0xA1,0xE0,
                                    0x45,0xA3,0xA4,0xA5,0xA6,
                                    0x4B,0xA7,0x4D,0x48,0x4F,
                                    0xA8,0x50,0x43,0x54,0xA9,
                                    0xAA,0x58,0xE1,0xAB,0xAC,
                                    0xE2,0xAD,0xAE,0x62,0xAF,
                                    0xB0,0xB1};

................................

        hd44780_wait_ready( true );

        if( c >= 'а' && c <= 'я')
        {
            hd44780_outdata( rus_small_chars[ c - 'а'] );
        }
        else if( c >= 'А' && c <= 'Я')
        {
            hd44780_outdata( rus_big_chars[ c - 'А'] );
        }
        else if (c >= 0x20)
        {
            hd44780_outdata( c );
        }
[/SIZE][/FONT]
As you may see my source contains string constants like 'а' and 'я'. The way the recoding occurs:

[FONT=Courier New][SIZE=2]hd44780_outdata( rus_small_chars[ c - 'а'] );
hd44780_outdata( rus_big_chars[ c - 'А'] );[/SIZE][/FONT]

implies that source file codepage has to be single-byte (like cp-1251 or koi8-r). To fix this issue I used the additional gcc-flag "-fexec-charset=cp1251" with Eclipse in my previous projects. LPCXpresso (Windows-version) throws the following when I tried to do the same:

[FONT=Courier New][SIZE=2]cc1.exe: error: no iconv implementation, cannot convert from UTF-8 to cp1251[/SIZE][/FONT]

I guess it's because of GCC which is a part of LPCXpresso IDE wasn't built in a proper way.

Any advice on how to behave in this situation is really appreciated

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Alexey Goncharov on Fri Apr 08 01:37:55 MST 2011

Quote: Zero
Use symbol 'CR_INTEGER_PRINTF'

See 'Reducing codesize of printf':

http://support.code-red-tech.com/CodeRedWiki/UsingPrintf

Thank you! Total code size falls down to 13...14k with -O1.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Ex-Zero on Fri Apr 08 00:44:04 MST 2011

Quote: Alexey Goncharov

The third question is related to the size of Standard CodeRed C Library. I've built my application with -O1 and nohosting C Library. Right after that the total size (.text + .data) of applications immediately jumped to 20k+ . Just to compare - the same code without C Library weights only 7...8k. I'm using an LPC1343 which has only 32k, so it's kind of strange and unlikely for me. Could you please give me some recommendations? :)

Use symbol 'CR_INTEGER_PRINTF'

See 'Reducing codesize of printf':

http://support.code-red-tech.com/CodeRedWiki/UsingPrintf

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Alexey Goncharov on Fri Apr 08 00:32:30 MST 2011
Hi gdm!

I'm not sure what is occured. After reading this article (in Russian), and installing the codepage set for iconv as described in that article, [B]-fexec-charset=cp1251[/B] started to work for me together with LPCXpresso in Linux. And cyrillic alphabet support becomes available "just from the box".

Quote: gbm
It's generally not a good idea to use non/latin characters in C source. If you plan to update your strings, maybe writing a simple native-encoding to LCD-encoding utility that would create C source files containing strings encoded in hexadecimal. For your application I would use the target hardware encoding (LCD codepage) rather than UTF-8.

Will it be possible to use functions like [B]printf[/B] together with such a kind of conversion utilities?

The third question is related to the size of Standard CodeRed C Library. I've built my application with -O1 and nohosting C Library. Right after that the total size (.text + .data) of applications immediately jumped to 20k+ . Just to compare - the same code without C Library weights only 7...8k. I'm using an LPC1343 which has only 32k, so it's kind of strange and unlikely for me. Could you please give me some recommendations?

P.S.: Forgot to say. While diving into the problem described above, I've figured that cyrillic alphabet letters follow in CP-1251 codepage without any break/gap. So the code, which you may see in my first post, can be updated to the listed below:

char cyrillic_chars[]  = { 0x41, 0xA0, 0x42, 0xA1, 0xE0, 0x45, 0xA3, 0xA4,
   0xA5, 0xA6, 0x4B, 0xA7, 0x4D, 0x48, 0x4F, 0xA8,
   0x50, 0x43, 0x54, 0xA9, 0xAA, 0x58, 0xE1, 0xAB,
   0xAC, 0xE2, 0xAD, 0xAE, 0x62, 0xAF, 0xB0, 0xB1,
   0x61, 0xB2, 0xB3, 0xB4, 0xE3, 0x65, 0xB6, 0xB7,
   0xB8, 0xB9, 0xBA, 0xBB, 0xBC, 0xBD, 0x6F, 0xBE,
   0x70, 0x63, 0xBF, 0x79, 0xE4, 0x78, 0xE5, 0xC0,
   0xC1, 0xE6, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7 };

if( c >= '&#1040;' && c <= '&#1103;' )
{
    hd44780_write_byte( cyrillic_chars[c - '&#1040;'], HD44780_DATA );
}

lpcware · ‎06-15-2016

Content originally posted in LPCWare by gbm on Thu Apr 07 10:33:59 MST 2011
It's generally not a good idea to use non/latin characters in C source. If you plan to update your strings, maybe writing a simple native-encoding to LCD-encoding utility that would create C source files containing strings encoded in hexadecimal. For your application I would use the target hardware encoding (LCD codepage) rather than UTF-8.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Alexey Goncharov on Thu Apr 07 08:11:46 MST 2011
Hi, gbm!

Thank you for the quick answer.

Quote:
Simple but not very elegant: use explicit hex codes for non-latin characters. That's what I did in my project.

Well.. i thought about this. How should i define and save constant strings (e.g. for menu) in this case. Substitute every char in string :

const char *str_main = "Ужас";

in the way like:

const uint16_t str_main[] = {0xD0A3, 0xD096, 0xD090, 0xD0A1, 0x0000};

If i'm right, then how to maintain this kind of strings in future?

P.S: I've just checked the linux version of LPCXpresso (v3.8.1) - behaviour is the same

lpcware · ‎06-15-2016

Content originally posted in LPCWare by gbm on Thu Apr 07 06:01:52 MST 2011
Simple but not very elegant: use explicit hex codes for non-latin characters. That's what I did in my project.