Code::Blocks Forums

User forums => Help => Topic started by: mauser on January 01, 2006, 07:32:29 pm

Title: Source Code encoding in UNICODE version
Post by: mauser on January 01, 2006, 07:32:29 pm
I have the unicode version of revision 1635. i write win32 apps and it seems that the source codes are utf-8 encoded, but MSVC2003 seems not to understand utf-8 source codes, is it a c::b feature? what can you advice. Thanks in advance
Title: Re: Source Code encoding in UNICODE version
Post by: mauser on January 01, 2006, 07:39:41 pm
Also C::B didn't show anything when i tried to open a file with russian characters in cp1251 charset editied in another editor. It behaved the way if the file was just empty.
Title: Re: Source Code encoding in UNICODE version
Post by: Michael on January 01, 2006, 07:41:42 pm
MSVC2003 seems not to understand utf-8 source codes, is it a c::b feature? what can you advice. Thanks in advance

What do you mean by "understand"? Is the problem by loading, displaying or compiling the sources?

Michael
Title: Re: Source Code encoding in UNICODE version
Post by: anonuser on January 01, 2006, 07:54:05 pm
UTF-8 is ascii so it shouldn't have any problems with it.
Now when you bump up to UTF-32 or UTF-16 that's when things get interesting.

Title: Re: Source Code encoding in UNICODE version
Post by: Der Meister on January 01, 2006, 07:59:49 pm
MSVC seems to need a signature for Unicode-Source files. You can set this in the dialog "file->Extra save options" (or somthing similar to that). The required option is "UTF-8 with signature". If you save your file that way MSVC recognizes that it is a Unicode-Source-File and works with it without problems.

But: The signature consits of two or three bytes at the beginning of the file. Some editors show them (or at least some cryptic characters for them), some don't and some don't even open such a file. Code::Blocks opens it (at least it did at my last try) and seems to have no problems with that signature (it even doesn't show it). But the compilers I tested (gcc and icc - both on linux) refused to compile this file. They complain about invalid characters in the file. Unfortunately MSVC (the IDE as well as the compiler) seems to need this signature to properly handle Unicode-source-files.
The only solution I can give you here: Don't use Unicode-Source files if you want to use them with MSVC and other editors/compilers. In strings you can still use unicode-characters if you use their code instead that character itself, i.e. write '\x00E4' instead of 'รค'.
Title: Re: Source Code encoding in UNICODE version
Post by: mauser on January 01, 2006, 08:12:54 pm
UTF-8 is ascii so it shouldn't have any problems with it.
Now when you bump up to UTF-32 or UTF-16 that's when things get interesting.



Well utf-8 is not ascii. it is ascii compatible in some way.
And i think i just need to swith to GCC. that assumes utf-8 input by default.
Thanks
Title: Re: Source Code encoding in UNICODE version
Post by: Leviathan on January 02, 2006, 06:03:30 pm
ascii is a subset of utf-8. All symbols in ascii are present in utf8 with the same value.
But utf-8 is a multi-byte encoding, so a symbol may consist of 1, 2 or even 4 bytes, so obviously ascii doesn't contain all symbols utf-8 does.

Now, more on topic: What "Der Meister" said is absolutely correct. Windows uses utf-16 internally, therefore its support for utf8 is limited. Also, it expects a BOM (Byte order mark) at the beginning of a unicode-file. All other textfiles are assumed to be (extended) ascii.
Unix on the other hand doesn't expect a BOM, so the first 2 bytes are interpreted as symbols.

You have 2 choices: Either stick to ASCII like "Der Meister" suggested, or write a (very simple) program to quickly add or remove the Signature (0xFEBBBF) to/from files.
Title: Re: Source Code encoding in UNICODE version
Post by: mandrav on January 02, 2006, 06:42:22 pm
You should try with r1648. This should be fixed (for now).
Title: Re: Source Code encoding in UNICODE version
Post by: killerbot on January 02, 2006, 06:55:56 pm
?? why for now ??

otherwise this bug can be closed.

http://sourceforge.net/tracker/index.php?func=detail&aid=1384513&group_id=126998&atid=707416
Title: Re: Source Code encoding in UNICODE version
Post by: mandrav on January 02, 2006, 06:58:47 pm
?? why for now ??

otherwise this bug can be closed.

http://sourceforge.net/tracker/index.php?func=detail&aid=1384513&group_id=126998&atid=707416

Well, because I didn't add any code to handle "strange" encodings, I just asked not to do any conversion on the charset. I believe it is fixed now, but I 'll wait a while until more people have tested it.
Title: Re: Source Code encoding in UNICODE version
Post by: killerbot on January 02, 2006, 07:01:46 pm
I builded and tested on some as files and scintilla editor.cxx and for those it worked, let's hope there are no side effects. (winXP sp2 system)
Title: Re: Source Code encoding in UNICODE version
Post by: tiwag on January 02, 2006, 07:06:22 pm
it works for me now with the files, which previously didn't open.
we'll see what happens in future , don't care too much for now ...
Title: Re: Source Code encoding in UNICODE version
Post by: killerbot on January 02, 2006, 09:17:59 pm
BAD NEWS ;

I justed builded on linux (SUSE10) and it seems the problem still occurs there, tried it out on :
editor.cxx
as/* files

:-(

Lieven
Title: Re: Source Code encoding in UNICODE version
Post by: mandrav on January 02, 2006, 09:23:30 pm
BAD NEWS ;

I justed builded on linux (SUSE10) and it seems the problem still occurs there, tried it out on :
editor.cxx
as/* files

:-(

Lieven

It will be fixed now that I pinpointed the error :)
Just show some patience ;)
Title: Re: Source Code encoding in UNICODE version
Post by: Ceniza on January 02, 2006, 09:29:15 pm
Quote from: mandrav
Just show some patience :wink:

Heh, you should consider to add something like that as your signature now :P
Title: Re: Source Code encoding in UNICODE version
Post by: mandrav on January 02, 2006, 09:30:48 pm
Quote from: mandrav
Just show some patience :wink:

Heh, you should consider to add something like that as your signature now :P

You know what? I just might ;)
Title: Re: Source Code encoding in UNICODE version
Post by: killerbot on January 02, 2006, 09:30:58 pm
Quote from: mandrav
Just show some patience :wink:

Heh, you should consider to add something like that as your signature now :P

I agree !!!

LOL
Title: Re: Source Code encoding in UNICODE version
Post by: mandrav on January 02, 2006, 09:32:04 pm
I just did, lol :D
Title: Re: Source Code encoding in UNICODE version
Post by: killerbot on January 02, 2006, 09:34:14 pm
don't forget :

It will be fixed now that I pinpointed the error  :wink:
Title: Re: Source Code encoding in UNICODE version
Post by: tiwag on January 02, 2006, 09:43:22 pm
You know what? I just might ;)

(http://www.smiliemania.de/smilie132/00000215.gif)