Author Topic: Source Code encoding in UNICODE version  (Read 17056 times)

mauser

  • Guest
Source Code encoding in UNICODE version
« on: January 01, 2006, 07:32:29 pm »
I have the unicode version of revision 1635. i write win32 apps and it seems that the source codes are utf-8 encoded, but MSVC2003 seems not to understand utf-8 source codes, is it a c::b feature? what can you advice. Thanks in advance

mauser

  • Guest
Re: Source Code encoding in UNICODE version
« Reply #1 on: January 01, 2006, 07:39:41 pm »
Also C::B didn't show anything when i tried to open a file with russian characters in cp1251 charset editied in another editor. It behaved the way if the file was just empty.

Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Source Code encoding in UNICODE version
« Reply #2 on: January 01, 2006, 07:41:42 pm »
MSVC2003 seems not to understand utf-8 source codes, is it a c::b feature? what can you advice. Thanks in advance

What do you mean by "understand"? Is the problem by loading, displaying or compiling the sources?

Michael

anonuser

  • Guest
Re: Source Code encoding in UNICODE version
« Reply #3 on: January 01, 2006, 07:54:05 pm »
UTF-8 is ascii so it shouldn't have any problems with it.
Now when you bump up to UTF-32 or UTF-16 that's when things get interesting.


Offline Der Meister

  • Regular
  • ***
  • Posts: 307
Re: Source Code encoding in UNICODE version
« Reply #4 on: January 01, 2006, 07:59:49 pm »
MSVC seems to need a signature for Unicode-Source files. You can set this in the dialog "file->Extra save options" (or somthing similar to that). The required option is "UTF-8 with signature". If you save your file that way MSVC recognizes that it is a Unicode-Source-File and works with it without problems.

But: The signature consits of two or three bytes at the beginning of the file. Some editors show them (or at least some cryptic characters for them), some don't and some don't even open such a file. Code::Blocks opens it (at least it did at my last try) and seems to have no problems with that signature (it even doesn't show it). But the compilers I tested (gcc and icc - both on linux) refused to compile this file. They complain about invalid characters in the file. Unfortunately MSVC (the IDE as well as the compiler) seems to need this signature to properly handle Unicode-source-files.
The only solution I can give you here: Don't use Unicode-Source files if you want to use them with MSVC and other editors/compilers. In strings you can still use unicode-characters if you use their code instead that character itself, i.e. write '\x00E4' instead of 'รค'.
Real Programmers don't comment their code. If it was hard to write, it should be hard to understand.
Real Programmers don't write in BASIC. Actually, no programmers write in BASIC, after the age of 12.

mauser

  • Guest
Re: Source Code encoding in UNICODE version
« Reply #5 on: January 01, 2006, 08:12:54 pm »
UTF-8 is ascii so it shouldn't have any problems with it.
Now when you bump up to UTF-32 or UTF-16 that's when things get interesting.



Well utf-8 is not ascii. it is ascii compatible in some way.
And i think i just need to swith to GCC. that assumes utf-8 input by default.
Thanks

Offline Leviathan

  • Single posting newcomer
  • *
  • Posts: 7
Re: Source Code encoding in UNICODE version
« Reply #6 on: January 02, 2006, 06:03:30 pm »
ascii is a subset of utf-8. All symbols in ascii are present in utf8 with the same value.
But utf-8 is a multi-byte encoding, so a symbol may consist of 1, 2 or even 4 bytes, so obviously ascii doesn't contain all symbols utf-8 does.

Now, more on topic: What "Der Meister" said is absolutely correct. Windows uses utf-16 internally, therefore its support for utf8 is limited. Also, it expects a BOM (Byte order mark) at the beginning of a unicode-file. All other textfiles are assumed to be (extended) ascii.
Unix on the other hand doesn't expect a BOM, so the first 2 bytes are interpreted as symbols.

You have 2 choices: Either stick to ASCII like "Der Meister" suggested, or write a (very simple) program to quickly add or remove the Signature (0xFEBBBF) to/from files.

Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4315
    • Code::Blocks IDE
Re: Source Code encoding in UNICODE version
« Reply #7 on: January 02, 2006, 06:42:22 pm »
You should try with r1648. This should be fixed (for now).
Be patient!
This bug will be fixed soon...

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
Re: Source Code encoding in UNICODE version
« Reply #8 on: January 02, 2006, 06:55:56 pm »

Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4315
    • Code::Blocks IDE
Re: Source Code encoding in UNICODE version
« Reply #9 on: January 02, 2006, 06:58:47 pm »
?? why for now ??

otherwise this bug can be closed.

http://sourceforge.net/tracker/index.php?func=detail&aid=1384513&group_id=126998&atid=707416

Well, because I didn't add any code to handle "strange" encodings, I just asked not to do any conversion on the charset. I believe it is fixed now, but I 'll wait a while until more people have tested it.
Be patient!
This bug will be fixed soon...

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
Re: Source Code encoding in UNICODE version
« Reply #10 on: January 02, 2006, 07:01:46 pm »
I builded and tested on some as files and scintilla editor.cxx and for those it worked, let's hope there are no side effects. (winXP sp2 system)

Offline tiwag

  • Developer
  • Lives here!
  • *****
  • Posts: 1196
  • sailing away ...
    • tiwag.cb
Re: Source Code encoding in UNICODE version
« Reply #11 on: January 02, 2006, 07:06:22 pm »
it works for me now with the files, which previously didn't open.
we'll see what happens in future , don't care too much for now ...

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
Re: Source Code encoding in UNICODE version
« Reply #12 on: January 02, 2006, 09:17:59 pm »
BAD NEWS ;

I justed builded on linux (SUSE10) and it seems the problem still occurs there, tried it out on :
editor.cxx
as/* files

:-(

Lieven

Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4315
    • Code::Blocks IDE
Re: Source Code encoding in UNICODE version
« Reply #13 on: January 02, 2006, 09:23:30 pm »
BAD NEWS ;

I justed builded on linux (SUSE10) and it seems the problem still occurs there, tried it out on :
editor.cxx
as/* files

:-(

Lieven

It will be fixed now that I pinpointed the error :)
Just show some patience ;)
Be patient!
This bug will be fixed soon...

Offline Ceniza

  • Developer
  • Lives here!
  • *****
  • Posts: 1441
    • CenizaSOFT
Re: Source Code encoding in UNICODE version
« Reply #14 on: January 02, 2006, 09:29:15 pm »
Quote from: mandrav
Just show some patience :wink:

Heh, you should consider to add something like that as your signature now :P