Author Topic: can type unicode character in C:B?  (Read 8094 times)

hakit

  • Guest
can type unicode character in C:B?
« on: November 08, 2005, 06:51:41 am »
can i use unicode character such as Chinese, Vietnamese.... in CB editor ?


Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
Re: can type unicode character in C:B?
« Reply #1 on: November 08, 2005, 10:09:53 am »
Short answer: No.

Slightly longer answer: In theory, it should be possible (and quite simple, too) by adjusting parameters in wxScintilla. However, I have tried so in the past and it did nothing at all.


EDIT:
If you want to experiment, the "correct" way to do it is to call wxScintilla with SetCodePage(wxSCI_CP_UTF8);. This effectively sets the "code.page" property to 65001.
However, I was unable to see any visible difference when doing this.
« Last Edit: November 08, 2005, 10:39:24 am by thomas »
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

Offline cyberkoa

  • Plugin developer
  • Almost regular
  • ****
  • Posts: 145
    • http://
Re: can type unicode character in C:B?
« Reply #2 on: November 08, 2005, 04:02:29 pm »
I have tested under both Windows & Ubuntu Linux, it seems like we can do that.

I have tested with Chinese , Spanish , Japanese and Tamil.

RC2 binary (Windows) is built with non-Unicode support , you need to compile yourself a Unicode version.

However , recently, while I am doing testing in wxSmith , I found that TinyXML seems like cannot handle Unicode well , xrc export in wxSmith CANNOT save Double Byte character properly .


takeshimiya

  • Guest
Re: can type unicode character in C:B?
« Reply #3 on: November 08, 2005, 06:15:15 pm »
What is the current state of tinyxml per se? It supports UTF8 or not?

If it does, why C::B talks to tinyxml in ascii?
If it doesn't, wouldn't be better to modify tinyxml to use wxStrings?

Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
Re: can type unicode character in C:B?
« Reply #4 on: November 08, 2005, 07:10:23 pm »
What is the current state of tinyxml per se? It supports UTF8 or not?
It has been supporting UTF-8 encoded documents for one and a half years.

Quote
If it does, why C::B talks to tinyxml in ascii?
The tags used in Code::Blocks only consist of ANSI characters. All ANSI characters are UTF-8 characters, no matter how you encode your document later. Using wide strings is nowhere better, it only breaks things.

Quote
If it doesn't, wouldn't be better to modify tinyxml to use wxStrings?
Extremely bad idea. Not only would that require a lot of code changes to tinyXML and thus make updating to newer versions a pain, but it would actually break the code.
wxStrings use wchar_t sized characters in Unicode builds, and tinyXML explicitely does not support anything like this (see documentation).
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

takeshimiya

  • Guest
Re: can type unicode character in C:B?
« Reply #5 on: November 08, 2005, 07:39:14 pm »
Ok, I thought that as tinyxml supported STL strings it would support wchar_t, making an easy replace by wxString (as wxString supports most of the syntax of the STL strings).

Better wait for tinyxml support of wide characters then.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: can type unicode character in C:B?
« Reply #6 on: November 08, 2005, 07:53:41 pm »
But Remember! TinyXML still supports the &nnnn; character "format"! This might be an elegant workaround... transforming all weird characters to the numeric entities format, and then encode them using TinyXML.

However, that's a job for the wxSmith devs.
« Last Edit: November 08, 2005, 07:55:31 pm by rickg22 »

Offline cyberkoa

  • Plugin developer
  • Almost regular
  • ****
  • Posts: 145
    • http://
Re: can type unicode character in C:B?
« Reply #7 on: November 08, 2005, 09:00:25 pm »
some update , the support for Unicode for Western language seems ok , I tried with spanish word espaƱol , it is correctly saved in XML file with numeric entities format.

However, Eastern language with graphical character such as Chinese 中文 and Japanese にほんご  face the problem, word is totally not saved (the latest wxSmith CVS)

Still need some time to further explore the problem.