When will support UTF-8 editor?

User forums > Using Code::Blocks

<< < (5/5)

kagerato:

--- Quote from: Takeshi Miya on December 16, 2005, 06:13:42 pm ---TinyXml supports UTF-8.
How are you supposed to store in memory UTF-8 encoded in memory then...?

--- End quote ---

What encoding do you use to store your text in RAM, you mean? I see two optimal ways:

1.) As UTF-8 (which, once again, is variable-width)
2.) As UTF-16 (Windows and other systems seem to accept Unicode data most often using this encoding)

#1 makes it a simple matter to read and write data between disk and RAM, since you'll very likely be using UTF-8 for both. The latter option is better if you're commonly calling functions from system or third-party libraries that require UTF-16. The alternative to #2 in the same situation is multiple copies of the text in different encodings, which is not only messy, tedious, and a potential source of bugs, but also a misuse of RAM and processing.

In any case, thomas sounds like he knows how to manage whatever the problem is/was. I still do not completely understand the nature of the problem; hence why I asked.

thomas:
The problem is that we store all text in UTF-16 using wchar[], and we do not have a choice to do otherwise. tinyXML does not support wchar. Therefore, we convert to UTF-8 just before passing the data to tinyXML.

Also, wxScintilla might not be completely Unicode-safe. This is only a suspicion, not necessarily true. While browsing the sources, I have spotted several places where they use chars as indices or compare against const char values. Unless these are only applied on text fragments which have been converted to UTF-8 (which I don't know, maybe they are?), this may be an issue. In that case, we will have another problem which is not easily solved.

takeshimiya:
Regarding Scintilla, I once asked the SciTE developers if support for Unicode filenames was feasible, and they answered that someone once started working on that, but it wasn't an easy task and requiered a rather major rewrite.
Anyways, Unicode text in Scintilla seems to work ok, but we always can expect bugs because they have their own string class, and I noticed some const chars* around the code too, so I'm not sure if it supports fully Unicode.

dbtsai:
Well, I would like to try utf-8 version C::B,

but I can not compile it well....

Could anyone release an utf-8 compiled version??

And let people to try what's going wrong!!

^_^

thomas:

--- Quote from: dbtsai on December 20, 2005, 12:57:21 pm ---Well, I would like to try utf-8 version C::B,

but I can not compile it well....
--- End quote ---
http://forums.codeblocks.org/index.php?topic=1701.0

Navigation

[0] Message Index

[*] Previous page

Go to full version