TinyXml with Unicode thread

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

<< < (4/5) > >>

thomas:
What we use xml for, DOM is the one and only good thing. SAX would indeed make my life a lot less enjoyable. Whether it uses a little more or less memory really does not matter - the data needs to be stored anyway, and the extra overhead per node is 28 bytes on a 32 bit machine, not so much really :)

A config file has typically 1200-1500 nodes, so we're talking about 30-50 kilobytes.

Michael:

--- Quote from: thomas on December 20, 2005, 07:28:10 pm ---A config file has typically 1200-1500 nodes, so we're talking about 30-50 kilobytes.

--- End quote ---

Yes, that is not too much. And also it should not be an issue if the config file would grove a bit in the future.

Michael

takeshimiya:
Back on topic: Unicode support.

@280Z28: I've applied your patch and it's working a lot better.

Here are the things that works (tried with Japanese): :D
-Typing in the cbEditor Unicode chars.
-Copying and pasting Unicode chars between cbEditors.
-Copying Unicode chars between cbEditor and external editors.
-Dragging Unicode chars between cbEditors.
-Dragging Unicode chars between cbEditor and external editors.
-Saving a file that contain the recently typed Unicode chars (It's saved in UTF-8 without BOM).

Here are the things that doesn't works: :cry:
-Loading a file that contain Unicode chars. It loads ok, but it's displayed as it were ASCII, not Unicode, so the Unicode chars becomes garbage.
-Pasting Unicode chars from an external editor (Notepad) to cbEditor. Somehow, it pastes the word "Hello". :shock:
-Doesn't haves any notion of what a BOM is.
-Other Unicode encodings such as UTF-16 doesn't works. Only UTF-8 works.

280Z28:

--- Quote from: Takeshi Miya on December 21, 2005, 12:55:36 am ---Back on topic: Unicode support.

@280Z28: I've applied your patch and it's working a lot better.

Here are the things that works (tried with Japanese): :D
-Typing in the cbEditor Unicode chars.
-Copying and pasting Unicode chars between cbEditors.
-Copying Unicode chars between cbEditor and external editors.
-Dragging Unicode chars between cbEditors.
-Dragging Unicode chars between cbEditor and external editors.
-Saving a file that contain the recently typed Unicode chars (It's saved in UTF-8 without BOM).

Here are the things that doesn't works: :cry:
-Loading a file that contain Unicode chars. It loads ok, but it's displayed as it were ASCII, not Unicode, so the Unicode chars becomes garbage.
-Pasting Unicode chars from an external editor (Notepad) to cbEditor. Somehow, it pastes the word "Hello". :shock:
-Doesn't haves any notion of what a BOM is.
-Other Unicode encodings such as UTF-16 doesn't works. Only UTF-8 works.

--- End quote ---

:lol: :lol: :lol: I had "Hello" hard coded in the version you downloaded. That's one of the things I fixed for when I re-posted a patch. There is a problem with pasting data from other editors that shows up on my computer here at work but not at home and I was trying to see if a certain section of code was being executed.

The BOM stuff, etc. is going to be a nightmare. :(

In the end the user MUST (using ricks terms :) ) have a way to say "this file opened in the wrong encoding. reopen it in _____."

Files MUST be saved in the same encoding that they were opened with.

The BOM MUST be preserved if it was present when the file was opened.

takeshimiya:
Yes, just like SciTE does this: it let you override (once the file is opened) the encoding (ASCII, UTF-8, UTF-16LE, UTF-16BE).

If the file haves BOM, it's easy, we read that, and when the file is going to be saved we check the actual Encoding.

It only will requiere UTF-8 to UTF-16, type of conversions, which I think wxWidgets provides, but if not, they're only small functions.

The only difficult case will be when we load a file that doesn't have BOM, but the most simple solution would be to assume either that it's UTF-8, or ASCII with the current locale.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version