Author Topic: TinyXml with Unicode thread  (Read 21900 times)

Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
Re: TinyXml with Unicode thread
« Reply #15 on: December 20, 2005, 07:28:10 pm »
What we use xml for, DOM is the one and only good thing. SAX would indeed make my life a lot less enjoyable. Whether it uses a little more or less memory really does not matter - the data needs to be stored anyway, and the extra overhead per node is 28 bytes on a 32 bit machine, not so much really :)

A config file has typically 1200-1500 nodes, so we're talking about 30-50 kilobytes.
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: TinyXml with Unicode thread
« Reply #16 on: December 20, 2005, 07:35:44 pm »
A config file has typically 1200-1500 nodes, so we're talking about 30-50 kilobytes.

Yes, that is not too much. And also it should not be an issue if the config file would grove a bit in the future.

Michael

takeshimiya

  • Guest
Re: TinyXml with Unicode thread
« Reply #17 on: December 21, 2005, 12:55:36 am »
Back on topic: Unicode support.

@280Z28: I've applied your patch and it's working a lot better.

Here are the things that works (tried with Japanese): :D
-Typing in the cbEditor Unicode chars.
-Copying and pasting Unicode chars between cbEditors.
-Copying Unicode chars between cbEditor and external editors.
-Dragging Unicode chars between cbEditors.
-Dragging Unicode chars between cbEditor and external editors.
-Saving a file that contain the recently typed Unicode chars (It's saved in UTF-8 without BOM).

Here are the things that doesn't works: :cry:
-Loading a file that contain Unicode chars. It loads ok, but it's displayed as it were ASCII, not Unicode, so the Unicode chars becomes garbage.
-Pasting Unicode chars from an external editor (Notepad) to cbEditor. Somehow, it pastes the word "Hello". :shock:
-Doesn't haves any notion of what a BOM is.
-Other Unicode encodings such as UTF-16 doesn't works. Only UTF-8 works.

Offline 280Z28

  • Regular
  • ***
  • Posts: 397
  • *insert unicode here*
Re: TinyXml with Unicode thread
« Reply #18 on: December 21, 2005, 08:32:03 pm »
Back on topic: Unicode support.

@280Z28: I've applied your patch and it's working a lot better.

Here are the things that works (tried with Japanese): :D
-Typing in the cbEditor Unicode chars.
-Copying and pasting Unicode chars between cbEditors.
-Copying Unicode chars between cbEditor and external editors.
-Dragging Unicode chars between cbEditors.
-Dragging Unicode chars between cbEditor and external editors.
-Saving a file that contain the recently typed Unicode chars (It's saved in UTF-8 without BOM).

Here are the things that doesn't works: :cry:
-Loading a file that contain Unicode chars. It loads ok, but it's displayed as it were ASCII, not Unicode, so the Unicode chars becomes garbage.
-Pasting Unicode chars from an external editor (Notepad) to cbEditor. Somehow, it pastes the word "Hello". :shock:
-Doesn't haves any notion of what a BOM is.
-Other Unicode encodings such as UTF-16 doesn't works. Only UTF-8 works.


:lol: :lol: :lol: I had "Hello" hard coded in the version you downloaded. That's one of the things I fixed for when I re-posted a patch. There is a problem with pasting data from other editors that shows up on my computer here at work but not at home and I was trying to see if a certain section of code was being executed.

The BOM stuff, etc. is going to be a nightmare. :(

In the end the user MUST (using ricks terms :) ) have a way to say "this file opened in the wrong encoding. reopen it in _____."

Files MUST be saved in the same encoding that they were opened with.

The BOM MUST be preserved if it was present when the file was opened.
78 280Z, "a few bolt-ons" - 12.71@109.04
99 Trans Am, "Daily Driver" - 525rwhp/475rwtq
 Check out The Sam Zone :cool:

takeshimiya

  • Guest
Re: TinyXml with Unicode thread
« Reply #19 on: December 21, 2005, 09:52:27 pm »
Yes, just like SciTE does this: it let you override (once the file is opened) the encoding (ASCII, UTF-8, UTF-16LE, UTF-16BE).

If the file haves BOM, it's easy, we read that, and when the file is going to be saved we check the actual Encoding.

It only will requiere UTF-8 to UTF-16, type of conversions, which I think wxWidgets provides, but if not, they're only small functions.

The only difficult case will be when we load a file that doesn't have BOM, but the most simple solution would be to assume either that it's UTF-8, or ASCII with the current locale.

takeshimiya

  • Guest
Re: TinyXml with Unicode thread
« Reply #20 on: December 22, 2005, 08:27:19 pm »
The link http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch does not works.

Anyways, there is a problem with your patches, you provide some here but not all on the sf.net tracker?

I'll want to see instead is more small patches, posted and updated in sf.net

Offline 280Z28

  • Regular
  • ***
  • Posts: 397
  • *insert unicode here*
Re: TinyXml with Unicode thread
« Reply #21 on: December 22, 2005, 08:41:34 pm »
The link http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch does not works.

Anyways, there is a problem with your patches, you provide some here but not all on the sf.net tracker?

I'll want to see instead is more small patches, posted and updated in sf.net

I don't have what I consider a "workable solution to the problem." Only once I'm sure they work, I post them at SF. :)

I have posted quite a few of the Unicode fixes at SF already.

By the way, CodeBlocks\src\sdk\wxscintilla\src\scintilla\src\Editor.cxx is ASCII. On top of that, there is a character at line 1195 that prevents Code::Blocks from opening it (it opens as blank) if UTF8 is specified. As a workaround, I changed lines 364, 390, and 405 of codeblocks\src\sdk\globals.cpp from wxConvUTF8 to wxConvLibc. I have no idea if that was a good idea (so I haven't posted a patch), but it worked on my system so I left it.

Edit: I make no guarantees concerning my multi-patches I post on here. :lol:
« Last Edit: December 22, 2005, 08:47:00 pm by 280Z28 »
78 280Z, "a few bolt-ons" - 12.71@109.04
99 Trans Am, "Daily Driver" - 525rwhp/475rwtq
 Check out The Sam Zone :cool:

takeshimiya

  • Guest
Re: TinyXml with Unicode thread
« Reply #22 on: December 22, 2005, 08:50:54 pm »
No, I wasn't saying your patches didn't work.

The link http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch  still doesn't works.

Offline 280Z28

  • Regular
  • ***
  • Posts: 397
  • *insert unicode here*
Re: TinyXml with Unicode thread
« Reply #23 on: December 24, 2005, 08:49:30 pm »
No, I wasn't saying your patches didn't work.

The link http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch  still doesn't works.

:oops: Oh well 1565's history anyway :lol:
78 280Z, "a few bolt-ons" - 12.71@109.04
99 Trans Am, "Daily Driver" - 525rwhp/475rwtq
 Check out The Sam Zone :cool: