Code::Blocks Forums

Developer forums (C::B DEVELOPMENT STRICTLY!) => Development => Topic started by: takeshimiya on December 20, 2005, 08:41:19 am

Title: TinyXml with Unicode thread
Post by: takeshimiya on December 20, 2005, 08:41:19 am
Well, the problems of TinyXml and Unicode are still there.

I just found a piece of code that could help here: (http://wxforum.shadonet.com/viewtopic.php?t=1376)

wxtinyxml.cpp
Code
// This crude code is for using wxWidget streams with TinyXML. 
// This is useful, for example, for loading and saving XML directly
// into a ZIP file (with the new ZIP streams in 2.5.4).

// Copyright (c) 2005 Andrew Ziem. All rights reserved.
// This code is licensed under the three licenses:  wxWindows Library Licence, Version 3;
// Zlib license (like TinyXML); and the GNU General Public License version 2 or later.
               

#include "wxtinyxml.h"

bool wxTiXmlDocument::LoadFile( wxInputStream &istream )
{
        Clear();
        location.Clear();
       
        wxTextInputStream txt(istream);
       
        wxString data;
       
        do
        {
            const wxString s = txt.ReadLine();

            if ( istream.Eof() && s.empty() )
                break;
               
            data += s;

        } while (1);

        Parse( wxConvCurrent->cWX2MB(data), 0, TIXML_ENCODING_UTF8 );

        if (  Error() )
                return false;
        else
                return true;
}

wxtinyxml.h
Code
bool wxTiXmlDocument::SaveFile (wxOutputStream &ostream)
{
        wxTextOutputStream txt(ostream);
       
#ifdef TIXML_USE_STL
#error not implemented
#else
        TiXmlOutStream outs;
        StreamOut (&outs);
        const char *c =  outs.c_str();
        txt << wxString(c, wxConvUTF8);
#endif

}


#ifndef __WXTINYXML_H__
#define __WXTINYXML_H__

#include "wx/wxprec.h"

#ifdef __BORLANDC__
    #pragma hdrstop
#endif

#ifndef WX_PRECOMP
    #include "wx/wx.h"
#endif

#include "tinyxml.h"
#include "wx/txtstrm.h"

class wxTiXmlDocument : public TiXmlDocument
{
public:
bool LoadFile( wxInputStream &istream );
bool SaveFile( wxOutputStream &ostream );

};

#endif

If that doesn't helps, let's try another solution so we can end with this bug.
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 20, 2005, 08:43:42 am
The problems of TinyXML and Unicode are there. Would you like to try my patch that fixes it all?
Title: Re: TinyXml with Unicode thread
Post by: takeshimiya on December 20, 2005, 08:44:55 am
Of couse, I'll try that. :)
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 20, 2005, 08:47:05 am
http://sourceforge.net/tracker/index.php?func=detail&aid=1385378&group_id=126998&atid=707418

http://sourceforge.net/tracker/index.php?func=detail&aid=1384347&group_id=126998&atid=707418

http://sourceforge.net/tracker/index.php?func=detail&aid=1382579&group_id=126998&atid=707418

There are a couple more smaller issues I fixed, but posting patches isn't seeming very worthwhile these days. Those 3 go back to 4 days ago. I have 9 other patches on SF right now, those 3 target Unicode specific problems.
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 20, 2005, 08:53:28 am
Hang on, I'll get you a patch right now that fixes the "Code::Blocks v1.0" shows as "C v1" fix. I didn't and won't post it on SF because I was already told it won't get committed.
Title: Re: TinyXml with Unicode thread
Post by: takeshimiya on December 20, 2005, 09:16:18 am
Thanks, I've noticed that thr C v1 bug with Unicode got introduced with autoversionator, right?
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 20, 2005, 09:17:51 am
This one has most of my patches applied. At least all of the ones I'm "comfortable with." That includes all 13 I've posted at SourceForge plus a couple that I haven't.

It includes the 3 listed two posts up, but TortoiseSVN should figure that out and merge them without trouble.

Applies in codeblocks\src to rev1565.

Edit: There's a serious bug with copy/paste in what was linked here. It's fixed in the one 3 or 4 posts down.
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 20, 2005, 09:25:47 am
Thanks, I've noticed that thr C v1 bug with Unicode got introduced with autoversionator, right?

Actually it's the unfortunate result of this thread (http://forums.codeblocks.org/index.php?topic=1620.0) :(
Title: Re: TinyXml with Unicode thread
Post by: killerbot on December 20, 2005, 10:24:40 am
Sam,

Could you please post all of them on sf ?

As said I will start nightly builds soon, and I will try also to build a 'user' patched build. That way people can experiment with pre-approved patches, which is good since this might give feedback on side effetcs before it makes the offical build.

I think he main debs are first sweating out there change of menu/toolbar/... stuff before they look back at the patches.


Lieven
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 20, 2005, 10:48:04 am
Here is a complete patch of my code against rev 1565.

It includes my updates to the project file for building Unicode.

The only things in it that aren't on SF are maybe 2 lines where "<<" was replaced (see the diff for my syntax highlighting patch on SF for what I mean) and the rectangle copy/paste and drag/drop functionality in wxScintilla. Also, ctrl+mousewheel in my code does a pageup/pagedn instead of zooming. You'll find it amazingly useful.

http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch
Title: Re: TinyXml with Unicode thread
Post by: thomas on December 20, 2005, 10:55:24 am
Hang on, I'll get you a patch right now that fixes the "Code::Blocks v1.0" shows as "C v1" fix. I didn't and won't post it on SF because I was already told it won't get committed.
Sam, the patch was turned down because it does not fix the bug. It is a patch that reverts Yiannis' changes in r1513. You are welcome  to fix the bug, but like I told you, the way to go is to modify the about box so it works with the modifications, not to revert appglobals.h.
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 20, 2005, 11:07:35 am
Hang on, I'll get you a patch right now that fixes the "Code::Blocks v1.0" shows as "C v1" fix. I didn't and won't post it on SF because I was already told it won't get committed.
Sam, the patch was turned down because it does not fix the bug. It is a patch that reverts Yiannis' changes in r1513. You are welcome  to fix the bug, but like I told you, the way to go is to modify the about box so it works with the modifications, not to revert appglobals.h.

I may toy with it later. For now, there are some major bugs that don't have solutions at all. As my code stands, that section falls under "don't fix what ain't broke." The former appglobals.h works in ANSI and unicode so I don't see why change it.  :? If someone changes it later and it still works and you're happy with it, then I guess that's fine too.
Title: Re: TinyXml with Unicode thread
Post by: Michael on December 20, 2005, 06:43:20 pm
Hello,

Regarding TinyXML & wxWidgets, I have found this article How To Use TinyXML With WxWidgets (http://wiki.wxwidgets.org/wiki.pl?How_To_Use_TinyXML_With_WxWidgets) in the wxWiki that could be interesting.

Alternatively, if TinyXML seems to have problems with UNICODE, why not try the libxml2 library (http://xmlsoft.org/)? It is mentioned on the wxWiki Using XML With WxWidgets (http://wiki.wxwidgets.org/wiki.pl?Using_XML_With_WxWidgets) and a wrapper wxxml2 (http://wxcode.sourceforge.net/components/wxxml2/) exisits. Some info here about the libxml2 Encodings support (http://xmlsoft.org/encoding.html).

Best wishes,
Michael
Title: Re: TinyXml with Unicode thread
Post by: thomas on December 20, 2005, 06:50:58 pm
How To Use TinyXML With WxWidgets (http://wiki.wxwidgets.org/wiki.pl?How_To_Use_TinyXML_With_WxWidgets)
That's exactly what we are doing. :)
Quote
why not try the libxml2 library (http://xmlsoft.org/)? It is mentioned on the wxWiki
If we were to use another xml library, we should be using the one built into wx. But that would mean we would have to rewrite an awful lot of code. Plus, the charm of tinyXML is that it is easy to use, small, and fast.
Title: Re: TinyXml with Unicode thread
Post by: Michael on December 20, 2005, 07:12:18 pm
How To Use TinyXML With WxWidgets (http://wiki.wxwidgets.org/wiki.pl?How_To_Use_TinyXML_With_WxWidgets)
That's exactly what we are doing. :)

Ok, I see. Sorry :oops:.

why not try the libxml2 library (http://xmlsoft.org/)? It is mentioned on the wxWiki
Quote
If we were to use another xml library, we should be using the one built into wx. But that would mean we would have to rewrite an awful lot of code. Plus, the charm of tinyXML is that it is easy to use, small, and fast.

Yes, I understand. Changing of xml library is worth if the code to be modified/rewritten is not too much and if really it is the only library that could solve the problem(s).

I have looked at TinyXML and it is an interesting piece of code. The only problem, I see is that it uses the Document Object Model (DOM). For small XML documents DOM is suitable, but for larger ones it could lead to high memory consumption (SAX would be a better alternative in this context).

Michael
Title: Re: TinyXml with Unicode thread
Post by: thomas on December 20, 2005, 07:28:10 pm
What we use xml for, DOM is the one and only good thing. SAX would indeed make my life a lot less enjoyable. Whether it uses a little more or less memory really does not matter - the data needs to be stored anyway, and the extra overhead per node is 28 bytes on a 32 bit machine, not so much really :)

A config file has typically 1200-1500 nodes, so we're talking about 30-50 kilobytes.
Title: Re: TinyXml with Unicode thread
Post by: Michael on December 20, 2005, 07:35:44 pm
A config file has typically 1200-1500 nodes, so we're talking about 30-50 kilobytes.

Yes, that is not too much. And also it should not be an issue if the config file would grove a bit in the future.

Michael
Title: Re: TinyXml with Unicode thread
Post by: takeshimiya on December 21, 2005, 12:55:36 am
Back on topic: Unicode support.

@280Z28: I've applied your patch and it's working a lot better.

Here are the things that works (tried with Japanese): :D
-Typing in the cbEditor Unicode chars.
-Copying and pasting Unicode chars between cbEditors.
-Copying Unicode chars between cbEditor and external editors.
-Dragging Unicode chars between cbEditors.
-Dragging Unicode chars between cbEditor and external editors.
-Saving a file that contain the recently typed Unicode chars (It's saved in UTF-8 without BOM).

Here are the things that doesn't works: :cry:
-Loading a file that contain Unicode chars. It loads ok, but it's displayed as it were ASCII, not Unicode, so the Unicode chars becomes garbage.
-Pasting Unicode chars from an external editor (Notepad) to cbEditor. Somehow, it pastes the word "Hello". :shock:
-Doesn't haves any notion of what a BOM is.
-Other Unicode encodings such as UTF-16 doesn't works. Only UTF-8 works.
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 21, 2005, 08:32:03 pm
Back on topic: Unicode support.

@280Z28: I've applied your patch and it's working a lot better.

Here are the things that works (tried with Japanese): :D
-Typing in the cbEditor Unicode chars.
-Copying and pasting Unicode chars between cbEditors.
-Copying Unicode chars between cbEditor and external editors.
-Dragging Unicode chars between cbEditors.
-Dragging Unicode chars between cbEditor and external editors.
-Saving a file that contain the recently typed Unicode chars (It's saved in UTF-8 without BOM).

Here are the things that doesn't works: :cry:
-Loading a file that contain Unicode chars. It loads ok, but it's displayed as it were ASCII, not Unicode, so the Unicode chars becomes garbage.
-Pasting Unicode chars from an external editor (Notepad) to cbEditor. Somehow, it pastes the word "Hello". :shock:
-Doesn't haves any notion of what a BOM is.
-Other Unicode encodings such as UTF-16 doesn't works. Only UTF-8 works.


:lol: :lol: :lol: I had "Hello" hard coded in the version you downloaded. That's one of the things I fixed for when I re-posted a patch. There is a problem with pasting data from other editors that shows up on my computer here at work but not at home and I was trying to see if a certain section of code was being executed.

The BOM stuff, etc. is going to be a nightmare. :(

In the end the user MUST (using ricks terms :) ) have a way to say "this file opened in the wrong encoding. reopen it in _____."

Files MUST be saved in the same encoding that they were opened with.

The BOM MUST be preserved if it was present when the file was opened.
Title: Re: TinyXml with Unicode thread
Post by: takeshimiya on December 21, 2005, 09:52:27 pm
Yes, just like SciTE does this: it let you override (once the file is opened) the encoding (ASCII, UTF-8, UTF-16LE, UTF-16BE).

If the file haves BOM, it's easy, we read that, and when the file is going to be saved we check the actual Encoding.

It only will requiere UTF-8 to UTF-16, type of conversions, which I think wxWidgets provides, but if not, they're only small functions.

The only difficult case will be when we load a file that doesn't have BOM, but the most simple solution would be to assume either that it's UTF-8, or ASCII with the current locale.
Title: Re: TinyXml with Unicode thread
Post by: takeshimiya on December 22, 2005, 08:27:19 pm
The link http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch does not works.

Anyways, there is a problem with your patches, you provide some here but not all on the sf.net tracker?

I'll want to see instead is more small patches, posted and updated in sf.net
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 22, 2005, 08:41:34 pm
The link http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch does not works.

Anyways, there is a problem with your patches, you provide some here but not all on the sf.net tracker?

I'll want to see instead is more small patches, posted and updated in sf.net

I don't have what I consider a "workable solution to the problem." Only once I'm sure they work, I post them at SF. :)

I have posted quite a few of the Unicode fixes at SF already.

By the way, CodeBlocks\src\sdk\wxscintilla\src\scintilla\src\Editor.cxx is ASCII. On top of that, there is a character at line 1195 that prevents Code::Blocks from opening it (it opens as blank) if UTF8 is specified. As a workaround, I changed lines 364, 390, and 405 of codeblocks\src\sdk\globals.cpp from wxConvUTF8 to wxConvLibc. I have no idea if that was a good idea (so I haven't posted a patch), but it worked on my system so I left it.

Edit: I make no guarantees concerning my multi-patches I post on here. :lol:
Title: Re: TinyXml with Unicode thread
Post by: takeshimiya on December 22, 2005, 08:50:54 pm
No, I wasn't saying your patches didn't work.

The link http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch  still doesn't works.
Title: Re: TinyXml with Unicode thread
Post by: 280Z28 on December 24, 2005, 08:49:30 pm
No, I wasn't saying your patches didn't work.

The link http://www.280z28.org/CodeBlocks/patches/complete-rev1565u.patch  still doesn't works.

:oops: Oh well 1565's history anyway :lol: