Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

wxString support in wxWidgets 3.0 problem?

<< < (3/4) > >>

BlueHazzard:
is wchar_t in windows 16bit? If yes, can you use character access anyway? I mean, there isn't enough space in 16 bit for the whole unicode- tables.
(UTF16 is the basest decision you can make by supporting unicode:
* you have to look for the endianess
* If you open a corrupt file, there is no way to repair it...)

ollydbg:

--- Quote from: BlueHazzard on October 17, 2013, 05:23:22 pm ---is wchar_t in windows 16bit?

--- End quote ---
If I know correctly, yes. Windows system is in-fact using UTF-16 for encoding strings, and wchar_t is 16 bits.


--- Quote ---If yes, can you use character access anyway? I mean, there isn't enough space in 16 bit for the whole unicode- tables.
(UTF16 is the basest decision you can make by supporting unicode:
* you have to look for the endianess
* If you open a corrupt file, there is no way to repair it...)

--- End quote ---
In some cases, a character need four bytes to holds, which means two UTF-16 code unit. Under Windows, the user need to handle this special case (called surrogate pairs)

See the document in: http://docs.wxwidgets.org/trunk/overview_string.html

--- Quote ---Internal wxString Encoding

Since wxWidgets 3.0 wxString internally uses UTF-16 (with Unicode code units stored in wchar_t) under Windows and UTF-8 (with Unicode code units stored in char) under Unix, Linux and Mac OS X to store its content.

For definitions of code units and code points terms, please see the Unicode Representations and Terminology paragraph.

For simplicity of implementation, wxString when wxUSE_UNICODE_WCHAR==1 (e.g. on Windows) uses per code unit indexing instead of per code point indexing and doesn't know anything about surrogate pairs; in other words it always considers code points to be composed by 1 code unit, while this is really true only for characters in the BMP (Basic Multilingual Plane). Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user code has to take care of surrogate pairs himself. (Note however that Windows itself has built-in support for surrogate pairs in UTF-16, such as for drawing strings on screen.)

Remarks
    Note that while the behaviour of wxString when wxUSE_UNICODE_WCHAR==1 resembles UCS-2 encoding, it's not completely correct to refer to wxString as UCS-2 encoded since you can encode code points outside the BMP in a wxString as two code units (i.e. as a surrogate pair; as already mentioned however wxString will "see" them as two different code points)

When instead wxUSE_UNICODE_UTF8==1 (e.g. on Linux and Mac OS X) wxString handles UTF8 multi-bytes sequences just fine also for characters outside the BMP (it implements per code point indexing), so that you can use UTF8 in a completely transparent way:

--- End quote ---

Note that it looks like this document is not correct, since UNIX system is now use std::basic_string<wchar_t> by default.
But its description is correct on Windows system.

Note, we don't handle surrogate pairs in currently C::B, wxWidgets 2.8.12 use std::basic_string<wchar_t> too. Question: are there any change we can meet a surrogate pairs in C++ source code? Maybe it is in comments? Which character need surrogate pairs to hold under Windows? I don't have such example.

oBFusCATed:
ollydbg: Are you sure you've disabled building in STL mode?

ollydbg:

--- Quote from: oBFusCATed on October 17, 2013, 06:24:01 pm ---ollydbg: Are you sure you've disabled building in STL mode?

--- End quote ---
Hi, Obf, what does this question means? I'm sorry I can't understand your question. You mean: build wxString without using the internal std::basic_string support? I think this is not an option for wx2.9.x+.


--- Quote from: ollydbg on October 17, 2013, 05:35:44 pm ---...
Note that it looks like this document is not correct, since UNIX system is now use std::basic_string<wchar_t> by default.

--- End quote ---
I have report this issue to wxWidgets maillist, now it was fixed in the wx trunk, see this commit: https://groups.google.com/d/msg/wx-commits-diffs/QZDKnpiL3lM/eEX0cFOKS3cJ, the web page: http://docs.wxwidgets.org/trunk/overview_string.html need some days to synchronize with the trunk change.

Another issue I see is: wxString is not NULL terminated, right? So, it is OK for the while condition check below in function: bool ParserThread::GetBaseArgs(const wxString& args, wxString& baseArgs)

--- Code: ---    while (*ptr != ParserConsts::null)
    {
    ...
    }

--- End code ---
Basically, I think we should use the length of the wxString to limit the pointer range.

BlueHazzard:
In c++ times the pointer-way is the bad way ;) Better would be to use iterators...
but i think wx2.8 has no support for string iterators -.-

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version