is wchar_t in windows 16bit?
If I know correctly, yes. Windows system is in-fact using UTF-16 for encoding strings, and wchar_t is 16 bits.
If yes, can you use character access anyway? I mean, there isn't enough space in 16 bit for the whole unicode- tables.
(UTF16 is the basest decision you can make by supporting unicode:
* you have to look for the endianess
* If you open a corrupt file, there is no way to repair it...)
In some cases, a character need four bytes to holds, which means two UTF-16 code unit. Under Windows, the user need to handle this special case (called surrogate pairs)
See the document in:
http://docs.wxwidgets.org/trunk/overview_string.htmlInternal wxString Encoding
Since wxWidgets 3.0 wxString internally uses UTF-16 (with Unicode code units stored in wchar_t) under Windows and UTF-8 (with Unicode code units stored in char) under Unix, Linux and Mac OS X to store its content.
For definitions of code units and code points terms, please see the Unicode Representations and Terminology paragraph.
For simplicity of implementation, wxString when wxUSE_UNICODE_WCHAR==1 (e.g. on Windows) uses per code unit indexing instead of per code point indexing and doesn't know anything about surrogate pairs; in other words it always considers code points to be composed by 1 code unit, while this is really true only for characters in the BMP (Basic Multilingual Plane). Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user code has to take care of surrogate pairs himself. (Note however that Windows itself has built-in support for surrogate pairs in UTF-16, such as for drawing strings on screen.)
Remarks
Note that while the behaviour of wxString when wxUSE_UNICODE_WCHAR==1 resembles UCS-2 encoding, it's not completely correct to refer to wxString as UCS-2 encoded since you can encode code points outside the BMP in a wxString as two code units (i.e. as a surrogate pair; as already mentioned however wxString will "see" them as two different code points)
When instead wxUSE_UNICODE_UTF8==1 (e.g. on Linux and Mac OS X) wxString handles UTF8 multi-bytes sequences just fine also for characters outside the BMP (it implements per code point indexing), so that you can use UTF8 in a completely transparent way:
Note that it looks like this document is not correct, since UNIX system is now use std::basic_string<wchar_t> by default.
But its description is correct on Windows system.
Note, we don't handle surrogate pairs in currently C::B, wxWidgets 2.8.12 use std::basic_string<wchar_t> too. Question: are there any change we can meet a surrogate pairs in C++ source code? Maybe it is in comments? Which character need surrogate pairs to hold under Windows? I don't have such example.