I have read this:
wxWidgets: wxString Class Reference (http://docs.wxwidgets.org/trunk/classwx_string.html)
Then, I found that in wxWidgets 3.0, the wxString class will internally use different "code unit", which is utf8 under Linux like system, and utf16 under windows.
Both of them were variable length code point Unicode representation, so index reference like:
wxString s;
s[20]= something;
will have very low performance compared with sequence iterator.
I'm not sure what does the currently implementation, but does this will cause potential issue in the feature?
EDIT 2013-10-18:
I found that in wxWidgets 3.0, the wxString class will internally use different "code unit", which is utf8 under Linux like system, and utf16 under windows.
This is not correct, now all use fixed width unit (wchar_t), see this post http://forums.codeblocks.org/index.php/topic,14421.msg126174.html#msg126174 for explanation.
FYI:
I found one message in wx forum:
DL> It's not really thread-safe since it uses reference counting - I think,
This was true for 2.8 but this question is explicitly about 2.9 and by
default in wx 2.9 (i.e. unless you set wxUSE_STD_STRING to 0) wxString uses
std::basic_string for implementation and so doesn't use reference counting
if the standard class doesn't -- and most, if not all, of them don't use it
any more. So the thread safety of wxString is the same as the thread-safety
of the underlying standard library string class.
Regards,
VZ
So, in the future, it seems wxString 3.x/2.9.x mostly does NOT use reference counting as stl.
Then in the current Codecompletion plugin's source, there are a lot of functions like:
wxString GetToken();
wxString PeekToken();
These code will do a deep copy of string data, so I'm concern the performance.
PS: Under wxWidgets 2.8.x 's implementation, wxString use reference counting, so return a wxString object is much fast (it do not do a deep copy of string data)
So, what do you think?
I just search the Google for sometime, and found that
gcc libc++'s string is COW(copy on write), see
http://stackoverflow.com/questions/1594803/is-stdstring-thead-safe-with-gcc-4-3
This code can show the COW
#include <string>
#include <cstdio>
int main()
{
std::string orig = "I'm the original!";
std::string copy_cow = orig;
std::string copy_mem = orig.c_str();
std::printf("%p %p %p\n", orig.data(),
copy_cow.data(),
copy_mem.data());
}
So, I think though wx does not use reference count, I think std::string use it.
Am I right??? some one can confirm this?
ollydbg: Are you sure you've disabled building in STL mode?
Hi, Obf, what does this question means? I'm sorry I can't understand your question. You mean: build wxString without using the internal std::basic_string support? I think this is not an option for wx2.9.x+.
...
Note that it looks like this document is not correct, since UNIX system is now use std::basic_string<wchar_t> by default.
I have report this issue to wxWidgets maillist, now it was fixed in the wx trunk, see this commit: https://groups.google.com/d/msg/wx-commits-diffs/QZDKnpiL3lM/eEX0cFOKS3cJ, the web page: http://docs.wxwidgets.org/trunk/overview_string.html need some days to synchronize with the trunk change.
Another issue I see is: wxString is not NULL terminated, right? So, it is OK for the while condition check below in function: bool ParserThread::GetBaseArgs(const wxString& args, wxString& baseArgs)
while (*ptr != ParserConsts::null)
{
...
}
Basically, I think we should use the length of the wxString to limit the pointer range.
Another issue is the string construction. As you know, all token strings are in-fact a sub-string of the source file. (in some special case, the token is replaced by some macro expansion, but we can create an auxiliary source string to hold all the expanded strings).
What a lexer do is to locate the start point and the end point of the lexeme, for example in a source code
int main ( ) { int a; .....
^ $
Note, when a lexeme is found, the lexer (Quex lexer) know the start position "^", and the end position "$", also it has a Type enum information, in this case, it is an "identifier". It depend on the user to handle this information, so if you have a Token class like below:
class CCToken
{
std::string name;
TokenType type;
}
The user should construct the CCToken instance by a memory copy from source code to name member variables, then set the type member variables.
I think a better way is:
class CCToken
{
int source_index;
int lexeme_start;
int lexeme_length;
TokenType type;
}
There, the first member is the index to the source buffer, then remember the start position and length.
Maybe, we can supply a member function like: "std::string CCToken::ToStdString()", which return a true new std::string. In most cases, I think we don't need to use lexeme_start and lexeme_length, because we only need to know the TokenType. For example there are some TokenTypes like: "keyword_class", "keyword_public"........