Author Topic: wxString support in wxWidgets 3.0 problem?  (Read 36632 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6034
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: wxString support in wxWidgets 3.0 problem?
« Reply #15 on: October 18, 2013, 04:49:21 am »
In c++ times the pointer-way is the bad way ;) Better would be to use iterators...
but i think wx2.8 has no support for string iterators -.-
Yes, I agree.

What I think the better way is:
Use std::string to hold the source file buffers internally in CC plugins, not wxString. Even wxWidgets document suggest this:
http://docs.wxwidgets.org/trunk/classwx_string.html
Quote
String class for passing textual data to or receiving it from wxWidgets.

Note
    While the use of wxString is unavoidable in wxWidgets program, you are encouraged to use the standard string classes std::string or std::wstring in your applications and convert them to and from wxString only when interacting with wxWidgets.
Then all the internal source code was encoded in UTF8 (stored in std::string), then I have already created a simple/faster lexer by Quex, see this post Quex lexer grammar, probably can make our tokenizer much faster for details. the generated lexer is all c/c++ code, and it is about three times faster than Flex generated lexer.

Note: ctags internally use char type too.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6034
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: wxString support in wxWidgets 3.0 problem?
« Reply #16 on: October 18, 2013, 05:13:36 am »
Another issue is the string construction. As you know, all token strings are in-fact a sub-string of the source file. (in some special case, the token is replaced by some macro expansion, but we can create an auxiliary source string to hold all the expanded strings).

What a lexer do is to locate the start point and the end point of the lexeme, for example in a source code
Code
int main ( ) { int a; .....
    ^   $
Note, when a lexeme is found, the lexer (Quex lexer) know the start position "^", and the end position "$", also it has a Type enum information, in this case, it is an "identifier". It depend on the user to handle this information, so if you have a Token class like below:
Code
class CCToken
{
    std::string name;
    TokenType   type;
}
The user should construct the CCToken instance by a memory copy from source code to name member variables, then set the type member variables.

I think a better way is:
Code
class CCToken
{
    int  source_index;
    int  lexeme_start;
    int  lexeme_length;
    TokenType  type;
}
There, the first member is the index to the source buffer, then remember the start position and length.

Maybe, we can supply a member function like: "std::string CCToken::ToStdString()", which return a true new std::string. In most cases, I think we don't need to use lexeme_start and lexeme_length, because we only need to know the TokenType. For example there are some TokenTypes like: "keyword_class", "keyword_public"........


« Last Edit: October 18, 2013, 05:15:18 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.