Something I just remembered.
A question to our CodeCompletion developers : with all the improvements in place, how difficult is it to have the following working correctly [no completion, nor type tooltips on certain scopes : scopes where the declaration occurs in the if/for, ...]
Examples :
TiXmlHandle handle;
if(TiXmlElement* foo = handle.ToElement())
{
foo->doSomething();
}
for(int index =0 ; index < 10; ++index)
{
int bar = index;
}
==> so type information and completion on : index, foo ...
Ok, You are right.
I find the logic:
if (token == ParserConsts::kw_for)
{
if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
else
m_Tokenizer.GetToken(); //skip args
m_Str.Clear();
}
We say, when we are parsing the function body to collect the auto variables, we has the option:
m_Options.useBuffer==true
m_Options.bufferSkipBlocks==false
So, finally m_Tokenizer.GetToken(); //skip args
will be called.
The proposed way was:
if (token == ParserConsts::kw_for)
{
if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
else
//m_Tokenizer.GetToken(); //skip args
GetAutoVariable();
m_Str.Clear();
}
Well, the function GetAutoVariable() will read the args in the next parentheses, and catch the variables in you cases. Oh, I think Reading the auto variables is much LIKE reading the function arguments.
Any logic error???
there are many possibilities usage like:
//type + variable
for(int a=0;.....)
for(NS::MyClass a=0;...)
// type containing some template info
for(MyNameSpace::MyTempLateClass<X,Y> a=0;...)
// pointer declaration
for(int *a=0;...)
for(int **a=0;...)
// two variables
for(int *a=0, b=0;...)
It is a bit complex. :D
BTW: The currently tokenizer even can't distinguish "+" and "++" (Morten's latest patch seems try to do a workaround in the parserthread). I think we need a "type id bundled return token" instead a pure wxString token.
yes, but I would suggest to support them one by one, increasing the complexity.
But all your examples are things that are also possible on a regular line. That's why I had the idea 'to mimic' as if those lines are inside the for loop, and have the parser parse them there (scope wise that's the correct behavior), line number wise one has to remember they are a few lines up.
By the way, don't forget this one ;-)
for(int index = 0; Foo* foofoo = Something.getFooByIndex(index); ++index)
{
// let's do something with foofoo
}
I'm thinking and doing some experiments, I would like to reuse the code, so, look:
(http://i683.photobucket.com/albums/vv194/ollydbg_cb/2011-04-30225041.png)
Here, DoParse() is our conventionalmethod to correct Symbols.
When we were handling "for" statement, when we meet a "(", we can recursively call another DoParse(), then if it meets an unbalanced ")", it just returned. Next, if it is a "{", we just do the same thing, but the DoParse() returned at an unbalanced "}".
It is the same thing as we parse the class declaration like
class MyClass
{
int m_a;
int m_b;
}
Here, DoParse() will be called when we try to read the class members.
It works quite well in my quex parser project, the code snippet looks like:
void ParserThread::HandleForWhile()
{
ConsumeToken(); //eat for or while key word
ConsumeToken(); //eat the left parenthesis
PushContext(); //save the old context
m_Context.EndStatement();
DoParse(); // do a parse, and should returned on an unbalanced right parenthesis
PopContext(); // restore the old context
RawToken * tok = PeekToken();
if(tok->type_id()==TKN_L_BRACE)
{
ConsumeToken(); //eat {
PushContext(); //save the old context
m_Context.EndStatement();
DoParse(); // do a parse, and should returned on an unbalanced right brace
PopContext(); // restore the old context
}
else
SkipStatementBlock();
}
As using quex lexer, parsing is much easier than the current implementation. :D
@oBFusCATed (http://forums.codeblocks.org/index.php?action=profile;u=1071)
There is no such patch, because I use another kind of Token.
CC's currently Token (Tokenizer class can supply) is just a wxString, so Token comparation is not quite good.
code snippet in DoParse() looks like below: Note: the Tokenizer has a hand-written lexer, which just return a lexeme ( a wxString ) with out Type ID information. comparation on strings is not quite good, we first do a switch on the token's length, then compared on text again.
case 6:
if (token == ParserConsts::kw_delete)
{
m_Str.Clear();
SkipToOneOfChars(ParserConsts::semicolonclbrace);
}
else if (token == ParserConsts::kw_switch)
{
if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
else
m_Tokenizer.GetToken(); //skip args
m_Str.Clear();
}
else if (token == ParserConsts::kw_return)
{
SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
m_Str.Clear();
}
else if (token == ParserConsts::kw_extern)
...
In my implementation, Token class has more precise information. The Token class is briefly like: (Quex lexer takes the work to fill these information) So, if it is an identifier, its text field will take the actual lexeme string, but if it is a keyword or a punctuation, it just need an type ID, and its text can be empty.
class Token
{
int type_id;
string text;
int line_number;
int column_number;
}
So, In my implementation, I use code like below:
while (true)
{
RawToken* tk = PeekToken();
switch (tk->type_id())
{
case TKN_L_BRACE: //{
{
SkipBrace();
break;
}
case TKN_R_BRACE: //}
{
// the only time we get to find a } is when recursively called by e.g. HandleClass
// we have to return now...
cout<<"DoParse(): return from"<<*tk<<tk->line_number()<<":"<<tk->column_number()<<endl;
ConsumeToken();
return;
}
case TKN_R_PAREN: //)
{
cout<<"DoParse(): return from"<<*tk<<tk->line_number()<<":"<<tk->column_number()<<endl;
ConsumeToken();
return;
}
case TKN_L_PAREN : // (
{
SkipParentheses();
break;
}
case TKN_FOR:
case TKN_WHILE:
{
TRACE("handling for or while block");
HandleForWhile();
}
.....
You can see: I can compare on type ID to distinguish different Tokens. So, it just do int value comparation instead string comparation. Also, the Token can supply both line/column information.
I also use some layers from parserthread->preprocessor->tokenizer, cc's current implementation do preprocess and parse in one class layer, which makes the code hard to read and maintain. :D
I'd like to say, if we need to adopt a new parser, we should change code a lot a lot...
@oBFusCATed (http://forums.codeblocks.org/index.php?action=profile;u=1071)
There is no such patch, because I use another kind of Token.
CC's currently Token (Tokenizer class can supply) is just a wxString, so Token comparation is not quite good.
[..]
I'd like to say, if we need to adopt a new parser, we should change code a lot a lot...
Please forgive my intrusion.
I am working on parser for D for a project of my own, and have too concluded that tokens need an initial classification both for better efficiency and better preparation for the semantical analysis. Outputting just strings may be handy as an initial approach and sound like a good idea at first, but some form of "predigestion" is very useful.
Basically, my "tokenizer" (in my case, the class is called Scanner) preliminarily classifies certain tokens such as braces, parenthesis, operators, etc. through an enum, and only stores the string in the case of a "word token". Note that it is not necessary to distinguish between keywords and symbols at this stage yet. Doing this reduces the time spent later on comparing strings in the parser.
example:
struct Token
{
TokenType _type;
wxString _word;
};
A lot of simple operators, parentheses, semicolons, commas and braces (the most common tokens in most source code) can be skipped, by avoiding strcmp() type operations that can be reduced to comparing an integer.
Just to say: ollydbg is spot on, as far as I can see.
Here is my code test:
void MyFunction(int paraA, float paramB)
{
for(int index = 0; Foo* foofoo = Something.getFooByIndex(index); ++index)
{
// let's do something with foofoo
foofoo->DoSomething();
int i;
i++;
}
//type + variable
for(int a=0;a<10;a++)
{
int i;
i++;
};
for(NS::MyClass a=0;a<100;a++)
{
int i;
i++;
};
// type containing some template info
for(MyNameSpace::MyTempLateClass<X,Y> a=0;a.DoSomething>b;a++)
a++;
// pointer declaration
for(int *a=0;a<0x4444;a = a+4)
;
for(int **a=0;a<0x4444;a = a+4)
;
// two variables
for(int *a=0, b=0;...)
;
}
and here is the result:
function MyFunction 1:6
for 3:5
variable index 3:13
variable foofoo 3:29
variable i 7:12
for 12:5
variable a 12:13
variable i 14:12
for 18:5
variable a 18:21
variable i 20:12
for 25:5
variable a 25:43
for 29:5
variable a 29:14
for 31:5
variable a 31:15
for 35:5
variable a 35:14
the xxx:xxx showing a symbol position by line:column.
But I think my parser is still not mature. :D, it need a long time and long way to go :D