Code::Blocks Forums

Developer forums (C::B DEVELOPMENT STRICTLY!) => Plugins development => Topic started by: killerbot on April 20, 2011, 03:52:15 pm

Title: CodeCompletion plugin
Post by: killerbot on April 20, 2011, 03:52:15 pm
Something I just remembered.

A question to our CodeCompletion developers : with all the improvements in place, how difficult is it to have the following working correctly [no completion, nor type tooltips on certain scopes : scopes where the declaration occurs in the if/for, ...]

Examples :

Code
TiXmlHandle handle;
if(TiXmlElement* foo = handle.ToElement())
{
    foo->doSomething();
}

for(int index =0 ; index < 10; ++index)
{
    int bar = index;
}


==> so type information and completion on : index,  foo ...
Title: Re: CodeCompletion plugin
Post by: ollydbg on April 20, 2011, 04:54:15 pm
I know the current feature: if the parser meets a keyword (if or for or while), it just skip the whole body. and the parentheses behind these keyword were skipped too.
Is it possible that we just add these auto variables to the top-level function body? so they can be deleted when edit caret changed to another function body.
Title: Re: CodeCompletion plugin
Post by: killerbot on April 20, 2011, 08:39:21 pm
I know the current feature: if the parser meets a keyword (if or for or while), it just skip the whole body. and the parentheses behind these keyword were skipped too.
Is it possible that we just add these auto variables to the top-level function body? so they can be deleted when edit caret changed to another function body.
Does it skip the entire body ? I think not, local variables declared in the body do work, right ?
So basically, things between the "()" of an if, for, while, could be seen as a set of statements, all on the same line, and be treated scope wise as if they were inside the body ?
This sounds easy as an algorithm, but I guess reality is much more complicated ?
Title: Re: CodeCompletion plugin
Post by: ollydbg on April 21, 2011, 03:16:52 am
Ok, You are right.
I find the logic:
Code
            if (token == ParserConsts::kw_for)
            {
                if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
                    SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
                else
                    m_Tokenizer.GetToken(); //skip args
                m_Str.Clear();
            }

We say, when we are parsing the function body to collect the auto variables, we has the option:

Code
m_Options.useBuffer==true
m_Options.bufferSkipBlocks==false

So, finally
Code
m_Tokenizer.GetToken(); //skip args
will be called.

The proposed way was:
Code
            if (token == ParserConsts::kw_for)
            {
                if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
                    SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
                else
                    //m_Tokenizer.GetToken(); //skip args
                    GetAutoVariable();
                m_Str.Clear();
            }

Well, the function GetAutoVariable() will read the args in the next parentheses, and catch the variables in you cases. Oh, I think Reading the auto variables is much LIKE reading the function arguments.

Any logic error???

Title: Re: CodeCompletion plugin
Post by: MortenMacFly on April 21, 2011, 07:14:46 am
Code
                    GetAutoVariable();
What is an "auto variable"? :shock:
Title: Re: CodeCompletion plugin
Post by: ollydbg on April 21, 2011, 07:38:29 am
What is an "auto variable"? :shock:
Oh, sorry, should be: local variable, or automatic variable
http://en.wikipedia.org/wiki/Automatic_variable
 :D

Both the cases in killerbot's example is local variables.
Title: Re: CodeCompletion plugin
Post by: killerbot on April 21, 2011, 07:38:34 am
it's like reading the function arguments, with the difference you can encounter also assignments and regular statements [eg ++foo]
Title: Re: CodeCompletion plugin
Post by: ollydbg on April 24, 2011, 06:42:40 am
there are many possibilities usage like:
Code
//type + variable
for(int a=0;.....)
for(NS::MyClass a=0;...)

// type containing some template info
for(MyNameSpace::MyTempLateClass<X,Y> a=0;...)

// pointer declaration
for(int *a=0;...)
for(int **a=0;...)

// two variables
for(int *a=0, b=0;...)
It is a bit complex. :D

BTW: The currently tokenizer even can't distinguish "+" and "++" (Morten's latest patch seems try to do a workaround in the parserthread). I think we need a "type id bundled return token" instead a pure wxString token.

Title: Re: CodeCompletion plugin
Post by: killerbot on April 24, 2011, 07:22:15 am
yes, but I would suggest to support them one by one, increasing the complexity.
But all your examples are things that are also possible on a regular line. That's why I had the idea 'to mimic' as if those lines are inside the for loop, and have the parser parse them there (scope wise that's the correct behavior), line number wise one  has to remember they are a few lines up.


By the way, don't forget this one ;-)

Code
for(int index = 0; Foo* foofoo = Something.getFooByIndex(index); ++index)
{
   // let's do something with foofoo
}
Title: Re: CodeCompletion plugin
Post by: ollydbg on April 30, 2011, 05:10:09 pm
I'm thinking and doing some experiments, I would like to reuse the code, so, look:
(http://i683.photobucket.com/albums/vv194/ollydbg_cb/2011-04-30225041.png)
Here, DoParse() is our conventionalmethod to correct Symbols.
When we were handling "for" statement, when we meet a "(", we can recursively call another DoParse(), then if it meets an unbalanced ")", it just returned. Next, if it is a "{", we just do the same thing, but the DoParse() returned at an unbalanced "}".

It is the same thing as we parse the class declaration like
Code
class MyClass
{
   int m_a;
   int m_b;
}
Here, DoParse() will be called when we try to read the class members.

It works quite well in my quex parser project, the code snippet looks like:
Code
void ParserThread::HandleForWhile()
{
    ConsumeToken(); //eat for or while key word
    ConsumeToken(); //eat the left parenthesis
    PushContext();  //save the old context
    m_Context.EndStatement();
    DoParse();      // do a parse, and should returned on an unbalanced right parenthesis
    PopContext();   // restore the old context

    RawToken * tok = PeekToken();
    if(tok->type_id()==TKN_L_BRACE)
    {
        ConsumeToken(); //eat {
        PushContext();  //save the old context
        m_Context.EndStatement();
        DoParse();      // do a parse, and should returned on an unbalanced right brace
        PopContext();   // restore the old context
    }
    else
        SkipStatementBlock();

}
As using quex lexer, parsing is much easier than the current implementation.  :D
Title: Re: CodeCompletion plugin
Post by: killerbot on April 30, 2011, 06:21:09 pm
sounds good. Long live our CC/parsing experts :-)
Title: Re: CodeCompletion plugin
Post by: oBFusCATed on April 30, 2011, 06:56:54 pm
As using quex lexer, parsing is much easier than the current implementation.  :D
I see no patch that proves it will be good for the CC in C::B  :lol:
Title: Re: CodeCompletion plugin
Post by: ollydbg on May 01, 2011, 04:27:44 am
@oBFusCATed (http://forums.codeblocks.org/index.php?action=profile;u=1071)
There is no such patch, because I use another kind of Token.

CC's currently Token (Tokenizer class can supply) is just a wxString, so Token comparation is not quite good.
code snippet in DoParse() looks like below: Note: the Tokenizer has a hand-written lexer, which just return a lexeme ( a wxString ) with out Type ID information. comparation on strings is not quite good, we first do a switch on the token's length, then compared on text again.

Code
case 6:
            if (token == ParserConsts::kw_delete)
            {
                m_Str.Clear();
                SkipToOneOfChars(ParserConsts::semicolonclbrace);
            }
            else if (token == ParserConsts::kw_switch)
            {
                if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
                    SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
                else
                    m_Tokenizer.GetToken(); //skip args
                m_Str.Clear();
            }
            else if (token == ParserConsts::kw_return)
            {
                SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
                m_Str.Clear();
            }
            else if (token == ParserConsts::kw_extern)
...

In my implementation, Token class has more precise information. The Token class is briefly like: (Quex lexer takes the work to fill these information) So, if it is an identifier, its text field will take the actual lexeme string, but if it is a keyword or a punctuation, it just need an type ID, and its text can be empty.
Code
class Token
{
    int type_id;
    string text;
    int line_number;
    int column_number;
}

So, In my implementation, I use code like below:
Code
 while (true)
    {
        RawToken* tk = PeekToken();

        switch (tk->type_id())
        {
        case TKN_L_BRACE: //{
        {
            SkipBrace();
            break;
        }
        case TKN_R_BRACE: //}
        {
            // the only time we get to find a } is when recursively called by e.g. HandleClass
            // we have to return now...
            cout<<"DoParse(): return from"<<*tk<<tk->line_number()<<":"<<tk->column_number()<<endl;
            ConsumeToken();
            return;
        }
        case TKN_R_PAREN: //)
        {
            cout<<"DoParse(): return from"<<*tk<<tk->line_number()<<":"<<tk->column_number()<<endl;
            ConsumeToken();
            return;
        }
        case TKN_L_PAREN :       // (
        {
            SkipParentheses();
            break;
        }
        case TKN_FOR:
        case TKN_WHILE:
        {
            TRACE("handling for or while block");
            HandleForWhile();
        }
.....
You can see: I can compare on type ID to distinguish different Tokens. So, it just do int value comparation instead string comparation. Also, the Token can supply both line/column information.

I also use some layers from parserthread->preprocessor->tokenizer, cc's current implementation do preprocess and parse in one class layer, which makes the code hard to read and maintain. :D

I'd like to say, if we need to adopt a new parser, we should change code a lot a lot...
Title: Re: CodeCompletion plugin
Post by: ptDev on May 01, 2011, 04:35:06 pm
@oBFusCATed (http://forums.codeblocks.org/index.php?action=profile;u=1071)
There is no such patch, because I use another kind of Token.

CC's currently Token (Tokenizer class can supply) is just a wxString, so Token comparation is not quite good.

[..]

I'd like to say, if we need to adopt a new parser, we should change code a lot a lot...

Please forgive my intrusion.

I am working on parser for D for a project of my own, and have too concluded that tokens need an initial classification both for better efficiency and better preparation for the semantical analysis. Outputting just strings may be handy as an initial approach and sound like a good idea at first, but some form of "predigestion" is very useful.

Basically, my "tokenizer" (in my case, the class is called Scanner) preliminarily classifies certain tokens such as braces, parenthesis, operators, etc. through an enum, and only stores the string in the case of a "word token". Note that it is not necessary to distinguish between keywords and symbols at this stage yet. Doing this reduces the time spent later on comparing strings in the parser.

example:
Code
struct Token
{
   TokenType _type;
   wxString _word;
};

A lot of simple operators, parentheses, semicolons, commas and braces (the most common tokens in most source code) can be skipped, by avoiding strcmp() type operations that can be reduced to comparing an integer.


Just to say: ollydbg is spot on, as far as I can see.
Title: Re: CodeCompletion plugin
Post by: ollydbg on May 01, 2011, 05:10:16 pm
Quote
Note that it is not necessary to distinguish between keywords and symbols at this stage yet. Doing this reduces the time spent later on comparing strings in the parser.
thanks for the reply.

BTW: I need to say some words about your idea.
For a fixed keyword group, I think a DFA in lexer can be much faster. :D , here are my observations.

1, most compilers' lexer did the same way you said(gcc, clang), I think it is used for flexibility and I think it is not the most fastest way to do scanning.  :D
e.g. gcc have to support many different c/c++/object c languages, and different languages has different keywords definitions. So, When the lexer get a "word token", the parser will later check in a symbol table to see whether a "word token" is a keyword in the language or a general identifier.
Usually, this symbol table is a hashtable, so search the "word token" is quite fast.

2, For my implementation, I use quex lexer generator, and it is internally generate a DFA(code directed, which is much faster then table driven lexer like flex), and as my Parser is definitely a C++ parser, So, it have a fixed keywords definition which can be defined in the lexer grammar. So, the lexer can distinguish a c++ keyword and a general identifier.
When it meets a keyword, it just return a type id (int value), and no text is needed, this can avoid the hashtable search stage.

From my point of view, this way should be more faster, the disadvantage is that the DFA is fixed after its generation, and it can't vary dynamically. e.g. I can't dynamically let the quex generated lexer to identify a new added keyword in the run time.


Title: Re: CodeCompletion plugin
Post by: ptDev on May 01, 2011, 05:19:10 pm
Quote
Note that it is not necessary to distinguish between keywords and symbols at this stage yet. Doing this reduces the time spent later on comparing strings in the parser.
thanks for the reply.

BTW: I need to say some words about your idea.

[...]

So, it have a fixed keywords definition which can be defined in the lexer grammar. So, the lexer can distinguish a c++ keyword and a general identifier.
When it meets a keyword, it just return a type id (int value), and no text is needed, this can avoid the hashtable search stage.

From my point of view, this way should be more faster.

Searching a wxString in a std::map<wxString,int> would be pretty fast, yes.
And the least strings needed for semantical analysis, the better, of course :)
Title: Re: CodeCompletion plugin
Post by: rickg22 on May 01, 2011, 09:32:05 pm
I recently did a custom-language parser done in javascript (to make some javascript templates). I came to the same conclusion: Replacing string tokens with token ids is much, much faster. As a former CC dev, I applaud this initiative :)
Title: Re: CodeCompletion plugin
Post by: ollydbg on May 03, 2011, 10:21:52 am
Here is my code test:
Code
void MyFunction(int paraA, float paramB)
{
    for(int index = 0; Foo* foofoo = Something.getFooByIndex(index); ++index)
    {
       // let's do something with foofoo
       foofoo->DoSomething();
       int i;
       i++;
    }
    
    //type + variable
    for(int a=0;a<10;a++)
    {
       int i;
       i++;
    };
    
    for(NS::MyClass a=0;a<100;a++)
    {
       int i;
       i++;
    };

    // type containing some template info
    for(MyNameSpace::MyTempLateClass<X,Y> a=0;a.DoSomething>b;a++)
        a++;

    // pointer declaration
    for(int *a=0;a<0x4444;a = a+4)
        ;
    for(int **a=0;a<0x4444;a = a+4)
        ;

    // two variables
    for(int *a=0, b=0;...)
;
}

and here is the result:
Code
function MyFunction 1:6
   for  3:5
      variable index 3:13
      variable foofoo 3:29
      variable i 7:12
   for  12:5
      variable a 12:13
      variable i 14:12
   for  18:5
      variable a 18:21
      variable i 20:12
   for  25:5
      variable a 25:43
   for  29:5
      variable a 29:14
   for  31:5
      variable a 31:15
   for  35:5
      variable a 35:14

the xxx:xxx showing a symbol position by line:column.

But I think my parser is still not mature. :D, it need a long time and long way to go :D