Author Topic: CodeCompletion plugin  (Read 15203 times)

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
CodeCompletion plugin
« on: April 20, 2011, 03:52:15 pm »
Something I just remembered.

A question to our CodeCompletion developers : with all the improvements in place, how difficult is it to have the following working correctly [no completion, nor type tooltips on certain scopes : scopes where the declaration occurs in the if/for, ...]

Examples :

Code
TiXmlHandle handle;
if(TiXmlElement* foo = handle.ToElement())
{
    foo->doSomething();
}

for(int index =0 ; index < 10; ++index)
{
    int bar = index;
}


==> so type information and completion on : index,  foo ...

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: CodeCompletion plugin
« Reply #1 on: April 20, 2011, 04:54:15 pm »
I know the current feature: if the parser meets a keyword (if or for or while), it just skip the whole body. and the parentheses behind these keyword were skipped too.
Is it possible that we just add these auto variables to the top-level function body? so they can be deleted when edit caret changed to another function body.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
Re: CodeCompletion plugin
« Reply #2 on: April 20, 2011, 08:39:21 pm »
I know the current feature: if the parser meets a keyword (if or for or while), it just skip the whole body. and the parentheses behind these keyword were skipped too.
Is it possible that we just add these auto variables to the top-level function body? so they can be deleted when edit caret changed to another function body.
Does it skip the entire body ? I think not, local variables declared in the body do work, right ?
So basically, things between the "()" of an if, for, while, could be seen as a set of statements, all on the same line, and be treated scope wise as if they were inside the body ?
This sounds easy as an algorithm, but I guess reality is much more complicated ?

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: CodeCompletion plugin
« Reply #3 on: April 21, 2011, 03:16:52 am »
Ok, You are right.
I find the logic:
Code
            if (token == ParserConsts::kw_for)
            {
                if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
                    SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
                else
                    m_Tokenizer.GetToken(); //skip args
                m_Str.Clear();
            }

We say, when we are parsing the function body to collect the auto variables, we has the option:

Code
m_Options.useBuffer==true
m_Options.bufferSkipBlocks==false

So, finally
Code
m_Tokenizer.GetToken(); //skip args
will be called.

The proposed way was:
Code
            if (token == ParserConsts::kw_for)
            {
                if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
                    SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
                else
                    //m_Tokenizer.GetToken(); //skip args
                    GetAutoVariable();
                m_Str.Clear();
            }

Well, the function GetAutoVariable() will read the args in the next parentheses, and catch the variables in you cases. Oh, I think Reading the auto variables is much LIKE reading the function arguments.

Any logic error???

« Last Edit: April 21, 2011, 03:20:44 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: CodeCompletion plugin
« Reply #4 on: April 21, 2011, 07:14:46 am »
Code
                    GetAutoVariable();
What is an "auto variable"? :shock:
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: CodeCompletion plugin
« Reply #5 on: April 21, 2011, 07:38:29 am »
What is an "auto variable"? :shock:
Oh, sorry, should be: local variable, or automatic variable
http://en.wikipedia.org/wiki/Automatic_variable
 :D

Both the cases in killerbot's example is local variables.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
Re: CodeCompletion plugin
« Reply #6 on: April 21, 2011, 07:38:34 am »
it's like reading the function arguments, with the difference you can encounter also assignments and regular statements [eg ++foo]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: CodeCompletion plugin
« Reply #7 on: April 24, 2011, 06:42:40 am »
there are many possibilities usage like:
Code
//type + variable
for(int a=0;.....)
for(NS::MyClass a=0;...)

// type containing some template info
for(MyNameSpace::MyTempLateClass<X,Y> a=0;...)

// pointer declaration
for(int *a=0;...)
for(int **a=0;...)

// two variables
for(int *a=0, b=0;...)
It is a bit complex. :D

BTW: The currently tokenizer even can't distinguish "+" and "++" (Morten's latest patch seems try to do a workaround in the parserthread). I think we need a "type id bundled return token" instead a pure wxString token.

If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
Re: CodeCompletion plugin
« Reply #8 on: April 24, 2011, 07:22:15 am »
yes, but I would suggest to support them one by one, increasing the complexity.
But all your examples are things that are also possible on a regular line. That's why I had the idea 'to mimic' as if those lines are inside the for loop, and have the parser parse them there (scope wise that's the correct behavior), line number wise one  has to remember they are a few lines up.


By the way, don't forget this one ;-)

Code
for(int index = 0; Foo* foofoo = Something.getFooByIndex(index); ++index)
{
   // let's do something with foofoo
}

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: CodeCompletion plugin
« Reply #9 on: April 30, 2011, 05:10:09 pm »
I'm thinking and doing some experiments, I would like to reuse the code, so, look:

Here, DoParse() is our conventionalmethod to correct Symbols.
When we were handling "for" statement, when we meet a "(", we can recursively call another DoParse(), then if it meets an unbalanced ")", it just returned. Next, if it is a "{", we just do the same thing, but the DoParse() returned at an unbalanced "}".

It is the same thing as we parse the class declaration like
Code
class MyClass
{
   int m_a;
   int m_b;
}
Here, DoParse() will be called when we try to read the class members.

It works quite well in my quex parser project, the code snippet looks like:
Code
void ParserThread::HandleForWhile()
{
    ConsumeToken(); //eat for or while key word
    ConsumeToken(); //eat the left parenthesis
    PushContext();  //save the old context
    m_Context.EndStatement();
    DoParse();      // do a parse, and should returned on an unbalanced right parenthesis
    PopContext();   // restore the old context

    RawToken * tok = PeekToken();
    if(tok->type_id()==TKN_L_BRACE)
    {
        ConsumeToken(); //eat {
        PushContext();  //save the old context
        m_Context.EndStatement();
        DoParse();      // do a parse, and should returned on an unbalanced right brace
        PopContext();   // restore the old context
    }
    else
        SkipStatementBlock();

}
As using quex lexer, parsing is much easier than the current implementation.  :D
« Last Edit: May 01, 2011, 04:03:09 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5490
Re: CodeCompletion plugin
« Reply #10 on: April 30, 2011, 06:21:09 pm »
sounds good. Long live our CC/parsing experts :-)

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: CodeCompletion plugin
« Reply #11 on: April 30, 2011, 06:56:54 pm »
As using quex lexer, parsing is much easier than the current implementation.  :D
I see no patch that proves it will be good for the CC in C::B  :lol:
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: CodeCompletion plugin
« Reply #12 on: May 01, 2011, 04:27:44 am »
@oBFusCATed
There is no such patch, because I use another kind of Token.

CC's currently Token (Tokenizer class can supply) is just a wxString, so Token comparation is not quite good.
code snippet in DoParse() looks like below: Note: the Tokenizer has a hand-written lexer, which just return a lexeme ( a wxString ) with out Type ID information. comparation on strings is not quite good, we first do a switch on the token's length, then compared on text again.

Code
case 6:
            if (token == ParserConsts::kw_delete)
            {
                m_Str.Clear();
                SkipToOneOfChars(ParserConsts::semicolonclbrace);
            }
            else if (token == ParserConsts::kw_switch)
            {
                if (!m_Options.useBuffer || m_Options.bufferSkipBlocks)
                    SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
                else
                    m_Tokenizer.GetToken(); //skip args
                m_Str.Clear();
            }
            else if (token == ParserConsts::kw_return)
            {
                SkipToOneOfChars(ParserConsts::semicolonclbrace, true);
                m_Str.Clear();
            }
            else if (token == ParserConsts::kw_extern)
...

In my implementation, Token class has more precise information. The Token class is briefly like: (Quex lexer takes the work to fill these information) So, if it is an identifier, its text field will take the actual lexeme string, but if it is a keyword or a punctuation, it just need an type ID, and its text can be empty.
Code
class Token
{
    int type_id;
    string text;
    int line_number;
    int column_number;
}

So, In my implementation, I use code like below:
Code
 while (true)
    {
        RawToken* tk = PeekToken();

        switch (tk->type_id())
        {
        case TKN_L_BRACE: //{
        {
            SkipBrace();
            break;
        }
        case TKN_R_BRACE: //}
        {
            // the only time we get to find a } is when recursively called by e.g. HandleClass
            // we have to return now...
            cout<<"DoParse(): return from"<<*tk<<tk->line_number()<<":"<<tk->column_number()<<endl;
            ConsumeToken();
            return;
        }
        case TKN_R_PAREN: //)
        {
            cout<<"DoParse(): return from"<<*tk<<tk->line_number()<<":"<<tk->column_number()<<endl;
            ConsumeToken();
            return;
        }
        case TKN_L_PAREN :       // (
        {
            SkipParentheses();
            break;
        }
        case TKN_FOR:
        case TKN_WHILE:
        {
            TRACE("handling for or while block");
            HandleForWhile();
        }
.....
You can see: I can compare on type ID to distinguish different Tokens. So, it just do int value comparation instead string comparation. Also, the Token can supply both line/column information.

I also use some layers from parserthread->preprocessor->tokenizer, cc's current implementation do preprocess and parse in one class layer, which makes the code hard to read and maintain. :D

I'd like to say, if we need to adopt a new parser, we should change code a lot a lot...
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ptDev

  • Almost regular
  • **
  • Posts: 222
Re: CodeCompletion plugin
« Reply #13 on: May 01, 2011, 04:35:06 pm »
@oBFusCATed
There is no such patch, because I use another kind of Token.

CC's currently Token (Tokenizer class can supply) is just a wxString, so Token comparation is not quite good.

[..]

I'd like to say, if we need to adopt a new parser, we should change code a lot a lot...

Please forgive my intrusion.

I am working on parser for D for a project of my own, and have too concluded that tokens need an initial classification both for better efficiency and better preparation for the semantical analysis. Outputting just strings may be handy as an initial approach and sound like a good idea at first, but some form of "predigestion" is very useful.

Basically, my "tokenizer" (in my case, the class is called Scanner) preliminarily classifies certain tokens such as braces, parenthesis, operators, etc. through an enum, and only stores the string in the case of a "word token". Note that it is not necessary to distinguish between keywords and symbols at this stage yet. Doing this reduces the time spent later on comparing strings in the parser.

example:
Code
struct Token
{
   TokenType _type;
   wxString _word;
};

A lot of simple operators, parentheses, semicolons, commas and braces (the most common tokens in most source code) can be skipped, by avoiding strcmp() type operations that can be reduced to comparing an integer.


Just to say: ollydbg is spot on, as far as I can see.
« Last Edit: May 01, 2011, 04:39:36 pm by ptDev »

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: CodeCompletion plugin
« Reply #14 on: May 01, 2011, 05:10:16 pm »
Quote
Note that it is not necessary to distinguish between keywords and symbols at this stage yet. Doing this reduces the time spent later on comparing strings in the parser.
thanks for the reply.

BTW: I need to say some words about your idea.
For a fixed keyword group, I think a DFA in lexer can be much faster. :D , here are my observations.

1, most compilers' lexer did the same way you said(gcc, clang), I think it is used for flexibility and I think it is not the most fastest way to do scanning.  :D
e.g. gcc have to support many different c/c++/object c languages, and different languages has different keywords definitions. So, When the lexer get a "word token", the parser will later check in a symbol table to see whether a "word token" is a keyword in the language or a general identifier.
Usually, this symbol table is a hashtable, so search the "word token" is quite fast.

2, For my implementation, I use quex lexer generator, and it is internally generate a DFA(code directed, which is much faster then table driven lexer like flex), and as my Parser is definitely a C++ parser, So, it have a fixed keywords definition which can be defined in the lexer grammar. So, the lexer can distinguish a c++ keyword and a general identifier.
When it meets a keyword, it just return a type id (int value), and no text is needed, this can avoid the hashtable search stage.

From my point of view, this way should be more faster, the disadvantage is that the DFA is fixed after its generation, and it can't vary dynamically. e.g. I can't dynamically let the quex generated lexer to identify a new added keyword in the run time.


« Last Edit: May 01, 2011, 05:16:08 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.