Author Topic: Codecompletion parser bug on treating comments  (Read 11163 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Codecompletion parser bug on treating comments
« on: March 09, 2009, 08:49:58 am »
I found that the parser stopped parsing in the comment blocks, and only two variables( int a, int b) were recognized.
Code
int a;
int b;

///remove "//" or "/*" blocks

int c;
int d;

int main()
{
    cout << "Hello world!" << endl;
    return 0;
}


thanks :D

« Last Edit: March 09, 2009, 12:08:06 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: Codecompletion parser bug on treating comments
« Reply #1 on: March 09, 2009, 12:25:36 pm »
I found that the parser stopped parsing in the comment blocks, and only two variables( int a, int b) were recognized.
That is correct. C::B recognises the /* as beginning of a multi-line comment which never ends.

If you find a way how to *fast* determine whether you are inside a string or not I am pleased to apply that. I believe there is no way to handle such without a certain slowdown.

BTW: Are you aware that a compiler will issue a warning in such cases about a multi-line comment inside a comment? There is a good reason it will do... ;-)
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline Jenna

  • Administrator
  • Lives here!
  • *****
  • Posts: 7255
Re: Codecompletion parser bug on treating comments
« Reply #2 on: March 09, 2009, 05:13:07 pm »
I found that the parser stopped parsing in the comment blocks, and only two variables( int a, int b) were recognized.
That is correct. C::B recognises the /* as beginning of a multi-line comment which never ends.

Why do we search for nested braces or other comments, while we are skipping to EndOfLine indside a comment ?.

As far as I know, there are only two things that can end a c++-style comment: a newline or EOF.

In other words, what about just "eating" all chars until EOL or EOF?

A patch can look like this:
Code
Index: src/plugins/codecompletion/parser/tokenizer.cpp
===================================================================
--- src/plugins/codecompletion/parser/tokenizer.cpp (Revision 5482)
+++ src/plugins/codecompletion/parser/tokenizer.cpp (Arbeitskopie)
@@ -279,18 +279,21 @@
     {
         while (NotEOF() && CurrentChar() != '\n')
         {
-            if (CurrentChar() == '/' && NextChar() == '*')
+            if(!skippingComment)
             {
-                SkipComment(false); // don't skip whitespace after the comment
-                if (skippingComment && CurrentChar() == '\n')
+                if (CurrentChar() == '/' && NextChar() == '*')
                 {
-                    continue; // early exit from the loop
+                    SkipComment(false); // don't skip whitespace after the comment
+                    if (skippingComment && CurrentChar() == '\n')
+                    {
+                        continue; // early exit from the loop
+                    }
                 }
+                if (nestBraces && CurrentChar() == _T('{'))
+                    ++m_NestLevel;
+                else if (nestBraces && CurrentChar() == _T('}'))
+                    --m_NestLevel;
             }
-            if (nestBraces && CurrentChar() == _T('{'))
-                ++m_NestLevel;
-            else if (nestBraces && CurrentChar() == _T('}'))
-                --m_NestLevel;
             MoveToNextChar();
         }
         wxChar last = PreviousChar();

The patch looks a little bit "unclear", but this is how TortoiseSVN handles changed indendation, at least on my XP.
In fact I only added one if-clause with two braces.

If I have overseen or totally missing something, please correct me.

Offline dje

  • Lives here!
  • ****
  • Posts: 683
Re: Codecompletion parser bug on treating comments
« Reply #3 on: March 09, 2009, 05:31:24 pm »
Shouldn't it be
Code
while (NotEOF() && CurrentChar() != '\n' && CurrentChar() != '\r')
for Mac users ?

Dje

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: Codecompletion parser bug on treating comments
« Reply #4 on: March 10, 2009, 07:03:36 am »
In other words, what about just "eating" all chars until EOL or EOF?
Nope - won't work. Consider this:
Code
void MyFun(bool myParam /* = true */, int MyOtherParam /* = 0 */)
{
  int a /* could be b */ = 1; /* probably 0 */
  int b; /* Descr:
           * Nice!
           */ return;
  string "hello
            world";
}
...unless I am missing something...
(Will try the patch though...)
« Last Edit: March 10, 2009, 07:06:15 am by MortenMacFly »
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: Codecompletion parser bug on treating comments
« Reply #5 on: March 10, 2009, 07:08:38 am »
Something else that came into my mind just by now:
Why don't we kind of "pre-process" the buffer before CC analyses it in term of removing comments completely. I mean: Commented stuff is just useless for CC (unless we want to consider using Doxygen comments or alike) and probably operating the whole buffer could work with a "simple" RegEx?! In the end we would obsolete a lot of comment checking code.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: Codecompletion parser bug on treating comments
« Reply #6 on: March 10, 2009, 09:40:49 am »
I build a new CC with Jen's patch, and solved my problem.

@MortenMacFly
I do think that some comments should be reserved especially in function declaration. If we do a pre-process, then we will parse a source file twice, which will take more time :D.

I'm not sure why the default argument value was stripped, see the screen shot below, I do suggest that the functiontip will show " bool skipWhiteAtEnd = true" .


[attachment deleted by admin]
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Jenna

  • Administrator
  • Lives here!
  • *****
  • Posts: 7255
Re: Codecompletion parser bug on treating comments
« Reply #7 on: March 10, 2009, 10:41:49 am »
In other words, what about just "eating" all chars until EOL or EOF?
Nope - won't work. Consider this:
Code
void MyFun(bool myParam /* = true */, int MyOtherParam /* = 0 */)
{
  int a /* could be b */ = 1; /* probably 0 */
  int b; /* Descr:
           * Nice!
           */ return;
  string "hello
            world";
}
...unless I am missing something...
(Will try the patch though...)

It should still work.
We only call SkipToEOL with second parameter skippingComment set to true, if we are in a c++-comment ("//"), and not inside a c-style comment ("/*").

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: Codecompletion parser bug on treating comments
« Reply #8 on: March 10, 2009, 03:56:49 pm »
We only call SkipToEOL with second parameter skippingComment set to true, if we are in a c++-comment ("//"), and not inside a c-style comment ("/*").
...unless I am missing something...
:lol: :lol: :lol:
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline Ceniza

  • Developer
  • Lives here!
  • *****
  • Posts: 1441
    • CenizaSOFT
Re: Codecompletion parser bug on treating comments
« Reply #9 on: March 10, 2009, 07:08:22 pm »
... If we do a pre-process, then we will parse a source file twice, which will take more time :D.

Not true. It just divides the parsing into two stages. The preprocessing stage would usually return tokens. You do not need to do much on them to convert them into final tokens to feed a parser. The parser will just read the tag of the tokens indicating if they are strings, integers, identifiers, keywords, etc. In other words, think of the preprocessor as a smart lexer. The current implementation, on the other hand, tries to do everything at the same time. It would be true if the preprocessor just generated a text file. Then, you would have to "tokenize" the whole thing once more.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: Codecompletion parser bug on treating comments
« Reply #10 on: March 11, 2009, 12:55:58 am »
@Ceniza
I do agree with you on the idea of a "smart lexer".

The current implementation of the lexer was located in Tokenizer.h and Tokenizer.cpp, they just return a simple wxString, and no tag information was attached. So, the parser(Syntax Analyzer, I also use this term in the wiki Code Completion design) will do the whole work(most of the syntax analysis was done in parseThread.cpp and parseThread.h)

As a "smart lexer", it will feed parser not only a simple wxString, but an object indication if they are strings, numbers, identifiers. Oh, I nearly forget, the "smart lexer" will also do the preprocessing stage, which is also very important :D
« Last Edit: March 11, 2009, 12:57:34 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: Codecompletion parser bug on treating comments
« Reply #11 on: March 11, 2009, 01:13:30 am »
Something else that came into my mind just by now:
Why don't we kind of "pre-process" the buffer before CC analyses it in term of removing comments completely. I mean: Commented stuff is just useless for CC (unless we want to consider using Doxygen comments or alike) and probably operating the whole buffer could work with a "simple" RegEx?! In the end we would obsolete a lot of comment checking code.

Comment checking code is not quite difficult.
I checked the source code. Each time we call Tokenzier::DoGetToken(), it will first try to strip the comment

Code
wxString Tokenizer::DoGetToken()
{
    if (IsEOF())
        return wxEmptyString;

    if (!SkipWhiteSpace())
        return wxEmptyString;

    if (m_SkipUnwantedTokens && !SkipUnwanted())       // ****************Here
        return wxEmptyString;

    // if m_SkipUnwantedTokens is false, we need to handle comments here too
    if (!m_SkipUnwantedTokens)
        SkipComment();                                 //*****************Here
    ........

If m_SkipUnwantedTokens is true (which is a normal situation), then the SkipComment() will be called in the SkipUnwanted() function.

If m_SkipUnwantedTokens is false (which means we are in a special situation, such as eating the argument of a templates, we shouldn't call SkipUnwanted(), then SkipComment() will be called manually).

With all the steps before, I think Comment can be stripped quite well. :D Am I right? If wrong, please correct me. Thank you!


If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.