Author Topic: Codecompletion parser bug on treating comments (Read 13402 times)

ollydbg · « **on:** March 09, 2009, 08:49:58 am »

I found that the parser stopped parsing in the comment blocks, and only two variables( int a, int b) were recognized.

Code

int a;
int b;

///remove "//" or "/*" blocks

int c;
int d;

int main()
{
    cout << "Hello world!" << endl;
    return 0;
}

thanks

MortenMacFly · « **Reply #1 on:** March 09, 2009, 12:25:36 pm »

Quote from: ollydbg on March 09, 2009, 08:49:58 am

I found that the parser stopped parsing in the comment blocks, and only two variables( int a, int b) were recognized.

That is correct. C::B recognises the /* as beginning of a multi-line comment which never ends.

If you find a way how to *fast* determine whether you are inside a string or not I am pleased to apply that. I believe there is no way to handle such without a certain slowdown.

BTW: Are you aware that a compiler will issue a warning in such cases about a multi-line comment inside a comment? There is a good reason it will do... ;-)

Jenna · « **Reply #2 on:** March 09, 2009, 05:13:07 pm »

Quote from: MortenMacFly on March 09, 2009, 12:25:36 pm

Quote from: ollydbg on March 09, 2009, 08:49:58 am
I found that the parser stopped parsing in the comment blocks, and only two variables( int a, int b) were recognized.
That is correct. C::B recognises the /* as beginning of a multi-line comment which never ends.

Why do we search for nested braces or other comments, while we are skipping to EndOfLine indside a comment ?.

As far as I know, there are only two things that can end a c++-style comment: a newline or EOF.

In other words, what about just "eating" all chars until EOL or EOF?

A patch can look like this:

Code

Index: src/plugins/codecompletion/parser/tokenizer.cpp
===================================================================
--- src/plugins/codecompletion/parser/tokenizer.cpp	(Revision 5482)
+++ src/plugins/codecompletion/parser/tokenizer.cpp	(Arbeitskopie)
@@ -279,18 +279,21 @@
     {
         while (NotEOF() && CurrentChar() != '\n')
         {
-            if (CurrentChar() == '/' && NextChar() == '*')
+            if(!skippingComment)
             {
-                SkipComment(false); // don't skip whitespace after the comment
-                if (skippingComment && CurrentChar() == '\n')
+                if (CurrentChar() == '/' && NextChar() == '*')
                 {
-                    continue; // early exit from the loop
+                    SkipComment(false); // don't skip whitespace after the comment
+                    if (skippingComment && CurrentChar() == '\n')
+                    {
+                        continue; // early exit from the loop
+                    }
                 }
+                if (nestBraces && CurrentChar() == _T('{'))
+                    ++m_NestLevel;
+                else if (nestBraces && CurrentChar() == _T('}'))
+                    --m_NestLevel;
             }
-            if (nestBraces && CurrentChar() == _T('{'))
-                ++m_NestLevel;
-            else if (nestBraces && CurrentChar() == _T('}'))
-                --m_NestLevel;
             MoveToNextChar();
         }
         wxChar last = PreviousChar();

The patch looks a little bit "unclear", but this is how TortoiseSVN handles changed indendation, at least on my XP.
In fact I only added one if-clause with two braces.

If I have overseen or totally missing something, please correct me.

dje · « **Reply #3 on:** March 09, 2009, 05:31:24 pm »

Shouldn't it be

Code

while (NotEOF() && CurrentChar() != '\n' && CurrentChar() != '\r')

for Mac users ?

Dje

MortenMacFly · « **Reply #4 on:** March 10, 2009, 07:03:36 am »

Quote from: jens on March 09, 2009, 05:13:07 pm

In other words, what about just "eating" all chars until EOL or EOF?

Nope - won't work. Consider this:

Code

void MyFun(bool myParam /* = true */, int MyOtherParam /* = 0 */)
{
  int a /* could be b */ = 1; /* probably 0 */
  int b; /* Descr:
           * Nice!
           */ return;
  string "hello
            world";
}

...unless I am missing something...
(Will try the patch though...)

MortenMacFly · « **Reply #5 on:** March 10, 2009, 07:08:38 am »

Something else that came into my mind just by now:
Why don't we kind of "pre-process" the buffer before CC analyses it in term of removing comments completely. I mean: Commented stuff is just useless for CC (unless we want to consider using Doxygen comments or alike) and probably operating the whole buffer could work with a "simple" RegEx?! In the end we would obsolete a lot of comment checking code.

ollydbg · « **Reply #6 on:** March 10, 2009, 09:40:49 am »

I build a new CC with Jen's patch, and solved my problem.

@MortenMacFly
I do think that some comments should be reserved especially in function declaration. If we do a pre-process, then we will parse a source file twice, which will take more time

.

I'm not sure why the default argument value was stripped, see the screen shot below, I do suggest that the functiontip will show " bool skipWhiteAtEnd = true" .

[attachment deleted by admin]

Jenna · « **Reply #7 on:** March 10, 2009, 10:41:49 am »

Quote from: MortenMacFly on March 10, 2009, 07:03:36 am

Quote from: jens on March 09, 2009, 05:13:07 pm
In other words, what about just "eating" all chars until EOL or EOF?
Nope - won't work. Consider this:
Code
void MyFun(bool myParam /* = true */, int MyOtherParam /* = 0 */)
{
  int a /* could be b */ = 1; /* probably 0 */
  int b; /* Descr:
           * Nice!
           */ return;
  string "hello
            world";
}
...unless I am missing something...
(Will try the patch though...)

It should still work.
We only call SkipToEOL with second parameter skippingComment set to true, if we are in a c++-comment ("//"), and not inside a c-style comment ("/*").

MortenMacFly · « **Reply #8 on:** March 10, 2009, 03:56:49 pm »

Quote from: jens on March 10, 2009, 10:41:49 am

We only call SkipToEOL with second parameter skippingComment set to true, if we are in a c++-comment ("//"), and not inside a c-style comment ("/*").

Quote from: MortenMacFly on March 10, 2009, 07:03:36 am

...unless I am missing something...

:lol: :lol: :lol:

Ceniza · « **Reply #9 on:** March 10, 2009, 07:08:22 pm »

Quote from: ollydbg on March 10, 2009, 09:40:49 am

... If we do a pre-process, then we will parse a source file twice, which will take more time .

Not true. It just divides the parsing into two stages. The preprocessing stage would usually return tokens. You do not need to do much on them to convert them into final tokens to feed a parser. The parser will just read the tag of the tokens indicating if they are strings, integers, identifiers, keywords, etc. In other words, think of the preprocessor as a smart lexer. The current implementation, on the other hand, tries to do everything at the same time. It would be true if the preprocessor just generated a text file. Then, you would have to "tokenize" the whole thing once more.

ollydbg · « **Reply #10 on:** March 11, 2009, 12:55:58 am »

@Ceniza
I do agree with you on the idea of a "smart lexer".

The current implementation of the lexer was located in Tokenizer.h and Tokenizer.cpp, they just return a simple wxString, and no tag information was attached. So, the parser(Syntax Analyzer, I also use this term in the wiki Code Completion design) will do the whole work(most of the syntax analysis was done in parseThread.cpp and parseThread.h)

As a "smart lexer", it will feed parser not only a simple wxString, but an object indication if they are strings, numbers, identifiers. Oh, I nearly forget, the "smart lexer" will also do the preprocessing stage, which is also very important

ollydbg · « **Reply #11 on:** March 11, 2009, 01:13:30 am »

Quote from: MortenMacFly on March 10, 2009, 07:08:38 am

Something else that came into my mind just by now:
Why don't we kind of "pre-process" the buffer before CC analyses it in term of removing comments completely. I mean: Commented stuff is just useless for CC (unless we want to consider using Doxygen comments or alike) and probably operating the whole buffer could work with a "simple" RegEx?! In the end we would obsolete a lot of comment checking code.

Comment checking code is not quite difficult.
I checked the source code. Each time we call Tokenzier::DoGetToken(), it will first try to strip the comment

Code

wxString Tokenizer::DoGetToken()
{
    if (IsEOF())
        return wxEmptyString;

    if (!SkipWhiteSpace())
        return wxEmptyString;

    if (m_SkipUnwantedTokens && !SkipUnwanted())       // ****************Here
        return wxEmptyString;

    // if m_SkipUnwantedTokens is false, we need to handle comments here too
    if (!m_SkipUnwantedTokens)
        SkipComment();                                 //*****************Here
    ........

If m_SkipUnwantedTokens is true (which is a normal situation), then the SkipComment() will be called in the SkipUnwanted() function.

If m_SkipUnwantedTokens is false (which means we are in a special situation, such as eating the argument of a templates, we shouldn't call SkipUnwanted(), then SkipComment() will be called manually).

With all the steps before, I think Comment can be stripped quite well.

Am I right? If wrong, please correct me. Thank you!

Code::Blocks Forums

News:

Author Topic: Codecompletion parser bug on treating comments (Read 13402 times)

ollydbg

Codecompletion parser bug on treating comments

MortenMacFly

Re: Codecompletion parser bug on treating comments

Jenna

Re: Codecompletion parser bug on treating comments

dje

Re: Codecompletion parser bug on treating comments

MortenMacFly

Re: Codecompletion parser bug on treating comments

MortenMacFly

Re: Codecompletion parser bug on treating comments

ollydbg

Re: Codecompletion parser bug on treating comments

Jenna

Re: Codecompletion parser bug on treating comments

MortenMacFly

Re: Codecompletion parser bug on treating comments

Ceniza

Re: Codecompletion parser bug on treating comments

ollydbg

Re: Codecompletion parser bug on treating comments

ollydbg

Re: Codecompletion parser bug on treating comments