Author Topic: Several improvements to Code Completion plugin (Read 204895 times)

ollydbg · « **Reply #30 on:** September 17, 2013, 03:17:42 am »

Quote from: ollydbg on September 16, 2013, 06:09:39 pm

...
I think I need to update the comments in either HandleDefines and ReadToEOL function now.

When reading the function ReadToEOL, I see it first allocate a wxChar buffer, and then Append to a wxString, firstly I concern this code compile under wx 2.9.x, because wx 2.9.x use native internal buffers for different OS (under Linux, it use UTF8, under Windows, it use wchar_t), luckily, It don't have such issue, the reason is below:

Quote

wxChar is defined to be
char when wxUSE_UNICODE==0
wchar_t when wxUSE_UNICODE==1 (the default).

Also, wxString have a member function:

Quote

wxString & Append (const wchar_t *pwz, size_t nLen)
Appends the wide string literal psz with max length nLen.

ollydbg · « **Reply #31 on:** September 17, 2013, 05:25:07 am »

Quote from: ollydbg on September 16, 2013, 06:09:39 pm

I think I need to update the comments in either HandleDefines and ReadToEOL function now.

Ok, I extract a patch dedicated to ReadToEOL with some of comments added, see below:

Code

From 276d59ccf760ee85a16170db86efa9e362368a0a Mon Sep 17 00:00:00 2001
From: asmwarrior <asmwarrior@gmail.com>
Date: Tue, 17 Sep 2013 11:10:15 +0800
Subject: [PATCH] handle macro handling correctly, distinguish between function
 like macro definition and variable like definition

---
 src/plugins/codecompletion/parser/tokenizer.cpp | 15 ++++++++++++---
 src/plugins/codecompletion/parser/tokenizer.h   |  3 +++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 221d5ac..c0c5bbe 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -438,8 +438,10 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
         wxChar* p = buffer;
         wxString str;
 
+        // loop all the physical lines in reading macro definition
         for (;;)
         {
+            // this while statement end up in a physical EOL '\n'
             while (NotEOF() && CurrentChar() != _T('\n'))
             {
                 while (SkipComment())
@@ -449,7 +451,12 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
                 if (ch == _T('\n'))
                     break;
 
-                if (ch <= _T(' ') && (p == buffer || *(p - 1) == ch))
+                // if we see two spaces in the buffer, we should drop the second one. Note, if the
+                // first char is space, we should always save it to buffer, this is to distinguish
+                // a function/variable like macro definition, e.g.
+                // #define MYMACRO(A)  ...   -> function like macro definition
+                // #define MYMACRO (A)  ...  -> variable like macro definition, note a space before '('
+                if (ch <= _T(' ') && p > buffer && *(p - 1) == ch)
                 {
                     MoveToNextChar();
                     continue;
@@ -475,16 +482,18 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
                 MoveToNextChar();
             }
 
+            // check to see it is a logical EOL, some long macro definition contains a backslash-newline
             if (!IsBackslashBeforeEOL() || IsEOF())
-                break;
+                break; //break the outer for loop
             else
             {
+                //remove the backslash-newline and goto next physical line
                 while (p > buffer && *(--p) <= _T(' '))
                     ;
                 MoveToNextChar();
             }
         }
-
+        // remove the extra spaces in the end of buffer
         while (p > buffer && *(p - 1) <= _T(' '))
             --p;
 
diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index 3999252..dbc8d1b 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -188,6 +188,9 @@ public:
 
     /** return the string from the current position to the end of current line, in most case, this
      * function is used in handling #define, use with care outside this class!
+     * @param nestBraces true if you still need to count the '{' and '}' levels
+     * @param stripUnneeded true if you are going to remove comments and compression spaces(two or
+     * more spaces should become one space)
      */
     wxString ReadToEOL(bool nestBraces = true, bool stripUnneeded = true);
 
-- 
1.8.4.msysgit.0

PS: git is vary good at split commit.

EDIT: Here is the test case
Parsing code:

Code

#define FUNCTION_LIKE_DEFINE(A) (A+B)
#define VARIABLE_LIKE_DEFINE (A) (A+B)

Parsing log:

Code

000001. --------------M-a-i-n--L-o-g--------------


000002. -----------I-n-t-e-r-i-m--L-o-g-----------
000003. InitTokenizer() : m_Filename='C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h', m_FileSize=85.
000004. Init() : m_Filename='C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h'
000005. C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h
000006. Parse() : Parsing 'C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h'
000007. DoParse() : Loop:m_Str='', token='#'
000008. wxString Tokenizer::ReadToEOL(bool, bool) : line=2, CurrentChar='(', PreviousChar='E', NextChar='A', nestBrace(0)
000009. ReadToEOL(): (END) We are now at line 2, CurrentChar='\n', PreviousChar='\r', NextChar='#'
000010. ReadToEOL(): (A) (A+B)
000011. DoAddToken() : Created token='FUNCTION_LIKE_DEFINE', file_idx=1, line=2, ticket=258
000012. GetTokenBaseType() : Searching within m_Str='(A+B)'
000013. GetTokenBaseType() : Compensated m_Str='(A+B)'
000014. GetTokenBaseType() : Found ''
000015. DoAddToken() : Prepending ''
000016. DoAddToken() : Added/updated token 'FUNCTION_LIKE_DEFINE' (0), kind 'preprocessor', type '(A+B)', actual ''. Parent is  (-1)
000017. DoParse() : Loop:m_Str='', token='#'
000018. wxString Tokenizer::ReadToEOL(bool, bool) : line=3, CurrentChar=' ', PreviousChar='E', NextChar='(', nestBrace(0)
000019. ReadToEOL(): (END) We are now at line 3, CurrentChar='\n', PreviousChar='\r', NextChar='\r'
000020. ReadToEOL():  (A) (A+B)
000021. DoAddToken() : Created token='VARIABLE_LIKE_DEFINE', file_idx=1, line=3, ticket=259
000022. GetTokenBaseType() : Searching within m_Str=' (A) (A+B)'
000023. GetTokenBaseType() : Compensated m_Str=' (A) (A+B)'
000024. GetTokenBaseType() : Found ''
000025. DoAddToken() : Prepending ''
000026. DoAddToken() : Added/updated token 'VARIABLE_LIKE_DEFINE' (1), kind 'preprocessor', type ' (A) (A+B)', actual ''. Parent is  (-1)
********************************************************

Looks good to me.

ollydbg · « **Reply #32 on:** September 17, 2013, 07:22:13 am »

About this code snippet:

Code

        // Here, we are in the comment body
        while (true)
        {
            if (cstyle) // C style comment
            {
                //FIX(huki), Moved this from below to avoid taking the same '*' for comment begin and end
                // eg, /*// ... */
                if (!MoveToNextChar())
                    break;

                SkipToChar('/');
                if (PreviousChar() == '*') // end of a C style comment
                {
                    MoveToNextChar();
                    break;
                }
            }
            else        // C++ style comment
            {
                TRACE(_T("SkipComment() : Need to call SkipToInlineCommentEnd() here at line = %u"), m_LineNumber);
                SkipToInlineCommentEnd();
                break;
            }
        }

When begin, we have the index:

Code

/*// ... */
  ^

If you run MoveToNextChar() first, you are now deliberately skip one character, and you are here in:

Code

/*// ... */
   ^

Although this code did work correctly, but I think we can have a better method.
Method1: SkipToChar('*'); and see whether there is a '/' after '*'
Method2: if (PreviousChar() == '*' && CheckWeReallyMovedAfterSkipToChar)

I think method1 may be better, what's your opinion.

ollydbg · « **Reply #33 on:** September 17, 2013, 03:19:56 pm »

I do not understand what this code snippet trying to solve, can you explain?

Code

wxString Tokenizer::PeekToken()
{
    if (!m_PeekAvailable)
    {
        m_PeekAvailable = true;

        unsigned int savedTokenIndex = m_TokenIndex;
        unsigned int savedLineNumber = m_LineNumber;
        unsigned int savedNestLevel  = m_NestLevel;

        if (SkipUnwanted())
            m_PeekToken = DoGetToken();
        else
            m_PeekToken.Clear();

        m_PeekTokenIndex             = m_TokenIndex;
        m_PeekLineNumber             = m_LineNumber;
        m_PeekNestLevel              = m_NestLevel;

        //FIX(huki), check whether m_TokenIndex has decreased which implies a ReplaceBufferForReparse() was done.
        // We can also check for change in m_IsReplaceParsing and m_RepeatReplaceCount before and after DoGetToken().
        if (m_IsReplaceParsing && savedTokenIndex > m_TokenIndex)
            savedTokenIndex = m_TokenIndex;

        m_TokenIndex                 = savedTokenIndex;
        m_LineNumber                 = savedLineNumber;
        m_NestLevel                  = savedNestLevel;
    }

    return m_PeekToken;
}

I have try to let the cctest project parsing a simple buffer:

Code

/********************************/
/********************************/
/********************************/
/********************************/

_STD_BEGIN

int a;
int b;

_STD_END

With the replacement rule:

Code

    Tokenizer::SetReplacementString(_T("_STD_BEGIN"),                       _T("namespace std {"));
    Tokenizer::SetReplacementString(_T("_STD_END"),                         _T("}"));

But I don't see the if(m_IsReplaceParsing && savedTokenIndex > m_TokenIndex) condition is true.

Huki · « **Reply #34 on:** September 17, 2013, 11:01:45 pm »

Quote from: ollydbg on September 17, 2013, 05:25:07 am

Ok, I extract a patch dedicated to ReadToEOL with some of comments added, see below:

Code

From 276d59ccf760ee85a16170db86efa9e362368a0a Mon Sep 17 00:00:00 2001
From: asmwarrior <asmwarrior@gmail.com>
Date: Tue, 17 Sep 2013 11:10:15 +0800
Subject: [PATCH] handle macro handling correctly, distinguish between function
 like macro definition and variable like definition

---
 src/plugins/codecompletion/parser/tokenizer.cpp | 15 ++++++++++++---
 src/plugins/codecompletion/parser/tokenizer.h   |  3 +++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 221d5ac..c0c5bbe 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -438,8 +438,10 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
         wxChar* p = buffer;
         wxString str;
 
+        // loop all the physical lines in reading macro definition
         for (;;)
         {
+            // this while statement end up in a physical EOL '\n'
             while (NotEOF() && CurrentChar() != _T('\n'))
             {
                 while (SkipComment())
@@ -449,7 +451,12 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
                 if (ch == _T('\n'))
                     break;
 
-                if (ch <= _T(' ') && (p == buffer || *(p - 1) == ch))
+                // if we see two spaces in the buffer, we should drop the second one. Note, if the
+                // first char is space, we should always save it to buffer, this is to distinguish
+                // a function/variable like macro definition, e.g.
+                // #define MYMACRO(A)  ...   -> function like macro definition
+                // #define MYMACRO (A)  ...  -> variable like macro definition, note a space before '('
+                if (ch <= _T(' ') && p > buffer && *(p - 1) == ch)
                 {
                     MoveToNextChar();
                     continue;
@@ -475,16 +482,18 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
                 MoveToNextChar();
             }
 
+            // check to see it is a logical EOL, some long macro definition contains a backslash-newline
             if (!IsBackslashBeforeEOL() || IsEOF())
-                break;
+                break; //break the outer for loop
             else
             {
+                //remove the backslash-newline and goto next physical line
                 while (p > buffer && *(--p) <= _T(' '))
                     ;
                 MoveToNextChar();
             }
         }
-
+        // remove the extra spaces in the end of buffer
         while (p > buffer && *(p - 1) <= _T(' '))
             --p;
 
diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index 3999252..dbc8d1b 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -188,6 +188,9 @@ public:
 
     /** return the string from the current position to the end of current line, in most case, this
      * function is used in handling #define, use with care outside this class!
+     * @param nestBraces true if you still need to count the '{' and '}' levels
+     * @param stripUnneeded true if you are going to remove comments and compression spaces(two or
+     * more spaces should become one space)
      */
     wxString ReadToEOL(bool nestBraces = true, bool stripUnneeded = true);
 
-- 
1.8.4.msysgit.0

Yes, that looks good.

About your test case:

Code

000026. DoAddToken() : Added/updated token 'VARIABLE_LIKE_DEFINE' (1), kind 'preprocessor', type ' (A) (A+B)', actual ''. Parent is  (-1)

You can see here type ' (A) (A+B)'. Which means the leading space is kept for "type". This problem will be addressed when you apply my parserthread.cpp patches (cc_parser_general.patch file). Here is the relevant part:

Code: in doaddtoken()

Index: src/plugins/codecompletion/parser/parserthread.cpp
===================================================================
--- src/plugins/codecompletion/parser/parserthread.cpp	(revision 9271)
+++ src/plugins/codecompletion/parser/parserthread.cpp	(working copy)
@@ -1294,7 +1345,7 @@
 
     Token* newToken = 0;
     wxString newname(name);
-    m_Str.Trim();
+    m_Str.Trim(true).Trim(false);
     if (kind == tkDestructor)
     {
         // special class destructors case

For this subject, I suggest committing your ReadToEOL() patch and this DoAddToken() patch above.

Huki · « **Reply #35 on:** September 17, 2013, 11:18:04 pm »

Quote from: ollydbg on September 17, 2013, 07:22:13 am

When begin, we have the index:
Code
/*// ... */
  ^
If you run MoveToNextChar() first, you are now deliberately skip one character, and you are here in:
Code
/*// ... */
   ^
Although this code did work correctly, but I think we can have a better method.
Method1: SkipToChar('*'); and see whether there is a '/' after '*'
Method2: if (PreviousChar() == '*' && CheckWeReallyMovedAfterSkipToChar)

I think method1 may be better, what's your opinion.

I don't think we need to change the method. A comment block always starts and ends with two chars, so deliberately skipping one char is a safe "shortcut" IMO. In the extreme case we will have:
Begin:

Code

/**/
  ^

After running MoveToNextChar() first:

Code

/**/
   ^

So it's going to be ok.
But if we really need more readable code, I suggest Method1 too.

Huki · « **Reply #36 on:** September 17, 2013, 11:35:55 pm »

Quote from: ollydbg on September 17, 2013, 03:19:56 pm

I do not understand what this code snippet trying to solve, can you explain?
Code
        //FIX(huki), check whether m_TokenIndex has decreased which implies a ReplaceBufferForReparse() was done.
        // We can also check for change in m_IsReplaceParsing and m_RepeatReplaceCount before and after DoGetToken().
        if (m_IsReplaceParsing && savedTokenIndex > m_TokenIndex)
            savedTokenIndex = m_TokenIndex;
I have try to let the cctest project parsing a simple buffer:
With the replacement rule:
But I don't see the if(m_IsReplaceParsing && savedTokenIndex > m_TokenIndex) condition is true.

But with your test buffer, the replacement doesn't happen within PeekToken(). So of course, my fix won't be reached.
If we look at how PeekToken() works, we do the following:

Code

        // 1) First we save existing token index:
        unsigned int savedTokenIndex = m_TokenIndex;
        unsigned int savedLineNumber = m_LineNumber;
        unsigned int savedNestLevel  = m_NestLevel;

        // 2) Then we get new token. Within DoGetToken(), a ReplaceBufferForReparse() may occur, 
        //    either to expand an actual macro, or because of user replacement token. 
        //    This will rewind m_TokenIndex, so the savedTokenIndex above will become incorrect.
        if (SkipUnwanted())
            m_PeekToken = DoGetToken();
        else
            m_PeekToken.Clear();

        // 3) Restore saved index. Because of above mentioned issue, this could end up restoring m_TokenIndex to wrong value. 
        //    This will cause bad problems if UngetToken() is called after this PeekToken().
        m_TokenIndex                 = savedTokenIndex;
        m_LineNumber                 = savedLineNumber;
        m_NestLevel                  = savedNestLevel;

That is a PeekToken() specific fix. In general, we also need to reset m_UndoTokenIndex after a ReplaceBufferForReparse(). This is also done in the cc_fix_tokenizer.patch.

Code

Index: src/plugins/codecompletion/parser/tokenizer.cpp
===================================================================
--- src/plugins/codecompletion/parser/tokenizer.cpp	(revision 9271)
+++ src/plugins/codecompletion/parser/tokenizer.cpp	(working copy)
@@ -1817,6 +1865,7 @@
 
     // Fix token index
     m_TokenIndex -= bufferLen;
+    m_UndoTokenIndex = m_TokenIndex;    //ADD(huki)
 
     // Update the peek token
     if (m_PeekAvailable && updatePeekToken)

oBFusCATed · « **Reply #37 on:** September 18, 2013, 01:02:25 am »

@huki: Can I ask you to provide a patch which adds you to the "help -> about -> thanks to" dialog?

ollydbg · « **Reply #38 on:** September 18, 2013, 04:10:40 am »

Quote from: Huki on September 17, 2013, 11:18:04 pm

...
So it's going to be ok.
But if we really need more readable code, I suggest Method1 too.

Thanks for the comments on this, I have commit to SVN trunk now with "Method1".

Quote from: Huki on September 17, 2013, 11:01:45 pm

...
For this subject, I suggest committing your ReadToEOL() patch and this DoAddToken() patch above.

This is also committed to SVN trunk now.

These are the commits I have done. (My time is UTC+8 based)

Quote

Revision: 9343
Author: ollydbg
Date: 2013-9-18 9:59:45
Message:
* CC: when reading the parentheses, set the correct Tokenizer mode(tsReadRawExpression), also skip comments after the first '(' (thanks Huki)
-------------------------------
M(T ) : /trunk/src/plugins/codecompletion/parser/tokenizer.cpp

Revision: 9342
Author: ollydbg
Date: 2013-9-18 9:59:07
Message:
* CC: only skip after single equal sign, not double equals sign (thanks Huki)
-------------------------------
M(T ) : /trunk/src/plugins/codecompletion/parser/tokenizer.cpp

Revision: 9341
Author: ollydbg
Date: 2013-9-18 9:58:29
Message:
* CC: correctly find the end of a c style comment (thanks Huki)
-------------------------------
M(T ) : /trunk/src/plugins/codecompletion/parser/tokenizer.cpp

Revision: 9340
Author: ollydbg
Date: 2013-9-18 9:57:51
Message:
* CC: add missing breaks in switch case statements (thanks Huki)
-------------------------------
M(T ) : /trunk/src/plugins/codecompletion/parser/tokenizer.cpp

Revision: 9339
Author: ollydbg
Date: 2013-9-18 9:57:11
Message:
* CC: handle macro definition correctly, distinguish between function like macro definition and variable like definition (thanks Huki)
-------------------------------
M(T ) : /trunk/src/plugins/codecompletion/parser/parserthread.cpp

M(T ) : /trunk/src/plugins/codecompletion/parser/tokenizer.cpp

M(T ) : /trunk/src/plugins/codecompletion/parser/tokenizer.h

Huki · « **Reply #39 on:** September 19, 2013, 12:52:22 am »

@ollydbg: Thanks! I see one fix related to '==' sign hasn't made it through, in ReadParantheses(). See the code below with my comment.

Code

        case _T('='):
            {
                if (*(p - 1) <= _T(' '))
                {
                    *p = _T('=');
                    // Don't add a space after '=' sign, in case another '=' follows it
                    // (see how the 'else' block below works).
                    //*++p = _T(' ');
                    ++p;
                }
                else
                {
                    switch (*(p - 1))
                    {
                    case _T('='):
                    case _T('!'):
                    case _T('>'):
                    case _T('<'):
                        {
                            *p = _T('=');
                            *++p = _T(' ');
                            ++p;
                        }
                        break;

                    default:
                        {
                            *p = _T(' ');
                            *++p = _T('=');
                            *++p = _T(' ');
                            ++p;
                        }
                        break;
                    }
                }
            }
            break;

        [...]

        // '<', '>' and '!' are handled here.
        default:
            {
                *p = ch;
                ++p;
            }
            break;

And here is the patch:

Code: fixes space between '=='

Index: src/plugins/codecompletion/parser/tokenizer.cpp
===================================================================
--- src/plugins/codecompletion/parser/tokenizer.cpp	(revision 9271)
+++ src/plugins/codecompletion/parser/tokenizer.cpp	(working copy)
@@ -713,7 +718,6 @@
                 if (*(p - 1) <= _T(' '))
                 {
                     *p = _T('=');
-                    *++p = _T(' ');
                     ++p;
                 }
                 else

Huki · « **Reply #40 on:** September 19, 2013, 12:53:20 am »

Quote from: oBFusCATed on September 18, 2013, 01:02:25 am

@huki: Can I ask you to provide a patch which adds you to the "help -> about -> thanks to" dialog?

Sure, here you go:

Code

Index: src/src/dlgabout.cpp
===================================================================
--- src/src/dlgabout.cpp	(revision 9271)
+++ src/src/dlgabout.cpp	(working copy)
@@ -139,6 +139,7 @@
         "Sylvain Prat        : Initial MSVC workspace and project importers\n"
         "Chris Raschko       : Design of the 3D logo for Code::Blocks\n"
         "J.A. Ortega         : 3D Icon based on the above\n"
+        "Huki                : Patches for Code Completion plugin\n"
         "\n"
         "All contributors that provided patches.\n"
         "The wxWidgets project (http://www.wxwidgets.org).\n"

ollydbg · « **Reply #41 on:** September 20, 2013, 02:50:04 pm »

Quote from: Huki on September 19, 2013, 12:52:22 am

@ollydbg: Thanks! I see one fix related to '==' sign hasn't made it through, in ReadParantheses(). See the code below with my comment.

Code

        case _T('='):
            {
                if (*(p - 1) <= _T(' '))
                {
                    *p = _T('=');
                    // Don't add a space after '=' sign, in case another '=' follows it
                    // (see how the 'else' block below works).
                    //*++p = _T(' ');
                    ++p;
                }
                else
                {
                    switch (*(p - 1))
                    {
                    case _T('='):
                    case _T('!'):
                    case _T('>'):
                    case _T('<'):
                        {
                            *p = _T('=');
                            *++p = _T(' ');
                            ++p;
                        }
                        break;

                    default:
                        {
                            *p = _T(' ');
                            *++p = _T('=');
                            *++p = _T(' ');
                            ++p;
                        }
                        break;
                    }
                }
            }
            break;

        [...]

        // '<', '>' and '!' are handled here.
        default:
            {
                *p = ch;
                ++p;
            }
            break;

And here is the patch:

Code: fixes space between '=='

Index: src/plugins/codecompletion/parser/tokenizer.cpp
===================================================================
--- src/plugins/codecompletion/parser/tokenizer.cpp	(revision 9271)
+++ src/plugins/codecompletion/parser/tokenizer.cpp	(working copy)
@@ -713,7 +718,6 @@
                 if (*(p - 1) <= _T(' '))
                 {
                     *p = _T('=');
-                    *++p = _T(' ');
                     ++p;
                 }
                 else

FYI: Morten has applied this patch (rev9351), thanks.

Huki · « **Reply #42 on:** September 21, 2013, 09:29:23 pm »

Ok.. about the issue with ReplaceBufferForReparse(), I think it will be more clear with an example.
The goal is to get UngetToken() to work reliably when macro expansions are involved.

As an example, we have this code:

Code

#define MYMACRO X=0 /*Alternatively, user might have a replacement rule for MYMACRO*/
int X;
MYMACRO;
X = 1;

^ denotes the token index.
_ underlines the current token string.
EOL char is ignored for our example.

CASE I: when ReplaceBufferForReparse() is called in GetToken() / used directly before calling GetToken().
GetToken() Begin:

Code

            int X;MYMACRO;X = 1;
Token:           _^
 Undo:           ^

Step 1: Update undo token and get next token.

Code

            int X;MYMACRO;X = 1;
Token:            _______^
 Undo:            ^

Step 2: ReplaceBufferForReparse(): Replace the macro and reset token index.
BUG HERE: Undo token is still pointing to bad location!

Code

            int X;MYMAX=0;X = 1;
Token:                ^
 Undo:            ^

FIX: Move the undo index to correct location.

Code

            int X;MYMAX=0;X = 1;
Token:                ^
 Undo:                ^

Step 3: Get the next token now after replace.

Code

            int X;MYMAX=0;X = 1;
Token:                _^
 Undo:                ^

CASE II: when ReplaceBufferForReparse() is called in PeekToken(), we have an additional problem.
PeekToken() Begin:
Step 1: Save the token index.

Code

            int X;MYMACRO;X = 1;
Token:           _^
 Save:            ^

Step 2: DoGetToken():
- Get next token.

Code

            int X;MYMACRO;X = 1;
Token:            _______^
 Save:            ^

- Replace the macro and reset token index.

Code

            int X;MYMAX=0;X = 1;
Token:                ^
 Save:            ^

- Get the next token now after replace.

Code

            int X;MYMAX=0;X = 1;
Token:                _^
 Save:            ^

Step 3: Assign Token to Peek, then restore Token from Save.
BUG HERE: Save is still pointing to bad location!

Code

            int X;MYMAX=0;X = 1;
 Peek:                _^
Token:           _^

FIX: Reset the token index to correct location.

Code

            int X;MYMAX=0;X = 1;
 Peek:                _^
Token:           _    ^

Here is an updated patch with fixes for both cases.

Code: cc: reliable working of ungettoken() when macro expansion is involved

Index: src/plugins/codecompletion/parser/tokenizer.cpp
===================================================================
--- src/plugins/codecompletion/parser/tokenizer.cpp	(revision 9271)
+++ src/plugins/codecompletion/parser/tokenizer.cpp	(working copy)
@@ -1135,6 +1146,8 @@
         unsigned int savedLineNumber = m_LineNumber;
         unsigned int savedNestLevel  = m_NestLevel;
 
+        int savedReplaceCount = m_IsReplaceParsing ? m_RepeatReplaceCount : -1;
+
         if (SkipUnwanted())
             m_PeekToken = DoGetToken();
         else
@@ -1144,9 +1157,20 @@
         m_PeekLineNumber             = m_LineNumber;
         m_PeekNestLevel              = m_NestLevel;
 
-        m_TokenIndex                 = savedTokenIndex;
-        m_LineNumber                 = savedLineNumber;
-        m_NestLevel                  = savedNestLevel;
+        // Check whether a ReplaceBufferForReparse() was done in DoGetToken().
+        // We assume m_Undo... have already been reset in ReplaceBufferForReparse().
+        if (m_IsReplaceParsing && savedReplaceCount != (int)m_RepeatReplaceCount)
+        {
+            m_TokenIndex             = m_UndoTokenIndex;
+            m_LineNumber             = m_UndoLineNumber;
+            m_NestLevel              = m_UndoNestLevel;
+        }
+        else
+        {
+            m_TokenIndex             = savedTokenIndex;
+            m_LineNumber             = savedLineNumber;
+            m_NestLevel              = savedNestLevel;
+        }
     }
 
     return m_PeekToken;
@@ -1818,6 +1871,11 @@
     // Fix token index
     m_TokenIndex -= bufferLen;
 
+    // Fix undo position
+    m_UndoTokenIndex = m_TokenIndex;
+    m_UndoLineNumber = m_LineNumber;
+    m_UndoNestLevel = m_NestLevel;
+
     // Update the peek token
     if (m_PeekAvailable && updatePeekToken)
     {

ollydbg · « **Reply #43 on:** September 28, 2013, 05:21:14 pm »

Quote from: Huki on September 21, 2013, 09:29:23 pm

Ok.. about the issue with ReplaceBufferForReparse(), I think it will be more clear with an example.
The goal is to get UngetToken() to work reliably when macro expansions are involved.
...

Hi, thanks for the explanation, your patch is OK from my review.

Besides that, I think an improved UngetToken() can be:

Code

/* peek is always available when we run UngetToken() once, actually the m_TokenIndex is moved
 * backward one step. Note that the m_UndoTokenIndex value is not updated in this function, which
 * means if you are not allowed to run this function twice.
 */
void Tokenizer::UngetToken()
{
    if(m_TokenIndex == m_UndoTokenIndex) //this means we have already run a UngetToken() before.
        return;

    m_PeekTokenIndex = m_TokenIndex;
    m_PeekLineNumber = m_LineNumber;
    m_PeekNestLevel  = m_NestLevel;
    m_TokenIndex     = m_UndoTokenIndex;
    m_LineNumber     = m_UndoLineNumber;
    m_NestLevel      = m_UndoNestLevel;
    m_PeekToken      = m_Token;
    m_PeekAvailable  = true;
}

We only allow the user run UngetToken() once, not twice, right?

ollydbg · « **Reply #44 on:** September 29, 2013, 10:39:27 am »

@Huki, I commit a modified patch: Code: (CC: Reliable working of UngetToken() when macro expansion is involved) in rev9369. Thanks. There are still some patches left for testing, no hurry.

Code::Blocks Forums

News:

Author Topic: Several improvements to Code Completion plugin (Read 204895 times)

ollydbg

Re: Several improvements to Code Completion plugin

ollydbg

Re: Several improvements to Code Completion plugin

ollydbg

Re: Several improvements to Code Completion plugin

ollydbg

Re: Several improvements to Code Completion plugin

Huki

Re: Several improvements to Code Completion plugin

Huki

Re: Several improvements to Code Completion plugin

Huki

Re: Several improvements to Code Completion plugin

oBFusCATed

Re: Several improvements to Code Completion plugin

ollydbg

Re: Several improvements to Code Completion plugin

Huki

Re: Several improvements to Code Completion plugin

Huki

Re: Several improvements to Code Completion plugin

ollydbg

Re: Several improvements to Code Completion plugin

Huki

Re: Several improvements to Code Completion plugin

ollydbg

Re: Several improvements to Code Completion plugin

ollydbg

Re: Several improvements to Code Completion plugin