Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

Several improvements to Code Completion plugin

<< < (7/29) > >>

ollydbg:

--- Quote from: ollydbg on September 16, 2013, 06:09:39 pm ---...
I think I need to update the comments in either HandleDefines and ReadToEOL function now. :)

--- End quote ---
When reading the function ReadToEOL, I see it first allocate a wxChar buffer, and then Append to a wxString, firstly I concern this code compile under wx 2.9.x, because wx 2.9.x use native internal buffers for different OS (under Linux, it use UTF8, under Windows, it use wchar_t), luckily, It don't have such issue, the reason is below:


--- Quote ---wxChar is defined to be
char when wxUSE_UNICODE==0
wchar_t when wxUSE_UNICODE==1 (the default).

--- End quote ---

Also, wxString have a member function:

--- Quote ---wxString &  Append (const wchar_t *pwz, size_t nLen)
  Appends the wide string literal psz with max length nLen.

--- End quote ---

ollydbg:

--- Quote from: ollydbg on September 16, 2013, 06:09:39 pm ---I think I need to update the comments in either HandleDefines and ReadToEOL function now. :)

--- End quote ---
Ok, I extract a patch dedicated to ReadToEOL with some of comments added, see below:

--- Code: ---From 276d59ccf760ee85a16170db86efa9e362368a0a Mon Sep 17 00:00:00 2001
From: asmwarrior <asmwarrior@gmail.com>
Date: Tue, 17 Sep 2013 11:10:15 +0800
Subject: [PATCH] handle macro handling correctly, distinguish between function
 like macro definition and variable like definition

---
 src/plugins/codecompletion/parser/tokenizer.cpp | 15 ++++++++++++---
 src/plugins/codecompletion/parser/tokenizer.h   |  3 +++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 221d5ac..c0c5bbe 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -438,8 +438,10 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
         wxChar* p = buffer;
         wxString str;
 
+        // loop all the physical lines in reading macro definition
         for (;;)
         {
+            // this while statement end up in a physical EOL '\n'
             while (NotEOF() && CurrentChar() != _T('\n'))
             {
                 while (SkipComment())
@@ -449,7 +451,12 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
                 if (ch == _T('\n'))
                     break;
 
-                if (ch <= _T(' ') && (p == buffer || *(p - 1) == ch))
+                // if we see two spaces in the buffer, we should drop the second one. Note, if the
+                // first char is space, we should always save it to buffer, this is to distinguish
+                // a function/variable like macro definition, e.g.
+                // #define MYMACRO(A)  ...   -> function like macro definition
+                // #define MYMACRO (A)  ...  -> variable like macro definition, note a space before '('
+                if (ch <= _T(' ') && p > buffer && *(p - 1) == ch)
                 {
                     MoveToNextChar();
                     continue;
@@ -475,16 +482,18 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
                 MoveToNextChar();
             }
 
+            // check to see it is a logical EOL, some long macro definition contains a backslash-newline
             if (!IsBackslashBeforeEOL() || IsEOF())
-                break;
+                break; //break the outer for loop
             else
             {
+                //remove the backslash-newline and goto next physical line
                 while (p > buffer && *(--p) <= _T(' '))
                     ;
                 MoveToNextChar();
             }
         }
-
+        // remove the extra spaces in the end of buffer
         while (p > buffer && *(p - 1) <= _T(' '))
             --p;
 
diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index 3999252..dbc8d1b 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -188,6 +188,9 @@ public:
 
     /** return the string from the current position to the end of current line, in most case, this
      * function is used in handling #define, use with care outside this class!
+     * @param nestBraces true if you still need to count the '{' and '}' levels
+     * @param stripUnneeded true if you are going to remove comments and compression spaces(two or
+     * more spaces should become one space)
      */
     wxString ReadToEOL(bool nestBraces = true, bool stripUnneeded = true);
 
--
1.8.4.msysgit.0


--- End code ---

PS: git is vary good at split commit. :)

EDIT: Here is the test case
Parsing code:

--- Code: ---#define FUNCTION_LIKE_DEFINE(A) (A+B)
#define VARIABLE_LIKE_DEFINE (A) (A+B)

--- End code ---
Parsing log:

--- Code: ---000001. --------------M-a-i-n--L-o-g--------------


000002. -----------I-n-t-e-r-i-m--L-o-g-----------
000003. InitTokenizer() : m_Filename='C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h', m_FileSize=85.
000004. Init() : m_Filename='C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h'
000005. C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h
000006. Parse() : Parsing 'C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h'
000007. DoParse() : Loop:m_Str='', token='#'
000008. wxString Tokenizer::ReadToEOL(bool, bool) : line=2, CurrentChar='(', PreviousChar='E', NextChar='A', nestBrace(0)
000009. ReadToEOL(): (END) We are now at line 2, CurrentChar='\n', PreviousChar='\r', NextChar='#'
000010. ReadToEOL(): (A) (A+B)
000011. DoAddToken() : Created token='FUNCTION_LIKE_DEFINE', file_idx=1, line=2, ticket=258
000012. GetTokenBaseType() : Searching within m_Str='(A+B)'
000013. GetTokenBaseType() : Compensated m_Str='(A+B)'
000014. GetTokenBaseType() : Found ''
000015. DoAddToken() : Prepending ''
000016. DoAddToken() : Added/updated token 'FUNCTION_LIKE_DEFINE' (0), kind 'preprocessor', type '(A+B)', actual ''. Parent is  (-1)
000017. DoParse() : Loop:m_Str='', token='#'
000018. wxString Tokenizer::ReadToEOL(bool, bool) : line=3, CurrentChar=' ', PreviousChar='E', NextChar='(', nestBrace(0)
000019. ReadToEOL(): (END) We are now at line 3, CurrentChar='\n', PreviousChar='\r', NextChar='\r'
000020. ReadToEOL():  (A) (A+B)
000021. DoAddToken() : Created token='VARIABLE_LIKE_DEFINE', file_idx=1, line=3, ticket=259
000022. GetTokenBaseType() : Searching within m_Str=' (A) (A+B)'
000023. GetTokenBaseType() : Compensated m_Str=' (A) (A+B)'
000024. GetTokenBaseType() : Found ''
000025. DoAddToken() : Prepending ''
000026. DoAddToken() : Added/updated token 'VARIABLE_LIKE_DEFINE' (1), kind 'preprocessor', type ' (A) (A+B)', actual ''. Parent is  (-1)
********************************************************

--- End code ---

Looks good to me.

ollydbg:
About this code snippet:

--- Code: ---        // Here, we are in the comment body
        while (true)
        {
            if (cstyle) // C style comment
            {
                //FIX(huki), Moved this from below to avoid taking the same '*' for comment begin and end
                // eg, /*// ... */
                if (!MoveToNextChar())
                    break;

                SkipToChar('/');
                if (PreviousChar() == '*') // end of a C style comment
                {
                    MoveToNextChar();
                    break;
                }
            }
            else        // C++ style comment
            {
                TRACE(_T("SkipComment() : Need to call SkipToInlineCommentEnd() here at line = %u"), m_LineNumber);
                SkipToInlineCommentEnd();
                break;
            }
        }

--- End code ---

When begin, we have the index:

--- Code: ---/*// ... */
  ^

--- End code ---
If you run MoveToNextChar() first, you are now deliberately skip one character, and you are here in:

--- Code: ---/*// ... */
   ^

--- End code ---
Although this code did work correctly, but I think we can have a better method.
Method1: SkipToChar('*'); and see whether there is a '/' after '*'
Method2: if (PreviousChar() == '*' && CheckWeReallyMovedAfterSkipToChar)

I think method1 may be better, what's your opinion.

ollydbg:
I do not understand what this code snippet trying to solve, can you explain?

--- Code: ---wxString Tokenizer::PeekToken()
{
    if (!m_PeekAvailable)
    {
        m_PeekAvailable = true;

        unsigned int savedTokenIndex = m_TokenIndex;
        unsigned int savedLineNumber = m_LineNumber;
        unsigned int savedNestLevel  = m_NestLevel;

        if (SkipUnwanted())
            m_PeekToken = DoGetToken();
        else
            m_PeekToken.Clear();

        m_PeekTokenIndex             = m_TokenIndex;
        m_PeekLineNumber             = m_LineNumber;
        m_PeekNestLevel              = m_NestLevel;

        //FIX(huki), check whether m_TokenIndex has decreased which implies a ReplaceBufferForReparse() was done.
        // We can also check for change in m_IsReplaceParsing and m_RepeatReplaceCount before and after DoGetToken().
        if (m_IsReplaceParsing && savedTokenIndex > m_TokenIndex)
            savedTokenIndex = m_TokenIndex;

        m_TokenIndex                 = savedTokenIndex;
        m_LineNumber                 = savedLineNumber;
        m_NestLevel                  = savedNestLevel;
    }

    return m_PeekToken;
}

--- End code ---

I have try to let the cctest project parsing a simple buffer:

--- Code: ---/********************************/
/********************************/
/********************************/
/********************************/

_STD_BEGIN

int a;
int b;

_STD_END

--- End code ---

With the replacement rule:

--- Code: ---    Tokenizer::SetReplacementString(_T("_STD_BEGIN"),                       _T("namespace std {"));
    Tokenizer::SetReplacementString(_T("_STD_END"),                         _T("}"));

--- End code ---

But I don't see the if(m_IsReplaceParsing && savedTokenIndex > m_TokenIndex) condition is true.

Huki:

--- Quote from: ollydbg on September 17, 2013, 05:25:07 am ---Ok, I extract a patch dedicated to ReadToEOL with some of comments added, see below:

--- Code: ---From 276d59ccf760ee85a16170db86efa9e362368a0a Mon Sep 17 00:00:00 2001
From: asmwarrior <asmwarrior@gmail.com>
Date: Tue, 17 Sep 2013 11:10:15 +0800
Subject: [PATCH] handle macro handling correctly, distinguish between function
 like macro definition and variable like definition

---
 src/plugins/codecompletion/parser/tokenizer.cpp | 15 ++++++++++++---
 src/plugins/codecompletion/parser/tokenizer.h   |  3 +++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 221d5ac..c0c5bbe 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -438,8 +438,10 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
         wxChar* p = buffer;
         wxString str;
 
+        // loop all the physical lines in reading macro definition
         for (;;)
         {
+            // this while statement end up in a physical EOL '\n'
             while (NotEOF() && CurrentChar() != _T('\n'))
             {
                 while (SkipComment())
@@ -449,7 +451,12 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
                 if (ch == _T('\n'))
                     break;
 
-                if (ch <= _T(' ') && (p == buffer || *(p - 1) == ch))
+                // if we see two spaces in the buffer, we should drop the second one. Note, if the
+                // first char is space, we should always save it to buffer, this is to distinguish
+                // a function/variable like macro definition, e.g.
+                // #define MYMACRO(A)  ...   -> function like macro definition
+                // #define MYMACRO (A)  ...  -> variable like macro definition, note a space before '('
+                if (ch <= _T(' ') && p > buffer && *(p - 1) == ch)
                 {
                     MoveToNextChar();
                     continue;
@@ -475,16 +482,18 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
                 MoveToNextChar();
             }
 
+            // check to see it is a logical EOL, some long macro definition contains a backslash-newline
             if (!IsBackslashBeforeEOL() || IsEOF())
-                break;
+                break; //break the outer for loop
             else
             {
+                //remove the backslash-newline and goto next physical line
                 while (p > buffer && *(--p) <= _T(' '))
                     ;
                 MoveToNextChar();
             }
         }
-
+        // remove the extra spaces in the end of buffer
         while (p > buffer && *(p - 1) <= _T(' '))
             --p;
 
diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index 3999252..dbc8d1b 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -188,6 +188,9 @@ public:
 
     /** return the string from the current position to the end of current line, in most case, this
      * function is used in handling #define, use with care outside this class!
+     * @param nestBraces true if you still need to count the '{' and '}' levels
+     * @param stripUnneeded true if you are going to remove comments and compression spaces(two or
+     * more spaces should become one space)
      */
     wxString ReadToEOL(bool nestBraces = true, bool stripUnneeded = true);
 
--
1.8.4.msysgit.0


--- End code ---

--- End quote ---
Yes, that looks good.

About your test case:

--- Code: ---000026. DoAddToken() : Added/updated token 'VARIABLE_LIKE_DEFINE' (1), kind 'preprocessor', type ' (A) (A+B)', actual ''. Parent is  (-1)
--- End code ---
You can see here type ' (A) (A+B)'. Which means the leading space is kept for "type". This problem will be addressed when you apply my parserthread.cpp patches (cc_parser_general.patch file). Here is the relevant part:

--- Code: (in DoAddToken()) ---Index: src/plugins/codecompletion/parser/parserthread.cpp
===================================================================
--- src/plugins/codecompletion/parser/parserthread.cpp (revision 9271)
+++ src/plugins/codecompletion/parser/parserthread.cpp (working copy)
@@ -1294,7 +1345,7 @@
 
     Token* newToken = 0;
     wxString newname(name);
-    m_Str.Trim();
+    m_Str.Trim(true).Trim(false);
     if (kind == tkDestructor)
     {
         // special class destructors case

--- End code ---

For this subject, I suggest committing your ReadToEOL() patch and this DoAddToken() patch above.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version