Several improvements to Code Completion plugin

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

<< < (7/29) > >>

ollydbg:

--- Quote from: ollydbg on September 16, 2013, 06:09:39 pm ---...
I think I need to update the comments in either HandleDefines and ReadToEOL function now. :)

--- End quote ---
When reading the function ReadToEOL, I see it first allocate a wxChar buffer, and then Append to a wxString, firstly I concern this code compile under wx 2.9.x, because wx 2.9.x use native internal buffers for different OS (under Linux, it use UTF8, under Windows, it use wchar_t), luckily, It don't have such issue, the reason is below:

--- Quote ---wxChar is defined to be
char when wxUSE_UNICODE==0
wchar_t when wxUSE_UNICODE==1 (the default).

--- End quote ---

Also, wxString have a member function:

--- Quote ---wxString & Append (const wchar_t *pwz, size_t nLen)
Appends the wide string literal psz with max length nLen.

--- End quote ---

ollydbg:

--- Quote from: ollydbg on September 16, 2013, 06:09:39 pm ---I think I need to update the comments in either HandleDefines and ReadToEOL function now. :)

--- End quote ---
Ok, I extract a patch dedicated to ReadToEOL with some of comments added, see below:

--- Code: ---From 276d59ccf760ee85a16170db86efa9e362368a0a Mon Sep 17 00:00:00 2001
From: asmwarrior <asmwarrior@gmail.com>
Date: Tue, 17 Sep 2013 11:10:15 +0800
Subject: [PATCH] handle macro handling correctly, distinguish between function
like macro definition and variable like definition

---
src/plugins/codecompletion/parser/tokenizer.cpp | 15 ++++++++++++---
src/plugins/codecompletion/parser/tokenizer.h | 3 +++
2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 221d5ac..c0c5bbe 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -438,8 +438,10 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
   wxChar* p = buffer;
   wxString str;

+ // loop all the physical lines in reading macro definition
   for (;;)
   {
+ // this while statement end up in a physical EOL '\n'
   while (NotEOF() && CurrentChar() != _T('\n'))
   {
   while (SkipComment())
@@ -449,7 +451,12 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
   if (ch == _T('\n'))
   break;

- if (ch <= _T(' ') && (p == buffer || *(p - 1) == ch))
+ // if we see two spaces in the buffer, we should drop the second one. Note, if the
+ // first char is space, we should always save it to buffer, this is to distinguish
+ // a function/variable like macro definition, e.g.
+ // #define MYMACRO(A) ... -> function like macro definition
+ // #define MYMACRO (A) ... -> variable like macro definition, note a space before '('
+ if (ch <= _T(' ') && p > buffer && *(p - 1) == ch)
   {
   MoveToNextChar();
   continue;
@@ -475,16 +482,18 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
   MoveToNextChar();
   }

+ // check to see it is a logical EOL, some long macro definition contains a backslash-newline
   if (!IsBackslashBeforeEOL() || IsEOF())
- break;
+ break; //break the outer for loop
   else
   {
+ //remove the backslash-newline and goto next physical line
   while (p > buffer && *(--p) <= _T(' '))
   ;
   MoveToNextChar();
   }
   }
-
+ // remove the extra spaces in the end of buffer
   while (p > buffer && *(p - 1) <= _T(' '))
   --p;

diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index 3999252..dbc8d1b 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -188,6 +188,9 @@ public:

   /** return the string from the current position to the end of current line, in most case, this
   * function is used in handling #define, use with care outside this class!
+ * @param nestBraces true if you still need to count the '{' and '}' levels
+ * @param stripUnneeded true if you are going to remove comments and compression spaces(two or
+ * more spaces should become one space)
   */
   wxString ReadToEOL(bool nestBraces = true, bool stripUnneeded = true);

--
1.8.4.msysgit.0

--- End code ---

PS: git is vary good at split commit. :)

EDIT: Here is the test case
Parsing code:

--- Code: ---#define FUNCTION_LIKE_DEFINE(A) (A+B)
#define VARIABLE_LIKE_DEFINE (A) (A+B)

--- End code ---
Parsing log:

--- Code: ---000001. --------------M-a-i-n--L-o-g--------------

000002. -----------I-n-t-e-r-i-m--L-o-g-----------
000003. InitTokenizer() : m_Filename='C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h', m_FileSize=85.
000004. Init() : m_Filename='C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h'
000005. C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h
000006. Parse() : Parsing 'C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h'
000007. DoParse() : Loop:m_Str='', token='#'
000008. wxString Tokenizer::ReadToEOL(bool, bool) : line=2, CurrentChar='(', PreviousChar='E', NextChar='A', nestBrace(0)
000009. ReadToEOL(): (END) We are now at line 2, CurrentChar='\n', PreviousChar='\r', NextChar='#'
000010. ReadToEOL(): (A) (A+B)
000011. DoAddToken() : Created token='FUNCTION_LIKE_DEFINE', file_idx=1, line=2, ticket=258
000012. GetTokenBaseType() : Searching within m_Str='(A+B)'
000013. GetTokenBaseType() : Compensated m_Str='(A+B)'
000014. GetTokenBaseType() : Found ''
000015. DoAddToken() : Prepending ''
000016. DoAddToken() : Added/updated token 'FUNCTION_LIKE_DEFINE' (0), kind 'preprocessor', type '(A+B)', actual ''. Parent is (-1)
000017. DoParse() : Loop:m_Str='', token='#'
000018. wxString Tokenizer::ReadToEOL(bool, bool) : line=3, CurrentChar=' ', PreviousChar='E', NextChar='(', nestBrace(0)
000019. ReadToEOL(): (END) We are now at line 3, CurrentChar='\n', PreviousChar='\r', NextChar='\r'
000020. ReadToEOL(): (A) (A+B)
000021. DoAddToken() : Created token='VARIABLE_LIKE_DEFINE', file_idx=1, line=3, ticket=259
000022. GetTokenBaseType() : Searching within m_Str=' (A) (A+B)'
000023. GetTokenBaseType() : Compensated m_Str=' (A) (A+B)'
000024. GetTokenBaseType() : Found ''
000025. DoAddToken() : Prepending ''
000026. DoAddToken() : Added/updated token 'VARIABLE_LIKE_DEFINE' (1), kind 'preprocessor', type ' (A) (A+B)', actual ''. Parent is (-1)
********************************************************

--- End code ---

Looks good to me.

ollydbg:
About this code snippet:

--- Code: --- // Here, we are in the comment body
while (true)
{
if (cstyle) // C style comment
{
//FIX(huki), Moved this from below to avoid taking the same '*' for comment begin and end
// eg, /*// ... */
if (!MoveToNextChar())
break;

SkipToChar('/');
if (PreviousChar() == '*') // end of a C style comment
{
MoveToNextChar();
break;
}
}
else // C++ style comment
{
TRACE(_T("SkipComment() : Need to call SkipToInlineCommentEnd() here at line = %u"), m_LineNumber);
SkipToInlineCommentEnd();
break;
}
}

--- End code ---

When begin, we have the index:

--- Code: ---/*// ... */
^

--- End code ---
If you run MoveToNextChar() first, you are now deliberately skip one character, and you are here in:

--- Code: ---/*// ... */
^

--- End code ---
Although this code did work correctly, but I think we can have a better method.
Method1: SkipToChar('*'); and see whether there is a '/' after '*'
Method2: if (PreviousChar() == '*' && CheckWeReallyMovedAfterSkipToChar)

I think method1 may be better, what's your opinion.

ollydbg:
I do not understand what this code snippet trying to solve, can you explain?

--- Code: ---wxString Tokenizer::PeekToken()
{
if (!m_PeekAvailable)
{
m_PeekAvailable = true;

unsigned int savedTokenIndex = m_TokenIndex;
unsigned int savedLineNumber = m_LineNumber;
unsigned int savedNestLevel = m_NestLevel;

if (SkipUnwanted())
m_PeekToken = DoGetToken();
else
m_PeekToken.Clear();

m_PeekTokenIndex = m_TokenIndex;
m_PeekLineNumber = m_LineNumber;
m_PeekNestLevel = m_NestLevel;

//FIX(huki), check whether m_TokenIndex has decreased which implies a ReplaceBufferForReparse() was done.
// We can also check for change in m_IsReplaceParsing and m_RepeatReplaceCount before and after DoGetToken().
if (m_IsReplaceParsing && savedTokenIndex > m_TokenIndex)
savedTokenIndex = m_TokenIndex;

m_TokenIndex = savedTokenIndex;
m_LineNumber = savedLineNumber;
m_NestLevel = savedNestLevel;
}

return m_PeekToken;
}

--- End code ---

I have try to let the cctest project parsing a simple buffer:

--- Code: ---/********************************/
/********************************/
/********************************/
/********************************/

_STD_BEGIN

int a;
int b;

_STD_END

--- End code ---

With the replacement rule:

--- Code: --- Tokenizer::SetReplacementString(_T("_STD_BEGIN"), _T("namespace std {"));
Tokenizer::SetReplacementString(_T("_STD_END"), _T("}"));

--- End code ---

But I don't see the if(m_IsReplaceParsing && savedTokenIndex > m_TokenIndex) condition is true.

Huki:

--- Quote from: ollydbg on September 17, 2013, 05:25:07 am ---Ok, I extract a patch dedicated to ReadToEOL with some of comments added, see below:

--- Code: ---From 276d59ccf760ee85a16170db86efa9e362368a0a Mon Sep 17 00:00:00 2001
From: asmwarrior <asmwarrior@gmail.com>
Date: Tue, 17 Sep 2013 11:10:15 +0800
Subject: [PATCH] handle macro handling correctly, distinguish between function
like macro definition and variable like definition

---
src/plugins/codecompletion/parser/tokenizer.cpp | 15 ++++++++++++---
src/plugins/codecompletion/parser/tokenizer.h | 3 +++
2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 221d5ac..c0c5bbe 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -438,8 +438,10 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
   wxChar* p = buffer;
   wxString str;

+ // loop all the physical lines in reading macro definition
   for (;;)
   {
+ // this while statement end up in a physical EOL '\n'
   while (NotEOF() && CurrentChar() != _T('\n'))
   {
   while (SkipComment())
@@ -449,7 +451,12 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
   if (ch == _T('\n'))
   break;

- if (ch <= _T(' ') && (p == buffer || *(p - 1) == ch))
+ // if we see two spaces in the buffer, we should drop the second one. Note, if the
+ // first char is space, we should always save it to buffer, this is to distinguish
+ // a function/variable like macro definition, e.g.
+ // #define MYMACRO(A) ... -> function like macro definition
+ // #define MYMACRO (A) ... -> variable like macro definition, note a space before '('
+ if (ch <= _T(' ') && p > buffer && *(p - 1) == ch)
   {
   MoveToNextChar();
   continue;
@@ -475,16 +482,18 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
   MoveToNextChar();
   }

+ // check to see it is a logical EOL, some long macro definition contains a backslash-newline
   if (!IsBackslashBeforeEOL() || IsEOF())
- break;
+ break; //break the outer for loop
   else
   {
+ //remove the backslash-newline and goto next physical line
   while (p > buffer && *(--p) <= _T(' '))
   ;
   MoveToNextChar();
   }
   }
-
+ // remove the extra spaces in the end of buffer
   while (p > buffer && *(p - 1) <= _T(' '))
   --p;

diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index 3999252..dbc8d1b 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -188,6 +188,9 @@ public:

   /** return the string from the current position to the end of current line, in most case, this
   * function is used in handling #define, use with care outside this class!
+ * @param nestBraces true if you still need to count the '{' and '}' levels
+ * @param stripUnneeded true if you are going to remove comments and compression spaces(two or
+ * more spaces should become one space)
   */
   wxString ReadToEOL(bool nestBraces = true, bool stripUnneeded = true);

--
1.8.4.msysgit.0

--- End code ---

--- End quote ---
Yes, that looks good.

About your test case:

--- Code: ---000026. DoAddToken() : Added/updated token 'VARIABLE_LIKE_DEFINE' (1), kind 'preprocessor', type ' (A) (A+B)', actual ''. Parent is (-1)
--- End code ---
You can see here type ' (A) (A+B)'. Which means the leading space is kept for "type". This problem will be addressed when you apply my parserthread.cpp patches (cc_parser_general.patch file). Here is the relevant part:

--- Code: (in DoAddToken()) ---Index: src/plugins/codecompletion/parser/parserthread.cpp
===================================================================
--- src/plugins/codecompletion/parser/parserthread.cpp (revision 9271)
+++ src/plugins/codecompletion/parser/parserthread.cpp (working copy)
@@ -1294,7 +1345,7 @@

   Token* newToken = 0;
   wxString newname(name);
- m_Str.Trim();
+ m_Str.Trim(true).Trim(false);
   if (kind == tkDestructor)
   {
   // special class destructors case

--- End code ---

For this subject, I suggest committing your ReadToEOL() patch and this DoAddToken() patch above.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version