Developer forums (C::B DEVELOPMENT STRICTLY!) > Development
Several improvements to Code Completion plugin
ollydbg:
--- Quote from: ollydbg on September 16, 2013, 06:09:39 pm ---...
I think I need to update the comments in either HandleDefines and ReadToEOL function now. :)
--- End quote ---
When reading the function ReadToEOL, I see it first allocate a wxChar buffer, and then Append to a wxString, firstly I concern this code compile under wx 2.9.x, because wx 2.9.x use native internal buffers for different OS (under Linux, it use UTF8, under Windows, it use wchar_t), luckily, It don't have such issue, the reason is below:
--- Quote ---wxChar is defined to be
char when wxUSE_UNICODE==0
wchar_t when wxUSE_UNICODE==1 (the default).
--- End quote ---
Also, wxString have a member function:
--- Quote ---wxString & Append (const wchar_t *pwz, size_t nLen)
Appends the wide string literal psz with max length nLen.
--- End quote ---
ollydbg:
--- Quote from: ollydbg on September 16, 2013, 06:09:39 pm ---I think I need to update the comments in either HandleDefines and ReadToEOL function now. :)
--- End quote ---
Ok, I extract a patch dedicated to ReadToEOL with some of comments added, see below:
--- Code: ---From 276d59ccf760ee85a16170db86efa9e362368a0a Mon Sep 17 00:00:00 2001
From: asmwarrior <asmwarrior@gmail.com>
Date: Tue, 17 Sep 2013 11:10:15 +0800
Subject: [PATCH] handle macro handling correctly, distinguish between function
like macro definition and variable like definition
---
src/plugins/codecompletion/parser/tokenizer.cpp | 15 ++++++++++++---
src/plugins/codecompletion/parser/tokenizer.h | 3 +++
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 221d5ac..c0c5bbe 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -438,8 +438,10 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
wxChar* p = buffer;
wxString str;
+ // loop all the physical lines in reading macro definition
for (;;)
{
+ // this while statement end up in a physical EOL '\n'
while (NotEOF() && CurrentChar() != _T('\n'))
{
while (SkipComment())
@@ -449,7 +451,12 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
if (ch == _T('\n'))
break;
- if (ch <= _T(' ') && (p == buffer || *(p - 1) == ch))
+ // if we see two spaces in the buffer, we should drop the second one. Note, if the
+ // first char is space, we should always save it to buffer, this is to distinguish
+ // a function/variable like macro definition, e.g.
+ // #define MYMACRO(A) ... -> function like macro definition
+ // #define MYMACRO (A) ... -> variable like macro definition, note a space before '('
+ if (ch <= _T(' ') && p > buffer && *(p - 1) == ch)
{
MoveToNextChar();
continue;
@@ -475,16 +482,18 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
MoveToNextChar();
}
+ // check to see it is a logical EOL, some long macro definition contains a backslash-newline
if (!IsBackslashBeforeEOL() || IsEOF())
- break;
+ break; //break the outer for loop
else
{
+ //remove the backslash-newline and goto next physical line
while (p > buffer && *(--p) <= _T(' '))
;
MoveToNextChar();
}
}
-
+ // remove the extra spaces in the end of buffer
while (p > buffer && *(p - 1) <= _T(' '))
--p;
diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index 3999252..dbc8d1b 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -188,6 +188,9 @@ public:
/** return the string from the current position to the end of current line, in most case, this
* function is used in handling #define, use with care outside this class!
+ * @param nestBraces true if you still need to count the '{' and '}' levels
+ * @param stripUnneeded true if you are going to remove comments and compression spaces(two or
+ * more spaces should become one space)
*/
wxString ReadToEOL(bool nestBraces = true, bool stripUnneeded = true);
--
1.8.4.msysgit.0
--- End code ---
PS: git is vary good at split commit. :)
EDIT: Here is the test case
Parsing code:
--- Code: ---#define FUNCTION_LIKE_DEFINE(A) (A+B)
#define VARIABLE_LIKE_DEFINE (A) (A+B)
--- End code ---
Parsing log:
--- Code: ---000001. --------------M-a-i-n--L-o-g--------------
000002. -----------I-n-t-e-r-i-m--L-o-g-----------
000003. InitTokenizer() : m_Filename='C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h', m_FileSize=85.
000004. Init() : m_Filename='C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h'
000005. C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h
000006. Parse() : Parsing 'C:\DOCUME~1\zyh23\LOCALS~2\Temp\cc20.h'
000007. DoParse() : Loop:m_Str='', token='#'
000008. wxString Tokenizer::ReadToEOL(bool, bool) : line=2, CurrentChar='(', PreviousChar='E', NextChar='A', nestBrace(0)
000009. ReadToEOL(): (END) We are now at line 2, CurrentChar='\n', PreviousChar='\r', NextChar='#'
000010. ReadToEOL(): (A) (A+B)
000011. DoAddToken() : Created token='FUNCTION_LIKE_DEFINE', file_idx=1, line=2, ticket=258
000012. GetTokenBaseType() : Searching within m_Str='(A+B)'
000013. GetTokenBaseType() : Compensated m_Str='(A+B)'
000014. GetTokenBaseType() : Found ''
000015. DoAddToken() : Prepending ''
000016. DoAddToken() : Added/updated token 'FUNCTION_LIKE_DEFINE' (0), kind 'preprocessor', type '(A+B)', actual ''. Parent is (-1)
000017. DoParse() : Loop:m_Str='', token='#'
000018. wxString Tokenizer::ReadToEOL(bool, bool) : line=3, CurrentChar=' ', PreviousChar='E', NextChar='(', nestBrace(0)
000019. ReadToEOL(): (END) We are now at line 3, CurrentChar='\n', PreviousChar='\r', NextChar='\r'
000020. ReadToEOL(): (A) (A+B)
000021. DoAddToken() : Created token='VARIABLE_LIKE_DEFINE', file_idx=1, line=3, ticket=259
000022. GetTokenBaseType() : Searching within m_Str=' (A) (A+B)'
000023. GetTokenBaseType() : Compensated m_Str=' (A) (A+B)'
000024. GetTokenBaseType() : Found ''
000025. DoAddToken() : Prepending ''
000026. DoAddToken() : Added/updated token 'VARIABLE_LIKE_DEFINE' (1), kind 'preprocessor', type ' (A) (A+B)', actual ''. Parent is (-1)
********************************************************
--- End code ---
Looks good to me.
ollydbg:
About this code snippet:
--- Code: --- // Here, we are in the comment body
while (true)
{
if (cstyle) // C style comment
{
//FIX(huki), Moved this from below to avoid taking the same '*' for comment begin and end
// eg, /*// ... */
if (!MoveToNextChar())
break;
SkipToChar('/');
if (PreviousChar() == '*') // end of a C style comment
{
MoveToNextChar();
break;
}
}
else // C++ style comment
{
TRACE(_T("SkipComment() : Need to call SkipToInlineCommentEnd() here at line = %u"), m_LineNumber);
SkipToInlineCommentEnd();
break;
}
}
--- End code ---
When begin, we have the index:
--- Code: ---/*// ... */
^
--- End code ---
If you run MoveToNextChar() first, you are now deliberately skip one character, and you are here in:
--- Code: ---/*// ... */
^
--- End code ---
Although this code did work correctly, but I think we can have a better method.
Method1: SkipToChar('*'); and see whether there is a '/' after '*'
Method2: if (PreviousChar() == '*' && CheckWeReallyMovedAfterSkipToChar)
I think method1 may be better, what's your opinion.
ollydbg:
I do not understand what this code snippet trying to solve, can you explain?
--- Code: ---wxString Tokenizer::PeekToken()
{
if (!m_PeekAvailable)
{
m_PeekAvailable = true;
unsigned int savedTokenIndex = m_TokenIndex;
unsigned int savedLineNumber = m_LineNumber;
unsigned int savedNestLevel = m_NestLevel;
if (SkipUnwanted())
m_PeekToken = DoGetToken();
else
m_PeekToken.Clear();
m_PeekTokenIndex = m_TokenIndex;
m_PeekLineNumber = m_LineNumber;
m_PeekNestLevel = m_NestLevel;
//FIX(huki), check whether m_TokenIndex has decreased which implies a ReplaceBufferForReparse() was done.
// We can also check for change in m_IsReplaceParsing and m_RepeatReplaceCount before and after DoGetToken().
if (m_IsReplaceParsing && savedTokenIndex > m_TokenIndex)
savedTokenIndex = m_TokenIndex;
m_TokenIndex = savedTokenIndex;
m_LineNumber = savedLineNumber;
m_NestLevel = savedNestLevel;
}
return m_PeekToken;
}
--- End code ---
I have try to let the cctest project parsing a simple buffer:
--- Code: ---/********************************/
/********************************/
/********************************/
/********************************/
_STD_BEGIN
int a;
int b;
_STD_END
--- End code ---
With the replacement rule:
--- Code: --- Tokenizer::SetReplacementString(_T("_STD_BEGIN"), _T("namespace std {"));
Tokenizer::SetReplacementString(_T("_STD_END"), _T("}"));
--- End code ---
But I don't see the if(m_IsReplaceParsing && savedTokenIndex > m_TokenIndex) condition is true.
Huki:
--- Quote from: ollydbg on September 17, 2013, 05:25:07 am ---Ok, I extract a patch dedicated to ReadToEOL with some of comments added, see below:
--- Code: ---From 276d59ccf760ee85a16170db86efa9e362368a0a Mon Sep 17 00:00:00 2001
From: asmwarrior <asmwarrior@gmail.com>
Date: Tue, 17 Sep 2013 11:10:15 +0800
Subject: [PATCH] handle macro handling correctly, distinguish between function
like macro definition and variable like definition
---
src/plugins/codecompletion/parser/tokenizer.cpp | 15 ++++++++++++---
src/plugins/codecompletion/parser/tokenizer.h | 3 +++
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/src/plugins/codecompletion/parser/tokenizer.cpp b/src/plugins/codecompletion/parser/tokenizer.cpp
index 221d5ac..c0c5bbe 100644
--- a/src/plugins/codecompletion/parser/tokenizer.cpp
+++ b/src/plugins/codecompletion/parser/tokenizer.cpp
@@ -438,8 +438,10 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
wxChar* p = buffer;
wxString str;
+ // loop all the physical lines in reading macro definition
for (;;)
{
+ // this while statement end up in a physical EOL '\n'
while (NotEOF() && CurrentChar() != _T('\n'))
{
while (SkipComment())
@@ -449,7 +451,12 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
if (ch == _T('\n'))
break;
- if (ch <= _T(' ') && (p == buffer || *(p - 1) == ch))
+ // if we see two spaces in the buffer, we should drop the second one. Note, if the
+ // first char is space, we should always save it to buffer, this is to distinguish
+ // a function/variable like macro definition, e.g.
+ // #define MYMACRO(A) ... -> function like macro definition
+ // #define MYMACRO (A) ... -> variable like macro definition, note a space before '('
+ if (ch <= _T(' ') && p > buffer && *(p - 1) == ch)
{
MoveToNextChar();
continue;
@@ -475,16 +482,18 @@ wxString Tokenizer::ReadToEOL(bool nestBraces, bool stripUnneeded)
MoveToNextChar();
}
+ // check to see it is a logical EOL, some long macro definition contains a backslash-newline
if (!IsBackslashBeforeEOL() || IsEOF())
- break;
+ break; //break the outer for loop
else
{
+ //remove the backslash-newline and goto next physical line
while (p > buffer && *(--p) <= _T(' '))
;
MoveToNextChar();
}
}
-
+ // remove the extra spaces in the end of buffer
while (p > buffer && *(p - 1) <= _T(' '))
--p;
diff --git a/src/plugins/codecompletion/parser/tokenizer.h b/src/plugins/codecompletion/parser/tokenizer.h
index 3999252..dbc8d1b 100644
--- a/src/plugins/codecompletion/parser/tokenizer.h
+++ b/src/plugins/codecompletion/parser/tokenizer.h
@@ -188,6 +188,9 @@ public:
/** return the string from the current position to the end of current line, in most case, this
* function is used in handling #define, use with care outside this class!
+ * @param nestBraces true if you still need to count the '{' and '}' levels
+ * @param stripUnneeded true if you are going to remove comments and compression spaces(two or
+ * more spaces should become one space)
*/
wxString ReadToEOL(bool nestBraces = true, bool stripUnneeded = true);
--
1.8.4.msysgit.0
--- End code ---
--- End quote ---
Yes, that looks good.
About your test case:
--- Code: ---000026. DoAddToken() : Added/updated token 'VARIABLE_LIKE_DEFINE' (1), kind 'preprocessor', type ' (A) (A+B)', actual ''. Parent is (-1)
--- End code ---
You can see here type ' (A) (A+B)'. Which means the leading space is kept for "type". This problem will be addressed when you apply my parserthread.cpp patches (cc_parser_general.patch file). Here is the relevant part:
--- Code: (in DoAddToken()) ---Index: src/plugins/codecompletion/parser/parserthread.cpp
===================================================================
--- src/plugins/codecompletion/parser/parserthread.cpp (revision 9271)
+++ src/plugins/codecompletion/parser/parserthread.cpp (working copy)
@@ -1294,7 +1345,7 @@
Token* newToken = 0;
wxString newname(name);
- m_Str.Trim();
+ m_Str.Trim(true).Trim(false);
if (kind == tkDestructor)
{
// special class destructors case
--- End code ---
For this subject, I suggest committing your ReadToEOL() patch and this DoAddToken() patch above.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version