We have already enabling some kind of macro usage check in our parser(see rev9932), but that's not fully.
My idea was try to enable the macro checking on each identifier like tokens, I did some tests, and it won't slow down much of the parsing speed, the result is quite good.
But we still have some issues to solve
1, we have different build target, and different build target have different C Preprocessor definitions, if we still parse the whole C::B project, we can get bad result.
E.g. In Codeblocks.cbp, we have one macro definition in one target
#define type(obj) ((obj)._type)
or even bad
And we have some other target which can have some code like:
GdbCmd_SetCatch(DebuggerDriver *driver, const wxString &type, int *resultIndex)
In this case, the "type" was replaced to un-wanted tokens.
2, we have some code like:
// Assumes the buffer bitmap only covers the client area;
// does not prepare the window DC
#define wxBUFFER_CLIENT_AREA 0x02
class WXDLLEXPORT wxBufferedDC : public wxMemoryDC
{
public:
// Default ctor, must subsequently call Init for two stage construction.
wxBufferedDC()
: m_dc(NULL),
m_buffer(NULL),
m_style(0)
{
}
// Construct a wxBufferedDC using a user supplied buffer.
wxBufferedDC(wxDC *dc,
wxBitmap& buffer = wxNullBitmap,
int style = wxBUFFER_CLIENT_AREA)
: m_dc(NULL), m_buffer(NULL)
{
Init(dc, buffer, style);
}
When parsing the function arguments(parameters), if macro expansion is enabled on every tokens, we will get
(wxDC *dc, wxBitmap& buffer = wxNullBitmap, int style = 0x02)
instead of
(wxDC *dc, wxBitmap& buffer = wxNullBitmap, int style = wxBUFFER_CLIENT_ARE)
What would you like to see? The former or the later?
3, In the further, I would like to see that all the macro handling was put in the Tokenizer class. (Currently, we have macro handling both in Parserthread and Tokenizer class).
4, Too speed up the macro definition search, I think a hash table (e.g. wxWidgets: wxHashMap Class Reference (http://docs.wxwidgets.org/trunk/classwx_hash_map.html)) is preferred. (Currently, the macro definition is stored in the TokenTree, which is actually a Trie - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Trie))
EDIT: Even in the same build target, we can still have different C preprocessor definitions for different translation file, so this seems not easy to solve. One method is that we try to maintain macro definition for each file, or create a include file hierarchy.
EDIT2:
Patches (git patches serial) attached, comments were welcome.
EDIT3:
About issue 1, I think we have discussed here, see Huki's comments Re: Several improvements to Code Completion plugin (http://forums.codeblocks.org/index.php/topic,18315.msg132491.html#msg132491) and Re: Several improvements to Code Completion plugin (http://forums.codeblocks.org/index.php/topic,18315.msg132527.html#msg132527)
Hi, Huki, I attach the git patch serials against the latest trunk rev 10182.
Comments are welcome :)
I've attached some bug fix patches against your last cc_macro serials. Some description:
Fixed = [] , handling when parsing variable declaration. i.e.,
int var1 = 0; // this var has an assignment
int var2 = 0; // mouse on var2 -> we get "intint var2"
int var3 = 0; // mouse on var3 -> we get "intintint var3"
int num=0, num2=0, num3=0;
// mouse on each variable and we get:
// num -> "int num"
// num2 -> "int, num2"
// num3 -> "int,, num3"
int array[5]; // vars with [] are not parsed
The rest of the patches should already be clear.
Hi, Huki, I attach the git patch serials against the latest trunk rev 10182.
Comments are welcome :)
I've attached some bug fix patches against your last cc_macro serials. Some description:
...
Very nice work, Huki, thanks. I have them applied in my local copy, it fixes some errors of my patch serials.
BTW: I see a trailing spaces in a patch when I run the "git am" command
Applying: CC: fix variable parsing with "=" or "[]"
f:/cb_sf_git/trunk/.git/rebase-apply/patch:52: trailing whitespace.
|| peek==ParserConsts::oparray_chr
warning: 1 line adds whitespace errors.
Do you have the "remove trailing space" option enabled in C::B?
Any suggestion is that the commit message should have a "*" if it is a major function change or "-" if it is a minor change.
So, it should be:
* CC: xxxxxxx
detailed descriptions
You can wrote as many detailed text as possible in the commit log messages, E.g. see my commit r10157 as an example. I just follow Linux's commit message style, which means a simple line as a subject, and after an empty line, write details, like this: Git Commit Messages : 50/72 Formatting (http://stackoverflow.com/questions/2290016/git-commit-messages-50-72-formatting).
EDIT: I have fixed all the mentioned issue above in my local copy, so you don't need to post the patches again. :)
Currently, I see only one regression (as of revision 10341, before my patch and after applying my patch) is here:
--- C:/before.txt Sat Jun 20 14:14:40 2015
+++ C:/after.txt Sat Jun 20 14:09:01 2015
@@ -56,14 +56,14 @@
Total 9 tests, 9 PASS, 0 FAIL
--------------------------------------------------------
********************************************************
Testing in file: F:\cb_sf_git\trunk\src\plugins\codecompletion\testing\cc_function_ptr.cpp
********************************************************
--PASS: Fun FuncArray
+*FAIL: Fun FuncArray
-PASS: foo foo
--------------------------------------------------------
-Total 2 tests, 2 PASS, 0 FAIL
+Total 2 tests, 1 PASS, 1 FAIL
--------------------------------------------------------
********************************************************
Testing in file: F:\cb_sf_git\trunk\src\plugins\codecompletion\testing\cc_function_ptr_com_interface.cpp
********************************************************
-PASS: factory-> QueryInterface
Simplify the test case, I get such test file: ccc_bug.cpp
int (*FuncArray[10][20]) ();
//Fun //FuncArray
I will look into it.
OK, I found the cause of this regression, I get the parenthesis is L"(* FuncArray [ 10 ] [ 20 ])", and if without my patch, it don't have a space after the "FuncArray", this is because in my patch serials, I have change the function of reading the parenthesis. (The old code is hard to understand for me, and using raw wxChar pointers).
Then in the function:
// pattern: m_Str AAA (*BBB) (...);
if (peek == ParserConsts::semicolon && pos != wxNOT_FOUND)
{
name.RemoveLast(); // remove ")"
name.Remove(0, pos+1).Trim(false); // remove "(* "
// pattern: m_Str AAA (*BBB[X][Y]) (...);
pos = name.find(ParserConsts::oparray_chr);
if (pos != wxNOT_FOUND)
name.Remove(pos);
TRACE(_T("HandleFunction() : Add token name='")+name+_T("', args='")+args+_T("', return type='") + m_Str+ _T("'"));
Token* newToken = DoAddToken(tkFunction, name, lineNr, 0, 0, args);
We pass the name as "FuncArray " (note a space after "FuncArray".
Now, I think I need to return a more compact form like "(*FuncArray[10][20])".
EDIT: the old code for reading parenthesis dose not consider the macro expansion in its content, it just find the "(" and ")" pair in the raw wxString, while in my patch serials, I do macro expansion on every token if possible.
Now, here comes the function I need to implement, it looks like "space" are added only in some very special cases.
void Tokenizer::ReadParentheses(wxString& str)
{
// we create a local buffer here, so the data is copied from m_Buffer to buffer
// once the local buffer is full, we append the content in local buffer to the result str
static const size_t maxBufferLen = 4093;
// local buffer
wxChar buffer[maxBufferLen + 3];
buffer[0] = _T('$'); // avoid segfault error, because we have some *(p - 1) = x in the code
wxChar* realBuffer = buffer + 1;
// p points to the local buffer
wxChar* p = realBuffer;
int level = 0; // brace level of '(' and ')'
while (NotEOF())
{
while (SkipComment())
;
wxChar ch = CurrentChar();
while (ch == _T('#')) // do not use if
{
const PreprocessorType type = GetPreprocessorType();
if (type == ptOthers)
break;
HandleConditionPreprocessor(type);
ch = CurrentChar();
}
const unsigned int startIndex = m_TokenIndex;
switch(ch)
{
case _T('('):
{
++level;
*p = ch;
++p;
}
break;
case _T(')'):
{
if (*(p - 1) <= _T(' '))
--p; // if previous char is a space, we overwrite the space with a ')', so we save one char
--level;
*p = ch;
++p;
}
break;
case _T('\''):
case _T('"'):
{
MoveToNextChar(); // skip the leading '"' or '\''
SkipToStringEnd(ch); // m_TokenIndex point to the trailing '"' or '\'' now
MoveToNextChar(); // move to the char after trailing char
const size_t writeLen = m_TokenIndex - startIndex; // the string length
const size_t usedLen = p - realBuffer;
if (usedLen + writeLen > maxBufferLen) // there is not enough space left to write the string
{
if (writeLen > maxBufferLen) // to big string?
{
TRACE(_T("ReadParentheses(): Catch exception 1: %lu"), static_cast<unsigned long>(writeLen));
return;
}
if (p != realBuffer) // add the used string to the result str
{
str.Append(realBuffer, usedLen);
p = realBuffer; // adjust the p, as it was an empty buffer
}
// append the string to str (not write the string to the buffer)
str.Append((const wxChar*)m_Buffer + startIndex, writeLen);
}
else
{ // the buffer can hold the string, so copy it to the buffer
memcpy(p, (const wxChar*)m_Buffer + startIndex, writeLen * sizeof(wxChar));
p += writeLen;
}
continue;
}
break;
case _T(','): // if there is a space before the comma, we just remove it, but we should add a space after the comma
case _T('*'):
case _T('&'):
{
if (*(p - 1) <= _T(' '))
--p;
*p = ch;
*++p = _T(' ');
++p;
}
break;
case _T('='):
{
if (*(p - 1) <= _T(' '))
{
*p = _T('=');
// Don't add a space after '=' sign, in case another '=' follows it
// (see how the 'else' block below works).
//*++p = _T(' ');
++p;
}
else // special handling of "==" "=!" "=>" and "=<"
{
switch (*(p - 1))
{
case _T('='):
case _T('!'):
case _T('>'):
case _T('<'):
{
*p = _T('=');
*++p = _T(' ');
++p;
}
break;
default:
{
*p = _T(' ');
*++p = _T('=');
*++p = _T(' ');
++p;
}
break;
}
}
}
break;
case _T(' '): // we only add a space if the former char is not a space, also not a '('
case _T('\r'): // the tab char are regard as space
case _T('\t'):
{
if (*(p - 1) != _T(' ') && *(p - 1) != _T('('))
{
*p = _T(' ');
++p;
}
}
break;
case _T('\n'): // we need keep the \n for records paras correct position
if (*(p - 1) == _T(' '))
--p;
if (*(p - 1) != _T('('))
{
*p = ch;
++p;
}
break;
default: // any other case, we can directly copy the char
{
*p = ch;
++p;
}
break;
}
if (p >= realBuffer + maxBufferLen)
{
str.Append(realBuffer, p - realBuffer);
p = realBuffer;
}
MoveToNextChar();
if (level == 0)
break;
}//while (NotEOF())
if (p > realBuffer)
str.Append(realBuffer, p - realBuffer);
TRACE(_T("ReadParentheses(): %s, line=%u"), str.wx_str(), m_LineNumber);
if (str.Len() > 512)
{ TRACE(_T("ReadParentheses(): Catch exception 2: %lu"), static_cast<unsigned long>(str.Len())); }
}
So, I need to follow this way. :)
One issue for macro replacement parsing is that if you have such code:
#define uint8 unsigned char
void f(uint8 aaa);
Then the parser will see this function declaration:
void f(unsigned char aaa);
This is because we try to expand every macro usage.
Does this cause some issue?
Would you this problem affect the autocompletion, calltip and symbol browser?
Sure, yes. because the macro expansion happens in the lower tokenizer level, so that the high level parserthread only see the expanded tokens.
I take some time to do a simple research, I think macro expansion shouldn't be a problem.
Let me put the example code here again.
Here, I think a good code style is not use the macro definition to give a type a new name, the best way here is using typedef.
So, if we have the code:
typedef unsigned char uint8;
void f(uint8 aaa);
Then, the parser still see the "void f(uint8 aaa);"
I see that under MSVC, many types are actually typedef, such as "BOOL", "LPCTSTR".
BTW: I don't see much response about this patches serials :(, so this is a "ping" for further comments, thanks.
void Function1(void)
{
int some_int;
for (int for_int = 0; for_int < 5; ++for_int)
{
// //some_int
//for //for_int
}
}
void Function2(void)
{
int value1 = 0;
int value2 = 0;
// hover on "value1"
if (!value1) {}
// hover on "value1" and "value2"
if (value1 && value2) {}
// Above tooltips will be shown correctly because the statements
// within if(...), do(...), etc are NOT parsed entirely.
// Revision 10230 "fixes" this problem, but now, parsing these statements
// introduces the following regression.
//if (!value1) {} // adds a new token "! value1"
//if (value1 && value2){} // adds a new token "value1&& value2"
}
It looks like if (xxx), where xxx is an expression.
and the Code::Blocks / Tickets / #145 Code completion ignores parameters of catch-clauses (https://sourceforge.net/p/codeblocks/tickets/145/), the test code I use:
class E
{
int a;
};
int main()
{
try
{
}
catch(E exp123)
{
exp|
}
return 0;
}
Where catch(yyy), here yyy is much like function arguments, which is not an expression.
And the void ParserThread::HandleConditionalArguments() is mainly handling the "xxx", not "yyy". :)
* CC: [huki] parser - fix for function pointer parsing with assignment.
I think remove the ';' check is not quite good, you just want to support some patters like:
m_Str AAA (*BBB) (...);
m_Str AAA (*BBB) (...) = some_function;
Why not just check if the peek is either ';' or '='?
I think there are other cases which could not lead to a function pointer, such as C's Type Casting.
m_Str AAA (*typeA) (*typeB)(*typeC)....
Or maybe, there are other cases.
* CC: [huki] parser - fix for function pointer parsing with assignment.
I think remove the ';' check is not quite good, you just want to support some patters like:
m_Str AAA (*BBB) (...);
m_Str AAA (*BBB) (...) = some_function;
Why not just check if the peek is either ';' or '='?
I think there are other cases which could not lead to a function pointer, such as C's Type Casting.
m_Str AAA (*typeA) (*typeB)(*typeC)....
Or maybe, there are other cases.
I agree, checking for both ';' and '=' should be ok. Modified patch:
From f01ada3e356006475787c481316f405ab47359f9 Mon Sep 17 00:00:00 2001
From: huki <gk7huki@gmail.com>
Date: Wed, 8 Apr 2015 08:12:47 +0530
Subject: [PATCH 15/17] CC: [huki] parser - fix for function pointer parsing
with assignment.
---
src/plugins/codecompletion/parser/parserthread.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/plugins/codecompletion/parser/parserthread.cpp b/src/plugins/codecompletion/parser/parserthread.cpp
index 813bdd5..d3e6cdd 100644
--- a/src/plugins/codecompletion/parser/parserthread.cpp
+++ b/src/plugins/codecompletion/parser/parserthread.cpp
@@ -2227,7 +2227,7 @@ void ParserThread::HandleFunction(wxString& name, bool isOperator, bool isPointe
int pos = name.find(ParserConsts::ptr);
// pattern: m_Str AAA (*BBB) (...);
- if (peek == ParserConsts::semicolon && pos != wxNOT_FOUND)
+ if (pos != wxNOT_FOUND && (peek == ParserConsts::semicolon || peek == ParserConsts::equals))
{
name.RemoveLast(); // remove ")"
name.Remove(0, pos+1).Trim(false); // remove "(* "
@@ -2235,7 +2235,7 @@ void ParserThread::HandleFunction(wxString& name, bool isOperator, bool isPointe
// pattern: m_Str AAA (*BBB[X][Y]) (...);
pos = name.find(ParserConsts::oparray_chr);
if (pos != wxNOT_FOUND)
- name.Remove(pos);
+ name.Remove(pos).Trim(true);
TRACE(_T("HandleFunction() : Add token name='")+name+_T("', args='")+args+_T("', return type='") + m_Str+ _T("'"));
Token* newToken = DoAddToken(tkFunction, name, lineNr, 0, 0, args);
--
2.1.4
I added the Trim(true) for safety, in case the name contains a trailing space. But I see you already handled that case in the tokenizer's ReadParentheses (see your commit (https://github.com/gk7huki/codeblocks_sf/commit/98b728b88beb5ddd4e39c0d7a20aaa918733c823)). So feel free to ignore this part.
Then, the only change is to test for both ";" and "=".
...
Then we can add a parameter "bool parseArguments" to HandleConditionalArguments(), which will be false by default for 'if', 'do', 'while' and can be set to true to handle catch expressions. Is that ok with you?
I totally agree on this idea. Thanks.
...
I agree, checking for both ';' and '=' should be ok. Modified patch:
...
I added the Trim(true) for safety, in case the name contains a trailing space. But I see you already handled that case in the tokenizer's ReadParentheses (see your commit (https://github.com/gk7huki/codeblocks_sf/commit/98b728b88beb5ddd4e39c0d7a20aaa918733c823)). So feel free to ignore this part.
Then, the only change is to test for both ";" and "=".
Thanks for the modified patch, I'm OK with this change. This is what I'm going to commit:
From 23346fed9a26273fbde0f9f859cac9da0e31ec2d Mon Sep 17 00:00:00 2001
From: huki <gk7huki@gmail.com>
Date: Wed, 8 Apr 2015 08:12:47 +0530
Subject: * CC: parser - fix for function pointer parsing with assignment.
(Thanks Huki)
Parsing code pattern:
m_Str AAA (*BBB) (...) = some_function;
See discussion here:
http://forums.codeblocks.org/index.php/topic,19769.msg140852.html#msg140852
diff --git a/src/plugins/codecompletion/parser/parserthread.cpp b/src/plugins/codecompletion/parser/parserthread.cpp
index 813bdd5..d86cfee 100644
--- a/src/plugins/codecompletion/parser/parserthread.cpp
+++ b/src/plugins/codecompletion/parser/parserthread.cpp
@@ -2227,15 +2227,17 @@ void ParserThread::HandleFunction(wxString& name, bool isOperator, bool isPointe
int pos = name.find(ParserConsts::ptr);
// pattern: m_Str AAA (*BBB) (...);
- if (peek == ParserConsts::semicolon && pos != wxNOT_FOUND)
+ // pattern: m_Str AAA (*BBB) (...) = some_function;
+ if (pos != wxNOT_FOUND && (peek == ParserConsts::semicolon || peek == ParserConsts::equals))
{
name.RemoveLast(); // remove ")"
name.Remove(0, pos+1).Trim(false); // remove "(* "
// pattern: m_Str AAA (*BBB[X][Y]) (...);
+ // Trim(true) for safety, in case the name contains a trailing space
pos = name.find(ParserConsts::oparray_chr);
if (pos != wxNOT_FOUND)
- name.Remove(pos);
+ name.Remove(pos).Trim(true);
TRACE(_T("HandleFunction() : Add token name='")+name+_T("', args='")+args+_T("', return type='") + m_Str+ _T("'"));
Token* newToken = DoAddToken(tkFunction, name, lineNr, 0, 0, args);
CC: [huki] parser - fixed handling of assignment within for loop.
I'm OK with this commit.
With this patch, we can handle the pattern:
for(int i = 0; ...; ...)...
And correctly recognize "i" as a variable.
I will commit soon. This is an enhancement on the original patch, see revision 8700, and the related patch page: Patch 3345 - Code::Blocks History (http://alpha0010.github.io/cb-history/patches/3345.html)
But I see in the function void ParserThread::HandleForLoopArguments()'s body:
...
if (peek == ParserConsts::comma)
{
smallTokenizer.GetToken(); // eat comma
createNewToken = true;
}
else if (peek == ParserConsts::colon
|| peek == ParserConsts::semicolon
|| peek.empty())
{
createNewToken = true;
finished = true; // after this point there will be no further declarations
}
...
It looks like handling colon is not quite correct. For example:
for ( SomeNamespace::SomeClass i = 0; ...)
In this case, it should add a token "i", which the type "SomeNamespace::SomeClass".
The current method is try to construct a small buffer(smallTokenizer), which contains:
( SomeNamespace::SomeClass i = 0; ...)
and run some parsing method on it.
My idea could be:
1, Make the buffer "( SomeNamespace::SomeClass i = 0; ...)", and construct a new Parserthread object
2, call DoParse() function recursively on the buffer, which is quite similar with how we parse the class definitions
class AAA
{
// body of the class definition, we recursively to call the DoParse() function.
// So that all tokens are marked as the child token of the Token "AAA".
// once the "}" is met, the Doparse() will returned to its high level DoParse().
}
3, if possible, we can let the Token "i" be a child Token of the current Token(ideally the Token "for"), although all the tokens are temporary tokens.
In the main loop of the DoParse(), the "SomeNamespace::SomeClass i = 0;" will be correctly recognized as a Token "i".
This is the way we can handle function parameters, or the catch(xxx) clause, as we have already discussed.