Author Topic: Check macro usage on every identifier like token: issues and discussions  (Read 56376 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
We have already enabling some kind of macro usage check in our parser(see rev9932), but that's not fully.
My idea was try to enable the macro checking on each identifier like tokens, I did some tests, and it won't slow down much of the parsing speed, the result is quite good.
But we still have some issues to solve
1, we have different build target, and different build target have different C Preprocessor definitions, if we still parse the whole C::B project, we can get bad result.
E.g. In Codeblocks.cbp, we have one macro definition in one target
Code
#define type(obj) ((obj)._type)
or even bad
Code
#define type xxx
And we have some other target which can have some code like:
Code
        GdbCmd_SetCatch(DebuggerDriver *driver, const wxString &type, int *resultIndex)
In this case, the "type" was replaced to un-wanted tokens.

2, we have some code like:
Code
// Assumes the buffer bitmap only covers the client area;
// does not prepare the window DC
#define wxBUFFER_CLIENT_AREA        0x02

class WXDLLEXPORT wxBufferedDC : public wxMemoryDC
{
public:
    // Default ctor, must subsequently call Init for two stage construction.
    wxBufferedDC()
        : m_dc(NULL),
          m_buffer(NULL),
          m_style(0)
    {
    }

    // Construct a wxBufferedDC using a user supplied buffer.
    wxBufferedDC(wxDC *dc,
                 wxBitmap& buffer = wxNullBitmap,
                 int style = wxBUFFER_CLIENT_AREA)
        : m_dc(NULL), m_buffer(NULL)
    {
        Init(dc, buffer, style);
    }
When parsing the function arguments(parameters), if macro expansion is enabled on every tokens, we will get
Code
(wxDC *dc, wxBitmap& buffer = wxNullBitmap, int style = 0x02)
instead of
Code
(wxDC *dc, wxBitmap& buffer = wxNullBitmap, int style = wxBUFFER_CLIENT_ARE)
What would you like to see? The former or the later?

3, In the further, I would like to see that all the macro handling was put in the Tokenizer class. (Currently, we have macro handling both in Parserthread and Tokenizer class).

4, Too speed up the macro definition search, I think a hash table (e.g. wxWidgets: wxHashMap Class Reference) is preferred. (Currently, the macro definition is stored in the TokenTree, which is actually a Trie - Wikipedia, the free encyclopedia)




EDIT: Even in the same build target, we can still have different C preprocessor definitions for different translation file, so this seems not easy to solve. One method is that we try to maintain macro definition for each file, or create a include file hierarchy.

EDIT2:
Patches (git patches serial) attached, comments were welcome.

EDIT3:
About issue 1, I think we have discussed here, see Huki's comments Re: Several improvements to Code Completion plugin and Re: Several improvements to Code Completion plugin
« Last Edit: November 10, 2014, 03:29:44 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Huki

  • Multiple posting newcomer
  • *
  • Posts: 95
Hi, Huki, I attach the git patch serials against the latest trunk rev 10182.
Comments are welcome  :)
I've attached some bug fix patches against your last cc_macro serials. Some description:

Fixed = [] , handling when parsing variable declaration. i.e.,
Code: assignment parsing
int var1 = 0; // this var has an assignment
int var2 = 0; // mouse on var2 -> we get "intint var2"
int var3 = 0; // mouse on var3 -> we get "intintint var3"
Code: comma parsing
int num=0, num2=0, num3=0;
// mouse on each variable and we get:
// num -> "int num"
// num2 -> "int, num2"
// num3 -> "int,, num3"
Code: arrays parsing
int array[5]; // vars with [] are not parsed

The rest of the patches should already be clear.
« Last Edit: April 09, 2015, 07:16:23 pm by Huki »

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Hi, Huki, I attach the git patch serials against the latest trunk rev 10182.
Comments are welcome  :)
I've attached some bug fix patches against your last cc_macro serials. Some description:
...
Very nice work, Huki, thanks. I have them applied in my local copy, it fixes some errors of my patch serials.
BTW: I see a trailing spaces in a patch when I run the "git am" command
Code
Applying: CC: fix variable parsing with "=" or "[]"
f:/cb_sf_git/trunk/.git/rebase-apply/patch:52: trailing whitespace.
                         || peek==ParserConsts::oparray_chr
warning: 1 line adds whitespace errors.
Do you have the "remove trailing space" option enabled in C::B?

Any suggestion is that the commit message should have a "*" if it is a major function change or "-" if it is a minor change.
So, it should be:
Code
* CC: xxxxxxx

detailed descriptions
You can wrote as many detailed text as possible in the commit log messages, E.g. see my commit r10157 as an example. I just follow Linux's commit message style, which means a simple line as a subject, and after an empty line, write details, like this: Git Commit Messages : 50/72 Formatting.

EDIT: I have fixed all the mentioned issue above in my local copy, so you don't need to post the patches again. :)
« Last Edit: April 11, 2015, 04:47:17 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9701
Well guys it would be nice to follow but the patches don't apply in svn. Any chance you make svn compatible patches or a svn branch or work together on a git repo? The repo that I was referring to is terribly old... Or am I missing something?
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Well guys it would be nice to follow but the patches don't apply in svn. Any chance you make svn compatible patches or a svn branch or work together on a git repo? The repo that I was referring to is terribly old... Or am I missing something?
Hi, Morten, I attach the svn style patch which is created from this script.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9701
Hi, Morten, I attach the svn style patch which is created from this script.
That is nice, but why don't you (two) just work on a GIT repo? I could check this out via SVN and create patches myself. However, it should need to stay in sync with trunk though - so a bit more work is needed. On the pro side, Huki can commit directly and you can sync yourself with him easily.

It thought thats what the branch in the "OBF repo" was for...?! ???
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Hi, Morten, this is a reply after two months... How time flies.  :(
I think not much people are testing my patch serials. I'm considering what is the time to let those patches into our SVN repo. I don't have a clean plan, maybe, we can let them in, and we may have more testers through nightly build versions.  ;) What's your or other guy's idea?
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13406
    • Travis build status
If you add some tests to the cctest project I am totally fine if you push these in trunk.
Then you can fix bugs and add more tests. So you won't re-introduce old problems by fixing the new ones.

If you're adding tests it should be relatively safe. Also you should be around to be able to fix if something comes up.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
If you add some tests to the cctest project I am totally fine if you push these in trunk.
Then you can fix bugs and add more tests. So you won't re-introduce old problems by fixing the new ones.

If you're adding tests it should be relatively safe. Also you should be around to be able to fix if something comes up.
OK, thanks for the suggestion, I will take two steps.
1, add more tests these days.
2, commit my patch serials only if they don't cause any regression.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Currently, I see only one regression (as of revision 10341, before my patch and after applying my patch) is here:
Code
--- C:/before.txt	Sat Jun 20 14:14:40 2015
+++ C:/after.txt Sat Jun 20 14:09:01 2015
@@ -56,14 +56,14 @@
 Total 9 tests, 9 PASS, 0 FAIL
 --------------------------------------------------------
 ********************************************************
   Testing in file: F:\cb_sf_git\trunk\src\plugins\codecompletion\testing\cc_function_ptr.cpp
 ********************************************************
--PASS: Fun  FuncArray
+*FAIL: Fun  FuncArray
 -PASS: foo  foo
 --------------------------------------------------------
-Total 2 tests, 2 PASS, 0 FAIL
+Total 2 tests, 1 PASS, 1 FAIL
 --------------------------------------------------------
 ********************************************************
   Testing in file: F:\cb_sf_git\trunk\src\plugins\codecompletion\testing\cc_function_ptr_com_interface.cpp
 ********************************************************
 -PASS: factory->  QueryInterface

Simplify the test case, I get such test file: ccc_bug.cpp
Code

int (*FuncArray[10][20]) ();

//Fun  //FuncArray

I will look into it.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
OK, I found the cause of this regression, I get the parenthesis is L"(* FuncArray [ 10 ] [ 20 ])", and if without my patch, it don't have a space after the "FuncArray", this is because in my patch serials, I have change the function of reading the parenthesis. (The old code is hard to understand for me, and using raw wxChar pointers).

Then in the function:
Code
        // pattern: m_Str AAA (*BBB) (...);
        if (peek == ParserConsts::semicolon && pos != wxNOT_FOUND)
        {
            name.RemoveLast();  // remove ")"
            name.Remove(0, pos+1).Trim(false); // remove "(* "

            // pattern: m_Str AAA (*BBB[X][Y]) (...);
            pos = name.find(ParserConsts::oparray_chr);
            if (pos != wxNOT_FOUND)
                name.Remove(pos);

            TRACE(_T("HandleFunction() : Add token name='")+name+_T("', args='")+args+_T("', return type='") + m_Str+ _T("'"));
            Token* newToken =  DoAddToken(tkFunction, name, lineNr, 0, 0, args);
We pass the name as "FuncArray " (note a space after "FuncArray".

Now, I think I need to return a more compact form like "(*FuncArray[10][20])".

EDIT: the old code for reading parenthesis dose not consider the macro expansion in its content, it just find the "(" and ")" pair in the raw wxString, while in my patch serials, I do macro expansion on every token if possible.
« Last Edit: June 20, 2015, 09:23:41 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Now, here comes the function I need to implement, it looks like "space" are added only in some very special cases.

Code
void Tokenizer::ReadParentheses(wxString& str)
{
    // we create a local buffer here, so the data is copied from m_Buffer to buffer
    // once the local buffer is full, we append the content in local buffer to the result str
    static const size_t maxBufferLen = 4093;
    // local buffer
    wxChar buffer[maxBufferLen + 3];
    buffer[0] = _T('$'); // avoid segfault error, because we have some *(p - 1) = x in the code
    wxChar* realBuffer = buffer + 1;
    // p points to the local buffer
    wxChar* p = realBuffer;

    int level = 0; // brace level of '(' and ')'

    while (NotEOF())
    {
        while (SkipComment())
            ;
        wxChar ch = CurrentChar();

        while (ch == _T('#')) // do not use if
        {
            const PreprocessorType type = GetPreprocessorType();
            if (type == ptOthers)
                break;
            HandleConditionPreprocessor(type);
            ch = CurrentChar();
        }

        const unsigned int startIndex = m_TokenIndex;

        switch(ch)
        {
        case _T('('):
            {
                ++level;
                *p = ch;
                ++p;
            }
            break;

        case _T(')'):
            {
                if (*(p - 1) <= _T(' '))
                    --p; // if previous char is a space, we overwrite the space with a ')', so we save one char
                --level;
                *p = ch;
                ++p;
            }
            break;

        case _T('\''):
        case _T('"'):
            {
                MoveToNextChar();     // skip the leading '"' or '\''
                SkipToStringEnd(ch);  // m_TokenIndex point to the trailing '"' or '\'' now
                MoveToNextChar();     // move to the char after trailing char
                const size_t writeLen = m_TokenIndex - startIndex; // the string length
                const size_t usedLen = p - realBuffer;
                if (usedLen + writeLen > maxBufferLen) // there is not enough space left to write the string
                {
                    if (writeLen > maxBufferLen) // to big string?
                    {
                        TRACE(_T("ReadParentheses(): Catch exception 1: %lu"), static_cast<unsigned long>(writeLen));
                        return;
                    }

                    if (p != realBuffer) // add the used string to the result str
                    {
                        str.Append(realBuffer, usedLen);
                        p = realBuffer;  // adjust the p, as it was an empty buffer
                    }
                    // append the string to str (not write the string to the buffer)
                    str.Append((const wxChar*)m_Buffer + startIndex, writeLen);
                }
                else
                {   // the buffer can hold the string, so copy it to the buffer
                    memcpy(p, (const wxChar*)m_Buffer + startIndex, writeLen * sizeof(wxChar));
                    p += writeLen;
                }

                continue;
            }
            break;

        case _T(','): // if there is a space before the comma, we just remove it, but we should add a space after the comma
        case _T('*'):
        case _T('&'):
            {
                if (*(p - 1) <= _T(' '))
                    --p;

                *p = ch;
                *++p = _T(' ');
                ++p;
            }
            break;

        case _T('='):
            {
                if (*(p - 1) <= _T(' '))
                {
                    *p = _T('=');
                    // Don't add a space after '=' sign, in case another '=' follows it
                    // (see how the 'else' block below works).
                    //*++p = _T(' ');
                    ++p;
                }
                else // special handling of "==" "=!" "=>" and "=<"
                {
                    switch (*(p - 1))
                    {
                    case _T('='):
                    case _T('!'):
                    case _T('>'):
                    case _T('<'):
                        {
                            *p = _T('=');
                            *++p = _T(' ');
                            ++p;
                        }
                        break;

                    default:
                        {
                            *p = _T(' ');
                            *++p = _T('=');
                            *++p = _T(' ');
                            ++p;
                        }
                        break;
                    }
                }
            }
            break;

        case _T(' '): // we only add a space if the former char is not a space, also not a '('
        case _T('\r'): // the tab char are regard as space
        case _T('\t'):
            {
                if (*(p - 1) != _T(' ') && *(p - 1) != _T('('))
                {
                    *p = _T(' ');
                    ++p;
                }
            }
            break;

        case _T('\n'): // we need keep the \n for records paras correct position
            if (*(p - 1) == _T(' '))
                --p;
            if (*(p - 1) != _T('('))
            {
                *p = ch;
                ++p;
            }
            break;

        default: // any other case, we can directly copy the char
            {
                *p = ch;
                ++p;
            }
            break;
        }

        if (p >= realBuffer + maxBufferLen)
        {
            str.Append(realBuffer, p - realBuffer);
            p = realBuffer;
        }

        MoveToNextChar();

        if (level == 0)
            break;
    }//while (NotEOF())

    if (p > realBuffer)
        str.Append(realBuffer, p - realBuffer);
    TRACE(_T("ReadParentheses(): %s, line=%u"), str.wx_str(), m_LineNumber);
    if (str.Len() > 512)
    {   TRACE(_T("ReadParentheses(): Catch exception 2: %lu"), static_cast<unsigned long>(str.Len())); }
}


So, I need to follow this way. :)
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
OK, the only regression is fixed now.
I have attach the full patches (include git style patch serials, or a single svn patch)
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6038
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
One issue for macro replacement parsing is that if you have such code:

Code
#define uint8 unsigned char
void f(uint8 aaa);

Then the parser will see this function declaration:
Code
void f(unsigned char aaa);

This is because we try to expand every macro usage.

Does this cause some issue?
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13406
    • Travis build status
Would you this problem affect the autocompletion, calltip and symbol browser?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]