Author Topic: Macro replacement in CC(tokenizer) suggestion  (Read 11938 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Macro replacement in CC(tokenizer) suggestion
« on: June 29, 2009, 04:10:29 am »
Hi, In the current architecture, the function ThisOrReplacement(m_Token) was only called in Tokenizer::GetToken()

Code
wxString Tokenizer::GetToken()
{
    m_UndoTokenIndex = m_TokenIndex;
    m_UndoLineNumber = m_LineNumber;
    m_UndoNestLevel  = m_NestLevel;

    if(m_PeekAvailable)
    {
        m_TokenIndex = m_PeekTokenIndex;
        m_LineNumber = m_PeekLineNumber;
        m_NestLevel  = m_PeekNestLevel;
        m_Token   = m_PeekToken;
    }
    else
        m_Token = DoGetToken();

    m_PeekAvailable = false;

    return ThisOrReplacement(m_Token);
}


To accelerate this "Macro replacement", I think it should be moved to DoGetToken.

Here are the reasons:

1, ThisOrReplacement(m_Token) internally use a wxString--> wxString map container, so, it will use a search algorithm in this map(normally this will cause a search on a balanced BST), this will take a lot of time.

2, we can avoid many situations to call this function, for example, when m_Token is '{'  or wxEmptyString or many other string that shouldn't need macro expansion.

Any comments?

Thanks

 

If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #1 on: June 29, 2009, 04:33:14 am »
Also, I suggest that when the Tokenizer return a Token( wxString ), it should also combined with a "type", which means the parser can use this type information to do Syntax Analysis.

If I can remember, Ceniza call this a "Smart lexer"  :D
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #2 on: June 29, 2009, 09:40:37 am »
internally use a wxString--> wxString map container, so, it will use a search algorithm in this map(normally this will cause a search on a balanced BST), this will take a lot of time.

2, we can avoid many situations to call this function, for example, when m_Token is '{'  or wxEmptyString or many other string that shouldn't need macro expansion.
I'd be careful with such an optimisation, since a map rarely needs to do more than 4-5 lookups in total, so if you add too many special cases, the resulting code will be slower (and at the same time more difficult to maintain).
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #3 on: June 29, 2009, 09:57:35 am »
I'd be careful with such an optimisation, since a map rarely needs to do more than 4-5 lookups in total, so if you add too many special cases, the resulting code will be slower (and at the same time more difficult to maintain).
Not fully understand you comments :(

I mean if we return a wxString from DoGetToken(),  (for example '{') as we know '{' certainly don't need to do macro replacement, so ,we can avoid calling ThisOrReplacement('{');.

Also, there are many wxString like '{' :D



If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #4 on: June 30, 2009, 12:23:17 pm »
I'd be careful with such an optimisation, since a map rarely needs to do more than 4-5 lookups in total, so if you add too many special cases, the resulting code will be slower (and at the same time more difficult to maintain).
Not fully understand you comments :(
I understood that your idea is to catch special cases which cannot possibly be macros, so they need not looked up in the map<wxString,wxString>.

In other words, replace code that looks like:
return the_map.find(token);

with something like:
if(token.IsEmpty() || (token == one_constant) || (token == two_constant) || (token == three_constant))
    return token;
else
    return the_map.find(token);



My point is that maps have O(log(n)) lookup, so unless a source has 20 billion preprocessor defines, it is really nothing to worry about. For "normal" amounts, log(n) will be something like 4, maybe 5. Let's assume the worst case of 5. One "operation" is a compare and a branch.

Adding a line like the above will remove 5 operations done by the map lookup in the best case, at a cost of 1-4 additional operations (average 2). So we save 5-1 = 4 operations in the best, and 5-2 = 3 operations in the average case.
In the worst case, it will add 4 operations to the existing 5, almost doubling the work.
This scenario might still be advantageous, but it's not likely that it will be a big win.

Now, you were talking of "many" cases. Let's say "many" means 10. In this case, we will do 1-10 operations (5 average) to eliminate the 5 lookups done by the map.
So, on the average, we replace 5 operations with 5 operations (zero win, but more complicated code), and in the worst case, we add 10 operations, tripling the amount of work done.
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #5 on: June 30, 2009, 01:37:44 pm »
Another option is to replace std::map with std::tr1::unordered_map, it if is available (gcc > 4.x.y, not sure which is the minimal x thought)

std::tr1::unordered_map is hash map, so the lookup is O(1) operation.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #6 on: June 30, 2009, 05:13:34 pm »

with something like:
if(token.IsEmpty() || (token == one_constant) || (token == two_constant) || (token == three_constant))
    return token;
else
    return the_map.find(token);



Hi, thomas, Thanks for the full explanation about your point.

But I think you still misunderstand my mind.  :(

Look at the code in DoGetToken(), I have made a pseudo code below. We can add a bool variable

Code

bool NeedReplacement;

if (c == '_' || wxIsalpha(c))
    {         
         ..........
          NeedReplacement = true;
    }
    else if (wxIsdigit(CurrentChar()))
    {
          .........
          NeedReplacement = false;
    }
    else if (CurrentChar() == '"' || CurrentChar() == '\'')
    {
         ......
          NeedReplacement = false;
    }
    else if (CurrentChar() == '(')
    {
         ......
         NeedReplacement = false;
    }
    else
    {
            .......
    }

if(!NeedReplacement)
    return token;
else
    return the_map.find(token);

Because we have already know the information of the current token, we can still save quite a lot of time.  :D

If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5916
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #7 on: June 30, 2009, 05:15:45 pm »
Another option is to replace std::map with std::tr1::unordered_map, it if is available (gcc > 4.x.y, not sure which is the minimal x thought)

std::tr1::unordered_map is hash map, so the lookup is O(1) operation.
This is the great news.
If it is really true that hash map runs faster, I do suggest change to use std::tr1::unordered_map, thanks for the suggestion!!!
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Kazade

  • Multiple posting newcomer
  • *
  • Posts: 73
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #8 on: June 30, 2009, 05:56:14 pm »
Another option is to replace std::map with std::tr1::unordered_map, it if is available (gcc > 4.x.y, not sure which is the minimal x thought)

std::tr1::unordered_map is hash map, so the lookup is O(1) operation.

I personally have been using std::tr1::unordered_map for everything for ages (at least a year), I definitely think if it's available it should be used. Apparently GCC has a __GXX_EXPERIMENTAL_CXX0X__ macro. eg:

Code
#ifdef __GXX_EXPERIMENTAL_CXX0X__
    typedef std::tr1::unordered_map CBMap;
#else
    typedef std::map CBMap;
#endif

CBMap<wxString, wxString> myMap;

or something like that.


Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Macro replacement in CC(tokenizer) suggestion
« Reply #9 on: June 30, 2009, 07:10:47 pm »
Kazade: unordered containers do not require the experimental c++-0x, they are in TR1

(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]