Author Topic: flex is much faster?  (Read 12396 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5931
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
flex is much faster?
« on: July 31, 2010, 07:03:24 pm »
I just create a simple lexer by flex. the lex grammar is copied from Codelite's CXXparser.
 in the folder:
http://codelite.svn.sourceforge.net/viewvc/codelite/trunk/CxxParser/



I just do a comparison on parsing a 8M cpp source file.

A:
parsertest project of cc_branch, I just continuously call m_Tokenizer.GetToken() function, then set the state of tokenizer to "SkipNone", in Tokenizer, I comment out all the code about "preprocessor value caculation and macroreplacement", then the result time is:

1100 ms

B: I just create a exe fine from the generated from lex grammar( a little modified to remove the codelite related function call)
see:
expr_lexer.l
and
cpp_lexer.h

Then the lexer of the whole source file (8M) is about 150 ms.

I'm not sure what cause such difference. The former test we used is wxString, but the later flex code is using char*.

if true, we could totally use flex to generate a internal lexer of our Tokenizer class. we can only refactor the Tokenizer class, and let the interface of Tokenizer class keep the same as before.  :D

Any comments??
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: flex is much faster?
« Reply #1 on: July 31, 2010, 10:17:10 pm »
Use some profiler to see what is taking most of the time in the C::B's parser/lexer.
Guessing is pointless.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5931
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: flex is much faster?
« Reply #2 on: August 01, 2010, 05:14:58 am »
Use some profiler to see what is taking most of the time in the C::B's parser/lexer.
Guessing is pointless.

thanks for your hint.I just do a test, both with -pg enabled.

The CB's parser/lexer result:
Quote
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total          
 time   seconds   seconds    calls  ms/call  ms/call  name    
 16.33      0.08     0.08 19924967     0.00     0.00  Tokenizer::CurrentChar() const
 12.24      0.14     0.06  1590988     0.00     0.00  Tokenizer::GetToken()
  8.16      0.18     0.04  1590988     0.00     0.00  Tokenizer::DoGetToken()
  6.12      0.21     0.03 19966855     0.00     0.00  wxString::GetChar(unsigned int) const
  6.12      0.24     0.03 11617956     0.00     0.00  Tokenizer::IsEOF() const
  6.12      0.27     0.03  8227898     0.00     0.00  Tokenizer::MoveToNextChar(unsigned int)
  6.12      0.30     0.03  3182045     0.00     0.00  wxStringBase::wxStringBase(wxStringBase const&)
  4.08      0.32     0.02 19966855     0.00     0.00  wxStringBase::at(unsigned int) const
  4.08      0.34     0.02  5030899     0.00     0.00  wxStringData::Unlock()
  4.08      0.36     0.02  5030898     0.00     0.00  wxStringBase::~wxStringBase()
  4.08      0.38     0.02  1591019     0.00     0.00  wxStringBase::wxStringBase()
  4.08      0.40     0.02   118328     0.00     0.00  Tokenizer::ReadParentheses(wxString&)
  4.08      0.42     0.02        1    20.00   490.00  ParserThread::DoParse()
  2.04      0.43     0.01  5030898     0.00     0.00  wxString::~wxString()
  2.04      0.44     0.01  3182216     0.00     0.00  wxStringBase::length() const
  2.04      0.45     0.01  3182053     0.00     0.00  wxStringBase::empty() const
  2.04      0.46     0.01  3182045     0.00     0.00  wxString::wxString(wxString const&)
  2.04      0.47     0.01  3108630     0.00     0.00  Tokenizer::SkipComment()
  2.04      0.48     0.01  1591019     0.00     0.00  wxString::wxString()
I'm not sure which value is correct. I just do a time measure around the DoParse function:

Code
        wxStopWatch sw;
        DoParse();
        long t = sw.Time();
        ParserTrace(_T("The long running function took %ldms to execute"),t);
        sw.Pause();

...My DoParse code is just like:

while (m_Tokenizer.NotEOF())
    {
        if (!m_pTokensTree || TestDestroy())
            break;
        m_Tokenizer.GetToken();
        continue;
        ......
   }




The result time is t= 2100ms
But from the profile result:
Quote
 4.08      0.42     0.02        1    20.00   490.00  ParserThread::DoParse()
Which means the time call of Doparse is 490ms???
I suspect the profile result is WRONG!!!


The result of YYlex code:
Quote
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total          
 time   seconds   seconds    calls  ns/call  ns/call  name    
100.00      0.09     0.09  1281519    70.23    70.23  yylex
  0.00      0.09     0.00        3     0.00     0.00  yy_switch_to_buffer
  0.00      0.09     0.00        3     0.00     0.00  yyalloc
  0.00      0.09     0.00        2     0.00     0.00  yy_delete_buffer
  0.00      0.09     0.00        2     0.00     0.00  yy_flush_buffer
  0.00      0.09     0.00        1     0.00     0.00  yy_create_buffer
  0.00      0.09     0.00        1     0.00     0.00  yypop_buffer_state
  0.00      0.09     0.00        1     0.00     0.00  yyrestart
  0.00      0.09     0.00        1     0.00     0.00  yywrap
you can see, the yylex only takes 70.23 ns.

I have just do a time measure around all the yylex call

Code
            CTimer timer;
            while (a = yylex())
            {

                n++;
                if(a==LE_IDENTIFIER)
                    ;//printf("find some id!!= %s\n",yytext);

            }
            printf("n=%d",n);
            int t = timer.GetCurrentTime();
            printf("time = %d",t);

The measure result is:  t = 161ms.

There are some extra keyword matching in the yylex, because it can match all the "keywords" in C++, also, many binary operators were also matched in the yylex grammar, but this was not done in the currently Tokenizer::DoGetToken.

 :D

Any ideas?


« Last Edit: August 01, 2010, 05:17:46 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: flex is much faster?
« Reply #3 on: August 01, 2010, 11:24:37 am »
Ollydbg: Keep in mind that gprof measures only user code,
if you have system calls (sleep, file operations) the profile will be wrong.

I hope your test code looks like this:
Code
LoadWholeFile(filename);
DoParse();

If you have file operations inside DoParse() your profile will always be wrong. This a limitation of gprof and won't be fixed (I think).

(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5931
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: flex is much faster?
« Reply #4 on: August 01, 2010, 12:46:24 pm »
Ollydbg: Keep in mind that gprof measures only user code,
if you have system calls (sleep, file operations) the profile will be wrong.
thanks for the reply. I don't have any system call in the DoParse(), all what I do is operator on a big wxString, which is already loaded before DoParse();

Quote
I hope your test code looks like this:
Code
LoadWholeFile(filename);
DoParse();
If you have file operations inside DoParse() your profile will always be wrong. This a limitation of gprof and won't be fixed (I think).

I'm suspect this centense:
Quote
Each sample counts as 0.01 seconds.
I'm not sure what does this means? a sample rate of 0.01 seconds??
 
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5931
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: flex is much faster?
« Reply #5 on: August 02, 2010, 07:45:10 am »
I find an even better lexer, it is Lexical Analyzer Generator Quex

I have give it a try and ask the author of quex about the performance, here is his answer:

SourceForge.net: Lexical Analyzer Generator Quex: Topic: Performance question about Quex

See his result compared with flex:

Quote
(*) Results ___________________________________________________________________
Results on "code/linux-2.6.22.17-kernel-dir.c".
(1) lexer-flex:
Compiled with -Os (size optimized) clock_cycles_per_character = {35.004494}, // overhead eliminated
Compiled with -O3 (speed optimized) clock_cycles_per_character = {43.224880}

(2) lexer-quex Compiled with -Os (size optimized) clock_cycles_per_character = {17.738173},
Compiled with -O3 (speed optimized) clock_cycles_per_character = {17.393938}

As you can see, it is also a cache issue--smaller programs perform faster, cause lesser cache misses.
In directory "demo/benchmark/run" you find some helper scripts to run your benchmark.
Note, that the benchmark tries to isolate the cost for lexical analyzis.
An inadvertent use of the std::string class can slow down the performance tremendously.

Nice!!

This also give some hints why our DoParse function in CC runs slower, because it use wxString.  :D
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: flex is much faster?
« Reply #6 on: August 02, 2010, 09:50:44 am »
This also give some hints why our DoParse function in CC runs slower, because it use wxString.  :D
The profile just told you this :)

p.s. But keep in mind this: doing optimizations work is worthless if you don't have an automatic regression testing system!
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5931
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: flex is much faster?
« Reply #7 on: August 02, 2010, 10:20:37 am »
This also give some hints why our DoParse function in CC runs slower, because it use wxString.  :D
The profile just told you this :)
Thanks, but How can you explain the profile result of DoParse() function?? I still can't fully understand the profile result of CC.


Quote
p.s. But keep in mind this: doing optimizations work is worthless if you don't have an automatic regression testing system!
That's correct.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: flex is much faster?
« Reply #8 on: August 02, 2010, 10:49:59 am »
Thanks, but How can you explain the profile result of DoParse() function?? I still can't fully understand the profile result of CC.
Looking at the profile we can see many wxString functions.
Do you have the "call graph" (might be name something else) page of the profile?
This page give a bit more info for the flow of the program.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5931
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: flex is much faster?
« Reply #9 on: August 03, 2010, 04:18:44 pm »
@obf
thank you very much.
can you help to see the profile result?
I'm sorry I'm not fully understand the result.
See the attachment.
thanks.

BTW: does profile only support single thread? currently, CC use multithread.

change the file to

result.7z.001
result.7z.002
then, unzip them together.

because I can't upload that type of file...

[attachment deleted by admin]

[attachment deleted by admin]
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.