Author Topic: flex is much faster? (Read 12128 times)

ollydbg · « **on:** July 31, 2010, 07:03:24 pm »

I just create a simple lexer by flex. the lex grammar is copied from Codelite's CXXparser.
in the folder:
http://codelite.svn.sourceforge.net/viewvc/codelite/trunk/CxxParser/

I just do a comparison on parsing a 8M cpp source file.

A:
parsertest project of cc_branch, I just continuously call m_Tokenizer.GetToken() function, then set the state of tokenizer to "SkipNone", in Tokenizer, I comment out all the code about "preprocessor value caculation and macroreplacement", then the result time is:

1100 ms

B: I just create a exe fine from the generated from lex grammar( a little modified to remove the codelite related function call)
see:
expr_lexer.l
and
cpp_lexer.h

Then the lexer of the whole source file (8M) is about 150 ms.

I'm not sure what cause such difference. The former test we used is wxString, but the later flex code is using char*.

if true, we could totally use flex to generate a internal lexer of our Tokenizer class. we can only refactor the Tokenizer class, and let the interface of Tokenizer class keep the same as before.

Any comments??

oBFusCATed · « **Reply #1 on:** July 31, 2010, 10:17:10 pm »

Use some profiler to see what is taking most of the time in the C::B's parser/lexer.
Guessing is pointless.

ollydbg · « **Reply #2 on:** August 01, 2010, 05:14:58 am »

Quote from: oBFusCATed on July 31, 2010, 10:17:10 pm

Use some profiler to see what is taking most of the time in the C::B's parser/lexer.
Guessing is pointless.

thanks for your hint.I just do a test, both with -pg enabled.

The CB's parser/lexer result:

Quote

Flat profile:

Each sample counts as 0.01 seconds.
  % cumulative self self total
time seconds seconds calls ms/call ms/call name
16.33 0.08 0.08 19924967 0.00 0.00 Tokenizer::CurrentChar() const
12.24 0.14 0.06 1590988 0.00 0.00 Tokenizer::GetToken()
  8.16 0.18 0.04 1590988 0.00 0.00 Tokenizer::DoGetToken()
  6.12 0.21 0.03 19966855 0.00 0.00 wxString::GetChar(unsigned int) const
  6.12 0.24 0.03 11617956 0.00 0.00 Tokenizer::IsEOF() const
  6.12 0.27 0.03 8227898 0.00 0.00 Tokenizer::MoveToNextChar(unsigned int)
  6.12 0.30 0.03 3182045 0.00 0.00 wxStringBase::wxStringBase(wxStringBase const&)
  4.08 0.32 0.02 19966855 0.00 0.00 wxStringBase::at(unsigned int) const
  4.08 0.34 0.02 5030899 0.00 0.00 wxStringData::Unlock()
  4.08 0.36 0.02 5030898 0.00 0.00 wxStringBase::~wxStringBase()
  4.08 0.38 0.02 1591019 0.00 0.00 wxStringBase::wxStringBase()
  4.08 0.40 0.02 118328 0.00 0.00 Tokenizer::ReadParentheses(wxString&)
  4.08 0.42 0.02 1 20.00 490.00 ParserThread::DoParse()
  2.04 0.43 0.01 5030898 0.00 0.00 wxString::~wxString()
  2.04 0.44 0.01 3182216 0.00 0.00 wxStringBase::length() const
  2.04 0.45 0.01 3182053 0.00 0.00 wxStringBase::empty() const
  2.04 0.46 0.01 3182045 0.00 0.00 wxString::wxString(wxString const&)
  2.04 0.47 0.01 3108630 0.00 0.00 Tokenizer::SkipComment()
  2.04 0.48 0.01 1591019 0.00 0.00 wxString::wxString()

I'm not sure which value is correct. I just do a time measure around the DoParse function:

Code

        wxStopWatch sw;
        DoParse();
        long t = sw.Time();
        ParserTrace(_T("The long running function took %ldms to execute"),t);
        sw.Pause();

...My DoParse code is just like:

while (m_Tokenizer.NotEOF())
    {
        if (!m_pTokensTree || TestDestroy())
            break;
        m_Tokenizer.GetToken();
        continue;
        ......
   }

The result time is t= 2100ms
But from the profile result:

Quote

4.08 0.42 0.02 1 20.00 490.00 ParserThread::DoParse()

Which means the time call of Doparse is 490ms???
I suspect the profile result is WRONG!!!

The result of YYlex code:

Quote

Flat profile:

Each sample counts as 0.01 seconds.
  % cumulative self self total
time seconds seconds calls ns/call ns/call name
100.00 0.09 0.09 1281519 70.23 70.23 yylex
  0.00 0.09 0.00 3 0.00 0.00 yy_switch_to_buffer
  0.00 0.09 0.00 3 0.00 0.00 yyalloc
  0.00 0.09 0.00 2 0.00 0.00 yy_delete_buffer
  0.00 0.09 0.00 2 0.00 0.00 yy_flush_buffer
  0.00 0.09 0.00 1 0.00 0.00 yy_create_buffer
  0.00 0.09 0.00 1 0.00 0.00 yypop_buffer_state
  0.00 0.09 0.00 1 0.00 0.00 yyrestart
  0.00 0.09 0.00 1 0.00 0.00 yywrap

you can see, the yylex only takes 70.23 ns.

I have just do a time measure around all the yylex call

Code

            CTimer timer;
            while (a = yylex())
            {

                n++;
                if(a==LE_IDENTIFIER)
                    ;//printf("find some id!!= %s\n",yytext);

            }
            printf("n=%d",n);
            int t = timer.GetCurrentTime();
            printf("time = %d",t);

The measure result is: t = 161ms.

There are some extra keyword matching in the yylex, because it can match all the "keywords" in C++, also, many binary operators were also matched in the yylex grammar, but this was not done in the currently Tokenizer::DoGetToken.

Any ideas?

oBFusCATed · « **Reply #3 on:** August 01, 2010, 11:24:37 am »

Ollydbg: Keep in mind that gprof measures only user code,
if you have system calls (sleep, file operations) the profile will be wrong.

I hope your test code looks like this:

Code

LoadWholeFile(filename);
DoParse();

If you have file operations inside DoParse() your profile will always be wrong. This a limitation of gprof and won't be fixed (I think).

ollydbg · « **Reply #4 on:** August 01, 2010, 12:46:24 pm »

Quote from: oBFusCATed on August 01, 2010, 11:24:37 am

Ollydbg: Keep in mind that gprof measures only user code,
if you have system calls (sleep, file operations) the profile will be wrong.

thanks for the reply. I don't have any system call in the DoParse(), all what I do is operator on a big wxString, which is already loaded before DoParse();

Quote

I hope your test code looks like this:
Code
LoadWholeFile(filename);
DoParse();
If you have file operations inside DoParse() your profile will always be wrong. This a limitation of gprof and won't be fixed (I think).

I'm suspect this centense:

Quote

Each sample counts as 0.01 seconds.

I'm not sure what does this means? a sample rate of 0.01 seconds??

ollydbg · « **Reply #5 on:** August 02, 2010, 07:45:10 am »

I find an even better lexer, it is Lexical Analyzer Generator Quex

I have give it a try and ask the author of quex about the performance, here is his answer:

SourceForge.net: Lexical Analyzer Generator Quex: Topic: Performance question about Quex

See his result compared with flex:

Quote

(*) Results ___________________________________________________________________
Results on "code/linux-2.6.22.17-kernel-dir.c".
(1) lexer-flex:
Compiled with -Os (size optimized) clock_cycles_per_character = {35.004494}, // overhead eliminated
Compiled with -O3 (speed optimized) clock_cycles_per_character = {43.224880}

(2) lexer-quex Compiled with -Os (size optimized) clock_cycles_per_character = {17.738173},
Compiled with -O3 (speed optimized) clock_cycles_per_character = {17.393938}

As you can see, it is also a cache issue--smaller programs perform faster, cause lesser cache misses.
In directory "demo/benchmark/run" you find some helper scripts to run your benchmark.
Note, that the benchmark tries to isolate the cost for lexical analyzis.
An inadvertent use of the std::string class can slow down the performance tremendously.

Nice!!

This also give some hints why our DoParse function in CC runs slower, because it use wxString.

oBFusCATed · « **Reply #6 on:** August 02, 2010, 09:50:44 am »

Quote from: ollydbg on August 02, 2010, 07:45:10 am

This also give some hints why our DoParse function in CC runs slower, because it use wxString.

The profile just told you this

p.s. But keep in mind this: doing optimizations work is worthless if you don't have an automatic regression testing system!

ollydbg · « **Reply #7 on:** August 02, 2010, 10:20:37 am »

Quote from: oBFusCATed on August 02, 2010, 09:50:44 am

Quote from: ollydbg on August 02, 2010, 07:45:10 am
This also give some hints why our DoParse function in CC runs slower, because it use wxString.
The profile just told you this

Thanks, but How can you explain the profile result of DoParse() function?? I still can't fully understand the profile result of CC.

Quote

p.s. But keep in mind this: doing optimizations work is worthless if you don't have an automatic regression testing system!

That's correct.

oBFusCATed · « **Reply #8 on:** August 02, 2010, 10:49:59 am »

Quote from: ollydbg on August 02, 2010, 10:20:37 am

Thanks, but How can you explain the profile result of DoParse() function?? I still can't fully understand the profile result of CC.

Looking at the profile we can see many wxString functions.
Do you have the "call graph" (might be name something else) page of the profile?
This page give a bit more info for the flow of the program.

ollydbg · « **Reply #9 on:** August 03, 2010, 04:18:44 pm »

@obf
thank you very much.
can you help to see the profile result?
I'm sorry I'm not fully understand the result.
See the attachment.
thanks.

BTW: does profile only support single thread? currently, CC use multithread.

change the file to

result.7z.001
result.7z.002
then, unzip them together.

because I can't upload that type of file...

[attachment deleted by admin]

[attachment deleted by admin]

Code::Blocks Forums

News:

Author Topic: flex is much faster? (Read 12128 times)

ollydbg

flex is much faster?

oBFusCATed

Re: flex is much faster?

ollydbg

Re: flex is much faster?

oBFusCATed

Re: flex is much faster?

ollydbg

Re: flex is much faster?

ollydbg

Re: flex is much faster?

oBFusCATed

Re: flex is much faster?

ollydbg

Re: flex is much faster?

oBFusCATed

Re: flex is much faster?

ollydbg

Re: flex is much faster?