flex is much faster?

Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

(1/2) > >>

ollydbg:
I just create a simple lexer by flex. the lex grammar is copied from Codelite's CXXparser.
in the folder:
http://codelite.svn.sourceforge.net/viewvc/codelite/trunk/CxxParser/

I just do a comparison on parsing a 8M cpp source file.

A:
parsertest project of cc_branch, I just continuously call m_Tokenizer.GetToken() function, then set the state of tokenizer to "SkipNone", in Tokenizer, I comment out all the code about "preprocessor value caculation and macroreplacement", then the result time is:

1100 ms

B: I just create a exe fine from the generated from lex grammar( a little modified to remove the codelite related function call)
see:
expr_lexer.l
and
cpp_lexer.h

Then the lexer of the whole source file (8M) is about 150 ms.

I'm not sure what cause such difference. The former test we used is wxString, but the later flex code is using char*.

if true, we could totally use flex to generate a internal lexer of our Tokenizer class. we can only refactor the Tokenizer class, and let the interface of Tokenizer class keep the same as before. :D

Any comments??

oBFusCATed:
Use some profiler to see what is taking most of the time in the C::B's parser/lexer.
Guessing is pointless.

ollydbg:

--- Quote from: oBFusCATed on July 31, 2010, 10:17:10 pm ---Use some profiler to see what is taking most of the time in the C::B's parser/lexer.
Guessing is pointless.

--- End quote ---

thanks for your hint.I just do a test, both with -pg enabled.

The CB's parser/lexer result:

--- Quote ---Flat profile:

Each sample counts as 0.01 seconds.
  % cumulative self self total
time seconds seconds calls ms/call ms/call name
16.33 0.08 0.08 19924967 0.00 0.00 Tokenizer::CurrentChar() const
12.24 0.14 0.06 1590988 0.00 0.00 Tokenizer::GetToken()
  8.16 0.18 0.04 1590988 0.00 0.00 Tokenizer::DoGetToken()
  6.12 0.21 0.03 19966855 0.00 0.00 wxString::GetChar(unsigned int) const
  6.12 0.24 0.03 11617956 0.00 0.00 Tokenizer::IsEOF() const
  6.12 0.27 0.03 8227898 0.00 0.00 Tokenizer::MoveToNextChar(unsigned int)
  6.12 0.30 0.03 3182045 0.00 0.00 wxStringBase::wxStringBase(wxStringBase const&)
  4.08 0.32 0.02 19966855 0.00 0.00 wxStringBase::at(unsigned int) const
  4.08 0.34 0.02 5030899 0.00 0.00 wxStringData::Unlock()
  4.08 0.36 0.02 5030898 0.00 0.00 wxStringBase::~wxStringBase()
  4.08 0.38 0.02 1591019 0.00 0.00 wxStringBase::wxStringBase()
  4.08 0.40 0.02 118328 0.00 0.00 Tokenizer::ReadParentheses(wxString&)
  4.08 0.42 0.02 1 20.00 490.00 ParserThread::DoParse()
  2.04 0.43 0.01 5030898 0.00 0.00 wxString::~wxString()
  2.04 0.44 0.01 3182216 0.00 0.00 wxStringBase::length() const
  2.04 0.45 0.01 3182053 0.00 0.00 wxStringBase::empty() const
  2.04 0.46 0.01 3182045 0.00 0.00 wxString::wxString(wxString const&)
  2.04 0.47 0.01 3108630 0.00 0.00 Tokenizer::SkipComment()
  2.04 0.48 0.01 1591019 0.00 0.00 wxString::wxString()

--- End quote ---
I'm not sure which value is correct. I just do a time measure around the DoParse function:

--- Code: --- wxStopWatch sw;
   DoParse();
   long t = sw.Time();
   ParserTrace(_T("The long running function took %ldms to execute"),t);
   sw.Pause();

...My DoParse code is just like:

while (m_Tokenizer.NotEOF())
   {
   if (!m_pTokensTree || TestDestroy())
   break;
   m_Tokenizer.GetToken();
   continue;
......
   }

--- End code ---

The result time is t= 2100ms
But from the profile result:

--- Quote --- 4.08 0.42 0.02 1 20.00 490.00 ParserThread::DoParse()
--- End quote ---
Which means the time call of Doparse is 490ms???
I suspect the profile result is WRONG!!!

The result of YYlex code:

--- Quote ---Flat profile:

Each sample counts as 0.01 seconds.
  % cumulative self self total
time seconds seconds calls ns/call ns/call name
100.00 0.09 0.09 1281519 70.23 70.23 yylex
  0.00 0.09 0.00 3 0.00 0.00 yy_switch_to_buffer
  0.00 0.09 0.00 3 0.00 0.00 yyalloc
  0.00 0.09 0.00 2 0.00 0.00 yy_delete_buffer
  0.00 0.09 0.00 2 0.00 0.00 yy_flush_buffer
  0.00 0.09 0.00 1 0.00 0.00 yy_create_buffer
  0.00 0.09 0.00 1 0.00 0.00 yypop_buffer_state
  0.00 0.09 0.00 1 0.00 0.00 yyrestart
  0.00 0.09 0.00 1 0.00 0.00 yywrap

--- End quote ---
you can see, the yylex only takes 70.23 ns.

I have just do a time measure around all the yylex call

--- Code: --- CTimer timer;
   while (a = yylex())
   {

   n++;
   if(a==LE_IDENTIFIER)
   ;//printf("find some id!!= %s\n",yytext);

   }
   printf("n=%d",n);
   int t = timer.GetCurrentTime();
   printf("time = %d",t);

--- End code ---

The measure result is: t = 161ms.

There are some extra keyword matching in the yylex, because it can match all the "keywords" in C++, also, many binary operators were also matched in the yylex grammar, but this was not done in the currently Tokenizer::DoGetToken.

:D

Any ideas?

oBFusCATed:
Ollydbg: Keep in mind that gprof measures only user code,
if you have system calls (sleep, file operations) the profile will be wrong.

I hope your test code looks like this:

--- Code: ---LoadWholeFile(filename);
DoParse();

--- End code ---

If you have file operations inside DoParse() your profile will always be wrong. This a limitation of gprof and won't be fixed (I think).

ollydbg:

--- Quote from: oBFusCATed on August 01, 2010, 11:24:37 am ---Ollydbg: Keep in mind that gprof measures only user code,
if you have system calls (sleep, file operations) the profile will be wrong.

--- End quote ---
thanks for the reply. I don't have any system call in the DoParse(), all what I do is operator on a big wxString, which is already loaded before DoParse();

--- Quote ---I hope your test code looks like this:

--- Code: ---LoadWholeFile(filename);
DoParse();

--- End code ---
If you have file operations inside DoParse() your profile will always be wrong. This a limitation of gprof and won't be fixed (I think).

--- End quote ---

I'm suspect this centense:

--- Quote ---Each sample counts as 0.01 seconds.

--- End quote ---
I'm not sure what does this means? a sample rate of 0.01 seconds??

Navigation

[0] Message Index

[#] Next page

Go to full version