Author Topic: quex lexer grammar, probably can make our tokenizer much faster  (Read 14002 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5913
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
quex lexer grammar, probably can make our tokenizer much faster
« on: September 24, 2010, 11:19:10 am »
As we know, Quex lexer generate some code directed lexer which is much faster then the "table driving lexer" like flex. see:

Lexical analysis - Wikipedia, the free encyclopedia

Quex - A Fast Universal Lexical Analyzer Generator

and the performance compared with flex

SourceForge.net: Lexical Analyzer Generator Quex: Topic: Performance question about Quex post 5

And I just create a lexer grammar here
SourceForge.net: Lexical Analyzer Generator Quex: Modify: 3074664 - a c++ quex lexer code

By using a lexer generator, we can remove a lot of hand-craft codes in our currently Tokenizer class, and I hope it will let our CC much faster.

I have a simple demo test for Parserthread using the "TokenID and text infromation" from this lexer, This way, we can avoid a lot of wxString comparing statement in Parserthread's code, instead, we used TokenID comparison, which is only int type comparison or a table driven jump can be used.

but the demo test is quite simple, I hope I can improved it in the future.




If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5913
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: quex lexer grammar, probably can make our tokenizer much faster
« Reply #1 on: September 29, 2010, 08:16:02 am »
but the demo test is quite simple, I hope I can improved it in the future.

The dummy demo test project can be found here (with a cbp project, only tested under windows):

http://code.google.com/p/quexparser/

SVN:
http://code.google.com/p/quexparser/source/checkout

A lot of work need to be done. (a lot of function in Parserthread class are stub functions  :D).

Any comments are welcome!!!
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Loaden

  • Lives here!
  • ****
  • Posts: 1014
Re: quex lexer grammar, probably can make our tokenizer much faster
« Reply #2 on: September 30, 2010, 07:16:12 am »
Well done! :D