Author Topic: New parser model for Code completion  (Read 87877 times)

anonuser

  • Guest
Re: New parser model for Code completion
« Reply #15 on: December 11, 2005, 06:19:45 am »
Would you like an example state machine?
I have a little state machine for parsing xml across the network. It parses exactly one xml message. There are also fail safes in place so invalid data won't crash it. Let me know and I'll paste an exert.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: New parser model for Code completion
« Reply #16 on: December 11, 2005, 07:12:29 am »
sure, why not :)

Altho i was more interested in replicating the tokenizer.cpp functionality (see source code)... but you're welcome to post your code. Specially when I have NOT designed any state machine yet.

anonuser

  • Guest
Re: New parser model for Code completion
« Reply #17 on: December 11, 2005, 07:27:25 am »
Alright here codes.
This is C++ and its just the statemachine but it should be obvious what's going on. Sorry if anything is sloppy.
The networking is abstracted, getData(false) means do not return a copy of the data to keep things clean.
here are the eStates:
here's the header for clarity's sake
(rid of code to make thread easier to read)
« Last Edit: December 12, 2005, 08:23:11 pm by anonuser »

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: New parser model for Code completion
« Reply #18 on: December 12, 2005, 05:07:59 am »
Hmmm... very interesting. However, I have a question.

Quote
switch(eState) {
case Begin:
...
case WaitOpen:
...
case StoreName:
...
case WaitClose:
...
case Quote:
...
case WaitEscape:

Let me guess - Those are the states of your state machine?

anonuser

  • Guest
Re: New parser model for Code completion
« Reply #19 on: December 12, 2005, 05:31:37 am »
Exactly. State machines are simple by design but can be complex when implementing them.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: New parser model for Code completion
« Reply #20 on: December 12, 2005, 05:30:07 pm »
Well , that's good to know, because you see... that's *NOT* how I plan to implement my state machine. See, your states are HARDWIRED. Just like Codeblocks' tokenizer. What I want to do is a state transition table, and a state transition interpreter.

takeshimiya

  • Guest
Re: New parser model for Code completion
« Reply #21 on: December 12, 2005, 05:43:04 pm »
Rick, perhaps you want to look at Libero:

State machine generators are closely related to the deterministic finite automata (DFA) generated by scanner generators like the one built into ANTLR. However, scanner generators are aimed at language applications and are not well suited for state machine generation outside this area. The Libero state machine generator from iMatix supports state machine generation for a variety of applications. It is released as GPL.

The input to Libero is a state machine diagram expressed in a textural language. The output can be in a variety of languages ranging from Java and C++ to COBOL.


However I think the ANTLR approach of parser generators with LL(k) is a lot better.

anonuser

  • Guest
Re: New parser model for Code completion
« Reply #22 on: December 12, 2005, 06:16:53 pm »
its the same idea, hard wired or not. just an example.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: New parser model for Code completion
« Reply #23 on: December 12, 2005, 07:20:13 pm »
Yes, I see. But if we use a C++ generator, it means we'll have to recompile after adding all the language modules. And we'd need a state machine for EACH of the languages, instead of a global one and reading the states as data, depending on the model.

takeshimiya

  • Guest
Re: New parser model for Code completion
« Reply #24 on: December 12, 2005, 07:50:00 pm »
I think it's easier to recompile C::B to add support for a language, rather than writting entirely news grammar for that language, but that's just me :D

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: New parser model for Code completion
« Reply #25 on: December 12, 2005, 08:27:41 pm »
I see. Well in that case, I'd rather have a hardwired state table than having the full C++ code generated for different languages (which is what i'm trying to avoid, doing that makes the code very hard to maintain)

Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
Re: New parser model for Code completion
« Reply #26 on: December 12, 2005, 08:40:55 pm »
Yes, I see. But if we use a C++ generator, it means we'll have to recompile after adding all the language modules.
And in the case of ANTLR, this means you have to do that using Java...
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

takeshimiya

  • Guest
Re: New parser model for Code completion
« Reply #27 on: December 12, 2005, 08:44:47 pm »
And in the case of ANTLR, this means you have to do that using Java...

You only have to use Java if you're using the Java output.
If you're using the C++ output you'll use C++.

Also ANTLR haves support (grammars and lexers) for parsing a lot of languages on earth.
« Last Edit: December 12, 2005, 08:47:57 pm by Takeshi Miya »

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: New parser model for Code completion
« Reply #28 on: December 12, 2005, 08:48:59 pm »
Perhaps making our own customized model would be easier. I mean, we can deduce the "C++ wannabe model" :lol: from Yiannis' code, and later we can expand it to other languages.

I looked at Yiannis' tokenizer (which would be better called 'parser', don't you love all that confusing nomenclature? :P ), and it has more or less a good model for deciding when to add a description for a function, keeps track of opening and closing brackets, etc.

Some of that functions can't be replicated (easily) with a standard FSA, so i'll add a new item to the table: "action". So instead of having only a "next state" given a state and an input, we'll have both an action, and a next state. That'll facilitate things.

EDIT: But it all depends on whether ANTLR has a generator, or it's more a generalized-parser model.

takeshimiya

  • Guest
Re: New parser model for Code completion
« Reply #29 on: December 12, 2005, 09:04:39 pm »
ANTLR is a LL(k) lookahead based parser generator. It generates a recursive descent parser from the grammar.
You can see how an example of how to create a parser from the start here.

Elkhound is a GLR Parser Generator. The parsers it generates use the Generalized LR parsing algorithm. GLR works with any context-free grammar.

Boost Spirit is an object oriented recursive descent parser framework implemented using template meta-programming techniques. Expression templates allow us to approximate the syntax of EBNF completely in C++. Parser objects are composed through operator overloading and the result is a backtracking, top down parser that is capable of parsing rather ambiguous grammars.
The Spirit framework enables a target grammar to be written exclusively in C++. Inline EBNF grammar specifications can mix freely with other C++ code and, thanks to the generative power of C++ templates, are immediately executable.