Code::Blocks Forums

Developer forums (C::B DEVELOPMENT STRICTLY!) => Development => CodeCompletion redesign => Topic started by: takeshimiya on December 08, 2005, 09:15:31 am

Title: New parser model for Code completion
Post by: takeshimiya on December 08, 2005, 09:15:31 am
Don't worry, this is what opensource is, you code on your free-time when you want, but if you don't have time, no problem.

If you're going to improve the code completion plugin it will be great, as I consider it's the part of C::B that most needs a lot of improvement. :)
But given that writting a complete C++ parser is difficult and can take years, I wonder if you did gave a look at the opensource ones I researched here (http://forums.codeblocks.org/index.php?topic=1559.msg11150#msg11150).

Good luck! :: Suerte! :D
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: rickg22 on December 08, 2005, 04:39:03 pm
It's not really hard. Right now I'll only improve the parsing time by elliminating the token-adding overhead. That is, if Yiannis doesn't find out what's wrong first and beats me to a quick-hack :P

(Alright, alright, here's the answer, you could use a hash table on the tokens' names. There's your quick hack :P )

Regarding the parser... that's easy, too. It's just matter of designing a finite state machine that can be programmed in XML, so we can add custom languages. And no, i'm NOT being sarcastic! I actually had thought of this since i looked at codecompletion for the first time.
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: cyberkoa on December 08, 2005, 05:55:45 pm
remember to come back later, dun disappear :)
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: takeshimiya on December 08, 2005, 06:30:20 pm
You're the first person I hear saying that writting a C++ parser is easy. :shock: I mean, a full C++ parser capable of parsing templates and very big projects (like STL, QT, Mozilla, wx).

A parser like the one that haves Visual Assist X:
-Parses all files in the code, and the ones outside the project too.
-Parses the current file without need to be saved.
-Can parse non valid code.
-The comments surrounding functions is keeped with functions (important).
-The parser running in a background thread.

It would be great if the C++ parser could be separated from the CodeCompletion code, so it could be used for other purposes (better syntax highlight, code analysing).
For example, having with the same style the keyword int, float, CMyClass, wxWindow, etc. without requiering user to specify user-keywords.
Or having brace match highlighting, but when you are inside of that block (not like now that only higlights if you're at the brace).

Basically I would want CodeCompletion plugin becoming Visual Assist X (http://www.wholetomato.com). It really makes a difference, no joke. Once you get used to it's features, it's very difficult to use anything else.

If you ask what features are essential to me, those are:

Hovering Tooltips (http://www.wholetomato.com/products/features/hover.html?more=yes)
Enhanced Listboxes (http://www.wholetomato.com/products/features/members.html)
Better Parameter Info (http://www.wholetomato.com/products/features/parameter.html?more=yes)
Enhanced Syntax Coloring (http://www.wholetomato.com/products/features/color.html?more=yes)
Local Symbols in Bold (http://www.wholetomato.com/products/features/bold.html)
Suggestion Lists (http://www.wholetomato.com/products/features/suggestion.html)
Shorthand (http://www.wholetomato.com/products/features/shorthand.html)
Repair Case (http://www.wholetomato.com/products/features/case.html)
Convert Dot to -> (http://www.wholetomato.com/products/features/dot.html)
Context Field (http://www.wholetomato.com/products/features/context.html?more=yes)
Definition Field (http://www.wholetomato.com/products/features/definition.html)
Navigate Back and Forward (http://www.wholetomato.com/products/features/navigate.html)

Hovering tooltips is almost the most important feature to me:
(http://www.wholetomato.com/products/features/images/hover.gif)
The point of this is that you don't need any documentation, you get straight the comments from any function. It's simply too adictive. :D


All of this would be feasible with the current C::B parser, or it would requiere a very major rewrite?
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: rickg22 on December 08, 2005, 07:22:54 pm
You're the first person I hear saying that writting a C++ parser is easy. :shock: I mean, a full C++ parser capable of parsing templates and very big projects (like STL, QT, Mozilla, wx).

Oh no, i don't mean a full C++ parser... that would require Yacc / Bison / etc / eew. Actually, i'm only replicating (and generalizing) the C::B parser's functionality. But I'm gonna rewrite it using FSA's so we can extend it for other languages. And I'm gonna use my work-in-progress search tree, so we end up with something like this:

the current_token_id would be something like enum, like
token_if,token_then,token_class, etc. (I'll have two search trees: One for C++ keywords, like "class", "typedef", "public", "for","if", etc;, and another for identifiers "myvar","myclass" <- these are the ones added to the parser.)

Code
if(is_keyword)
{
  appropriate_action = thisparser->lookup_action_for_keywords(current_state,current_token_id);
  next_state = thisparser->lookup_state_for_keywords(current_state,current_token_id);
}
else
{
  appropriate_action = thisparser->takeaction_for_identifiers(current_token_id);
  next_state = thisparser->lookup_state_for_identifiers(current_state,current_token_id);
}

switch(appropriate_action)
{
 ...
 case id_add_function_declaration: add_function_declaration(thistoken->params) /* or something */
 
}

current_state = next_state;

See, a finite state machine changes state depending on the current state and keyword. (FSA's or FSM's as they're also called They're able to recognize regular expressions) Additionally, I'm adding an appropriate action which would modify the parser.

See:

http://www.cs.brown.edu/~jes/book/BOOK/node10.html

Quote
(insert lotsa suggestions here)

All of this would be feasible with the current C::B parser, or it would requiere a very major rewrite?
I don't know, I haven't thought of the details. But the theory is there :)
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: rickg22 on December 08, 2005, 07:27:30 pm
Oops... did I say Finite State Automaton? I meant PushDown Automaton, because i'll be needing a stack to keep track of if/then blocks and methods, etc.

http://en.wikipedia.org/wiki/Pushdown_automaton

Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: Michael on December 08, 2005, 07:43:58 pm
Oops... did I say Finite State Automaton? I meant PushDown Automaton, because i'll be needing a stack to keep track of if/then blocks and methods, etc.
http://en.wikipedia.org/wiki/Pushdown_automaton
Thank you for adding the PushDown Automaton description to the wiki. Very interesting.

Michael
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: takeshimiya on December 08, 2005, 07:54:39 pm
Great :)

How fits in that parser: attaching to an identifier (myfunction, myclass, myvariable) the surrounding comments?

ie.
// Save editor contents. Returns true on success, false otherwise.
virtual bool Save() { return true; }

wxString desc; // title of the regex


So when you are, let's say in the Symbols window (or completion window, or a tooltip), you can see the identifier with the surrounding comments.
Then automagically you won't requiere references manual of any library anymore. You'll have the API documentation right there :)

Once attached the comments to their respective identifier, they could be parsed also (ie. doxygen, javadoc style of comments).
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: Urxae on December 08, 2005, 07:55:44 pm
http://en.wikipedia.org/wiki/Pushdown_automaton
Thank you for adding the PushDown Automaton description to the wiki. Very interesting.

That's not the C::B wiki, that's Wikipedia. And for some reason I suspect that article already existed ;)...
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: Michael on December 08, 2005, 07:57:12 pm
That's not the C::B wiki, that's Wikipedia. And for some reason I suspect that article already existed ;)...
Oops...you're right :oops:
Title: Re: I'm afraid I don't have much time to work on C::B....
Post by: rickg22 on December 08, 2005, 08:01:15 pm
AH HAH! Here's what I wanted to show you! Take special look at the "State Transition table".

http://en.wikipedia.org/wiki/Event_driven_finite_state_machine

This is the kind of automaton i'll be building. Fits like a shoe to the current C::B parser model.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 08, 2005, 08:26:46 pm
Yeah, lot's of computer science theory :)

But how fits in the current parser, attaching to an identifier (myfunction, myclass, myvariable) the surrounding comments..?

I just hope it wouldn't become very hard to make more the parser more generalized (currently everything's hard coded).
And I hope by generalizing it, you don't reinvent the well, there are lot's of parser generators out there.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 08, 2005, 08:52:12 pm
lol :lol:
http://www.w3.org/TR/2005/WD-scxml-20050705/
Title: Re: New parser model for Code completion
Post by: rickg22 on December 08, 2005, 10:16:50 pm
Wow, thanks! That'll save me a lot of time! :)
Title: Re: New parser model for Code completion
Post by: rickg22 on December 09, 2005, 12:19:32 am
Some help, guys!

While I'm busy making my SearchTree class, i'd appreciate if someone of you can examine the tokenizer and parser classes (perhaps even parserthread?) and make a state transition (and actions?) table, like which token goes to which state... specifically, Tokenizer::DoGetToken(), Tokenizer::SkipUnwanted(), etc.

Generally, each Skip(char n) function is a state whose only exit transition is the one that breaks out of the loop.
Title: Re: New parser model for Code completion
Post by: anonuser on December 11, 2005, 06:19:45 am
Would you like an example state machine?
I have a little state machine for parsing xml across the network. It parses exactly one xml message. There are also fail safes in place so invalid data won't crash it. Let me know and I'll paste an exert.
Title: Re: New parser model for Code completion
Post by: rickg22 on December 11, 2005, 07:12:29 am
sure, why not :)

Altho i was more interested in replicating the tokenizer.cpp functionality (see source code)... but you're welcome to post your code. Specially when I have NOT designed any state machine yet.
Title: Re: New parser model for Code completion
Post by: anonuser on December 11, 2005, 07:27:25 am
Alright here codes.
This is C++ and its just the statemachine but it should be obvious what's going on. Sorry if anything is sloppy.
The networking is abstracted, getData(false) means do not return a copy of the data to keep things clean.
here are the eStates:
here's the header for clarity's sake
(rid of code to make thread easier to read)
Title: Re: New parser model for Code completion
Post by: rickg22 on December 12, 2005, 05:07:59 am
Hmmm... very interesting. However, I have a question.

Quote
switch(eState) {
case Begin:
...
case WaitOpen:
...
case StoreName:
...
case WaitClose:
...
case Quote:
...
case WaitEscape:

Let me guess - Those are the states of your state machine?
Title: Re: New parser model for Code completion
Post by: anonuser on December 12, 2005, 05:31:37 am
Exactly. State machines are simple by design but can be complex when implementing them.
Title: Re: New parser model for Code completion
Post by: rickg22 on December 12, 2005, 05:30:07 pm
Well , that's good to know, because you see... that's *NOT* how I plan to implement my state machine. See, your states are HARDWIRED. Just like Codeblocks' tokenizer. What I want to do is a state transition table, and a state transition interpreter.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 12, 2005, 05:43:04 pm
Rick, perhaps you want to look at Libero (http://www.imatix.com/html/libero/index.htm):

State machine generators are closely related to the deterministic finite automata (DFA) generated by scanner generators like the one built into ANTLR. However, scanner generators are aimed at language applications and are not well suited for state machine generation outside this area. The Libero state machine generator from iMatix supports state machine generation for a variety of applications. It is released as GPL.

The input to Libero is a state machine diagram expressed in a textural language. The output can be in a variety of languages ranging from Java and C++ to COBOL.


However I think the ANTLR approach of parser generators with LL(k) is a lot better.
Title: Re: New parser model for Code completion
Post by: anonuser on December 12, 2005, 06:16:53 pm
its the same idea, hard wired or not. just an example.
Title: Re: New parser model for Code completion
Post by: rickg22 on December 12, 2005, 07:20:13 pm
Yes, I see. But if we use a C++ generator, it means we'll have to recompile after adding all the language modules. And we'd need a state machine for EACH of the languages, instead of a global one and reading the states as data, depending on the model.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 12, 2005, 07:50:00 pm
I think it's easier to recompile C::B to add support for a language, rather than writting entirely news grammar for that language, but that's just me :D
Title: Re: New parser model for Code completion
Post by: rickg22 on December 12, 2005, 08:27:41 pm
I see. Well in that case, I'd rather have a hardwired state table than having the full C++ code generated for different languages (which is what i'm trying to avoid, doing that makes the code very hard to maintain)
Title: Re: New parser model for Code completion
Post by: thomas on December 12, 2005, 08:40:55 pm
Yes, I see. But if we use a C++ generator, it means we'll have to recompile after adding all the language modules.
And in the case of ANTLR, this means you have to do that using Java...
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 12, 2005, 08:44:47 pm
And in the case of ANTLR, this means you have to do that using Java...

You only have to use Java if you're using the Java output.
If you're using the C++ output you'll use C++.

Also ANTLR haves support (grammars and lexers) for parsing a lot of languages on earth.
Title: Re: New parser model for Code completion
Post by: rickg22 on December 12, 2005, 08:48:59 pm
Perhaps making our own customized model would be easier. I mean, we can deduce the "C++ wannabe model" :lol: from Yiannis' code, and later we can expand it to other languages.

I looked at Yiannis' tokenizer (which would be better called 'parser', don't you love all that confusing nomenclature? :P ), and it has more or less a good model for deciding when to add a description for a function, keeps track of opening and closing brackets, etc.

Some of that functions can't be replicated (easily) with a standard FSA, so i'll add a new item to the table: "action". So instead of having only a "next state" given a state and an input, we'll have both an action, and a next state. That'll facilitate things.

EDIT: But it all depends on whether ANTLR has a generator, or it's more a generalized-parser model.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 12, 2005, 09:04:39 pm
ANTLR (http://www.antlr.org/) is a LL(k) lookahead based parser generator. It generates a recursive descent parser from the grammar.
You can see how an example of how to create a parser from the start here (http://www.merrells.com/blog/work/archives/2002/01/accent_vol_2_no.html).

Elkhound (http://www.cs.berkeley.edu/~smcpeak/elkhound/) is a GLR Parser Generator. The parsers it generates use the Generalized LR parsing algorithm. GLR works with any context-free grammar.

Boost Spirit (http://spirit.sourceforge.net/) is an object oriented recursive descent parser framework implemented using template meta-programming techniques. Expression templates allow us to approximate the syntax of EBNF completely in C++. Parser objects are composed through operator overloading and the result is a backtracking, top down parser that is capable of parsing rather ambiguous grammars.
The Spirit framework enables a target grammar to be written exclusively in C++. Inline EBNF grammar specifications can mix freely with other C++ code and, thanks to the generative power of C++ templates, are immediately executable.

Title: Re: New parser model for Code completion
Post by: killerbot on December 12, 2005, 09:14:47 pm
Hello parser experts,

Maybe you can shed some light on this bug report of mine :
https://sourceforge.net/tracker/index.php?func=detail&aid=1323191&group_id=126998&atid=707416

The code completion does not kick in on function arguments. Is this a shortcoming in the current strategy/parser, or is it possible with the current mechanism and are we just suffering from a ?minor/major? bug. Any ideas for fixes ?

Thanks for your time,
Lieven
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 12, 2005, 09:20:50 pm
Rick, I want to say that I encourage your own parser work, but it is certain that you'll have to spend some years to reach a state compared to the other works.
Title: Re: New parser model for Code completion
Post by: rickg22 on December 13, 2005, 05:44:44 am
TakeshiMiya: As I said, I only want to replicate Yiannis' parser functionality. Then we might be able to extend it or perhaps use the other parsers.

In any case, I'm still working on my tree model to improve the current parser's speed. I don't think it'll be easy to combine the current parser's memory model with the other parsers. In that case it would be better to start them completely from scratch, using a new memory model adapted to fit them. But I don't think I'm qualified to do that, because I don't know those parsers. On the other hand, you do seem to have knowledge about them :-)

(Besides, I've always wanted to do this, it's something like a personal challenge. Just like byo wanted to write his own RAD editor, I want to write my parser. But I don't guarantee that i might succeed on time, or even have be able to START working on it. As I said, my current circumstances only allow me to work on C::B half an hour daily, even less. So if anyone wants to try making their own improved code completion based on these parsers, they're welcome. Since my approach is different, I don't think I'll interfere with it.

Well, it's late now and I have to go to bed. See ya.
Title: Re: New parser model for Code completion
Post by: Michael on December 13, 2005, 11:49:47 am
Hello Rick and parser experts,

In a book I have bought time last year in Bangkok (I know that it would be a waste of money :)), there is a "simple" implementation of a parser for C++. This parser for C++ is part of a Mini-Interpreter for C++.

The implementation of the parser looks interesting and may be it could be useful for the development of your parser.

The code of the Mini-Interpreter for C++ belongs to the book The Art of C++ (http://books.mcgraw-hill.com/getbook.php?isbn=0072255129&template=osborne) and It is freely available here (http://books.mcgraw-hill.com/downloads/products/0072255129/0072255129_code.zip).

After unzipping the file, use a text editor (As UltraEdit) to get the files of the Mini-Interpreter C++ (it is CHAP9.LST).

Best wishes,
Michael
Title: Re: New parser model for Code completion
Post by: rickg22 on December 20, 2005, 07:26:20 pm
Guys, look at this!

http://www.cee.hw.ac.uk/~alison/alg/lectures.html

I think these lectures will benefit us all.
Title: Re: New parser model for Code completion
Post by: thomas on December 20, 2005, 07:34:48 pm
Hmm... that reads almost like Sedgewick's book :)


Oh... lol
Quote
Sedgewick, chapters 19-23
Title: Re: New parser model for Code completion
Post by: Michael on December 20, 2005, 07:42:33 pm
Guys, look at this!

http://www.cee.hw.ac.uk/~alison/alg/lectures.html

I think these lectures will benefit us all.

Interesting lectures, even if a bit old :). Some external links are unfortunately broken :(.

Michael
Title: Re: New parser model for Code completion
Post by: grv575 on December 21, 2005, 12:54:55 am
Wouldn't the ideal system do:

1.  use lexx to generate a lexer/tokenizer/fsm that would be run to tokenize the source files into tokens.
2.  yacc would then, given language tokens, parse them into valid language syntax and generate an abstract syntax tree (each node is a token - organized in a tree structure to reflect the language grammer - i.e. (x+5) +3 != x + (5 + 3) even though they have the same tokens - x, 5, 3).

So to add a different parser all that would need to be done is:
a) specify the lexx input file and run lexx on it - this produces a fsm
b) use yacc on a grammar specification based on the tokens lexx produces to produce the parser.

The parser will then parse the language specified and give you an abstract syntax tree to work with.  These tree nodes can be annotated with things like comments (although then it has to be agreed that comments must preced source lines unless they are on the same line or ...) or other attributes.

The tree will then be what codeblocks uses to iterate over to get the info it needs for code completion, find declaration/implementation, refactoring?, ...

CB wouldn't need to be recompiled to add a different language - just lexx & yacc run on the specification, then maybe an .xml config file edited which CB uses to determine which languages are available.

Isn't this the most robust & simplest solution to get up & running?
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 21, 2005, 01:10:32 am
Isn't this the most robust & simplest solution to get up & running?

No, the most one I found is CodeStore (http://vcfbuilder.org/?q=node/139), which in fact uses ANTLR C++ (which is a perfect replace for lexx/yacc/bison/etc)

If we use ANTLR, we get automatically these grammars done by the community:
-Full and mature, C++ supporting almost everything (templates, namespaces, etc)
-Phyton
-C#
-Java
-Pascal
-MySQL
-HTML
-CSS
-JavaScript
-Ada
-Verilog
-A lot more

And being that is a parser generator, it's intended to add any other language you want.
Title: Re: New parser model for Code completion
Post by: grv575 on December 21, 2005, 01:34:29 am
So what's the issue with integrating antlr?  It sounds like the java ant makefile compiler mixed with LR parsing...

I've used JavaCC with JCUP before for writing a compiler so I'd be willing to help integrate if needed.
So the output of antlr is a c++ parser or...?
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 21, 2005, 01:48:08 am
So the output of antlr is a c++ parser or...?
Yes, and it's better even, because the ANTLR output can be: a C++ parser, a Python parser, a C# parser, or a Java parser.
Im unsure if the output based on the grammars is written inside ANTLR, or it depends on the grammar developer, but for sure the C++ parser output is C++.

From what I've read ANTLR is like JavaCC but a lot better.

Anyways, take a look at CodeStore, that it's the work of another opensource IDE, which are some classes around the ANTLR C++ parser to facilitate the job (dealing with the AST, etc), with the purpose of code completion.

It's very new so you'll have to get it from CVS.
I've tried compiling it (CodeStore with ANTLR C++) in Code::Blocks and compiles almost out-of-the-box, it's all written in portable C++. :)
Title: Re: New parser model for Code completion
Post by: grv575 on December 21, 2005, 02:55:23 am
If we're talking about VCFBuilder then I don't think the CVS source is complete.  There's includes like:

#   include "../src/CodeStore/src/CodeStore.h"

That aren't in the source tree.  Uses doxygen which is nice, but I can't seem to get it to compile or see the internals of the codesource source.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 21, 2005, 03:16:36 am
Mm... There are various branches, I don't remember what I used.

Basically I imported the MSVC6 project to C::B.
Then I changed some classes assignations because a bug that GCC 3.4 haves in the ANTLR code, and it compiled without errors.

And for the CodeStore part, I remember changing some things to make it work, I got it working too.
I don't know what is the level of completeness of CodeStore, but surelly is worth the look, after all it's very new and done in portable C++.
The best would be join the development of the CodeStore branch and help in the development.
Title: Re: New parser model for Code completion
Post by: Game_Ender on December 21, 2005, 04:34:33 am
How does the CodeStore parser compare to Elsa (http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/index.html)?  I have been looking and looking for a good, feature complete, and tested C++ parser library and Elsa is it.  It even has the same license as CodeStore, and it looks more complete.  It would definitely not be a general solution but the again, from everything I have been reading you can't parse C++ like you can most other languages, so having a separate method of doing the C++ parsing/AST building might not be a bad thing.
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 04:37:11 am
Alright so why don't we just take a stab at making the plugin?
Title: Re: New parser model for Code completion
Post by: Game_Ender on December 21, 2005, 04:44:17 am
Alright so why don't we just take a stab at making the plugin?

The Code Completion plugin  is already made, the issue is that it does not support every part of the C++ langauge because it is so very hard to fully parse C++.  I  have been looking through the CodeCompletion Plugin code for a little while and I am still working out the best way to drop in another system in place of Mandrav's tokenizer/parser combination.

I forgot to mention that Elsa supports XML serialization of its AST, so with a little work you can turn this into a quickly loadable file for code completion.
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 04:45:24 am
Why work with the old one, why not tear it out ?
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 21, 2005, 04:58:31 am
As I've said, Elsa and ANTLR C++ are the most complete C++ parsers I could find.
The ANTLR C++ is being made from 1997 and was completely rewritten 3 times, so now is very mature.

I attach two text files describing the current situation of both parsers and what things are missing.

Additionaly, I couldn't compile Elsa on mingw32 (it's not ported yet), but ANTRL compiles succesfully.
What I liked about ANTLR C++ is that the code of the parser is very small, it takes seconds to compile, despite it being very powerful.

[attachment deleted by admin]
Title: Re: New parser model for Code completion
Post by: rickg22 on December 21, 2005, 05:00:06 am
Why work with the old one, why not tear it out ?
Actually we're searching for a brave warrior who will start the quest of implementing the new parser. But so far everybody seems busy... I want to redo Yiannis' parser with Finite State machines, but I don't know if I'll be able to do it.

If you seem so enthusiastic, you're totally welcome to dive in and start coding with ANTLR or whatever you like :)
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 05:01:23 am
I'm going to see what this would take, I doubt I'll use any of what mandrav did. Other's should try too. See who gets it done better and faster ;D
But Yeah I'll try it.
Can anyone show me what I need to tear out?
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 05:03:18 am
First I need Yiannis to put in those FreeBSD changes I submitted    :shock:
So I can go ahead and resubmit my port.
Title: Re: New parser model for Code completion
Post by: grv575 on December 21, 2005, 05:09:14 am
The problem with elsa (which is an alkhound-based parser) is if you look at the source, it's absolutely huge.  It looks like getting elkhound to work with a new desired parser (say to parse java or python or whatever) would be a lot of work.  Plus it uses external tools like ast to produce the abstract syntax tree.  This looks all very complex and not modular and pluggable for anything other than c++.

The antlr compiler compiler looks like it might be a better bet.  The tutorials show that's it's easy to specify both the lexer, the parser, and an AST walker all with antlr.  This would make adding different parsers much more painless as it does it all (including generating the AST instead of using a seperate tool for that).

There's already a c++, java, c#, etc specifications for antlr as well.  So, it looks overall like a better fit.
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 05:14:41 am
Alright then I'll start with looking over antler making a few test projects seeing how it holds up.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 21, 2005, 05:15:30 am
Improving the current one, or creating a new parser, it doesn't matter, as in the end the best one would be used. :)

But for ANTLR C++, which is a complete C++ parser and generates an AST, the entire code weights 150KB and are only 14 files of pure c++ standard code.
Unfortunately, I can't say the same for other parsers like Elsa. :)
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 05:19:21 am
You're forgetting this all needs to integrate with the editor.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 21, 2005, 05:30:09 am
No, that's why I said to cooperate with the CodeStore author, which is doing exactly that: integrating ANTLR with a CodeCompletion plugin.
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 05:48:11 am
Alright I'll look into it.
Title: Re: New parser model for Code completion
Post by: grv575 on December 21, 2005, 06:00:39 am
cool, there is a standalone exe: antlr exe (http://www.antlr.org/download.html)
"antlr-2.7.5.exe (Win32 executable made with mingw)"

which works without java installed and is the latest version.  Not sure if this is better or not vs. using the c++ antlr port (which is like 2.7.4 apparently).
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 06:04:33 am
I believe the idea is to have antlr generate the language parser for you and you just use that code.
You provide language grammer it spits out a parser.
Title: Re: New parser model for Code completion
Post by: grv575 on December 21, 2005, 06:25:21 am
yeah but we can just provide a link to antlr.exe and say use this to generate a parser for language X if you want X to be used as a code completion plugin.  makes modular expansion of the parsing framework that much easier.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 21, 2005, 07:21:32 am
Yeah, that's great! 0 Java dependency 8)

And if we use parsers generated by ANTLR like the C++ one, we don't even need the parser generator because the parser is already generated (until you want to modify the parser generator itself).

So putting it easy: C++ for the parser generator, C++ for the parser.
Title: Re: New parser model for Code completion
Post by: Michael on December 21, 2005, 02:08:30 pm
Hello,

please allow me to put a link to topic where I have posted about the expression parser of the Mini C++. It is here (http://forums.codeblocks.org/index.php?topic=1637.0).

Michael
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 04:37:46 pm
just but what about us unix folks? I'm all for a no java depnds. but there is a library we can link to instead.
Title: Re: New parser model for Code completion
Post by: grv575 on December 21, 2005, 06:08:13 pm
unix compatibility should be a nonissue - the generated parsing code is portable.
Title: Re: New parser model for Code completion
Post by: anonuser on December 21, 2005, 06:21:28 pm
Well I work on unix and port back to other platforms. So I'll probably end up using Antlr C++
Title: Re: New parser model for Code completion
Post by: grv575 on December 23, 2005, 06:46:21 pm
Any progress?  Getting the cpp parser to compile is a nightmare on mingw and even getting the antlr lib to compile on mingw is a mess.  Using the autoconf stuff doesn't seem to work ("no rule to build Makefile.in", among other errors) and well, I've pretty much given up on antlr2.7.6 lib / antlrcpp / antlr cpp parser combinations.  Maybe the codestore stuff is better in terms of working compiling instructions, etc.  Didn't bother to download vcl yet to try it out, but is the project even active (looked like stuff from 2 years ago iirc)?  Not much luck getting a simple working demonstration of this stuff on windows using mingw gcc.  Anyone mind posting detailed instructions on setup stuff if they've had sucess with getting something working?
Title: Re: New parser model for Code completion
Post by: TDragon on December 23, 2005, 07:34:49 pm
I've been working with ANTLR 2.7.5 (I haven't gotten around to building 2.7.6 yet) and CPP_grammarV3.1 from the ANTLR website, with some good success. Out-of-the-box, it'll correctly parse a GCC-preprocessed file if it doesn't use any GCC extensions or built-in types (i.e. didn't include any standard headers); most of my time has been spent getting to know CPP_parser.g to figure out where and how I should add them. After adding support for most gcc __attribute__ specifiers and adding __builtin_va_list to the basic types, I came VERY close to successfully parsing Quadratic.cpp which includes two standard headers. Of the few remaining errors, most are related to the lack of support for "using" declarations when doing AST resolution lookup, so right now I've moved back to building a custom AST container which I'll drop in before I correct them.

After reading all the installation and usage instructions, only one step wasn't completely obvious, which was that I had to rebuild libantlr.a rather than using the pre-packaged version. Basically,
- Make sure you have MinGW and MSys installed (MSys is only needed to build the antlr library; it's fully MinGW32 compatible after this)
- Download and unzip antlr-x.y.z.tar.gz
- Open MSys, go to the antlr directory, run "./configure --disable-examples" (I added "--prefix=/path/to/antlr-dir" because I wasn't sure where the default was)
- Run "make install"
- If you want to be able to run antlr from the Windows command prompt, also download antlr-x.y.z.exe
- Download and unzip CPP_parserV3.1.zip
- If you want extended trace functionality, copy LLkParser.hpp to the antlr include/antlr directory; otherwise, comment out all references to antlrTrace() in CPP_parser.g
- Run antlr on CPP_parser.g to generate CPPParser.(cpp,hpp), CPPLexer.(cpp,hpp), and STDCTokenTypes.(hpp,txt)
- Compile CPPParser.cpp, CPPLexer.cpp, Dictionary.cpp, LLkParser.cpp (if you want extended trace), and Support.cpp into whatever project you want to use the parser in
- For the test project included with the parser, also compile Main.cpp and MyCode.cpp with MYCODE #defined

That was all from memory, but any problems you run into should be trivial. If you compile the test project, you can run the resulting executable on any preprocessed code (use gcc -E) to get a list of defined functions and optionally a list of declarations.

Hope that helps,
Twilight Dragon
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 23, 2005, 07:52:52 pm
The project is actively mantained (last release was from 2 months ago or so).

I've used the MSVC6 projects of ANTLRC++ that comes with CodeStore, imported them and only had to change a class instantation which was a bug in MinGW. Compiled with 0,0 warnings/errors in C::B.
Title: Re: New parser model for Code completion
Post by: grv575 on December 23, 2005, 11:33:56 pm
The project is actively mantained (last release was from 2 months ago or so).

I've used the MSVC6 projects of ANTLRC++ that comes with CodeStore, imported them and only had to change a class instantation which was a bug in MinGW. Compiled with 0,0 warnings/errors in C::B.

What's the link to the codestore download you're using?
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 23, 2005, 11:40:31 pm
No link, I fetched from CVS.  :)
Title: Re: New parser model for Code completion
Post by: grv575 on December 24, 2005, 12:51:55 am
- Open MSys, go to the antlr directory, run "./configure --disable-examples" (I added "--prefix=/path/to/antlr-dir" because I wasn't sure where the default was)
- Run "make install"

See right here is where I get errors about not being able to build Makefile.in.  Could you post your `echo $PATH` in msys?
Title: Re: New parser model for Code completion
Post by: anonuser on December 24, 2005, 01:54:40 am
Alright I've got me antlr going and a working C++ grammer file.
Now just need to work with the AST it gives me.
Title: Re: New parser model for Code completion
Post by: TDragon on December 24, 2005, 04:03:59 am
$ echo $PATH
.:/usr/local/bin:/mingw/bin:/bin:/mingw/bin:/e/PHP:/e/WINDOWS/system32:/e/WINDOWS:/e/WINDOWS/System32/Wbem:/e/Program Files/ATI Technologies/ATI Control Panel:.:/e/Program Files/doxygen/bin:/e/Python24:/e/Program Files/CVSNT/
Title: Re: New parser model for Code completion
Post by: grv575 on December 26, 2005, 09:25:46 pm
Thanks TDragon.  It turned out to be something wrong with my msys configuration.  Reinstalling mingw and msys fixed things.  So getting antlr running wasn't so bad on mingw after all.  I didn't just use the supplied Main.cpp because I wanted to see how easy it would be to generalize that part for other parsers (as long as there is a .g file for the parser definition).  Tried to use sa few of the C++ specific extensions to antlr which were written as possible (e.g. no modding the antlr source to support his tracing extensions).  So it looks like modifying antlr.2.76 is not necessary at all - just build it first and then the c++ grammer does plug in fairly nicely.  Steps to get the parser working:

Code
install mingw & msys:

binutils-2.16.91-20050827-1.tar.gz
extra.zip
gcc-core-3.4.4-20050522-1.tar.gz
gcc-g++-3.4.4-20050522-1.tar.gz
gdb-6.3-2.exe
mingw-runtime-3.9.tar.gz
mingw-utils-0.3.tar.gz
mingw32-make-3.80.0-3.tar.gz
MSYS-1.0.10.exe
w32api-3.5.tar.gz

extract all the archives to C:\MinGW, run the gdb installer, run the msys installer (use C:\MinGW for the postinstall prompt)

extract CPP_parserV3.1
copy CPP_parser.g, CPPDictionary.hpp, Dictionary.hpp, DictEntry.hpp, Dictionary.cpp, CPPSymbol.hpp, Support.cpp to a new folder test_folder
download antlr-2.7.5.exe to test_folder
comment out all lines in CPP_parser.g dealing with antlrTrace()
run antlr-2.7.5.exe CPP_parser.g
run msys
extract antlr-2.7.6.tar.gz
run ./configure --disable-examples && make && make install
change back to test_folder

create test.cpp:

---
#include <iostream>
#include <string>
#include "CPPLexer.hpp"
#include "CPPParser.hpp"

ANTLR_USING_NAMESPACE(std)
ANTLR_USING_NAMESPACE(antlr)

// The following data used by process_line_directive(char*,int) below
// I believe this data and function have to be at this level so as to
// be available to both CPPLexer and CPPParser (and Support.cpp)
int this_line = 0; // current line
 
int deferredLineCount = 0;

int include_line = 0; // include file's line number
int include_last_set = 0; // where included file's line number was last set
char currentIncludedFile[128]; // path and name of current included file

int principal_line = 0; // principal file's line number
int principal_last_set = 0; // where principal file's line number was last set
char principal_file[128]; // path and name of principal file
bool in_user_file = false; // true if we are inside the users's source file

int main()
{
try
{
CPPLexer lexer(cin);
CPPParser parser(lexer);
        parser.init();
parser.translation_unit();
}
catch (exception& e)
{
cerr << "exception: " << e.what() << endl;
}
}

// Needed for #line_number directives generated by gcc -E preprocessing or msvc preprocessing
void process_line_directive(const char *includedFile, const char *includedLineNo)
{
// See global interface variables above
// Working variables
static int line, result;
static bool principal_file_set = false;
static int x;
//printf("Main entered\n");
// Extract included file line no.
result = sscanf(includedLineNo, "%d \n", &line);

//printf("Main: line %d\n",line);
// remove first " from file path+name by shifting all characters left
for(x=1;includedFile[x]!='"';x++)
{
currentIncludedFile[x-1] = includedFile[x];
}

// Check path and name are not too long
if(x>128)
{
//
printf("Path and name of included file too long\n");
printf("Increase length of currentIncludedFile and\n");
printf("  principal_file to at least %d characters\n",x);
printf("Abandon run\n");
getchar();    
}

// Replace last " from file name with null
currentIncludedFile[x-1] = NULL;

if (!principal_file_set)
{
strcpy (principal_file, currentIncludedFile);
principal_file_set = true;
}

// check for main file name
if (strcmp(principal_file, currentIncludedFile) == 0)
{
principal_line = line;
principal_last_set = this_line;
strcpy(currentIncludedFile, " "); // delete include file name
in_user_file = true; // we are processing users's .C or .CPP file (aka principal_file)
}
else
// Check that this is a genuine path
if(currentIncludedFile[1]==':')
{//printf("main.cpp 222 entered\n");
//printf("main.cpp 223 %s %s\n",principal_file,currentIncludedFile);
include_line = line;
include_last_set = this_line;
in_user_file = false; // we are processing a header file
}
}
---

Then compile:
gcc -I/usr/local/include -L/usr/local/lib *.cpp -lstdc++ -lantlr

To test:
run ./a.exe < Quadratic.i (msvc preprocessor processed file)

(gcc -E preprocesses in gcc although not fully supported yet - need to add gcc __attribute__ specifiers and __builtin_va_list to the basic types in the CPP_parser.g grammar)

So it looks fairly complete as a c++ parser.  What would be best is if the grammar is modified to fully support gcc -E for handling preprocessing.  This way there's no bugs as well - tried and true tools like the gcc preprocessor are used to handle #includes, etc.  So code completion would be robust since it would have all the correct symbols defined in the current source file.  And then you could just do gcc -E *.cpp for all cpp project files if global code completion is enabled.
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 26, 2005, 09:30:58 pm
I would recommend to use a library for pre-processing:

C/C++ pre-processor parsers:
ucpp (http://pornin.nerim.net/ucpp/)
Wave (http://spirit.sourceforge.net/)
Title: Re: New parser model for Code completion
Post by: grv575 on December 26, 2005, 10:49:33 pm
I did check out ucpp which compiles great on mingw but didn't handle simple:

#include <iostream>
or
#include <blah.h>

said it couldn't find them...the standard include path is set to /usr/include and /usr/local/include though.  But the gcc preprocessor does correctly pull in these header sources...
Title: Re: New parser model for Code completion
Post by: takeshimiya on December 26, 2005, 10:53:09 pm
Something must be wrong with the paths then.
Title: Re: New parser model for Code completion
Post by: Raindog on January 21, 2006, 12:51:49 am
So what are the plans afer acquiring  a full C++ parsing ability?