Author Topic: Regular expressions  (Read 65560 times)

Offline JGM

  • Lives here!
  • ****
  • Posts: 518
  • Got to practice :)
Re: Regular expressions
« Reply #30 on: March 11, 2008, 08:52:12 pm »
I saw this on wiki

Quote
Questions:

    * why are we storing filepos_end? Wouldn't it be much more useful to store declaration, definition info?

I think thats an easy way to help on refactoring and less memory consuming, unless using a database system like sqlite for storing all symbols data, that should be the case when implementing the culmination of all these ideas.

Offline byo

  • Plugin developer
  • Lives here!
  • ****
  • Posts: 837
Re: Regular expressions
« Reply #31 on: March 11, 2008, 10:21:10 pm »
Cool to see so many interest and effort on planning a new code completion plugin. I just wanted to point that eranif has made a great job, I think that his library support everything that has been discussed here. Check his sample videos here http://codelite.org. His code completion library has involved into a full C/C++ IDE. Some time ago I talk to him to implement a plugin for codeblocks that used his library but I have become some lazy and also out of time.

Maybe theres no need to reinvent the wheel if is good enough  :D

I've looked into it and... well... I'm impressed on what this ide can do now. I'll look into source code to see whether it would be hard to integrate with C::B (user interface and overall app design looks very simillarily to C::B ;) ).

BYO

Offline JGM

  • Lives here!
  • ****
  • Posts: 518
  • Got to practice :)
Re: Regular expressions
« Reply #32 on: March 12, 2008, 09:11:33 pm »
Actually it uses most of the components that use code blocks, as scintilla and wxWidgets, maybe is not so hard to use it on codeblocks  :)

Offline byo

  • Plugin developer
  • Lives here!
  • ****
  • Posts: 837
Re: Regular expressions
« Reply #33 on: March 12, 2008, 11:42:46 pm »
Actually it uses most of the components that use code blocks, as scintilla and wxWidgets, maybe is not so hard to use it on codeblocks  :)

I've done some more investigatinos - CodeLite works perfect in many cases but it still have some problems with templates (for example I couldn't make function templates to work). And any more complex template-based code won't work - but that's probably just a temporary issue ;)

And there's another problem - CodeLite uses same approach like C::B in case of code::completion - everything is inside one library so we would have to isolate some parts (like parser) first. We can also drop the idea of splitting cc into smaller parts but I see this as a short-term solution - just imagine that someone wants to improove support for D language providing code::completion for it - everything should be created again - class browsers, symbol storages etc.

And next thing - if we decide to try CodeLite's stuff, we will probably have to branch it's code. So any further updates made in CodeLite would require some work to include in C::B - as long as we don't change it much everything will be fine but If we will have to make some bigger changes then we got a problem with keeping the code up-to-date.

One more question to Ceniza: You said something about new parser, what's the current progress ? Any results now ?

Regards
   BYO

Offline Seronis

  • Almost regular
  • **
  • Posts: 197
Re: Regular expressions
« Reply #34 on: March 13, 2008, 12:34:52 am »
Just curious, has anyone emailed the codelite dev and asked them if they would be interested in trying to integrate it with C::B ?  They -might- be intersted in that and might appreciate the design discussion going on already.

Offline JGM

  • Lives here!
  • ****
  • Posts: 518
  • Got to practice :)
Re: Regular expressions
« Reply #35 on: March 13, 2008, 01:19:30 am »
One more question to Ceniza: You said something about new parser, what's the current progress ? Any results now ?

I would also like to know the status on the work of Ceniza  :D

Just curious, has anyone emailed the codelite dev and asked them if they would be interested in trying to integrate it with C::B ? 

I don't think thats possible since eranif (codelite author) is really busy working on the CodeLite project.

Offline Jenna

  • Administrator
  • Lives here!
  • *****
  • Posts: 7255
Re: Regular expressions
« Reply #36 on: March 13, 2008, 01:28:38 am »
I don't think thats possible since eranif (codelite author) is really busy working on the CodeLite project.

and active C::B forum-member.

Offline JGM

  • Lives here!
  • ****
  • Posts: 518
  • Got to practice :)
Re: Regular expressions
« Reply #37 on: March 13, 2008, 01:34:21 am »
and active C::B forum-member.

theres some irony on that :roll:

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: Regular expressions
« Reply #38 on: March 13, 2008, 06:36:49 am »
Quote from: byo
I've done some more investigatinos - CodeLite works perfect in many cases but it still have some problems with templates (for example I couldn't make function templates to work). And any more complex template-based code won't work - but that's probably just a temporary issue
Indeed, the CC for template functions is not supported yet, but can be added with relatively small effort.

Quote from: JGM
I don't think thats possible since eranif (codelite author) is really busy working on the CodeLite project.
Indeed, many tasks to complete and only one developer ... (myself)
However, I will be happy to assist to integrate it into C::B plugin

Quote from: JGM
theres some irony on that
I fail to see the irony in that, for me hanging in C::B forum is like being a member in a forum dealing with the my interest - IDEs.
I myself dont use C::B, but the topics here are interested, nothing more.

About codelite CC: It uses ctags only for creating the symbol table (kept inside SQLite database), the actual parsing of expressions and statements is done using yacc & flex. The advantage of using yacc is that the grammar is maintainable and can be updated very easily.
Also, by using the database as symbol table, you gain the ability to do more neat stuff, such as:
- codelite offers code-completion for files
- it offers to automatically add include files for classes/functions/structs etc
and more

Eran

Offline Ceniza

  • Developer
  • Lives here!
  • *****
  • Posts: 1441
    • CenizaSOFT
Re: Regular expressions
« Reply #39 on: March 13, 2008, 09:33:54 am »
One more question to Ceniza: You said something about new parser, what's the current progress ? Any results now ?

I'm still working on the Preprocessor, and I'm also almost done with it (I still need to implement a few minor details). I'm also planning to change a bit the design of a few things (which should take just a few hours of work).

If you are even more interested in following the development of it, go to http://svn.cenizasoft.cjb.net/ -> CCP. Oh, and don't let yourself be fooled by the SourceParser and SourceFormatter projects... I abandoned those.

Right now there's only access to the repository thru ssh, but anonymous access should be just a matter of redirecting a port in the firewall. Just tell me if you are interested in trying the mess... er... sources, so I can check that.

Offline JGM

  • Lives here!
  • ****
  • Posts: 518
  • Got to practice :)
Re: Regular expressions
« Reply #40 on: March 13, 2008, 03:06:39 pm »
Cool! Does that code could be configured to work with any other language?

It would be great to have a C::C SDK where you can define the tokens and what kind of token it's so when passing that to the parser it takes care of generating a generic symbols table (like the one described by BYO).

For Example, on BASIC the token to finalize a line of code is the '\n' (i think) on C++ is ';'
For beginning a function body and ending it on BASIC "sub" or "function", and "end sub" or "end function" while on C++ '{' and '}'

There should be some method (strategy?) to create a C::C SDK that is configurable to fit into any language need, that at least could generate a symbols table based on some options that the developer is telling to the framework using some configuration options. I learned here it exist this yacc and flex tools, but they are some kind of complex to use and adapt for different languages.

Maybe if we could describe the order or structure in which every token could appear so when the parser encounters that it take the necessary actions to populate the symbols table with a specified format.

Example:

TokenContainer.Add("(");
TokenContainer["("].Describe(BEGIN_BLOCK);

TokenContainer.Add(")");
TokenContainer[")"].Describe(END_BLOCK("(")); //here we tell it ends the "(" block

TokenContainer.Add("{");
TokenContainer["{"].Describe(BEGIN_BLOCK);

TokenContainer.Add("}");
TokenContainer["}"].Describe(END_BLOCK("{")); //here we tell it ends the "{" block

TokenContainer.Add("-");
TokenContainer["-"].Describe(MINUS_OPERATOR);

TokenContainer.Add("+");
TokenContainer["+"].Describe(PLUS_OPERATOR);

TokenContainer.Add("operator");
TokenContainer["operator"].AddConfiguration("+ || -  && ( ... ) || { ... } || ;");
TokenContainer["operator"].AddConfiguration("* || / && ( ... ) || { ... } || ;");

Parser.AddDirectory("include");
Parser.FileExtensions("h cpp")
Parser.AddTokens(TokenContainer);
Parser.StartParsing();

SymbolsTable[Parser.GetSymbols().Count()] Symbols = Parser.GetSymbols();

Legend:
|| = or
&& = and
... = anything that should parsed and is part of the current token

So when the parser starts working and it encounters the "operator" token it could be followed by a + or - tokens followed by a pair of parenthesis with anything inside followed by and optional body {} and optional ;
This is just an idea and could be improved.

Also there should be another configurations to tell the parser how to store what it found on the symbols table. Thats something to think on  :)

Programming languages have many things in common, if we could wrap all those things and think in a generic way, but it is difficult  :(
« Last Edit: March 13, 2008, 03:10:21 pm by JGM »

Offline Ceniza

  • Developer
  • Lives here!
  • *****
  • Posts: 1441
    • CenizaSOFT
Re: Regular expressions
« Reply #41 on: March 13, 2008, 03:54:34 pm »
Quote
Cool! Does that code could be configured to work with any other language?

Not really. The preprocessing stage is only for C++. I think the multi-language functionality should be implemented differently, perhaps creating a plugin where parser plugins can be connected. I don't really see a reason to mix many languages in the same symtab.

Offline JGM

  • Lives here!
  • ****
  • Posts: 518
  • Got to practice :)
Re: Regular expressions
« Reply #42 on: March 13, 2008, 04:23:59 pm »
I don't really see a reason to mix many languages in the same symtab.

To be able to create a generic and customizable parser that with configurations is capable of parsing any language? The symbols table will be used in different languages, with some sort of documents that explain how is used, like the Tar File Struct. That stores a string on a specific location to indicate if ustar or not, keeping its native format.

Also to serve as an interface (base) that helps a developer re-implement it for other languages, accelerating the process of creating other parsers.

Maybe theres a way to make the symbols table more customizable.

Example:

=================================================================
SymbolsContainer.SetLanguage("C++");
SymbolsContainer.AddSymbol(theSymbolToAdd);

template <typename Types, typename Flags, typename ChildrensList, typename ExtraList, typename ExtraValues>
Class Symbols{
    string name;              // name of the symbol
    int    id;                // Id of the symbol, should be unique in the workspace
    int    file_id;           // Id of file where the symbol has been declared
    int    filepos_begin;     // Position where declaration of the symbol starts
    int    filepos_end;       // Position where declaration of the symbol ends
    Types    type;              // Type of the symbol: macro / class / typedef / variable / function
    Flags  modifiers;         // Bitfield used to mark some estra properties of symbol
                              // like that it is static or inline
    int    value_type_id;     // Id of symbol which represents c++ type of current symbol
                              // (like type of variable or type of returned value from function)
    int    extra_type_id;     // Extra type used in some cases
    ChildrensList   children;          // List of child elements of this symbol (members in class etc)
    ExtraList   extra_lists[3];    // See table below
    ExtraValues    extra_values;      // int -> string map which can keep some extra data
}
=================================================================
« Last Edit: March 13, 2008, 04:26:17 pm by JGM »

Offline byo

  • Plugin developer
  • Lives here!
  • ****
  • Posts: 837
Re: Regular expressions
« Reply #43 on: March 13, 2008, 09:27:50 pm »
Not really. The preprocessing stage is only for C++. I think the multi-language functionality should be implemented differently, perhaps creating a plugin where parser plugins can be connected. I don't really see a reason to mix many languages in the same symtab.

The goal is not to mix symbols in one table but if we could guarantee that most languages could use one unique representation of symbol table, it will be possible to ass support for new language only by writing it's parser - other stuff like symbol browser, completion window, call tooltip etc would be the same, no need to duplicate the code.

Regards
   BYO

Offline stevenkaras

  • Multiple posting newcomer
  • *
  • Posts: 18
Re: Regular expressions
« Reply #44 on: March 13, 2008, 09:48:36 pm »
Wow. I go away for a day or two and look what happens. This is really starting to come together as a team effort.

JGM> As for decoupling the language, the reason why you can't simply make the symbol table abstract is because languages are very different. For example, java doesn't have multiple inheritance, basic doesn't support classes, and so on. The syntaxes are very similar, but the actual structure of the language is so different, that you'd have a hard time describing conflicting language elements. How would you implement Basic's GOSUB and PROCEDURES as compared with C functions. However, as BYO pointed out, as long as the symbol table is abstract enough in it's use, the chances of successfully extending the language features is very likely.

Right now I think the best step forward would include the following:
1. doing some test with ctags for the symbol table
2. decide on a storage format for the symbols to be provided to the other components
3. have someone play with flex, come up with a good language spec for c++(worry about C later)
4. have someone else play with bison, and work on the C++ file for the parser.

I think that should be about it. Let me know if I forgot something.

Cheers!
steven