Regular expressions

Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

Regular expressions

<< < (9/15) > >>

JGM:
Cool! Does that code could be configured to work with any other language?

It would be great to have a C::C SDK where you can define the tokens and what kind of token it's so when passing that to the parser it takes care of generating a generic symbols table (like the one described by BYO).

For Example, on BASIC the token to finalize a line of code is the '\n' (i think) on C++ is ';'
For beginning a function body and ending it on BASIC "sub" or "function", and "end sub" or "end function" while on C++ '{' and '}'

There should be some method (strategy?) to create a C::C SDK that is configurable to fit into any language need, that at least could generate a symbols table based on some options that the developer is telling to the framework using some configuration options. I learned here it exist this yacc and flex tools, but they are some kind of complex to use and adapt for different languages.

Maybe if we could describe the order or structure in which every token could appear so when the parser encounters that it take the necessary actions to populate the symbols table with a specified format.

Example:

TokenContainer.Add("(");
TokenContainer["("].Describe(BEGIN_BLOCK);

TokenContainer.Add(")");
TokenContainer[")"].Describe(END_BLOCK("(")); //here we tell it ends the "(" block

TokenContainer.Add("{");
TokenContainer["{"].Describe(BEGIN_BLOCK);

TokenContainer.Add("}");
TokenContainer["}"].Describe(END_BLOCK("{")); //here we tell it ends the "{" block

TokenContainer.Add("-");
TokenContainer["-"].Describe(MINUS_OPERATOR);

TokenContainer.Add("+");
TokenContainer["+"].Describe(PLUS_OPERATOR);

TokenContainer.Add("operator");
TokenContainer["operator"].AddConfiguration("+ || - && ( ... ) || { ... } || ;");
TokenContainer["operator"].AddConfiguration("* || / && ( ... ) || { ... } || ;");

Parser.AddDirectory("include");
Parser.FileExtensions("h cpp")
Parser.AddTokens(TokenContainer);
Parser.StartParsing();

SymbolsTable[Parser.GetSymbols().Count()] Symbols = Parser.GetSymbols();

Legend:
|| = or
&& = and
... = anything that should parsed and is part of the current token

So when the parser starts working and it encounters the "operator" token it could be followed by a + or - tokens followed by a pair of parenthesis with anything inside followed by and optional body {} and optional ;
This is just an idea and could be improved.

Also there should be another configurations to tell the parser how to store what it found on the symbols table. Thats something to think on :)

Programming languages have many things in common, if we could wrap all those things and think in a generic way, but it is difficult :(

Ceniza:

--- Quote ---Cool! Does that code could be configured to work with any other language?

--- End quote ---

Not really. The preprocessing stage is only for C++. I think the multi-language functionality should be implemented differently, perhaps creating a plugin where parser plugins can be connected. I don't really see a reason to mix many languages in the same symtab.

JGM:

--- Quote from: Ceniza on March 13, 2008, 03:54:34 pm ---I don't really see a reason to mix many languages in the same symtab.

--- End quote ---

To be able to create a generic and customizable parser that with configurations is capable of parsing any language? The symbols table will be used in different languages, with some sort of documents that explain how is used, like the Tar File Struct. That stores a string on a specific location to indicate if ustar or not, keeping its native format.

Also to serve as an interface (base) that helps a developer re-implement it for other languages, accelerating the process of creating other parsers.

Maybe theres a way to make the symbols table more customizable.

Example:
=================================================================
SymbolsContainer.SetLanguage("C++");
SymbolsContainer.AddSymbol(theSymbolToAdd);

template <typename Types, typename Flags, typename ChildrensList, typename ExtraList, typename ExtraValues>
Class Symbols{
string name; // name of the symbol
int id; // Id of the symbol, should be unique in the workspace
int file_id; // Id of file where the symbol has been declared
int filepos_begin; // Position where declaration of the symbol starts
int filepos_end; // Position where declaration of the symbol ends
Types type; // Type of the symbol: macro / class / typedef / variable / function
Flags modifiers; // Bitfield used to mark some estra properties of symbol
// like that it is static or inline
int value_type_id; // Id of symbol which represents c++ type of current symbol
// (like type of variable or type of returned value from function)
int extra_type_id; // Extra type used in some cases
ChildrensList children; // List of child elements of this symbol (members in class etc)
ExtraList extra_lists[3]; // See table below
ExtraValues extra_values; // int -> string map which can keep some extra data
}
=================================================================

byo:

--- Quote from: Ceniza on March 13, 2008, 03:54:34 pm ---Not really. The preprocessing stage is only for C++. I think the multi-language functionality should be implemented differently, perhaps creating a plugin where parser plugins can be connected. I don't really see a reason to mix many languages in the same symtab.

--- End quote ---

The goal is not to mix symbols in one table but if we could guarantee that most languages could use one unique representation of symbol table, it will be possible to ass support for new language only by writing it's parser - other stuff like symbol browser, completion window, call tooltip etc would be the same, no need to duplicate the code.

Regards
BYO

stevenkaras:
Wow. I go away for a day or two and look what happens. This is really starting to come together as a team effort.

JGM> As for decoupling the language, the reason why you can't simply make the symbol table abstract is because languages are very different. For example, java doesn't have multiple inheritance, basic doesn't support classes, and so on. The syntaxes are very similar, but the actual structure of the language is so different, that you'd have a hard time describing conflicting language elements. How would you implement Basic's GOSUB and PROCEDURES as compared with C functions. However, as BYO pointed out, as long as the symbol table is abstract enough in it's use, the chances of successfully extending the language features is very likely.

Right now I think the best step forward would include the following:
1. doing some test with ctags for the symbol table
2. decide on a storage format for the symbols to be provided to the other components
3. have someone play with flex, come up with a good language spec for c++(worry about C later)
4. have someone else play with bison, and work on the C++ file for the parser.

I think that should be about it. Let me know if I forgot something.

Cheers!
steven

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version