Cool! Does that code could be configured to work with any other language?
It would be great to have a C::C SDK where you can define the tokens and what kind of token it's so when passing that to the parser it takes care of generating a generic symbols table (like the one described by BYO).
For Example, on BASIC the token to finalize a line of code is the '\n' (i think) on C++ is ';'
For beginning a function body and ending it on BASIC "sub" or "function", and "end sub" or "end function" while on C++ '{' and '}'
There should be some method (strategy?) to create a C::C SDK that is configurable to fit into any language need, that at least could generate a symbols table based on some options that the developer is telling to the framework using some configuration options. I learned here it exist this yacc and flex tools, but they are some kind of complex to use and adapt for different languages.
Maybe if we could describe the order or structure in which every token could appear so when the parser encounters that it take the necessary actions to populate the symbols table with a specified format.
Example:TokenContainer.Add("(");
TokenContainer["("].Describe(BEGIN_BLOCK);
TokenContainer.Add(")");
TokenContainer[")"].Describe(END_BLOCK("(")); //here we tell it ends the "(" block
TokenContainer.Add("{");
TokenContainer["{"].Describe(BEGIN_BLOCK);
TokenContainer.Add("}");
TokenContainer["}"].Describe(END_BLOCK("{")); //here we tell it ends the "{" block
TokenContainer.Add("-");
TokenContainer["-"].Describe(MINUS_OPERATOR);
TokenContainer.Add("+");
TokenContainer["+"].Describe(PLUS_OPERATOR);
TokenContainer.Add("operator");
TokenContainer["operator"].AddConfiguration("+ || - && ( ... ) || { ... } || ;");
TokenContainer["operator"].AddConfiguration("* || / && ( ... ) || { ... } || ;");
Parser.AddDirectory("include");
Parser.FileExtensions("h cpp")
Parser.AddTokens(TokenContainer);
Parser.StartParsing();
SymbolsTable[Parser.GetSymbols().Count()] Symbols = Parser.GetSymbols();
Legend:|| = or
&& = and
... = anything that should parsed and is part of the current token
So when the parser starts working and it encounters the "operator" token it could be followed by a + or - tokens followed by a pair of parenthesis with anything inside followed by and optional body {} and optional ;
This is just an idea and could be improved.
Also there should be another configurations to tell the parser how to store what it found on the symbols table. Thats something to think on
Programming languages have many things in common, if we could wrap all those things and think in a generic way, but it is difficult