Code completion doesnt follow #include in struct

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

<< < (5/8) > >>

Ceniza:

--- Quote from: JGM on March 24, 2011, 09:17:59 pm ---So does this code works to evaluate macro expressions/conditions like these ones for example?

--- Code: ---#if VERBOSE >= 2
print("trace message");
#endif

#if !defined(WIN32) || defined(__MINGW32__)
...
#endif

--- End code ---

I'm not sure If I'm using the correct terminology (sorry for that I'm a dumb xD) It would be easier to talk you on spanish xD, theres many parsing terminology I'm not familiar with :(

--- End quote ---

You need to tokenize and fully macro expand everything before feeding the evaluator.

For the first case you would need to expand VERBOSE to whatever its value is. Supposing it expands to '1', you would feed it:

--- Code: ---[ttNumber, "1"][ttWhiteSpace, " "][ttGreaterEqual, ">="][ttWhiteSpace, " "][ttNumber, "2"][ttEndOfTokens, ""]
--- End code ---

For the second case you would need to expand defined(WIN32) and defined(__MINGW32__). Supposing both are defined, you would feed it:

--- Code: ---[ttNot, "!"][ttNumber, "1"][ttWhiteSpace, " "][ttOr, "||"][ttWhiteSpace, " "][ttNumber, "1"][ttEndOfTokens, ""]
--- End code ---

Since it is a conditional (#if), all you care about is whether the result is 0 or not.

Instead of ttEndOfTokens as finalization, ttNewLine could be also added and handled just the same (must be added to the list of token types too).

If you want to learn more about the preprocessor in order to know what really needs to be implemented, check the C++0x draft chapters 2 (Lexical conventions) and 16 (Preprocessing directives) here.

JGM:

--- Quote from: Ceniza on March 25, 2011, 06:04:05 pm ---...
You need to tokenize and fully macro expand everything before feeding the evaluator.

For the first case you would need to expand VERBOSE to whatever its value is. Supposing it expands to '1', you would feed it:

--- Code: ---[ttNumber, "1"][ttWhiteSpace, " "][ttGreaterEqual, ">="][ttWhiteSpace, " "][ttNumber, "2"][ttEndOfTokens, ""]
--- End code ---

For the second case you would need to expand defined(WIN32) and defined(__MINGW32__). Supposing both are defined, you would feed it:

--- Code: ---[ttNot, "!"][ttNumber, "1"][ttWhiteSpace, " "][ttOr, "||"][ttWhiteSpace, " "][ttNumber, "1"][ttEndOfTokens, ""]
--- End code ---

Since it is a conditional (#if), all you care about is whether the result is 0 or not.

Instead of ttEndOfTokens as finalization, ttNewLine could be also added and handled just the same (must be added to the list of token types too).

If you want to learn more about the preprocessor in order to know what really needs to be implemented, check the C++0x draft chapters 2 (Lexical conventions) and 16 (Preprocessing directives) here.

--- End quote ---

Thanks for your guidance! I need to fix some of the things to generate the preprocessed code in the exact line positions but without the macros for normal parsing (to get same line numbers and columns as original source code, not sure if it is so necessary). Also I created a parse expression function returning always true since I didn't had code to do that (planning to write it from scratch xD), so does the code you did is GPL (in other words can I use it xD)? Also I need to make macro values evaluation recursive (as you mentioned before) for cases like:

#define blah(x) x*2
#define blah2(y) blah(y)
#define blah3(z) blah2(z)

Other thing I need to implement is the conversion of expressions like x ## x or # x to a valid string or empty string since this data (I think) is not necessary for general parsing.

this is the code that actually handles the processing:

--- Code: ---for(unsigned int position=0; position<lines.size(); position++)
   {
   vector<preprocessor_token> tokens = lines[position];

   //Parse macro
   if(tokens[0].token == "#")
   {
   if(deepness == 0 || (deepness > 0 && last_condition_return[deepness]))
   {
   if(tokens[1].token == "define")
   {
   define definition = parse_define(strip_macro_definition(tokens));
   definition.file = file;
   definition.line = tokens[2].line;
   definition.column = tokens[2].column;
   m_local_defines.push_back(definition);
   }
   if(tokens[1].token == "include")
   {
   string include_enclosure = tokens[2].token;
   string include_file = "";
   file_scope header_scope;

   if(include_enclosure == "<")
   {
   for(unsigned int i=3; i<tokens.size(); i++)
   {
   if(tokens[i].token == ">")
   {
   break;
   }
   else
   {
   include_file += tokens[i].token;
   }
   }

   m_headers_scope[include_file] = global;
   header_scope = global;
   }
   else
   {
   for(unsigned int i=1; i<tokens[2].token.size(); i++)
   {
   if(tokens[2].token.at(i) == '"')
   {
   break;
   }
   else
   {
   include_file += tokens[2].token.at(i);
   }
   }

   m_headers_scope[include_file] = local;
   header_scope = local;
   }

   if(!is_header_parsed(include_file))
   {
   output += parse_file(include_file, header_scope); //To output the processed headers code
   //parse_file(include_file, header_scope); //parses header without outputting

   m_headers.push_back(include_file);
   }
   }
   else if(tokens[1].token == "undef")
   {
   remove_define(tokens[2].token);
   }
   else if(tokens[1].token == "ifdef")
   {
   deepness++;
   if(is_defined(tokens[2].token))
   {
   last_condition_return[deepness] = true;
   }
   else
   {
   last_condition_return[deepness] = false;
   }
   }
   else if(tokens[1].token == "ifndef")
   {
   deepness++;
   if(!is_defined(tokens[2].token))
   {
   last_condition_return[deepness] = true;
   }
   else
   {
   last_condition_return[deepness] = false;
   }
   }
   else if(tokens[1].token == "if")
   {
   deepness++;
   last_condition_return[deepness] = parse_expression(strip_macro_definition(tokens));
   }
   }

   if(deepness > 0 && (tokens[1].token == "elif" || tokens[1].token == "else" || tokens[1].token == "endif"))
   {
   if(tokens[1].token == "elif" && last_condition_return[deepness] != true)
   {
   last_condition_return[deepness] = parse_expression(strip_macro_definition(tokens));
   }
   else if(tokens[1].token == "else" && last_condition_return[deepness] != true)
   {
   last_condition_return[deepness] = true;
   }
   else if(tokens[1].token == "endif")
   {
   last_condition_return.erase(last_condition_return.find(deepness));
   deepness--;
   }
   }
   }

   //Parse code
   else
   {
   if(deepness == 0 || (deepness > 0 && last_condition_return[deepness]))
   {
   unsigned int column = 1;

   for(unsigned int i=0; i<tokens.size(); i++)
   {
   unsigned int columns_to_jump = tokens[i].column - column;

   if(tokens[i].column <= 0)
   {
   columns_to_jump = 0;
   }
   else if(tokens[i].column < column)
   {
   columns_to_jump = column - tokens[i].column;
   }

   for(unsigned int y=0; y<columns_to_jump; y++)
   {
   output += " ";
   }

   if(tokens[i].type == identifier && is_defined(tokens[i].token))
   {
   output += get_define(tokens[i].token).value;
   }
   else
   {
   output += tokens[i].token;
   }

   column = tokens[i].column + tokens[i].token.size();
   }

   output += "\n";
   }
   }
   }

return output;

--- End code ---

as you can see, the code already handles nested macros correctly as basic ones (#define, #undef, #include, #ifdef, #ifndef) and with your code I would implement the parse_expression function (like I said returns true by now, no evaluation) for #if and #elif evaluation. The tricky part is going to be recursiveness evaluation of macros.

I think I'm worrying to much about printing the parsed code with same lines since it is impossible on the case of multiple line macros like:

#define declare_table() class table{ \
int blah;\
};

Since these kind of macros are going to affect the line numbering on the output code.

My worry about same line positions was due to the fact of using the library also for refactoring, but a solution could be tough later I guess.

I will try to read the pages you mentioned of the standard draft xD (I bought an e-reader to accompany me on the nights xD)

Thanks again for your feedback!

Edit: Just did a quick look on c++ draft and I completly forgot about #line, #pragma, #error :shock: but well I think these directives can be safely skipped except for #pragma that may include headers or things like that, what a pain :lol:

Edit: Trigraph sequences - I knew about them but who would use that??? xD mmm I also forgot about Alternative tokens :P,
uhhh also didnt tought about #include MACRO :S, well this post will remind me on things todo :D

Ceniza:

--- Quote from: JGM on March 25, 2011, 11:12:31 pm ---... so does the code you did is GPL (in other words can I use it xD)?
--- End quote ---

It is not GPL, it is more like "do as you please, but if it kills your dog do not blame it on me" kind of license. I think it can be mixed with code under the GPL, but you better ask at least 3 lawyers to be sure :P

I would not recommend skipping #line as certain tools make use of it (mostly those that produce code, like lex/yacc or preprocessors themselves), and handling #error would be neat because you could inform the user about it way before hitting the 'build' button (as long as the files are always properly parsed to avoid false positives).

Keeping track of line numbers and files is, of course, extremely important. After all, the idea is for Code::Blocks to make use of it, and that information is vital. I think that making a list of the whole set of tools that want to be implemented, and what is needed for each one of them is the way to go to really know how fine grained line numbering needs to be stored.

JGM:

--- Quote from: Ceniza on March 26, 2011, 10:53:55 am ---It is not GPL, it is more like "do as you please, but if it kills your dog do not blame it on me" kind of license. I think it can be mixed with code under the GPL, but you better ask at least 3 lawyers to be sure :P

--- End quote ---

that's scary xD

--- Quote from: Ceniza on March 26, 2011, 10:53:55 am ---I would not recommend skipping #line as certain tools make use of it (mostly those that produce code, like lex/yacc or preprocessors themselves), and handling #error would be neat because you could inform the user about it way before hitting the 'build' button (as long as the files are always properly parsed to avoid false positives).

--- End quote ---

Mmm so with #error the library should throw an exception.

--- Quote from: Ceniza on March 26, 2011, 10:53:55 am ---Keeping track of line numbers and files is, of course, extremely important. After all, the idea is for Code::Blocks to make use of it, and that information is vital. I think that making a list of the whole set of tools that want to be implemented, and what is needed for each one of them is the way to go to really know how fine grained line numbering needs to be stored.

--- End quote ---

Well, for now when the code is first tokenized columns and line numbers are stored correctly, what I mean is when outputting the pre-processed code for full parsing of it (lexical analysis?).

The output code would need to be re-parsed with the issue of line numbers modified from original source, unless associations are made to previously tokenized original source.

Lets say we have this original code

--- Code: ---#include <something.h>
#define class_blah class test {\
char variable[50];\
};

class_blah

--- End code ---

But the output of this would look different

--- Code: ---class something{
float test;
};

class test {
char variable[50];
};

--- End code ---

It would parse as it should, but loosing original positions. We would still know on which files the class definitions were found but with incorrect line numbers and probably columns. My tiny brain can't think of a solution xD

Ceniza:

--- Quote from: JGM on March 26, 2011, 08:20:46 pm ---Mmm so with #error the library should throw an exception.

--- End quote ---

Not necessarily. It could just store it somewhere for later retrieval. The parsing should continue in case it is a false positive.

--- Quote from: JGM on March 26, 2011, 08:20:46 pm ---Well, for now when the code is first tokenized columns and line numbers are stored correctly, what I mean is when outputting the pre-processed code for full parsing of it (lexical analysis?).

The output code would need to be re-parsed with the issue of line numbers modified from original source, unless associations are made to previously tokenized original source.

Lets say we have this original code

--- Code: ---#include <something.h>
#define class_blah class test {\
char variable[50];\
};

class_blah

--- End code ---

But the output of this would look different

--- Code: ---class something{
float test;
};

class test {
char variable[50];
};

--- End code ---

It would parse as it should, but loosing original positions. We would still know on which files the class definitions were found but with incorrect line numbers and probably columns. My tiny brain can't think of a solution xD

--- End quote ---

The preprocessing stage should output tokens, not text. The C++ parser's lexer job would be extremely simple: concatenate string literals into a single string literal token and turn numbers into either integral or floating-point tokens (flags may be needed to specify full type: unsigned, short, int, long, long long, float, double, long double). Identifiers could be turn into keywords here as well if not done before. Every other token would just pass through to the syntax analysis stage.

This is what the whole thing would, roughly, look like:

Preprocessor's Lexer -> Preprocessor -> Lexer -> Syntax analysis + symtab generation -> Semantic analysis.

Preprocessor's Lexer: Turns text into preprocessor tokens. Integral and floating-point values would be just "numbers". Keywords should be read as plain identifiers since the preprocessor does not care about them being a separate thing. File and line information is retrieved here.
Preprocessor: Resolves directives (#include, #if*, ...), discards tokens and builds new tokens when necessary (## and # operations). White spaces (space, newline, comments, ...) are, in theory, discarded as well.
Lexer: Converts "numbers" into proper tokens, concatenates contiguous string literals into a single string literal token and turns identifiers into keywords (the ones that are actually keywords, of course).
Syntax analysis: Checks that everything is properly "written" (class_decl ::= ttClass ttIdentifier ttSemiColon). An Abstract Syntax Tree can be built here, plus a symbols table.
Semantic analysis: Checks that everything makes sense: x = 3; // Is x a symbol in the current or a parent scope? Can it be assigned an integral type in any way (x is not const, x is integral, x has an overload of operator = that can be used, 3 can be turned into x's type and assigned, ...)?

That means some token types would not be seen by the preprocessor because its lexer would not produce them, most token types specifically for the preprocessor would have been consumed before reaching the lexer (at the next stage), and those few ones reaching it would be converted before being fed to the syntax analysis stage.

I hope it is clear enough, although its "roughness".

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version