I posted this in the bug tracker (http://developer.berlios.de/bugs/?func=detailbug&bug_id=17924&group_id=5358), and was told to post here
object.h:
typedef struct
{
#include "object_struct.h"
} Object;
object_struct.h:
vec3f pos; ///< The 3D position of the object
vec3f lpos; ///< The 3D position of the object last update
float r; ///< The radius of the Object
uchar type; ///< Used to identify if the Object has been inherited
Quaternion q; ///< The rotation of the Object, as a Quaternion
void *data; ///< Pointer to the data used by this object for rendering
main.c:
#include "object.h"
int main()
{
Object *object=malloc(sizeof(Object));
object->ty//HERE
}
after typing ty at //HERE type will not be suggested, nor will any other member of Object
This is even more annoying if you have another object 'inherit' Object:
enemy.h:
typedef struct
{
#include "object_struct.h"
float hp;
} Enemy
main.c:
#include "enemy.h"
int main()
{
Enemy *enemy=malloc(sizeof(Enemy));
object->//HERE
}
Code completion will automatically complete with hp (I know this can be turned off, but its useful in lots of other cases)
zacaj: Why don't you use proper inheritance? Yes, C has it, too.
It is something like:
typedef struct A
{
members of A;
};
typedef struct B {
A base;
members of B;
};
Why do you keep insisting in creating an overcomplicated const expression evaluator? Evaluating a const expression after macro expansion is straight forward using the grammar from the standard. I have pointed to an implementation multiple times, but it is always ignored. All you have to do is feed it the tokens while skipping whitespace (comments belong here too, if retained).
Proper macro expansion is complicated, though, even more when handling concatenation, "stringification" and recursive replacements. Think of something like this:
#include <cstdio>
#define STR_HELPER(x) # x
#define STR(x) STR_HELPER(x)
#define CONCAT2(x, y) x ## y
#define CONCAT3(x, y, z) CONCAT2(x ## y, z)
int main()
{
std::printf(STR($%@!&*));
std::printf(STR(CONCAT3(this, /* COMMENT */ is, a /* another comment */ test)));
}
How many of you can actually tell me what the program is supposed to show on screen without compiling and running it? What if I replaced STR(x) to:
Would it show the same?
So does this code works to evaluate macro expressions/conditions like these ones for example?
#if VERBOSE >= 2
print("trace message");
#endif
#if !defined(WIN32) || defined(__MINGW32__)
...
#endif
I'm not sure If I'm using the correct terminology (sorry for that I'm a dumb xD) It would be easier to talk you on spanish xD, theres many parsing terminology I'm not familiar with :(
Well I documented and cleaned the code to some point here are the changes:
http://www.mediafire.com/?17skj2g70c86u50
Also I included a test case on the test folder, as my development environment is ubuntu/linux I created a shell script test/test.sh This script uses the debug binary of cpp_parser library and parses the test.hpp file also on test folder. (I took the example presented on this thread of #include on a typdef struct and made it part of the test case)
The original code on test.hpp is this one:
#ifndef TEST_HPP
#define TEST_HPP
//Has multiply macro
#include "misc.h"
#ifndef MAX_VALUE
#define MAX_VALUE 10000
#endif
#ifndef MAX_VALUE
#define MAX_VALUE ShouldNotOccurre
#endif
typedef struct
{
//The content of the struct in another file
#include "object.h"
//Should only be included once
#include "object.h"
} Object;
namespace test
{
int value = MAX_VALUE;
int test = multiply;
/**
* Function one documentation
* @return true otherwise false
*/
bool function_one(const char &argument);
/**
* Function two documentation
* @return The amount of characters found
*/
unsigned int function_two(const char &argument);
};
//We undefine the MAX_VALUE macro
#undef MAX_VALUE
int value = MAX_VALUE;
#endif
and the cpp_parser binary returns this:
//Has multiply macro
typedef struct
{
//The content of the struct in another file
unsigned value;
string test;
//Should only be included once
} Object;
namespace test
{
int value = 10000;
int test = 4*5;
/**
* Function one documentation
* @return true otherwise false
*/
bool function_one(const char &argument);
/**
* Function two documentation
* @return The amount of characters found
*/
unsigned int function_two(const char &argument);
};
//We undefine the MAX_VALUE macro
int value = MAX_VALUE;
There are things left to implement and fix but for now basic functionality is working :D
So does this code works to evaluate macro expressions/conditions like these ones for example?
#if VERBOSE >= 2
print("trace message");
#endif
#if !defined(WIN32) || defined(__MINGW32__)
...
#endif
I'm not sure If I'm using the correct terminology (sorry for that I'm a dumb xD) It would be easier to talk you on spanish xD, theres many parsing terminology I'm not familiar with :(
You need to tokenize and fully macro expand everything before feeding the evaluator.
For the first case you would need to expand VERBOSE to whatever its value is. Supposing it expands to '1', you would feed it:
[ttNumber, "1"][ttWhiteSpace, " "][ttGreaterEqual, ">="][ttWhiteSpace, " "][ttNumber, "2"][ttEndOfTokens, ""]
For the second case you would need to expand defined(WIN32) and defined(__MINGW32__). Supposing both are defined, you would feed it:
[ttNot, "!"][ttNumber, "1"][ttWhiteSpace, " "][ttOr, "||"][ttWhiteSpace, " "][ttNumber, "1"][ttEndOfTokens, ""]
Since it is a conditional (#if), all you care about is whether the result is 0 or not.
Instead of ttEndOfTokens as finalization, ttNewLine could be also added and handled just the same (must be added to the list of token types too).
If you want to learn more about the preprocessor in order to know what really needs to be implemented, check the C++0x draft chapters 2 (Lexical conventions) and 16 (Preprocessing directives) here (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf).
...
You need to tokenize and fully macro expand everything before feeding the evaluator.
For the first case you would need to expand VERBOSE to whatever its value is. Supposing it expands to '1', you would feed it:
[ttNumber, "1"][ttWhiteSpace, " "][ttGreaterEqual, ">="][ttWhiteSpace, " "][ttNumber, "2"][ttEndOfTokens, ""]
For the second case you would need to expand defined(WIN32) and defined(__MINGW32__). Supposing both are defined, you would feed it:
[ttNot, "!"][ttNumber, "1"][ttWhiteSpace, " "][ttOr, "||"][ttWhiteSpace, " "][ttNumber, "1"][ttEndOfTokens, ""]
Since it is a conditional (#if), all you care about is whether the result is 0 or not.
Instead of ttEndOfTokens as finalization, ttNewLine could be also added and handled just the same (must be added to the list of token types too).
If you want to learn more about the preprocessor in order to know what really needs to be implemented, check the C++0x draft chapters 2 (Lexical conventions) and 16 (Preprocessing directives) here (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf).
Thanks for your guidance! I need to fix some of the things to generate the preprocessed code in the exact line positions but without the macros for normal parsing (to get same line numbers and columns as original source code, not sure if it is so necessary). Also I created a parse expression function returning always true since I didn't had code to do that (planning to write it from scratch xD), so does the code you did is GPL (in other words can I use it xD)? Also I need to make macro values evaluation recursive (as you mentioned before) for cases like:
#define blah(x) x*2
#define blah2(y) blah(y)
#define blah3(z) blah2(z)
Other thing I need to implement is the conversion of expressions like x ## x or # x to a valid string or empty string since this data (I think) is not necessary for general parsing.
this is the code that actually handles the processing:
for(unsigned int position=0; position<lines.size(); position++)
{
vector<preprocessor_token> tokens = lines[position];
//Parse macro
if(tokens[0].token == "#")
{
if(deepness == 0 || (deepness > 0 && last_condition_return[deepness]))
{
if(tokens[1].token == "define")
{
define definition = parse_define(strip_macro_definition(tokens));
definition.file = file;
definition.line = tokens[2].line;
definition.column = tokens[2].column;
m_local_defines.push_back(definition);
}
if(tokens[1].token == "include")
{
string include_enclosure = tokens[2].token;
string include_file = "";
file_scope header_scope;
if(include_enclosure == "<")
{
for(unsigned int i=3; i<tokens.size(); i++)
{
if(tokens[i].token == ">")
{
break;
}
else
{
include_file += tokens[i].token;
}
}
m_headers_scope[include_file] = global;
header_scope = global;
}
else
{
for(unsigned int i=1; i<tokens[2].token.size(); i++)
{
if(tokens[2].token.at(i) == '"')
{
break;
}
else
{
include_file += tokens[2].token.at(i);
}
}
m_headers_scope[include_file] = local;
header_scope = local;
}
if(!is_header_parsed(include_file))
{
output += parse_file(include_file, header_scope); //To output the processed headers code
//parse_file(include_file, header_scope); //parses header without outputting
m_headers.push_back(include_file);
}
}
else if(tokens[1].token == "undef")
{
remove_define(tokens[2].token);
}
else if(tokens[1].token == "ifdef")
{
deepness++;
if(is_defined(tokens[2].token))
{
last_condition_return[deepness] = true;
}
else
{
last_condition_return[deepness] = false;
}
}
else if(tokens[1].token == "ifndef")
{
deepness++;
if(!is_defined(tokens[2].token))
{
last_condition_return[deepness] = true;
}
else
{
last_condition_return[deepness] = false;
}
}
else if(tokens[1].token == "if")
{
deepness++;
last_condition_return[deepness] = parse_expression(strip_macro_definition(tokens));
}
}
if(deepness > 0 && (tokens[1].token == "elif" || tokens[1].token == "else" || tokens[1].token == "endif"))
{
if(tokens[1].token == "elif" && last_condition_return[deepness] != true)
{
last_condition_return[deepness] = parse_expression(strip_macro_definition(tokens));
}
else if(tokens[1].token == "else" && last_condition_return[deepness] != true)
{
last_condition_return[deepness] = true;
}
else if(tokens[1].token == "endif")
{
last_condition_return.erase(last_condition_return.find(deepness));
deepness--;
}
}
}
//Parse code
else
{
if(deepness == 0 || (deepness > 0 && last_condition_return[deepness]))
{
unsigned int column = 1;
for(unsigned int i=0; i<tokens.size(); i++)
{
unsigned int columns_to_jump = tokens[i].column - column;
if(tokens[i].column <= 0)
{
columns_to_jump = 0;
}
else if(tokens[i].column < column)
{
columns_to_jump = column - tokens[i].column;
}
for(unsigned int y=0; y<columns_to_jump; y++)
{
output += " ";
}
if(tokens[i].type == identifier && is_defined(tokens[i].token))
{
output += get_define(tokens[i].token).value;
}
else
{
output += tokens[i].token;
}
column = tokens[i].column + tokens[i].token.size();
}
output += "\n";
}
}
}
return output;
as you can see, the code already handles nested macros correctly as basic ones (#define, #undef, #include, #ifdef, #ifndef) and with your code I would implement the parse_expression function (like I said returns true by now, no evaluation) for #if and #elif evaluation. The tricky part is going to be recursiveness evaluation of macros.
I think I'm worrying to much about printing the parsed code with same lines since it is impossible on the case of multiple line macros like:
#define declare_table() class table{ \
int blah;\
};
Since these kind of macros are going to affect the line numbering on the output code.
My worry about same line positions was due to the fact of using the library also for refactoring, but a solution could be tough later I guess.
I will try to read the pages you mentioned of the standard draft xD (I bought an e-reader to accompany me on the nights xD)
Thanks again for your feedback!
Edit: Just did a quick look on c++ draft and I completly forgot about #line, #pragma, #error :shock: but well I think these directives can be safely skipped except for #pragma that may include headers or things like that, what a pain :lol:
Edit: Trigraph sequences - I knew about them but who would use that??? xD mmm I also forgot about Alternative tokens :P,
uhhh also didnt tought about #include MACRO :S, well this post will remind me on things todo :D
It is not GPL, it is more like "do as you please, but if it kills your dog do not blame it on me" kind of license. I think it can be mixed with code under the GPL, but you better ask at least 3 lawyers to be sure :P
that's scary xD
I would not recommend skipping #line as certain tools make use of it (mostly those that produce code, like lex/yacc or preprocessors themselves), and handling #error would be neat because you could inform the user about it way before hitting the 'build' button (as long as the files are always properly parsed to avoid false positives).
Mmm so with #error the library should throw an exception.
Keeping track of line numbers and files is, of course, extremely important. After all, the idea is for Code::Blocks to make use of it, and that information is vital. I think that making a list of the whole set of tools that want to be implemented, and what is needed for each one of them is the way to go to really know how fine grained line numbering needs to be stored.
Well, for now when the code is first tokenized columns and line numbers are stored correctly, what I mean is when outputting the pre-processed code for full parsing of it (lexical analysis?).
The output code would need to be re-parsed with the issue of line numbers modified from original source, unless associations are made to previously tokenized original source.
Lets say we have this original code
#include <something.h>
#define class_blah class test {\
char variable[50];\
};
class_blah
But the output of this would look different
class something{
float test;
};
class test {
char variable[50];
};
It would parse as it should, but loosing original positions. We would still know on which files the class definitions were found but with incorrect line numbers and probably columns. My tiny brain can't think of a solution xD
@ollydbg
I copied the quex generated code on your repo to eliminate my custom tokenizer and use something that would simplify the process as improve it. Also I was studying the modifications you did on the tokenizer class. Is there a simpler example to follow? If not I guess I will have to study it more and search the documentation :D
I'm grad that you use quex generated parser now. :D
1, all the lexer code was under the folder: /cppparser/lexer, they were generated by the file cpp.qx(this is called lexer grammar file), and the running command to generate is "cpp.bat" (windows command to call quex library). If you just want to use the generated lexer, you do not need to install python and quex, because the generated code has contains all the files necessary to compile. If you want to modify the cpp.qx file, then you need to install python and quex.
2, I use the mode "pointing to a buffer" in the generated lexer. which is, when it initialized, I set a NULL pointer to the lexer.
m_Quex((QUEX_TYPE_CHARACTER*)s_QuexBuffer,4,(QUEX_TYPE_CHARACTER*)s_QuexBuffer+1)
this code is the initialization of the lexer, and s_QuexBuffer is infact a buffer of NULL.
QUEX_TYPE_CHARACTER Tokenizer::s_QuexBuffer[4] = {0,0,0,0};
3, When you own buffer is ready, I just "point to" it, see:
bool Tokenizer::ReadFile()
{
bool success = false;
cc_string fileName = cc_text("");
if (m_pLoader)
{
fileName = m_pLoader->fileName();
const char * pBuffer = m_pLoader->data();
m_BufferLen = m_pLoader->length();
if( m_BufferLen != 0)
success = true;
m_Quex.reset_buffer((QUEX_TYPE_CHARACTER*)pBuffer,
m_BufferLen+2,
(QUEX_TYPE_CHARACTER*)pBuffer+m_BufferLen+1);
(void)m_Quex.token_p_switch(&m_TokenBuffer[0]);
cout<< "set buffer size" << (int)QUEX_SETTING_BUFFER_SIZE <<endl;
return true;
}
the char buffer is loaded by "m_pLoader" which is a file loader. so, it contains two info, one is its buffer start address, the other is the length
const char * pBuffer = m_pLoader->data();
m_BufferLen = m_pLoader->length();
this is the Token address set by the user, so when you call
(void)m_Quex.token_p_switch(&m_TokenBuffer[0]);
this will let the lexer go one step and fill the Token.
4, my Tokenizer is modified from the Tokenizer class in CC's current implementation, but has a lot of things changed. The normal way to receive the Token is like below:
bool Tokenizer::FetchToken(RawToken * pToken)
{
(void)m_Quex.token_p_switch(pToken);
QUEX_TYPE_TOKEN_ID id = m_Quex.receive();
if( id == TKN_TERMINATION )
{
m_IsEOF = true;
return false;
}
return true;
}
you supply a token address, and the lexer fill the token. then, you can get the Token's Id and Token's text(if it is an identifier), also it's line and column information.
If you have some problems using quex, feel free to ask me.
after writing a simple function for evaluation of expressions I got stuck :P
This is the test case:
#ifndef MAX_VALUE
#define MAX_VALUE 10000
#endif
#ifndef MAX_VALUE
#define MAX_VALUE ShouldNotOccurre
#endif
#if MAX_VALUE > 100 //This is the simple test I was to evaluate
int testy;
#endif
Here the function I wrote to prepare a list of Token for the constexprevaluator class
/**
* Converts an expression from a #if, #else, etc to an array of elements with macros expanded
* @return Vector with tokens that can be used to evalulate the expression by the ConstExprEvaluator class.
*/
const vector<Token> preprocessor::expand_macro_expression(const vector<preprocessor_token> &expression)
{
vector<Token> tokens;
for(unsigned int i=0; i<expression.size(); i++)
{
Token token_to_add;
if(expression[i].type == identifier)
{
token_to_add.type = ttNumber;
if(is_defined(expression[i].token))
{
define macro = get_define(expression[i].token);
//For macro definitions
if(macro.parameters.size() <= 0)
{
if(macro.value == "")
{
//The macro is defined but without a predifined value for it so we default to 1
token_to_add.value = "1";
}
else
{
//Tha macro has a predifined value (We should check if it's a valid number)
token_to_add.value = macro.value;
}
}
//For macro functions, we need to parse the parameters and then evaluate with recursive function
else
{
//for now we just return 1 but this is totally wrong
token_to_add.value = "1";
}
}
else
{
token_to_add.value = "0";
}
}
//Adds any numbers found
else if(expression[i].type == number)
{
token_to_add.type = ttNumber;
token_to_add.value = expression[i].token;
}
//Handle any other operator found
else
{
if(expression[i].token == "?")
{
token_to_add.type = ttQuestion;
token_to_add.value = "?";
}
else if(expression[i].token == ":")
{
token_to_add.type = ttColon;
token_to_add.value = ":";
}
else if(expression[i].token == "||")
{
token_to_add.type = ttOr;
token_to_add.value = "||";
}
else if(expression[i].token == "&&")
{
token_to_add.type = ttAnd;
token_to_add.value = "&&";
}
else if(expression[i].token == "|")
{
token_to_add.type = ttBitOr;
token_to_add.value = "|";
}
else if(expression[i].token == "^")
{
token_to_add.type = ttBitXOr;
token_to_add.value = "^";
}
else if(expression[i].token == "&")
{
token_to_add.type = ttBitAnd;
token_to_add.value = "&";
}
else if(expression[i].token == "==")
{
token_to_add.type = ttEqual;
token_to_add.value = "==";
}
else if(expression[i].token == "!=")
{
token_to_add.type = ttNotEqual;
token_to_add.value = "!=";
}
else if(expression[i].token == "<")
{
token_to_add.type = ttLess;
token_to_add.value = "<";
}
else if(expression[i].token == ">")
{
token_to_add.type = ttGreater;
token_to_add.value = ">";
}
else if(expression[i].token == "<=")
{
token_to_add.type = ttLessEqual;
token_to_add.value = "<=";
}
else if(expression[i].token == ">=")
{
token_to_add.type = ttGreaterEqual;
token_to_add.value = ">=";
}
else if(expression[i].token == "<<")
{
token_to_add.type = ttLShift;
token_to_add.value = "<<";
}
else if(expression[i].token == ">>")
{
token_to_add.type = ttRShift;
token_to_add.value = ">>";
}
else if(expression[i].token == "+")
{
token_to_add.type = ttPlus;
token_to_add.value = "+";
}
else if(expression[i].token == "-")
{
token_to_add.type = ttMinus;
token_to_add.value = "-";
}
else if(expression[i].token == "*")
{
token_to_add.type = ttTimes;
token_to_add.value = "*";
}
else if(expression[i].token == "/")
{
token_to_add.type = ttDivide;
token_to_add.value = "/";
}
else if(expression[i].token == "%")
{
token_to_add.type = ttModulo;
token_to_add.value = "%";
}
else if(expression[i].token == "!")
{
token_to_add.type = ttNot;
token_to_add.value = "?";
}
else if(expression[i].token == "~")
{
token_to_add.type = ttBitNeg;
token_to_add.value = "~";
}
else if(expression[i].token == "(")
{
token_to_add.type = ttLParen;
token_to_add.value = "(";
}
else if(expression[i].token == ")")
{
token_to_add.type = ttRParen;
token_to_add.value = ")";
}
}
tokens.push_back(token_to_add);
}
Token endToken = {ttEndOfTokens, ""};
tokens.push_back(endToken);
return tokens;
}
And this is the function that catches the tokens and pass them to the evaluator:
/**
* Evaluates a macro expression/condition
* @param define_declaration A macro object
* @return true if condition is true (duh!) false otherwise
*/
const bool preprocessor::parse_expression(const vector<preprocessor_token> &expression)
{
bool return_value = false;
const vector<Token> tokens_vector = expand_macro_expression(expression);
Token tokens[tokens_vector.size()];
for(unsigned int i=0; i<tokens_vector.size(); i++)
{
tokens[i] = tokens_vector[i];
cout << tokens_vector[i].value; //To check the tokens are correct (debugging)
}
try
{
PCToken pcToken = &tokens[0];
if(ConstExprEvaluator::eval(&pcToken) > 0)
{
return_value = true;
}
}
catch (const PreprocessorError &prepError)
{
return_value = false;
//TODO add this exception to m_error
std::cerr << "Exception: " << prepError.getMessage() << "\n";
}
return return_value;
}
and I'm getting the following exception: Exception: Error parsing constant-expression at token
@ceniza
whats the exact meaning of that exception? and sorry for my noobness :oops:
Edit: The expression to evalualte is 10000 > 100
Edit #2: After drinking a glass of water I was thinking on the code I wrote and I figured out I was adding empty tokens on the for loop sorry for that one!