Could it be possible to create a parser using regular expressions?
Still, I think it is possible to create a parser based on yacc & flex which will do most of the work
namespace MyNS {
template <typename T, typename Ty>
class MyClass :
/* some comment here */
public Singleton<MyClass> , public Factory<MyClass>, private SomeOtherClass<MyClass>
{
//Now make sure you ignore this comment as well
};
};//NS
class MyClass{};
MyClass cls;
Ceniza has done that, worked ok too.Still, I think it is possible to create a parser based on yacc & flex which will do most of the workNever heard of that!
Ceniza has done that, worked ok too.
So the work of Ceniza will be implemented for the future code completion plugin?Of course it's true, why would I make up such a story. It's not even April Fool's day :)
If that's true, it is great! Then there should not be any other work done by others.
Then there should not be any other work done by others.There is no reason why someone capable of coming up with a better solution than the present one should not work on it. It can only be for good.
JGM: IIRC you speak Spanish. If you want you can check my proposal right here (http://ceniza666.googlepages.com/Anteproyecto.pdf) (which is in Spanish).
It would be great to see a project of that magnitude working
By the way what does it means IIRC?
If there is something I could do to help would be nice.
rather than everyone just saying: "let's use regex!" or "it should look like visual assistant x!"
Provide, in an automatic/seamless way, all possible/likely solutions to "complete" the current identifier.
teplate<typename T> class Template
{
public:
T& GetInstance() { return m_Instance; }
private:
T m_Instance;
};
class Parameter
{
public:
void PrintfText() { printf("Text"); }
};
int main(int,char**)
{
Template<Parameter> Object;
Object.GetInstance().Printf <<<<< What to complete here ?
}
class identifier
{
string name; // the identifier name
int decl_line; // line number of declaration (prototype for functions)
virtual string tooltip(void) = 0; // returns what a tooltip should display for the identifier
virtual string listname(void) = 0; // returns what the list name should look like
};
class variable : public identifier
{
string type; // the type of the variable
};
class enumeration : public identifier
{
};
class typedef : public identifier
{
string type; // the base type
};
class function : public identifier
{
int impl_line; // line number of the definition
string returns; // the return type
string signature; // The parameter list
};
class preprocdef : public identifier
{
string macro; // the other side of the macro
};
class namespace : public identifier
{
list variables;
list enumerations;
list typedefs;
list functions;
list classes;
list namespaces;
void using(identifer); // to support the using keyword(it brings something into the current namespace)
};
class class : public identifier
{
list base_classes;
list variables;
list enumerations;
list typedefs;
list functions;
list classes;
list namespaces;
list static_variables;
list static_enumerations;
list static_typedefs;
list static_functions;
list static_classes;
list static_namespaces;
}
class file
{
string filepath; // the filename + path (to open it quickly for reference use)
namespace global; // the global namespace
}
class symbol
{
string name; // name of the symbol
int id; // Id of the symbol, should be unique in the project
int file_id; // if od file where the symbol has been declared
int filepos_begin; // Position where declaration of the symbol starts
int filepos_end; // Position where declaration of the symbol ends
int type; // Type of the symbol: macro / class / typedef / variable / function
int modifiers; // Bitfield used to mark some estra properties of symbol like that it is static or inline
int value_type_id; // Id of symbol which represents c++ type of current symbol (like type of variable or type of returned value from function)
int extra_type_id; // Extra type used in some cases
list children; // List of child elements of this symbol (members in class etc)
list extra_lists[3]; // Some extra lists which can provide additional symbols depending on type of current
// symbol - like list of base classes or list of template arguments, maybe we could give
// more than 3 lists, but I didn't found any reason for that now.
map extra_values; // int -> string map which can keep some extra data
}
class list
{
int symbol_id; // Id of referenced symbol
int scope; // Scope of the symbol (public / private / protected ... ), don't have to be used
}
type | | | modifiers | | | value_type_id | | | extra_type | | | children | | | extra_lists[0] | | | extra_lists[1] | | | extra_lists[2] |
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
namespace | | | | | | | | | | | | | | | | | declarations in namespace | | | | | "using" namespaces | | | | | | | | | |||||
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
class / struct / union | | | | | | | | | | | | | members of class | | | | base classes | | | | template args | | | | ||||
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
variable | | | | extern, static, volatile, const | | | | type of variable | | | | | | | | | | | | | | | | |||||
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
function | | | | static, inline, const ... | | | | returned value | | | | | | | arguments | | | | template arguments | | | | | | | |||
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
typedef | | | | pointer, array, reference, pointer_to_member | | | | base type | | | | type of class in pointer_to_member | | | | | | | | | | | | | ||||
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
enum | | | | | | | | | | | | | items in enum | | | | | | | | | | ||||||
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
enum item | | | | | | | | | | id of enum | | | | | | | | | | | | | ||||||
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
macro | | | | | | | | | | | | | macro parts | | | | | | | | | | ||||||
---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- | | | ---------------- |
macro part | | | | arg_to_string, va_args | | | | number of arg or -1 | | | | | | | | | | | | | | | |
symbol type | |without static | |with static |
global variable | |external linkage | |internal linkage |
local variable | |automatic storage | |program storage |
function | |external linkage | |internal linkage |
class member | |only accessed through an object | |only accessed through the qualified name |
3. JGM> I didn't include the preprocessor definitions because the preprocessor has it's own syntax, and is loosely connected to the C++. Keep in mind that you're actually programming in 2 languages at the same time.
template <class StringType, class TokenType>
struct PPSymTabElement
{
typedef std::vector<StringType> ParameterListType;
typedef std::vector<TokenType> ReplacementListType;
bool valid;
bool isFunc;
ParameterListType parameters;
ReplacementListType replacement;
PPSymTabElement() : valid(false), isFunc(false) {}
};
6. I think we should back off on storing everything about each variable for a bit, as it's just unnecessary, and focus on getting a basic implementation working. But I would like to mention that we should keep in mind that there are other uses for a symbol table other than code completion: a symbol browser, improved syntax highlighting (that would catch non-identifiers), and code refactoring.
Ceniza, could you explain a bit more about your class? I didn't quite grasp what you tried to do(although it looks sound).
symbol type |without static |with static global variable |external linkage |internal linkage local variable |automatic storage |program storage function |external linkage |internal linkage class member |only accessed through an object |only accessed through the qualified name
class Stuff
{
public:
static int m_StaticMember;
};
int Stuff::m_StaticMember = 10;
void func(void)
{
Stuff object;
Stuff::m_StaticMember = 11; // This is valid code
object.m_StaticMember = 12; // This is valid code too
}
1. As for the extra lists, wouldn't it be better to use inheritance to implement that concept, rather than placing it in the base class?
class Symbol
{
public:
list m_ExtraStuff[3];
};
class ClassSymbol
{
Symbol * m_Symbol;
public:
ClassSymbol(Symbol* symbol): m_Symbol(symbol) {}
list& GetMembers() { return m_Symbol->m_ExtraStuff[0]; }
};
2. Again, you use the extra list to show use of the using keyword, but I think it'd be simpler to effectively allow the transfer of symbols between namespaces. Especially once you consider the various ways you can use the using keyword (using namspace std; using std::cout; using ::myVar; etc)
4. BYO>Providing a symbol table is a hard task. But I like to see that everyone has put some thought into it. I got where you were going with the class, trying to avoid using inheritance, and the virtual table, but the code can be inefficient at first, and we can always re-implement it later as a monolithic class.Sure, I'm not the boss here nor the most c++ expert :) I just want to have really good C::C 8)
6. I think we should back off on storing everything about each variable for a bit, as it's just unnecessary, and focus on getting a basic implementation working. But I would like to mention that we should keep in mind that there are other uses for a symbol table other than code completion: a symbol browser, improved syntax highlighting (that would catch non-identifiers), and code refactoring.
Sorry for the delay. Real life did what it does best: rears its ugly head and distracts you from the fun.
2. As for handling multiple lists, yes, 3 does seem to be the most relevant. In addition, the reason why we should keep in mind linkage is that we want to cache the external linkage of each file, not the internal(which I think should be generated on the fly).
5. I'll sit down over the next few days and write a revised proposal, and post it up here once I'm finished.
Looking formard for the proposal :) maybe it could be put onto wiki so it would be easier to work on it ?Done. Here's the link:
Done. Here's the link:
http://wiki.codeblocks.org/index.php?title=Code_Completion_Rewrite (http://wiki.codeblocks.org/index.php?title=Code_Completion_Rewrite)
Questions:
* why are we storing filepos_end? Wouldn't it be much more useful to store declaration, definition info?
Cool to see so many interest and effort on planning a new code completion plugin. I just wanted to point that eranif has made a great job, I think that his library support everything that has been discussed here. Check his sample videos here http://codelite.org. His code completion library has involved into a full C/C++ IDE. Some time ago I talk to him to implement a plugin for codeblocks that used his library but I have become some lazy and also out of time.
Maybe theres no need to reinvent the wheel if is good enough :D
Actually it uses most of the components that use code blocks, as scintilla and wxWidgets, maybe is not so hard to use it on codeblocks :)
One more question to Ceniza: You said something about new parser, what's the current progress ? Any results now ?
Just curious, has anyone emailed the codelite dev and asked them if they would be interested in trying to integrate it with C::B ?
I don't think thats possible since eranif (codelite author) is really busy working on the CodeLite project.
and active C::B forum-member.
I've done some more investigatinos - CodeLite works perfect in many cases but it still have some problems with templates (for example I couldn't make function templates to work). And any more complex template-based code won't work - but that's probably just a temporary issueIndeed, the CC for template functions is not supported yet, but can be added with relatively small effort.
I don't think thats possible since eranif (codelite author) is really busy working on the CodeLite project.Indeed, many tasks to complete and only one developer ... (myself)
theres some irony on thatI fail to see the irony in that, for me hanging in C::B forum is like being a member in a forum dealing with the my interest - IDEs.
One more question to Ceniza: You said something about new parser, what's the current progress ? Any results now ?
Cool! Does that code could be configured to work with any other language?
I don't really see a reason to mix many languages in the same symtab.
Not really. The preprocessing stage is only for C++. I think the multi-language functionality should be implemented differently, perhaps creating a plugin where parser plugins can be connected. I don't really see a reason to mix many languages in the same symtab.
Right now I think the best step forward would include the following:
...
3. have someone play with flex, come up with a good language spec for c++(worry about C later)
4. have someone else play with bison, and work on the C++ file for the parser.
PITA (Pain In The Ass, before someone asks)
Wow. I go away for a day or two and look what happens. This is really starting to come together as a team effort.
JGM> As for decoupling the language, the reason why you can't simply make the symbol table abstract is because languages are very different.
Yep, i will start having a really bad time, but then if theres is going to be a different parser for each language, I think would be good to standardize the parser structures, or create some abstract (interfaces) classes where others may derivate.True. Parsing in general is a PITA. My brain is fried from this week, but here goes:
1. Symbols from a language are placed into a symbol table.
2. The user requests a completion of the current symbol.
3. the C::C plugin determines the correct scope of the symbol(global, local, class)
4. Compare the current symbol against the proper scopes.
5. Provide a composite list to the user.
Also we need to know a detailed insight on how wxScintilla works for this purposes, if some one that has already worked with it could explain something :)
The processing of completion list is done inside CodeCompletion::DoCodeComplete. If this function detects that we're in preprocessor block, it calls CodeCompleteIncludes(), otherwise it calls CodeComplete(). I didn't look into these functions either.
What I still wonder is how we know what text the user entered before the symbols '.' '->' '::' etc...
Does we have to check in current row and column back until a space is found to see what the user typed, or this process is done by re-parsing the file again and again to store the position of symbols?
"Stay tuned, and stay infected." (I wonder if anyone knows where this quote comes from :D)
"Stay tuned, and stay infected." (I wonder if anyone knows where this quote comes from :D)
I tried to google cheat. It linked back to this post. Damn their spiders are getting fast.
thanks for the link of the source! :D I was trying to check the sources of your svn repository to study them, but i don't know how :oops:
I have a questions, I'm curious to know if the implementation is going to support different file encodings, that should be something to visualize or anticipate, I think :roll:
I wouldn't mind sharing the document, but it's in Spanish. Maybe the library reference would be of more help, yet it's not as interesting to read as the document itself :P
I wouldn't mind sharing the document, but it's in Spanish. Maybe the library reference would be of more help, yet it's not as interesting to read as the document itself :P
My native language is spanish :D
I was reading your project documents and liked very much the irc chat conversation :P I felt like the beginner. I haven't finished reading but I'm at page 37 (and reading) I can say that you could be a great professor (you are really good with words) :) Your university faculty should approve that immediately is an A++.
The reading was pretty cool, implementing all those things properly should produce a really high speed c/c++ parser.
The tests used three different types of string: STDString, CCPString<SimpleAllocator> and CCPString<CCPStringAllocator>. Here are the averaged results:
STDString: 0.409 s
CCPString<SimpleAllocator>: 0.193 s
CCPString<CCPStringAllocator>: 0.182 s
For being a SWAG we achieved a 2.12x performance gain using SimpleAllocator and 2.25x using CCPStringAllocator. Not bad!
What about memory usage? Let's take a look:
STDString: 11864 KiB
CCPString<SimpleAllocator>: 11792 KiB
CCPString<CCPStringAllocator>: 18496 KiB
Now the situation changes a bit. When using the SimpleAllocator we save less than 1% of memory, but using CCPStringAllocator increases the memory usage by a 56%, both compared to the memory usage when using STDString. I'm still curious about it. It'd be a good idea to play with different values for blockSize and see how the program behaves.
For those who also care about executable size, here are the results for each case:
STDString: 266.87 KiB
CCPString<SimpleAllocator>: 210.80 KiB
CCPString<CCPStringAllocator>: 238.81 KiB
Overall, SimpleAllocator has the best performance/memory usage/executable size ratio. Still, CCPStringAllocator gives the highest performance, but not for much compared to SimpleAllocator (about 6%).
I'd say we achieved our goal.
This document is copyrighted by me (Paúl Andrés Jiménez). If you reproduce this file or parts of it in your website or a document, print it, save it, borrow it, share it, whatever... you're gonna Burn in Hell (it's actually a nice song by Dimmu Borgir). If you try to take ownership of this file, I'll f*cking sue you! That is, if I ever find out that you did. If burning in Hell is good enough for you, just don't forget to include this disclaimer. Also, all the names of companies, products and such are owned by their owners (doh!).