Regular expressions

Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

Regular expressions

<< < (5/15) > >>

stevenkaras:

--- Quote ---6. I think we should back off on storing everything about each variable for a bit, as it's just unnecessary, and focus on getting a basic implementation working. But I would like to mention that we should keep in mind that there are other uses for a symbol table other than code completion: a symbol browser, improved syntax highlighting (that would catch non-identifiers), and code refactoring.
--- End quote ---

I've thought about this even more, and it makes sense to split up code completion into several plugins. The basic one would provide the symbol table, along with an API for accessing that table(remind me how document trees are handled by wx?). Others can build on top of that, providing functionality for code completion, a symbol browser, and code refactoring.

Ceniza, could you explain a bit more about your class? I didn't quite grasp what you tried to do(although it looks sound).

Other than that, I'm a bit swamped this week, so at least until Sunday I won't have much time to think for myself :?

Ceniza:

--- Quote from: stevenkaras on March 05, 2008, 05:00:44 pm ---Ceniza, could you explain a bit more about your class? I didn't quite grasp what you tried to do(although it looks sound).

--- End quote ---

I guess you're asking about the code I pasted. If that's the case, it's quite simple: there're two different types of macros. The first type are object-like macros, and the second type are function-like macros. isFunc indicates whether it's a function-like macro or not (obviously, it's an object-like one if false). If it's a function-like macro, then the parameters list (or vector if you prefer) will be populated with the name of all its parameters (the comma separated list of names that follows the left-parenthesis and which ends with a right-parenthesis), and if it's an object-like macro, then it 'll be empty. replacement is what follows the macro name and parameters (if it's a function-like one). valid... well, let me explain something first.

You may be wondering where the heck I'm storing the name of the macro. The name of the macro is the key of a LookupTable (std::map for example), and the element is an instance of PPSymTabElement. A requirement of the LookupTable is to return a reference to the element stored in it. If the element doesn't exist, then it returns a reference to a PPSymTabElement created with the default constructor. Now, valid is set to false by default, and it must be set to true before inserting a new element to the LookupTable. That's how you tell the difference between found and not found. It's surely not the most elegant solution, but, since it's my own design, I can do whatever I want. Furthermore, the whole code is full of strange design decisions just for fun... and to later test the parser's ability to understand itself.

I hope it's clearer now.

Now... to finish this post... what everyone was waiting for: pictures of naked women!

LOADING ...

byo:

--- Quote from: stevenkaras on March 05, 2008, 01:30:24 am ---symbol type|without static|with staticglobal variable|external linkage|internal linkagelocal variable|automatic storage|program storagefunction|external linkage|internal linkageclass member|only accessed through an object|only accessed through the qualified name
--- End quote ---

Ok, I still don't agree that separate lists are required. When you want to have code-completion. In most cases it will cause us to process few symbol tables instead of one. And let's take an example:

--- Code: ---class Stuff
{
public:
static int m_StaticMember;
};
int Stuff::m_StaticMember = 10;

void func(void)
{
Stuff object;

Stuff::m_StaticMember = 11; // This is valid code
object.m_StaticMember = 12; // This is valid code too
}

--- End code ---

Now if we want to complete in the first case (through scope name) we can skip all non-static variables and functions, in second case we have to enumerate all no matter what modifiers are applied. And it may be helpful to provide whole list in first case just to give user's some hint on the content of class.

--- Quote ---1. As for the extra lists, wouldn't it be better to use inheritance to implement that concept, rather than placing it in the base class?

--- End quote ---

That was caused becuase I wanted to have one class for any type of symbol that will be good enough without any inheritance. And we can always create some helper classes that will hide the "ugly" interface of general symbol class under some nice well-defined interface - for example:

--- Code: ---class Symbol
{
public:
list m_ExtraStuff[3];
};

class ClassSymbol
{
Symbol * m_Symbol;
public:
ClassSymbol(Symbol* symbol): m_Symbol(symbol) {}
list& GetMembers() { return m_Symbol->m_ExtraStuff[0]; }
};

--- End code ---

Now we have few benefits of one unified interface:
* sizeof(Symbol) is always the same, no matter what symbol contains - this allows to create arrays of objects. When using inheritance only arrays of pointers can be created.
* if you look into current code::completion sources you may find that symbols use block-allocated method of memory allocation - this gives some nice boost when objects are allocated and deallocated frequently and this method can not be used in case of inheritance
* saving symbols cache and restoring from files is really easy. In case of inheritance we would have to use different method of saving and loading for each derived class so we wouldn't be able to do this without symbol factories etc.

--- Quote ---2. Again, you use the extra list to show use of the using keyword, but I think it'd be simpler to effectively allow the transfer of symbols between namespaces. Especially once you consider the various ways you can use the using keyword (using namspace std; using std::cout; using ::myVar; etc)

--- End quote ---

I thought about such concept but let's take an example again: you create 10 namespaces and in each of those namespaces you put using std; - we can really quickly run out of memory.
And one more think - when symbol-lookup will be made it will usually have to lookup few namespaces (in most cases we will have at least 3: global namespace, local symbols and class members). So adding few more into current list of namespaces shouldn't be really so expensive.

--- Quote ---4. BYO>Providing a symbol table is a hard task. But I like to see that everyone has put some thought into it. I got where you were going with the class, trying to avoid using inheritance, and the virtual table, but the code can be inefficient at first, and we can always re-implement it later as a monolithic class.

--- End quote ---
Sure, I'm not the boss here nor the most c++ expert :) I just want to have really good C::C 8)

--- Quote ---6. I think we should back off on storing everything about each variable for a bit, as it's just unnecessary, and focus on getting a basic implementation working. But I would like to mention that we should keep in mind that there are other uses for a symbol table other than code completion: a symbol browser, improved syntax highlighting (that would catch non-identifiers), and code refactoring.

--- End quote ---

Ok, but let's still keep one thing on mind: c++ parsing is really _complex_ task because syntax of this language is so messed up. Some mistakes on design stage may be really painfull later ;). But I agree that anything working is better than something that does not.
[/quote]

Regards
BYO

stevenkaras:
Sorry for the delay. Real life did what it does best: rears its ugly head and distracts you from the fun.

In any case, I'm really glad to see this sort of discussion going on. It shows that there are people who care very deeply about C::C and C::B. At this stage, I have a few things to say:

1. BYO> Wonderful reply. You caught some mistakes of mine and rationalized your arguments for the symbol class quite well. Although I did check out the static thing. The reason why I had written that was that I was thinking of typedefs in classes, which must be addressed through a qualified name, rather than an object. Good for us that this gives us an easy way out (there's only one exception, which makes it easier). In addition, the concept of a wrapper class to decouple the implementation of the symbols is wonderful.
2. As for handling multiple lists, yes, 3 does seem to be the most relevant. In addition, the reason why we should keep in mind linkage is that we want to cache the external linkage of each file, not the internal(which I think should be generated on the fly).
3. If we can define a wrapper class, or at least a less ugly interface for the symbol class, this would be a great step in the right direction.
4. We all forgot about friends of classes. These have access to the class scope.
5. I'll sit down over the next few days and write a revised proposal, and post it up here once I'm finished.
6. Again, sorry for the delay, but reality is a bit more important, or at least urgent.

byo:

--- Quote from: stevenkaras on March 09, 2008, 06:23:21 pm ---Sorry for the delay. Real life did what it does best: rears its ugly head and distracts you from the fun.

--- End quote ---

It's rather usual in open-source projects :) We all have to work for food ;)

--- Quote ---2. As for handling multiple lists, yes, 3 does seem to be the most relevant. In addition, the reason why we should keep in mind linkage is that we want to cache the external linkage of each file, not the internal(which I think should be generated on the fly).

--- End quote ---

Ok, but it's as linker sees it. The problem is that c++ is really poor in terms of external / internal linkage. There's really nothing like "using" keyword from java in c++ (well, standard says that some cases of #include may have such behavior but I didn't found it so far).
Each #include does only mean: "put the content of file blahblah here". So the only assumption we can do here is that everything declared in header files should be threated as external stuff and everything in cpp files should be threated as internal stuff (as long as someone do not include cpp file ;) ). So external keyword is only a hint to compiler that such symbol is only declared here and definition stays elsewhere (and will be found while linking).

So my suggestion is like this:

* Keep all known header files preprocessed somewhere (memory / files / mixed storages)
* When working on some file, reparse it as it's changed (this will require some performance tuning - like skipping bodies of functions that are not user), if this file is header such change may fire some update of cache listed in previous point
* When working with file in second point, each #include directive which is on the global scope (so not inside any namespaces, not in classes etc. because that require special threating) adds all keywords from such file into current lookup structures, if it's not in cache or we can not use the cache, we behave as normal preprocessor by including content of such file
Hmm, or maybe I misunderstood something about this external / internal linkage ;)

--- Quote ---5. I'll sit down over the next few days and write a revised proposal, and post it up here once I'm finished.

--- End quote ---

Looking formard for the proposal :) maybe it could be put onto wiki so it would be easier to work on it ?

Regards
BYO

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version