Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

Regular expressions

<< < (11/15) > >>

byo:

--- Quote from: JGM on March 14, 2008, 08:18:11 pm ---Also we need to know a detailed insight on how wxScintilla works for this purposes, if some one that has already worked with it could explain something  :)

--- End quote ---

I've done some investigation in this area and here are results (may not be complete list but hope it will be useful anyway):

wxScintilla does notify about each it's event using wxScintillaEvent - this one is processed inside cbEditor class. The most imoprtant for us is cbEditor::OnScintillaEvent (around line 2614 in cbEditor.cpp) - it call editor hooks. Editor hooks are generally some functions from plugins which are called when something inside editor change.

Next one place where such hook arrives is CodeCompletion::EditorEventHook function (around line 1731 in plugins/codecompletion/codecompletion.cpp). This function is responsible for detecting whether we should show completion window / function tooltip.

Function tooltip is fired when user enters '(' key and is closed when user enters ')' one. Tooltip is prepared in CodeCompletion::ShowCallTip() function (line 734) - it uses NativeParser::GetCallTips and NativeParser::GetCallTipComas() but I haven't look into them yet.

Code completion window may be fired when user enters '<' key (for list of includes), '.' , '->' or '::'. Some basic  checks are performed to detect whether we can complete by fetching style of entered character ( control->GetStyleAt() ) - this style is generated by wxScintilla while colourising the code.
Code completion window can also be fired when user entered required number of characters in identifier and when required timeout occured. Window can also be fired when user press ctrl+space.

The processing of completion list is done inside CodeCompletion::DoCodeComplete. If this function detects that we're in preprocessor block, it calls CodeCompleteIncludes(), otherwise it calls CodeComplete(). I didn't look into these functions either.

So in general, I've written what happens in 2. point in stevens' list.
3 / 4 and 5 are done in CodeCompleteIncludes and CodeComplete.
1 is done either when project is being loaded or when file is being saved (I tend to press ctrl+shift+S to save all files frequently so it forces symbols to be refreshed ;) ).

Regards
   BYO

JGM:
Nice!  :D


--- Quote from: byo on March 14, 2008, 09:55:09 pm ---The processing of completion list is done inside CodeCompletion::DoCodeComplete. If this function detects that we're in preprocessor block, it calls CodeCompleteIncludes(), otherwise it calls CodeComplete(). I didn't look into these functions either.

--- End quote ---

This is an important fact since we should divide the preprocessors symbols on another table, since like discussed before we are working we 2 different languages

What I still wonder is how we know what text the user entered before the symbols '.' '->' '::' etc...

Does we have to check in current row and column back until a space is found to see what the user typed, or this process is done by re-parsing the file again and again to store the position of symbols?

For example:

AnObject MyObject:

//When I press the '.' character
MyObject.
             ^We start counting back until we fine a space or \r, \n to get the word the user typed previously to
                search on the symbols table

I don't know if wxScintilla already provides this functionality.

Another problems are tooltips shown on codeblocks when opening a '(' are not persistent.

For Example:

//Function Prototype
void MyFunction(string str, int num, double pi, bool isGreat);

main()
{
   string myString;
    MyFunction(
                      ^Ok here is shown a tool tip with void MyFunction(string str, int num, double pi, bool isGreat)
}

The problem is when I write the first parameter, I use ctrl-space to search for myString and the tool tip disappears, so theres also the need to check the source when the user press the ',' to see if inside a functions parameters.

I should launch codeblocks get wxScintilla and start playing with it and the documentation :oops:

byo:

--- Quote from: JGM on March 15, 2008, 12:02:11 am ---What I still wonder is how we know what text the user entered before the symbols '.' '->' '::' etc...

Does we have to check in current row and column back until a space is found to see what the user typed, or this process is done by re-parsing the file again and again to store the position of symbols?


--- End quote ---

The most certain solution would be like this:

* We fire parser from the beginning of the file
* Parse files up till position of newly entered character (each #include directive should try to use some cached list of symbols instead of including real file content)
* When parser reaches current position, it has some local lookup table (or other structures) which would be used when current/next token would be parsed - these table is  exactly what we need - list of symbols which match at current position.
Ok, but this may be time-consuming. So we could optimize it by skipping bodies of functions as I mentioned before.

To do this:

* Start parsing from the beginning
* when parser detects function body ( '{' token after function header ), it skipps tokens till there's closing '}' token (watch out for internal {} blocks) - it should use standard tokenization and preprocessing to skip all '{' and '}' inside comments strings, macro expansions, etc.
* If we jump out of function body, we can continue normal parsing
* Now if during the skipping processing we pass current position, this function is the one we have to parse correctly so we get back (or if toknes inside funciton were stored somewhere we jump to the beginning of this list) and do proper parsing.
* If current token comes outside function we can handle it depending on symbol's context (like helping with includes)
This can be optimized futrher by checking what has changed in the file since previous processing. In most cases, code will be added sequentially so we would be able to use parser's local symbol tables from previous processing only altering them a little bit.

I may be wrong but current implementation in C::B looks like this:

* Parse "using namespace" declarations
* Detect what function we're in
* Parse arguments of that function
* Parse body of the function
* Add symbols from function's scope
* Build list of symbols
BYO

stevenkaras:
BYO&JGM> Interesting discussion. The way I see it, there are 2 ways to handle calls:

1. Reparse the entire file in an optimized fashion
2. moving backwards, skip whitespace, then grab the token. Perhaps a quick function that takes a file position and grabs the token at that location would be useful. Note that this can be somewhat dangerous, as it could lead to a quickly outdated symbol table, unless all the conditions for adding/removing symbols are carefully designated.

So while we probably want to start out with the first, we can consider the second for a possible extension if optimization of this process is needed.

Also, I thought of another feature and a few more ways C::C should be called. The new feature would be tips on variables, showing the type, and possibly, the scope (if relevant). As for the events, C::C should be called on a mouse hover over an identifier after a timeout. Useful, in a pinch.

I'm updating the wiki with some of this, and rewriting a good chunk of what's already there. I can't sleep, and I've got a full pot of coffee.   :mrgreen:

Almost forgot. I'm also looking at sources of a BNF definition of C++ so we can feed that into bison&flex. I've seen some promising stuff.

Edit:
I've been poking around, and found some interesting stuff. First off, C++ grammar is not LALR compatible. So without some inaccuracy, flex is unusable, although bison supposedly supports the more general parsing method. The tradeoff is a big hike in inefficiency in order to work out all the accuracy issues(which would only pop up in somewhat extreme cases, as I'm told).

JGM> found an interesting read you might like: here

Also, I was able to find a BNF of C++ here, but it's put in with hyperlinks, and I'm not in a mood to remove by hand (or even write a script to do it for me), so I'll search some more. the GCC page may have something.

Ummm...we have a problem. We've been talking about creating a code completion plugin for C++. Which version of C++? Reading the gcc manual, I remembered that the language is still being developed, and that there are many(at least 4 official, final) standards out there.

In any case, I've found a few files:
1. Appendix A of "The C++ Programming Language", Special Edition, which is apparently C++98, and is a bit convoluted, to say the least.
2. A supposedly broken grammar with and without actions(no clue)

In any case, apparently gcc switched from using a yacc/flex style parser to a hand-written recursive one a few years ago(around 2002), and have been using that since then. Poking around in a copy of the 4.2.1 branch, there are a few yacc files, but nothing really all that useful. We may need to write a new grammar from scratch(painful to think about, but unless anyone can come up with a better alternative...)

JGM:
Wow, just read the wiki and I'm getting scare with all those examples on Complex cases of usage, and this is something that really needs some deep thinking  :shock: we should wrap g++ code and modify it to fit all these needs (just a joke) :lol:

Well so we have a symbols class and initial requirements. should we start thinking on a parser class?
Also, does we are going to use 2 different parsers? preprocessors and normal c++ syntax?

typedef map<tokenIdentifier, stringRepresentation> Tokens;
//Maybe not quite nice, just thought about, token[PLUS] == CurrentString found by parser, maybe would create a nice and readable syntax

class PreprocessorsParser
{
    Tokens m_Tokens;
    vector<Symbols> m_SymbolsTable;
    vector<string> m_Defines;
    vector<string> m_Directories;
    string m_Extensions;
   
    void AddDefine(const string& define);
    void AddDirectory(const string& directory);
    void SetFilesExtension(const string& extensions); //Should be a comma or space seperated
    vector<Symbols> GetSymbolsTable();
    string GetParsedFile(); //This would return a string representing a cleaned file of unneeded macros such as #if, #ifndef #elif, #endif
    void StartParsing();
};

class Parser
{
    Tokens m_Tokens;
    vector<Symbols> m_SymbolsTable;
    vector<string> m_Directories;
    string m_Extensions;
   
    void AddDirectory(const string& directory);
    void SetFilesExtension(const string& extensions); //Should be a comma or space seperated
    vector<Symbols> GetSymbolsTable();
    void StartParsing();
};

Two really incomplete examples just for further working. :wink: just keep it simple and easy to follow.
I'm really sleepy  :oops:

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version