Author Topic: Short-term quick patching code completion (for 1.0 / 1.1) ideas (Read 24092 times)

rickg22 · « **on:** June 04, 2007, 04:01:18 am »

OK, I think we _MUST_ do something about code completion. I noticed that after compiling the Code::Blocks project, closing the project takes Ages. I opened the Process Explorer (for Windows - free replacement of task manager) and noticed that C::B went as much as 180MB usage. Worse, if I re-parse the project after changing the settings, suddenly the UI becomes clumsy and slow.

This is SO WRONG.

Also, the project to re-do code completion from scratch is stalled. Supposedly TakeshiMiya was taking over, but I haven't been able to contact him. I contacted Eranif for his CodeLite, but everything's so new and seems a huge start from scratch. I'm afraid i can't implement it unless i dedicate myself to weeks of study.

So an idea came up to my mind.

There can be some things that CAN be redone in code completion (after all, it was me who revamped it a year ago - disminished the mem usage - so it's something i know better).

Currently the areas of improvement are these:

a) All the tokens are in memory, including ones which are possibly NEVER used.
b) The data structures needed to keep the tokens in memory make it so difficult and complicated.
c) The parser architecture isn't well-separated in layers (how's that called? Abstraction? Isolation? Whatever) so the project is kinda doomed to failure.

My proposal is:

* To keep the tokens in an SQLite database. Adding and searching for tokens will be done using the database backend, so no memory is used besides the 250K memory footprint of the SQLite database. For this we'll have to...

* Use some wrappers so that token searching and adding is handled by a "black box" and the implementation can differ. My idea is to use an object called "TokenDB" as a base class which can later be derived.

* The Tokens visual tree starts with the minimum tokens. When a token is expanded, the tree item subtree is created on-the-fly. When collapsed, the items are disposed of. This way we won't need to keep a tree hundreds of megabytes long.

* The Tokens structures will have a "pointer" of only two integers: File id, and Local Token id. They include a pointer to the TokenData, which is the one that will be allocated / deallocated dynamically. The rest of the functions will depend on whether the data is present or not: If it's not present, use the database functions. If it's present, use the in-memory data.

Hopefully this can be done in a few weeks (we might miss the CB 1.0 launch, but that's better than nothing, and it seems that this patch is much simpler than reimplementing the whole thing).

Later (mid - longterm) we can improve the parser and tokenizer to support other languages. What do you think?

killerbot · « **Reply #1 on:** June 04, 2007, 06:14:43 am »

Yiannis already did a lot of work to stabilize the code. It has been a pool for crashes for a long time. Thanks to Yiannis all of these are gone by now. So I wouldn't touch that code too much anymore, since we want to freeze very shortly.

I think such a rewrite will take a long time, so I would say post 1.0, and maybe at first on a separate branch.

Have to go now, catch the plane ...

rickg22 · « **Reply #2 on:** June 04, 2007, 06:29:38 am »

I agree, I'm not touching the code. Instead, i copied it and put it in a separate folder called "cc2" (Code completion v2).

David Perfors · « **Reply #3 on:** June 04, 2007, 09:24:17 am »

It would be a great improvement to store the data in a database. And I don't think it will give much trouble. Important is that you do it in a branch and keep it simple.

Raindog · « **Reply #4 on:** June 05, 2007, 04:37:19 am »

vcfbuilder has I think a pretty decent C++ parser implemented. I believe takeshimaya attempted to contact the developer of that project wanting to collaborate but i do not know the response received.

rickg22 · « **Reply #5 on:** June 05, 2007, 04:49:07 am »

Quote from: mispunt on June 04, 2007, 09:24:17 am

It would be a great improvement to store the data in a database. And I don't think it will give much trouble. Important is that you do it in a branch and keep it simple.

I made it a contrib plugin

with its own cbp. I could effectively compile it and use it as C::B's code completion, but I had to delete the old codecompletion.dll. Anyway, now i'm starting to modify it.

*Takes a deep plunge* Here i go!

Biplab · « **Reply #6 on:** June 05, 2007, 08:44:31 am »

Can we have the code or a binary plugin for testing?

rickg22 · « **Reply #7 on:** June 05, 2007, 08:51:14 am »

As soon as it compiles :lol: See, I'm changing one file and everything breaks.

But here's an snip of what I'm currently doing:

Code

wxString Token::GetName() const
{
    return m_db->GetTokenName(m_id);
}

wxString Token::GetArgs() const
{
    return m_db->GetTokenArgs(m_id);
}

wxString Token::GetType() const
{
    return m_db->GetTokenType(m_id);
}

TokenID Token::GetParent()
{
    return m_db->GetTokenParent(m_id);
}

The thing is that all searches were done using in-memory data structures, so Token, TokenTree etc. were seen as arrays or the-like, and you see them referenced in for-loops and everything. Since we're switching to offline database, the parser becomes a client-server implementation.
m_db is the database wrapper. It'll have a cached copy of the most recent Token data to keep things fast.

Biplab · « **Reply #8 on:** June 05, 2007, 08:59:19 am »

So is it going to keep the most commonly used header files in a DB? That would be nice. Or maybe an option can be kept for users to define the most commonly used headers.

eranif · « **Reply #9 on:** June 05, 2007, 12:22:43 pm »

Hi,
I can only give you an advise of how I used SQLite for the same purpose:

I am using 2 databases.

One for the commonly used headers which are unlikely to be chaned (e.g. /usr/include)
And one per workspace.

- User can add to the common database entries manually or from the GUI.
- User can replace the common database by simple 'File Open' dialog
- The common database can be shipped with C::B installation (huge advantage)

The parser thread only updates the workspace database.

When user clicks . or -> or Ctrl+Space, queries are made against both databases and the results are merged (the price of doing so, using prepared statements and in-memory feature of sqlite, is very low).

Hope you will find this useful.
Eran

rickg22 · « **Reply #10 on:** June 05, 2007, 03:53:11 pm »

Quote from: Biplab on June 05, 2007, 08:59:19 am

So is it going to keep the most commonly used header files in a DB? That would be nice. Or maybe an option can be kept for users to define the most commonly used headers.

Actually i'm thinking it'd be like this:

a) Keep an array of *ALL* the Tokens in memory. But these only include their index and a (maybe null) pointer to the full data.
b) When data is requested from a Token, it's loaded from the DB into RAM.
c) This is the part that I'm not sure how to tackle. How to dispose unused Tokens? Maybe I'll keep a list of used files, and the ones with least requests gets released.

But I like eranif's choice. Keep a DB of the common headers. The good news is that I'm going to keep separate indexes for tokens of different files, i.e. idx = fileid + tokenid. This will allow me to separate token dbs into files or groups of files.

MortenMacFly · « **Reply #11 on:** June 05, 2007, 04:08:59 pm »

Quote from: rickg22 on June 05, 2007, 03:53:11 pm

But I like eranif's choice. [...]

Rick: I'm not sure if you know about eranif's effort at https://opensvn.csie.org/CodeLite.
I believe this already provides a nice framework for some of the tasks discussed here. (If you are aware of it just ignore my post... ;-)).
With regards, Morten.

rickg22 · « **Reply #12 on:** June 05, 2007, 06:59:37 pm »

Morten: Yes, I'm aware of it. But to be honest, I'm not very good at deciphering big chunks of code that I didn't write

Biplab · « **Reply #13 on:** June 05, 2007, 07:06:43 pm »

I'm downloading it and I'll try to compile it. Let's see if I can decipher bits of it.

eranif · « **Reply #14 on:** June 05, 2007, 07:44:17 pm »

Quote

I'm downloading it and I'll try to compile it. Let's see if I can decipher bits of it. Smile

If you are referring to CodeLite, please note the source files at https://opensvn.csie.org/CodeLite are out of date.

I continued my project elsewhere, to download it use the following:
svn checkout http://svn.berlios.de/svnroot/repos/codelite/trunk

The CodeLite library is now part of a bigger project, LiteEditor (which was its demo application...)
If you need anyhelp deciphering it - just ask me - and I can give you a full guided tour at the library and examples for how to use it.

Eran

Biplab · « **Reply #15 on:** June 05, 2007, 08:01:09 pm »

Quote from: eranif on June 05, 2007, 07:44:17 pm

I continued my project elsewhere, to download it use the following:
svn checkout http://svn.berlios.de/svnroot/repos/codelite/trunk

Thanks Eran for the update.

Quote from: rickg22 on June 05, 2007, 03:53:11 pm

c) This is the part that I'm not sure how to tackle. How to dispose unused Tokens? Maybe I'll keep a list of used files, and the ones with least requests gets released.

I was trying to decipher bits to solve this. We'd be glad to get your suggestion on this.

Thanks & Regards,

Biplab

MortenMacFly · « **Reply #16 on:** June 05, 2007, 08:58:15 pm »

Quote from: eranif on June 05, 2007, 07:44:17 pm

I continued my project elsewhere, to download it use the following:
svn checkout http://svn.berlios.de/svnroot/repos/codelite/trunk

That is good news! I didn't know about that and thought your work has "finished". I'll have a look tonight...

Alturin · « **Reply #17 on:** June 10, 2007, 01:05:26 pm »

Quote from: eranif on June 05, 2007, 07:44:17 pm

The CodeLite library is now part of a bigger project, LiteEditor (which was its demo application...)

Does this mean there are no more plans to integrate CodeLite with C::B?
Also I noticed eranif is the only committing user, did tiwag and takeshimiya abandon the effort?

mariocup · « **Reply #18 on:** June 10, 2007, 11:16:09 pm »

Hi,

I think it a good idea to store that kind of information into a database. I have a suggestion if you are going that way.

As writing documentation for software is not a pleasure so developer often abandon it. Another problem is to keep documentation consistent to the software implementation. So I made an approach that really works fine for me and could be useful for others.

My process of generating documentation for software projects:
1. Add doxygen documentation within your source code. The function name is used as ID for documentation.
2. Put additional documentation in an own file (e.g. services.tex). Add to a corresponding ID an additional description.

Now my perl script reads the doxygen commands from the source code and adds the description of the documentation file (services.tex) and generates a tex file. This file is compiled and I get a final version of the document with bookmarks, references, index, hyperlinks on the fly. Now I save a lot of work, because if a function is added I only have to add an description for the new ID in the services.tex and the sources are clean without to much documentation. Further the script checks for example the parameters of function and their documentation.

The advantage of this method is, that I have a html docu made with doxygen and a pdf version that looks nice and has all navigation features. Futher I can generate different types of documentation (internal, external, all etc.).

No my idea:
Keep a minimum of documentation within the source
Add a description for a function, variable etc. in your database
Start the generation of documentation.

The stylesheet for formating the output could be some kind of xml syntax. If that sounds interesting to you, please let me know.

Bye,

Mario

rickg22 · « **Reply #19 on:** June 21, 2007, 05:32:37 pm »

Update: My work on improving CC is stalled. There's just too much to change

.

News:

Author Topic: Short-term quick patching code completion (for 1.0 / 1.1) ideas (Read 24092 times)

Alturin

mariocup