OK, I think we _MUST_ do something about code completion. I noticed that after compiling the Code::Blocks project, closing the project takes Ages. I opened the Process Explorer (for Windows - free replacement of task manager) and noticed that C::B went as much as 180MB usage. Worse, if I re-parse the project after changing the settings, suddenly the UI becomes clumsy and slow.
This is SO WRONG.
Also, the project to re-do code completion from scratch is stalled. Supposedly TakeshiMiya was taking over, but I haven't been able to contact him. I contacted Eranif for his CodeLite, but everything's so new and seems a huge start from scratch. I'm afraid i can't implement it unless i dedicate myself to weeks of study.
So an idea came up to my mind.
There can be some things that CAN be redone in code completion (after all, it was me who revamped it a year ago - disminished the mem usage - so it's something i know better).
Currently the areas of improvement are these:
a) All the tokens are in memory, including ones which are possibly NEVER used.
b) The data structures needed to keep the tokens in memory make it so difficult and complicated.
c) The parser architecture isn't well-separated in layers (how's that called? Abstraction? Isolation? Whatever) so the project is kinda doomed to failure.
My proposal is:
* To keep the tokens in an SQLite database. Adding and searching for tokens will be done using the database backend, so no memory is used besides the 250K memory footprint of the SQLite database. For this we'll have to...
* Use some wrappers so that token searching and adding is handled by a "black box" and the implementation can differ. My idea is to use an object called "TokenDB" as a base class which can later be derived.
* The Tokens visual tree starts with the minimum tokens. When a token is expanded, the tree item subtree is created on-the-fly. When collapsed, the items are disposed of. This way we won't need to keep a tree hundreds of megabytes long.
* The Tokens structures will have a "pointer" of only two integers: File id, and Local Token id. They include a pointer to the TokenData, which is the one that will be allocated / deallocated dynamically. The rest of the functions will depend on whether the data is present or not: If it's not present, use the database functions. If it's present, use the in-memory data.
Hopefully this can be done in a few weeks (we might miss the CB 1.0 launch, but that's better than nothing, and it seems that this patch is much simpler than reimplementing the whole thing).
Later (mid - longterm) we can improve the parser and tokenizer to support other languages. What do you think?