Author Topic: Suggestion: Using ctags & sqlite for code completion  (Read 85088 times)

Offline eranif

  • Regular
  • ***
  • Posts: 256
Suggestion: Using ctags & sqlite for code completion
« on: August 21, 2006, 01:44:17 pm »
Hi,

On January 06, 2006 I posted a thread about using ctags as parser, and many people responded that there is even 3 alternatives for it and some are under development.

My question is:

What is the status of CodeCompletion? - and when I am talking about code completion, I am talking about: Symbol tree, Hint tips, auto completion and find symbols?

is it working well?

I am currently working on a complete solution for all of the above - using my old idea of ctags & sqlite.
I redesgined everything to be more OOD - and its currently working very well.

If this is still relevant, talk to me

Eran
« Last Edit: September 04, 2006, 04:39:24 pm by eranif »

takeshimiya

  • Guest
Re: CodeCompletion - what is the status?
« Reply #1 on: August 21, 2006, 02:15:57 pm »
Hi Eran!

What is the status of CodeCompletion? - and when I am talking about code completion, I am talking about: Symbol tree, Hint tips, auto completion and find symbols?

is it working well?
The GUI-part is working well for the most part. It's the parser the one who isn't.
In the last week the parser has been improved and continues to be improved. But it is still far from perfect, and remember that is a hand-crafted one.

I am currently working on a complete solution for all of the above - using my old idea of ctags & sqlite.
I redesgined everything to be more OOD - and its currently working very well.

I would really want to see if you can make an alternative CodeCompletion plugin, or adapt the current one to your parser.

Now about your parser, I'm about to tell you someone else (ddiego, the author of the VCF library) is working on something very simmilar to this (see here: http://vcfbuilder.org/?q=node/139).
The parser is made by using Ctags and SQLite, with the addition of the C++ ANTLR parser (which is the most complete and correct parser I could found). ucpp is being used for macro preprocessing.

I've talked to him on this thread: http://vcfbuilder.org/?q=node/143 (you need to register in the forum to see it complete), and nonetheless I think what he's working on now it's the best approach for C++ parsing.
For languages other than C++, it's different, as most of the languages aren't difficult to parse, a Ctags-only based plugin is a good bet.

I really encourage you for making the plugin, as well I would like if you can go and discuss or coordinate in the above thread with ddiego, you know, you can learn from his experiences about implementing the parser, and viceversa.  :)

Regards,
Takeshi Miya

Offline Game_Ender

  • Lives here!
  • ****
  • Posts: 551
Re: CodeCompletion - what is the status?
« Reply #2 on: August 21, 2006, 03:17:54 pm »
Please continue with your solution, the pluggin based system of Code::Blocks leaves plenty of room for multiple implementations, especially since I am sure the Code::Blocks team would probably rather be focusing their efforts toward the compilier redisgn effort.

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: CodeCompletion - what is the status?
« Reply #3 on: August 21, 2006, 03:42:03 pm »
Thanks for the motivation  :wink:

I will try to complete the work in the coming weeks with a sample GUI demostration for you guys to give feedback.

When I have something to show you, I will post it here.


Eran


Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
Re: CodeCompletion - what is the status?
« Reply #4 on: August 21, 2006, 09:07:48 pm »
Hi Eran!

Code completion still works the same as it did almost a year ago (except for a few minor tweaks and other default settings), there is no noticeable improvement to date.

Ceniza has partially implemented (and is still working on) a parser that is a lot faster and more reliable than the current one (and which correctly interpretes all language features). Once we're satisfied with it, we will remake the code completion plugin bottom up.

Nevertheless, this should not stop you from pursuing your idea. It should (within the limits of ctags) work just fine, and if nothing else, ctags is reliable and reasonably fast.
Personally, I don't think that using SQLite is a good idea because I believe that parsing the SQL and converting data to and from the database's storage format may cause noticeable overhead, but I may as well be wrong. Prove me wrong! I'll be happy if you do :)

I think having two approaches at hand is not a bad thing at all. Your approach may be a lot more flexible to support other languages, too.
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

takeshimiya

  • Guest
Re: CodeCompletion - what is the status?
« Reply #5 on: August 21, 2006, 09:41:07 pm »
Personally, I don't think that using SQLite is a good idea because I believe that parsing the SQL and converting data to and from the database's storage format may cause noticeable overhead, but I may as well be wrong. Prove me wrong! I'll be happy if you do :)
You'll be happy, Eran has already proved that SQLite is very fast for the purpose (you can download his IDE).
The c++ parser of ddiego also uses SQLite and Ctags, and he said it was very fast, the code parsing is the bottleneck

I think having two approaches at hand is not a bad thing at all. Your approach may be a lot more flexible to support other languages, too.
100% true.


takeshimiya

  • Guest
Re: CodeCompletion - what is the status?
« Reply #6 on: August 21, 2006, 09:48:38 pm »
I'll post here the interesting info from http://vcfbuilder.org/?q=node/143 since it requieres registration:
(The replies are from ddiego)

Quote
Something very simmilar to your purpose have been done here: http://forums.codeblocks.org/index.php?topic=1889.0
You really want to read that thread.

However, I didn't have it clear: you will be discarding ANTLR at all, and only using CTags?
Or will use ANTLR to generate the database?

Interesting thread. I still see the need for only 2 "parsers".

 The ctags parser is used to create the persistent DB which will exist in various places just like it Visual Studio does with it's .ncb files. The question becomes how often these db files become updated. For system include directories this would be a one time cost, the db is made once, and then not messed with unless the system includes dir changes or the system include files changes.

The db for the project would be more volatile, that would have to be changed more often, but given the speed of ctags and sqlite I don't see this as a problem.

The difference between the DB data and the parser data, is that the DB data would be more sparse. But the result of using either one would be the creation of an AST that is a graph of CodeNode instances that can be traversed. So if you parse a single file with the ANTLR based C++ parser, or request some data from the ctags based DB, both will return this information as a collection of CodeNode's.

...

I am (I'm already coding this right now) using both. The idea is to use ctags to create a DB that has a broad overview of the various AST elements, but use ANTLR to provide an exact view of a specific file/resource.

...

In addition to my earlier comments here are some thoughts on where I'd like to see this whole thing going:

Currently, relying we are relying soley on the C++ parser to handle ALL of the parsing chores. To parse a single file in "real time" (about as fast as you can type), it works OK, but to potentially have to have it parseing thousands of files to keep track of all the possible headers in your project, plus system (and other third party headers) files seems unwieldy - it just won't handle this fast enough.


So this got me thinking about how all of this (the parser and the CodeStore "engine") should work. After taking a glance at how Visual Studio seems to do things I've come to some conclusions:

    * First, an simplify things by creating a database of all the core elements that we need to display in our class AST. This set of elements is a subset of the entire AST for any given file. We care about things like function declarations, function arguments, templates, template arguments, class declarations, namespace declarations. Putting these into a database makes it easy to search, and provides more potential flexibility for search types.
    * If we have a database of this data, then it makes sense to support more than one. There would be one db per project, and then one (or more) "global" db's for system headers (like the C runtime, or the C++ STL). The global db's would only have to be generated once, since these won't change often (if at all).
    * We would need a schema for the db, a table that has the following columns: 
          o id INTEGER PRIMARY KEY
          o name  TEXT
          o filename  TEXT
          o line INTEGER,
          o kind INTEGER,
          o language INTEGER,
          o access INTEGER
          o inheritance TEXT,
          o parent INTEGER,
          o signature TEXT

    This schema  would allow for generating a hierarchical display if neccesary

    * To generate these databases, we don't need the full fledged support of the C++ parser, since we need only a limited number of AST nodes, at this level. So what what I'm thinking is to use ctags to generate the initial db info, then use SQLite3.2 to create/store the ctags data into a db. This would accomplish most of what we need, then use the parser for those cases where the entire AST is needed. Using ctags, and SQLite, I can create a DB representation of the entire VC98/Includes directory from scratch in about 1 minute or less (that's about 726,773 lines of code scanned). And this would only have to be done once.

All of the above would be done transparently by the CodeStore engine. SQLite source would become incorporated, and ctags would be used as an exe (we can't use it directly as a library due to GPL issues).


----

You can check out the project by doing:
svn co https://svn.sourceforge.net/svnroot/classdom classdom

« Last Edit: August 21, 2006, 09:53:29 pm by Takeshi Miya »

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: CodeCompletion - what is the status?
« Reply #7 on: August 21, 2006, 10:24:02 pm »
Hi Takeshi,

All what you have written here from the thread - I already implemented (the database described by Diego - is almost identical to my...)

I uploaded to my site a compressed zip of my current work including all sources. I reached the part of the auto-completion - all the infrastructure functions are ready - I just need to put the together.

In the zip you can find:
Visual studio workspace 7.1
Three projects: CodeParser, CodeParserTest & CodeParserGUISample
sqlite3.dll

Compile the workspace and run the GUI sample - it is a very easy to use.

When running first time: use the option: Add source to database
and follow the instructions

double click on an item on the GUI tree to the left, will open it in an editor to the right
and will place the cursor on the correct line.

All the logic and flow are located on the frame.cpp file

to make sure it will run, copy ctags.exe and put it under C:\windows\system32

Link to the source files:
www.eistware.com/wxes/codeparser/codeparser.zip

Link to ctags.exe for windows:
www.eistware.com/wxes/codeparser/ctags.zip

Btw, I too once thought of using real parser for IDE, but I abandoned this idea since true parsers will throw exceptions when syntax is incorrect, so we need more of a guessing system

for example:

When you write:
CBlock block;

as a coder, you automatically assume that CBlock is a class or something like this, but real parser, if it will not find the declaration for it, it will fail.
so you need more tolerant parser.

Anyways, I believe I will complete my work during next week.

Eran



takeshimiya

  • Guest
Re: CodeCompletion - what is the status?
« Reply #8 on: August 21, 2006, 11:11:56 pm »
Btw, I too once thought of using real parser for IDE, but I abandoned this idea since true parsers will throw exceptions when syntax is incorrect, so we need more of a guessing system
You're talking about Compiler parsers, which are designed for being very correct and to fail at the first incorrect syntax.
The ANTLR c++ generated parser does not, however, because it is designed to be extremelly correct, but as it's a generated parser you can control what to do at failing times, and how to generate the AST, etc.

Quoting ddiego which have read your previous thread: "I still see the need for only 2 parsers.
The idea is to use ctags to create a DB that has a broad overview of the various AST elements, but use ANTLR to provide an exact view of a specific file/resource."

The idea of having also an exact view of a specific file also becomes evident when we'll want to use the parser for Refactoring. In that moment, we'll have to use the "exact view".

So that hybrid approach seems to be the best solution.

Anyways, I believe I will complete my work during next week.
Thank you for your efforts, really looking forward to it!

Regards,
Takeshi Miya

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: CodeCompletion - what is the status?
« Reply #9 on: August 23, 2006, 11:16:48 pm »
In the zip you can find:
Visual studio workspace 7.1
I read this and would like to have a look into it. Unfortunately I don't get it compiled. Besides the fact I have no VC7.1 I tried converting this into a C::B project but... failed! :shock:
Eran: Do you see any chance to provid me (us) with a C::B project file that e.g. uses the wxWidgets libs as they are produces for C::B (please look at: http://wiki.codeblocks.org/index.php?title=Installing_Code::Blocks_from_source_on_Windows#Building)? I ask because you may have this already - it may be not much work for you...?!
With regards, Morten.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: CodeCompletion - what is the status?
« Reply #10 on: August 24, 2006, 12:29:42 am »
Hi Morten,

For the VC7.1, I dont have other editors here (unless VC8 is good for you ... ), I dont have C:B cause I am too lazy to build it from scratch  - I am patiently waiting for the official release so I can install it using setup  :D

What I can do, is to pack everything in setup.exe and will upload it to my site

EDIT:

here is the link for the setup.exe installation for the sample - it will install missing dlls & ctags for you (uninstall is also provided)
www.eistware.com/wxes/codeparser/CodeParserSample.exe

While I am writing here, I will report my progress so far  :):

First of all, I want to make a point that I am making this is a wxWidget library and not C::B plugin. So it means that once it is completed, it  is up to the developers here to decide whether they choose to use it or not (an integrate it into C::B), but I will be more then happy to answer/fix anything required from my side.

About the current status:

I actually made a nice progress with it:

- What now is working nicely is the class tree + class tree updating during editing
- Tomorrow I will focus my work on WordCompletion (when you type Ctrl+Space - a list of suitable words will appear or if only one word is available, it will be completed automatically) - this part is including local scope and workspace scope, should be completed by friday I think.

Next step is the code completion - I am familiar with couple of cases that will cause problems:
1. Casting - 'C' style casting is a pain in the a**. - I can parse a simple casting and get the method, however there will be cases that it will fail. the simple cases, such as ((Box*)rect)-> will give the correct results for code completion

2. Inhertiance members are not taking into considerations - but this one is easy to implement - so I dont expect any problems here

After code completion, comes symbol searches ( find function declaration Ctrl + . will jump you to function / member implementation or declaration )

I really hope that by the end of next week everything will be completed for testing ( I do expect bugs, but i want it to be stable )

Eran


« Last Edit: August 24, 2006, 01:20:08 am by eranif »

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: CodeCompletion - what is the status?
« Reply #11 on: August 27, 2006, 12:52:54 am »
Hi,

I have updated the sample program on my website, it is now demonstrates the following:

1. Building up a GUI tree (AKA Symbol tree) using ctags & sqlite
2. GUI tree is updated upon saving the file. If you will change file content the tree will be updated once you save it, this is done since ctags accepts files as input, so updating the tree per interval is too much overhead (creating temporary file, passing it to ctags and then delete it - too much)
3. Saving the tree into database for later reloading
4. word completion is now completed - what is word completion you may ask, well, word completion is step one of the CodeCompletion, it completes words using Ctrl + Space, the completion is considering the local scope and the entire workspace.
HOw does it works? first it scans local scope for members that qualifies with the partial name under the cursor, and then scan the database and adds other matches (if we are inside class, it will add its members / functions) such as GLOBAL variables + function + class struct etc

if only single match exist, it will automatically insert it under the cursor, else it will print out to the debug window the matches it found. scope depth is taking into consideration as well, for example:
Code
void foo()
{
    int number =0;
    if(number == 0)
    {
           wxString name;
           break;
    }
    n<------ Here you type Ctrl + Space
only number will be available from local scope since name is in depth=2 while number is depth=1 same as n.

If you would like to check out the sample (I named the libe CodeLite & the sample program LiteEditor  :wink::
http://www.eistware.com/wxes/codeparser/liteeditor.exe

If you want the source files, drop me a message here or in private and I will upload them or email them (to package them I need to close my editor and close all windows so I can tar it ^^)

Eran

grunerite

  • Guest
Re: CodeCompletion - what is the status?
« Reply #12 on: August 27, 2006, 03:27:51 pm »
If you would like to check out the sample (I named the libe CodeLite & the sample program LiteEditor  :wink::
http://www.eistware.com/wxes/codeparser/liteeditor.exe

Hi Eran,

Pretty cool. I think this is meant to be an example to show that it works, but this example app could be made a valuable tool if you add 1 more feature: the ability to open more than 1 source file at a time. In the file open dialog, I can select many files at once, but it only opens 1 file (last clicked?).

I commonly download some library code, and don't want to load it up in an IDE to browse classes, etc. With this app, I could just load up a bunch of source at once and browse through it to learn the library faster.

Very nice.

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: CodeCompletion - what is the status?
« Reply #13 on: August 27, 2006, 03:45:09 pm »
Hi,


Pretty cool. I think this is meant to be an example to show that it works, but this example app could be made a valuable tool if you add 1 more feature: the ability to open more than 1 source file at a time. In the file open dialog, I can select many files at once, but it only opens 1 file (last clicked?).


Well, my intention was to demonstrate that the library itself works well before I will hand it over to some of the developers here to integrate it into C::B.

The reason that you cant open more than one file is that the current API of the library TagsManager::SourceToTags support single file only, I still need to add some kind of batch operation API, but this is very straightforward to do.

About enhancing it, once the library is completed I will publish it along with the source code of the editor (the sample program) to the public domain

I published it as setup.exe becuase one of the developers here said that he cant open the source file using VC7.1 - my working enviornment, so I created an install shield using InnoSetup for it.

I really hope I will complete the CodeCompletion by the end of this week.

A question for users / developers here:

I am considering that the library itself will popup a AutoCompletion box - my own implementation, and a find symbol dialog (for example, find a function declaration/definition)
what do you guys think?

Eran

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: CodeCompletion - what is the status?
« Reply #14 on: August 28, 2006, 11:45:42 pm »
Hi,

I think that the code completion is pretty much ready for integration - if you guys want it.

I will start by listing what I have accomplished so far:

- As described in previous posts - tree view & tree updates using the library thread which works in the background
- WordCompletion - attempts to complete a word under the cursor when hot key is pressed ( in the demo it is Ctrl+Space ), if single match is found, the word is inserted automatically no list box is poped, other wise, user selection box pops up (scintilla built in)
- CodeCompletion - typing an operator . or -> will attempt to parse the expression and to evaluate its return value, this is done with no limit of how complex the sentence is. for example: GetClassBox().GetBox().GetName(), a popup box will be shown for every operator (if a match will be found).
- Partial casting is supported (currently, only 'C' style casting is supported)
- All files were build and tested using g++ 3.4.5 MinGW, makefile is provided, you may need to alter it a bit, but the hard part of converting to code to compile under g++ is behind me (pheww, MSVC warning level 4 didnt do the job i got hundreds of errors when attempted to build with g++, especially with templates)
- Batch API was added to allow adding multiple files at a time to the database (in the demo you can select File->Add source file -> and select multiple files)
- Smart file parsing, if for example you have only the implementation of a function :
 
Code
int Rectange::GetTopRight() {}
and the class Rectangle is declared in another file, which you did not add to the database, the library will automatically identifies that something is missing (in our case Rectange) and will 'fill' the space with 'dummy' entry to the tree view and the database. Once the real entry is added it will replace the dummy one.

Need to be added with minimum effort:
- Scope operator ( :: )
- this pointer & *this
- C++ casting (static_cast, dynamice, const_cast, reintepret_cast) style
- Identify whether an indentifier is a pointer or reference (or object), currently it responds for both regardless the type (e.g. Box *b; b. <-- will open a popup box)

I think that the code is well documented and can be easily read and followed.

The source files can be retrieved at:
http://www.eistware.com/wxes/codelite/CodeLite_sources.zip

to build the demo, you will need the following:
wxscintilla, wxsqlite3 and wxcontrols (already provided, under sdk/gcc_lib)

Setup for the demo:
http://www.eistware.com/wxes/codelite/LiteEditor.exe

Eran