Author Topic: Using CTags as main parser  (Read 37344 times)

Offline eranif

  • Regular
  • ***
  • Posts: 254
Using CTags as main parser
« on: January 06, 2006, 02:36:42 pm »
Hi,

I was browsing your forums when I saw that you are not using CTags for tagging your files, but rather wrote your in-house parser.

My question is why not using Ctags? It supports over 30 languages, parses hunders of files in seconds ... and it is very reliable.

For an IDE I wrote, I implemented a mechanism that works as follows:

- A thread is created in the main frame - the ClassViewThread
- Every interval the thread is scanning all modified files - the results are stored in an SQLite database.
- If the GUI tree is not updated with the new parsing - an event is sent to the ClassView object to update its data (the event contains the differeneces between the old files data and the new data)

What do you think? I will be happy to contribute the code if you want

Eran



Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4291
    • Code::Blocks IDE
Re: Using CTags as main parser
« Reply #1 on: January 06, 2006, 03:03:54 pm »
Eran,

the in-house parser was written for things that ctags cannot help you with (like parsing local function variables, function arguments, etc).
Although, lately I cannot work on it because the amount of work needed for the rest of the project is too much to leave me spare time for the parser...

So, if you want to contribute a code-completion plugin that would use ctags, please do :)
Many people will be happy.
This should help you get started: http://wiki.codeblocks.org/index.php?title=Creating_a_simple_%22Hello_World%22_plugin
Be patient!
This bug will be fixed soon...

Offline eranif

  • Regular
  • ***
  • Posts: 254
Re: Using CTags as main parser
« Reply #2 on: January 06, 2006, 03:10:10 pm »
I know CTags does not handle local vars, However, the code I said I will be happy to contribute uses also a lexer + written with flex language to analyze a current scope (small scope, the main parsing is done via CTags)

Anyway, I will have a look at the link you gave

Eran

Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4291
    • Code::Blocks IDE
Re: Using CTags as main parser
« Reply #3 on: January 06, 2006, 04:54:32 pm »
I know CTags does not handle local vars, However, the code I said I will be happy to contribute uses also a lexer + written with flex language to analyze a current scope (small scope, the main parsing is done via CTags)

That's interesting :)
Be patient!
This bug will be fixed soon...

Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Using CTags as main parser
« Reply #4 on: January 06, 2006, 06:33:44 pm »
- Every interval the thread is scanning all modified files - the results are stored in an SQLite database.

I find the use of a database an interesting choice to store data :). SQLite seems to me enough light-weight and cross-platform to be used within C::B. Anyway, if C::B would integrate SQLite (or any other embedded database), this would make C::B dependent of SQLite, which is not IMHO forcely positive. Burden could be added with UNICODE, Windows, Linux and other OSs support,...

Michael

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #5 on: January 06, 2006, 06:37:39 pm »
And don't forget that I'm STILL working on the codecompletion plugin!

Right now i'm optimizing the class browser, when i have that ready, i'll commit and start working on the parser.
Edit: But it's OK if you want to start working on your parser :) I don't want to hinder anyone.
« Last Edit: January 06, 2006, 06:39:15 pm by rickg22 »

Offline eranif

  • Regular
  • ***
  • Posts: 254
Re: Using CTags as main parser
« Reply #6 on: January 08, 2006, 09:47:26 pm »
My parser is already completed - I have it ready and tested  :D

I have the code written a year ago for an IDE of my (which I abanddoned).

ctags:

The good thing about using CTags - is the speed it can parse files, (it can parse hundreds of files in less then seconds), the additional flex based parser that I added was cause CTags cant parse local variables, So I used regular expressions + flex synatx to do that - works very good (it can handle template as well)

SQLite:

The SQLite DB is very portable and I would recommend it to anyone who wants to use a fast embedded database.
The idea is to parse the whole workspace when you start running (or use the db from previous runnings), and then let a thread parse the changes as they happen in the certain interval. I think that this is the best way of doing it - use cache (db) + runtime

Using db can be an advantage for other tasks as well:

- For example, Find a symbol in workspace - simply scan the db and offers the user the results, since it uses SQL, it is very straight forward.
- Code completion can be simply taken from the db ...
and other options as well.

I think that writting C++ parser from scratch is not very simple task, dont mention the bugs that it will contains ...
CTags is very mature tool that can help you do the parsing better - so I recommend ctags.

Eran


Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Using CTags as main parser
« Reply #7 on: January 08, 2006, 10:40:43 pm »
The SQLite DB is very portable and I would recommend it to anyone who wants to use a fast embedded database.

Interesting to know. If it is so good it is worth an implementation-try in my project :).

I think that writting C++ parser from scratch is not very simple task, dont mention the bugs that it will contains ...
CTags is very mature tool that can help you do the parsing better - so I recommend ctags.

Implementing a parser from scratch is as you correctly said not a very simple task. Anyway, the complexity of the parser depend on your application's requirements. The implementation of a recursive-descent parser is "relatively" easy :). Naturally, it is not so fast as table-driven parser, but this is may be not so critical for you application. Sometime ago, I have posted about the C++ parser of a Mini-C++. See this topic for further information.

Michael

PS.: Is you parser open source and freely available for testing?

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #8 on: January 08, 2006, 11:16:14 pm »
AH, so it's a recursive descent one! That means I still have hope with my rewrite :P

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #9 on: January 08, 2006, 11:36:02 pm »
The SQLite DB is very portable and I would recommend it to anyone who wants to use a fast embedded database.

Interesting to know. If it is so good it is worth an implementation-try in my project :).

That's interesting too, because in my version of the CodeCompletion plugin, which ironically, is incomplete :P, I use a trie to search for the tags. Eranif's implementation uses a database with indexes, which are themselves search trees.
I guess the main difference is that my index resides in memory, while eranif's resides on disk.

Offline eranif

  • Regular
  • ***
  • Posts: 254
Re: Using CTags as main parser
« Reply #10 on: January 09, 2006, 12:05:49 am »
SQLite can be in-memory too (just set the name of the db in your code to be ':memory:') , however SQLite has some other advantages when used on disk, let say for code completion:

A user wrote 'My' and then pressed a ctrl+space (shorcut for autocompletion) - In my case the solution is easy: select from class_table where name like '%My%'; and thats it! (of course I do the same select on the other tables such as function_table, member_table, prototype_table, concatenates the results and then open the autocompletion box).

Btw, sadly wxScintilla did not implement AutoComplete box yet. (from the screenshots I saw that you guys created one of your own though)

In addition - when u use disk image - when loading the project again - you dont need to parse the files again, they are already parsed!.

More, You can parse third party libraries headers without even including them into your project - just parse them, create a database and the n use it (this approach can also be used to parse the gcc include files and thus creating codecompletion for the C++ language itself and also distibute them with C::B package)

Speed is not an issue, I tested the parser for large projects the speed of selecting from SQLite is very fast (you will not notice that it is on disk)

@Michael:

I just read your other article on Mini C++ parser. You are also mentionaning the Navigation drop down lists, using the same method of SQLite, I achieved that as well (using SCN_UPDATEUI event, I update the drop down lists according to the SQLite db, since the line numbers and file names are kept in the db, the get the relevant info I simply do the following select call:
select scope, name from function_table  where line >= <current line number goes here> ORDER BY LINE - this will result with the requested info.

Here is a screenshot of my CLOSED editor (THIS IS NOT A COMMERCIAL, THE PROJECT IS CLOSED) - and see what I achieve with SQLite as DB:


Eran

« Last Edit: January 09, 2006, 12:21:24 am by eranif »

Offline takeshi miya

  • Lives here!
  • ****
  • Posts: 1487
Re: Using CTags as main parser
« Reply #11 on: January 09, 2006, 03:19:54 am »
Looks great! :D

I actually think the best solution is to have in C::B, 3 completely different parsers (for some time).

-The current hand-made C++ parser.
-The CTags parser (for all languages supported).
-The ANTLR parser (for all languages supported).

I've talked with the CodeStore author (which uses ANTLR), and it's almost everything we need for C++.

So, actually I would like to see which one is easier to implement (and handles better C++).

Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4291
    • Code::Blocks IDE
Re: Using CTags as main parser
« Reply #12 on: January 09, 2006, 08:41:35 am »
Btw, sadly wxScintilla did not implement AutoComplete box yet. (from the screenshots I saw that you guys created one of your own though)

The AutoCompleteBox is implemented. I didn't use it for other reasons.
Btw, could you send me a mail with the code to look around and run some tests? :lol:
Be patient!
This bug will be fixed soon...

Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Using CTags as main parser
« Reply #13 on: January 09, 2006, 12:07:16 pm »
I actually think the best solution is to have in C::B, 3 completely different parsers (for some time).

-The current hand-made C++ parser.
-The CTags parser (for all languages supported).
-The ANTLR parser (for all languages supported).

I think that your suggested solution is good :D. I do not know exactly the roadmap for C::B, but IMHO it would be worth before the release of C::B 1.0 to implement and test the current hand-made C++ parser, CTags parser and ANTLR parser. In this way it would be possible to decide which parser (or parsers :)) best fit C::B needs.

@rickg22: If I remember your parser is table-driven based, right? IMO your rewrite is worth of consideration and implementation :D.

@eranif: Thank you very much for all the explanations. XStudio looks really good :).

Michael

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #14 on: January 09, 2006, 11:55:20 pm »
@rickg22: If I remember your parser is table-driven based, right? IMO your rewrite is worth of consideration and implementation :D.

That's when I get to *START* the rewrite. My current changes (let's call them "phase 1") are optimization ones - got rid of that stupid 3 second delay when reparsing files, minimized the "updating class browser" delay down to 0.5 seconds (on my machine), and the one i'm currently working on is the parser's FindMatches. Unfortunately, in this one, my program segfaults, it seems that the tree structure is corrupted :( I may need 3 or 4 days to catch the bug :(

Then goes phase 2: Getting rid of the "updating class browser" delay at all, and finally i'll start with the rewrite. Won't take long, I hope. It's just matter of converting the parserthread functions to states, and voila :)