Author Topic: Using CTags as main parser  (Read 56023 times)

Offline eranif

  • Regular
  • ***
  • Posts: 256
Using CTags as main parser
« on: January 06, 2006, 02:36:42 pm »
Hi,

I was browsing your forums when I saw that you are not using CTags for tagging your files, but rather wrote your in-house parser.

My question is why not using Ctags? It supports over 30 languages, parses hunders of files in seconds ... and it is very reliable.

For an IDE I wrote, I implemented a mechanism that works as follows:

- A thread is created in the main frame - the ClassViewThread
- Every interval the thread is scanning all modified files - the results are stored in an SQLite database.
- If the GUI tree is not updated with the new parsing - an event is sent to the ClassView object to update its data (the event contains the differeneces between the old files data and the new data)

What do you think? I will be happy to contribute the code if you want

Eran



Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4315
    • Code::Blocks IDE
Re: Using CTags as main parser
« Reply #1 on: January 06, 2006, 03:03:54 pm »
Eran,

the in-house parser was written for things that ctags cannot help you with (like parsing local function variables, function arguments, etc).
Although, lately I cannot work on it because the amount of work needed for the rest of the project is too much to leave me spare time for the parser...

So, if you want to contribute a code-completion plugin that would use ctags, please do :)
Many people will be happy.
This should help you get started: http://wiki.codeblocks.org/index.php?title=Creating_a_simple_%22Hello_World%22_plugin
Be patient!
This bug will be fixed soon...

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: Using CTags as main parser
« Reply #2 on: January 06, 2006, 03:10:10 pm »
I know CTags does not handle local vars, However, the code I said I will be happy to contribute uses also a lexer + written with flex language to analyze a current scope (small scope, the main parsing is done via CTags)

Anyway, I will have a look at the link you gave

Eran

Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4315
    • Code::Blocks IDE
Re: Using CTags as main parser
« Reply #3 on: January 06, 2006, 04:54:32 pm »
I know CTags does not handle local vars, However, the code I said I will be happy to contribute uses also a lexer + written with flex language to analyze a current scope (small scope, the main parsing is done via CTags)

That's interesting :)
Be patient!
This bug will be fixed soon...

Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Using CTags as main parser
« Reply #4 on: January 06, 2006, 06:33:44 pm »
- Every interval the thread is scanning all modified files - the results are stored in an SQLite database.

I find the use of a database an interesting choice to store data :). SQLite seems to me enough light-weight and cross-platform to be used within C::B. Anyway, if C::B would integrate SQLite (or any other embedded database), this would make C::B dependent of SQLite, which is not IMHO forcely positive. Burden could be added with UNICODE, Windows, Linux and other OSs support,...

Michael

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #5 on: January 06, 2006, 06:37:39 pm »
And don't forget that I'm STILL working on the codecompletion plugin!

Right now i'm optimizing the class browser, when i have that ready, i'll commit and start working on the parser.
Edit: But it's OK if you want to start working on your parser :) I don't want to hinder anyone.
« Last Edit: January 06, 2006, 06:39:15 pm by rickg22 »

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: Using CTags as main parser
« Reply #6 on: January 08, 2006, 09:47:26 pm »
My parser is already completed - I have it ready and tested  :D

I have the code written a year ago for an IDE of my (which I abanddoned).

ctags:

The good thing about using CTags - is the speed it can parse files, (it can parse hundreds of files in less then seconds), the additional flex based parser that I added was cause CTags cant parse local variables, So I used regular expressions + flex synatx to do that - works very good (it can handle template as well)

SQLite:

The SQLite DB is very portable and I would recommend it to anyone who wants to use a fast embedded database.
The idea is to parse the whole workspace when you start running (or use the db from previous runnings), and then let a thread parse the changes as they happen in the certain interval. I think that this is the best way of doing it - use cache (db) + runtime

Using db can be an advantage for other tasks as well:

- For example, Find a symbol in workspace - simply scan the db and offers the user the results, since it uses SQL, it is very straight forward.
- Code completion can be simply taken from the db ...
and other options as well.

I think that writting C++ parser from scratch is not very simple task, dont mention the bugs that it will contains ...
CTags is very mature tool that can help you do the parsing better - so I recommend ctags.

Eran


Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Using CTags as main parser
« Reply #7 on: January 08, 2006, 10:40:43 pm »
The SQLite DB is very portable and I would recommend it to anyone who wants to use a fast embedded database.

Interesting to know. If it is so good it is worth an implementation-try in my project :).

I think that writting C++ parser from scratch is not very simple task, dont mention the bugs that it will contains ...
CTags is very mature tool that can help you do the parsing better - so I recommend ctags.

Implementing a parser from scratch is as you correctly said not a very simple task. Anyway, the complexity of the parser depend on your application's requirements. The implementation of a recursive-descent parser is "relatively" easy :). Naturally, it is not so fast as table-driven parser, but this is may be not so critical for you application. Sometime ago, I have posted about the C++ parser of a Mini-C++. See this topic for further information.

Michael

PS.: Is you parser open source and freely available for testing?

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #8 on: January 08, 2006, 11:16:14 pm »
AH, so it's a recursive descent one! That means I still have hope with my rewrite :P

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #9 on: January 08, 2006, 11:36:02 pm »
The SQLite DB is very portable and I would recommend it to anyone who wants to use a fast embedded database.

Interesting to know. If it is so good it is worth an implementation-try in my project :).

That's interesting too, because in my version of the CodeCompletion plugin, which ironically, is incomplete :P, I use a trie to search for the tags. Eranif's implementation uses a database with indexes, which are themselves search trees.
I guess the main difference is that my index resides in memory, while eranif's resides on disk.

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: Using CTags as main parser
« Reply #10 on: January 09, 2006, 12:05:49 am »
SQLite can be in-memory too (just set the name of the db in your code to be ':memory:') , however SQLite has some other advantages when used on disk, let say for code completion:

A user wrote 'My' and then pressed a ctrl+space (shorcut for autocompletion) - In my case the solution is easy: select from class_table where name like '%My%'; and thats it! (of course I do the same select on the other tables such as function_table, member_table, prototype_table, concatenates the results and then open the autocompletion box).

Btw, sadly wxScintilla did not implement AutoComplete box yet. (from the screenshots I saw that you guys created one of your own though)

In addition - when u use disk image - when loading the project again - you dont need to parse the files again, they are already parsed!.

More, You can parse third party libraries headers without even including them into your project - just parse them, create a database and the n use it (this approach can also be used to parse the gcc include files and thus creating codecompletion for the C++ language itself and also distibute them with C::B package)

Speed is not an issue, I tested the parser for large projects the speed of selecting from SQLite is very fast (you will not notice that it is on disk)

@Michael:

I just read your other article on Mini C++ parser. You are also mentionaning the Navigation drop down lists, using the same method of SQLite, I achieved that as well (using SCN_UPDATEUI event, I update the drop down lists according to the SQLite db, since the line numbers and file names are kept in the db, the get the relevant info I simply do the following select call:
select scope, name from function_table  where line >= <current line number goes here> ORDER BY LINE - this will result with the requested info.

Here is a screenshot of my CLOSED editor (THIS IS NOT A COMMERCIAL, THE PROJECT IS CLOSED) - and see what I achieve with SQLite as DB:


Eran

« Last Edit: January 09, 2006, 12:21:24 am by eranif »

takeshimiya

  • Guest
Re: Using CTags as main parser
« Reply #11 on: January 09, 2006, 03:19:54 am »
Looks great! :D

I actually think the best solution is to have in C::B, 3 completely different parsers (for some time).

-The current hand-made C++ parser.
-The CTags parser (for all languages supported).
-The ANTLR parser (for all languages supported).

I've talked with the CodeStore author (which uses ANTLR), and it's almost everything we need for C++.

So, actually I would like to see which one is easier to implement (and handles better C++).

Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4315
    • Code::Blocks IDE
Re: Using CTags as main parser
« Reply #12 on: January 09, 2006, 08:41:35 am »
Btw, sadly wxScintilla did not implement AutoComplete box yet. (from the screenshots I saw that you guys created one of your own though)

The AutoCompleteBox is implemented. I didn't use it for other reasons.
Btw, could you send me a mail with the code to look around and run some tests? :lol:
Be patient!
This bug will be fixed soon...

Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Using CTags as main parser
« Reply #13 on: January 09, 2006, 12:07:16 pm »
I actually think the best solution is to have in C::B, 3 completely different parsers (for some time).

-The current hand-made C++ parser.
-The CTags parser (for all languages supported).
-The ANTLR parser (for all languages supported).

I think that your suggested solution is good :D. I do not know exactly the roadmap for C::B, but IMHO it would be worth before the release of C::B 1.0 to implement and test the current hand-made C++ parser, CTags parser and ANTLR parser. In this way it would be possible to decide which parser (or parsers :)) best fit C::B needs.

@rickg22: If I remember your parser is table-driven based, right? IMO your rewrite is worth of consideration and implementation :D.

@eranif: Thank you very much for all the explanations. XStudio looks really good :).

Michael

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #14 on: January 09, 2006, 11:55:20 pm »
@rickg22: If I remember your parser is table-driven based, right? IMO your rewrite is worth of consideration and implementation :D.

That's when I get to *START* the rewrite. My current changes (let's call them "phase 1") are optimization ones - got rid of that stupid 3 second delay when reparsing files, minimized the "updating class browser" delay down to 0.5 seconds (on my machine), and the one i'm currently working on is the parser's FindMatches. Unfortunately, in this one, my program segfaults, it seems that the tree structure is corrupted :( I may need 3 or 4 days to catch the bug :(

Then goes phase 2: Getting rid of the "updating class browser" delay at all, and finally i'll start with the rewrite. Won't take long, I hope. It's just matter of converting the parserthread functions to states, and voila :)

Offline killerbot

  • Administrator
  • Lives here!
  • *****
  • Posts: 5491
Re: Using CTags as main parser
« Reply #15 on: January 10, 2006, 12:00:41 am »
any commits after phase 1 , phase 2 , ... ?
delay from 3 sec to 0.5 sec --> delicious  :lol:

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #16 on: January 10, 2006, 12:02:44 am »
I'll commit when I get this bug fixed.
Oops, sorry for "hijacking" the thread. Eranif, you may continue now :)

Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Using CTags as main parser
« Reply #17 on: January 11, 2006, 10:56:49 am »
That's when I get to *START* the rewrite. My current changes (let's call them "phase 1") are optimization ones - got rid of that stupid 3 second delay when reparsing files, minimized the "updating class browser" delay down to 0.5 seconds (on my machine), and the one i'm currently working on is the parser's FindMatches. Unfortunately, in this one, my program segfaults, it seems that the tree structure is corrupted :( I may need 3 or 4 days to catch the bug :(

May be you can try to store your search trees into the SQLite db. This could help to prevent corruption. You can check wxSQLite if it would fulfill your needs.

Michael

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Using CTags as main parser
« Reply #18 on: January 11, 2006, 05:39:18 pm »
That's not necessary now, i got it fixed and commited (rev 1711)

Offline Michael

  • Lives here!
  • ****
  • Posts: 1608
Re: Using CTags as main parser
« Reply #19 on: January 11, 2006, 05:56:49 pm »
That's not necessary now, i got it fixed and commited (rev 1711)

Good and fast work :D.

Michael

Offline Game_Ender

  • Lives here!
  • ****
  • Posts: 551
Re: Using CTags as main parser
« Reply #20 on: January 14, 2006, 05:29:52 am »
Any word on the integration of eranif's CodeCompletion implementation?  It looks very exciting, I too had looked a CTags long ago when the discussion first began.  Only eranif did the hard part and actually implemented something with it.  It is also important to note the benifit of having an on disk SQLite database is the fact that there is no "Saving or Loading Cache", in fact there would be no cache at all.  I also believe that SQLite supports Atomic commits so crashing of the application will not corrupt the database, at least in theory.
« Last Edit: January 18, 2006, 10:53:06 pm by Game_Ender »

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: Using CTags as main parser
« Reply #21 on: January 17, 2006, 02:29:26 pm »
Hi,

Sorry I didnt have time to work on it yet. later today I will start decoupling the parser code from my old project to make it a standalone component, should not take long maybe a day or two.

Eran

takeshimiya

  • Guest
Re: Using CTags as main parser
« Reply #22 on: January 17, 2006, 05:45:49 pm »
Hi,

Sorry I didnt have time to work on it yet. later today I will start decoupling the parser code from my old project to make it a standalone component, should not take long maybe a day or two.

Eran


:D

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: Using CTags as main parser
« Reply #23 on: January 18, 2006, 11:27:56 pm »
Hi,

I started working on decoupling the code parser from my old project, well it seems like alot of work  :( ..

So, What I have come out so far is:

I created a small app that does the following:

- Create an empty db with all indexes defined
- Parse sample input files (in batch mode) and populate the db

In the main frame there is a function that called: InitDb()
this is the function the one that should intrest you (all the other files, can be ignored for the moment they are used internally or I will use them later when I will update the demo)

In this file what I do is:
- Open the sample input files and loads their data
- Create language parser (e.g. CPP) - this can easily be extended to other languages as well
- Create the ctags interface
- Execute
- Store the results into SQLite

The db class is called
- USDb

The DB is consistent with 4 tables:

- member_table
- function_table
- class_table
- prototype_table (this is special case of prototype function)

to prevent duplicates, there are indexes on the tables - so duplicate entries are not allowed (but of course, same name different prototype is allowed ...)

This example shows how you can use ctags to create a db with all workspace information saved.

Things it can do but I did not had time to present how to do them (but i will):

- It does not show how to get localmember type (qualifier).
- It does not show how to get the current function class name from cursor location (like the two combo boxes in VC71 located on top of the file)
- It does not suggest members/functions list
- It does not show how to parse local scope (ctags cant parse local variables, this is where my internal parser come to help - lex.yy.cpp & USCPPScanner )

Hopefully tomorrow or during the weekend (the weekend starts here tomorrow :)) I will update the sample with more functionality so you will get the impression of the idea.

To run the sample, place the exe "ctags.exe" udner a valid path - it can be found under "SQLite_db/bin/ctags.exe"

In addition, in order to view the results, I am also providing SQLite browser which I found on SF. it is called "SQLite Database Browser" and can be found under "SQLite_db/bin/" or you can use the simple command line "sqlite.exe"

The database name is "./internals/ctags.db" simply open it with the browser and have a look at the results of the parsing.
The parsing input files (it is actually the Tinyxml code) can be found under "Samples/"

The project can be found here:
http://www.eistware.com/wx/ctags/ctags.zip

Let me know if you had problems or questions,
Eran



Offline Game_Ender

  • Lives here!
  • ****
  • Posts: 551
Re: Using CTags as main parser
« Reply #24 on: January 19, 2006, 04:51:39 pm »
Awsome keep up the good work.  There is nothing holding back linux integration is there, I assume you just chose to do a windows example for easy of use.

On another note SQLite databases allow access by multiple processes that can read from the database at once, and locks the entire database during writing.  This would make it pretty easy to thread this sort of CodeCompletion.  You would only need to have a thread safe queue of files for the parser worker thread.  The worker thread just sits parsing all the files in the queue, hopefully doing batch writes and not a bunch of small ones to the database.  Then the main thread can query the database whenever it needs needed without fear of any errors occurring.

Offline eranif

  • Regular
  • ***
  • Posts: 256
Re: Using CTags as main parser
« Reply #25 on: January 20, 2006, 12:23:27 am »
For keeping the GUI tree up-to-date, I used the following logic:

- The main thread is creating a parser-thread that waits and wakes up every interval
- Once a new workspace is opened, a parsing of all the workspace files is done by the main thread - like in the example
- The editor, (I dont know c::b much, but i guess it uses scintilla) whenever the text is modified (can be caught in SCN_MOFIIED) - puts the text of the current modified file in map which is a pair of <project/fileName> & data, (e.g. let say you have project name prj1 and the file name is main.cpp, the key for the map is prj1/main.cpp and the value in the map is the file data)
- The parser thread wakes up, locks the map - and copy the content of the map to a local map - so the lock is free ASAP
- It then reads from the db all the entries belongs to project + file name 'select * from table where project='prj1' and file='main.cpp')
- It calls ctags to parse the modified file data retrieved from the map
- It preforms 'diff' on the two sets of results: the modified file results returned from ctags vs. the values from the db (sqlite)
- An array of all modified entries is created and sent by the parser thread to the class-view GUI object (wxTreeCtrl) using wxPostEvent() call

In this way the on-going working is not interrupted by the parsing process, the db is always up-to date and consistent.

Btw, the current parsing can handle the following:
Typedefs, enums, namespace, class (including templates), vars, structs, unions, defines and functions

I did it in windows since i work with VC7.1 - I dont think that there is a problem to port it to linux (I am not using MFC, WINAPI or anything similar that can limit the code) - I simply like VC71 so I work with it.

Eran


afterain

  • Guest
Re: Using CTags as main parser
« Reply #26 on: April 07, 2006, 07:13:09 am »
- It does not show how to get the current function class name from cursor location (like the two combo boxes in VC71 located on top of the file)
I think maybe we can generate a tags file for current file. And we know line of cursor. then we find nearest function's start line. if line of cursor between function's start line and function's end line, we find it.
- It does not suggest members/functions list
I get readtags.c and readtags.h from ctags, readtags is very fast -- "Even for an unsorted 24MB tag file, tag searches take about one second."
It can over tags file and can get all informations(field/class/line etc.) of every entry. If we want to list wxWindow's  members/functions, we can do it by "class" field. If "class" field is same with "wxWindow", then this member/function can add to list.
- It does not show how to parse local scope (ctags cant parse local variables, this is where my internal parser come to help - lex.yy.cpp & USCPPScanner )
Ctags can parse local variables. like this "ctags --C++-kinds=+l --C-kinds=+l -R". You can use "ctags --list-kinds", it will list all tag kinds for all language. field "l" mean local variable.
Example:
test.cpp
int main()
{
   int i;
   wxWindow a;
   a.test(i);
}

void ttse()
{
   bool b;
   map<int, int> t
ctags:
a   .\test.cpp   /^   wxWindow a;$/;"   l
b   .\test.cpp   /^   bool b;$/;"   l
i   .\test.cpp   /^   int i;$/;"   l
main   .\test.cpp   /^int main()$/;"   f
t   .\test.cpp   /^   map<int, int> t;$/;"   l
ttse   .\test.cpp   /^void ttse()$/;"   f

« Last Edit: April 07, 2006, 07:16:22 am by afterain »