Code::Blocks Forums

Developer forums (C::B DEVELOPMENT STRICTLY!) => Development => CodeCompletion redesign => Topic started by: rickg22 on December 19, 2005, 06:55:56 am

Title: Progress with SearchTree class
Post by: rickg22 on December 19, 2005, 06:55:56 am
hi guys, just want to show some of my current progress.

The Tree class can already add items, and serialize its structure into XML format. It can also dump the node structure in a nice graphical way. Look!

Code
    mytree->AddItem(_T("physics"),_T("1 - uno"));
    mytree->AddItem(_T("physiology"),_T("2 - dos"));
    mytree->AddItem(_T("psychic"),_T("3 - tres"));

results in:

Code
- "" (0)
      \- "p" (4)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sychic" (5)

The numbers between parenthesis are the node id's.
Unfortunately, adding another item screws up the tree, so I'm still in the process of debugging. But at least I have something now :)

I also have to thank Thomas for his valuable advice on using the STL classes like vector and map. They saved a lot of work!
Title: Re: Progress with SearchTree class
Post by: rickg22 on December 20, 2005, 08:41:46 am
Good news people! It seems I have fixed all bugs in the tree's adding of nodes.
Code
Adding word: physics
1 items in the tree
- "" (0)
      \- "physics" (1)
---
Adding word: physiology
2 items in the tree
- "" (0)
      \- "physi" (2)
                  +- "cs" (1)
                  \- "ology" (3)
---
Adding word: psychics
3 items in the tree
- "" (0)
      \- "p" (4)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sychics" (5)
---
Adding word: psychotic
4 items in the tree
- "" (0)
      \- "p" (4)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "ics" (5)
                         \- "otic" (7)
---
Adding word: paranormal
5 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "ics" (5)
                         \- "otic" (7)
---
Adding word: psychiatrist
6 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrist" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: psychiatrists
7 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: psychic
8 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: psychi
9 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: ghosts
10 items in the tree
- "" (0)
      +- "ghosts" (11)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: paranormals
11 items in the tree
- "" (0)
      +- "ghosts" (11)
      \- "p" (4)
              +- "aranormals" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---

I haven't finished testing the class, but just in case, I'm sending a backup of the tree class to the /downloads directory. Its name is searchtree.zip.
Title: Re: Progress with SearchTree class
Post by: takeshimiya on December 20, 2005, 08:47:37 am
I can guess this class is for the CodeCompletion plugin?
Title: Re: Progress with SearchTree class
Post by: Urxae on December 20, 2005, 09:29:39 am
Good news people! It seems I have fixed all bugs in the tree's adding of nodes.

Bad news, I think I found a bug just from those diagrams :(.

Look at the cases where a word is added that starts with another word (psychiatrist/psychiatrists, paranormal/paranormals).

Code
Adding word: psychiatrist
6 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrist" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: psychiatrists
7 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)

Since the node numbers don't change I presume this is an accurate representation of the internals of the data structure, so where did the node for "psychiatrist" go? Shouldn't there be an extra ending node there?
Title: Re: Progress with SearchTree class
Post by: rickg22 on December 20, 2005, 04:55:48 pm
Yes, it's by design. Edges (words) are only split when needed, to. Notice that there is no node with only one child. This is what allows the path compression and saves space in memory :)

Each node has a list of the items and their corresponding depths. In this case, "psychiatrists" has an item with depth 12 for "psychiatrist", and another with depth 13 for "psychiatrists". When a node is split, the items are recalculated so they belong to the right node (at least in theory, i haven't checked if that one works :P )

The next step is implementing the part that recalls the items from the tree. Eeach item has a value, so it's an associative array with strings for keys. When I implement that, my tree class will be complete, and i'll begin doing the adaptation to hold the class completion tokens :)
Title: Re: Progress with SearchTree class
Post by: Urxae on December 20, 2005, 05:26:47 pm
Yes, it's by design. Edges (words) are only split when needed, to. Notice that there is no node with only one child. This is what allows the path compression and saves space in memory :)

Each node has a list of the items and their corresponding depths. In this case, "psychiatrists" has an item with depth 12 for "psychiatrist", and another with depth 13 for "psychiatrists". When a node is split, the items are recalculated so they belong to the right node (at least in theory, i haven't checked if that one works :P )

Ah ok then, but that wasn't shown in the tree. Maybe you could append something like "[12, 13]" to the tree output for those nodes to show that?
Well it doesn't really matter as it's debug output, but that way you could check if it works at least.
Title: Re: Progress with SearchTree class
Post by: rickg22 on December 20, 2005, 06:47:28 pm
I'll debug in an easier way: associating the strings with themselves. If a string doesn't match, then something is wrong. :)
Title: Re: Progress with SearchTree class
Post by: anonuser on December 20, 2005, 07:28:21 pm
Neat searching. Have you tested searching?
Title: Re: Progress with SearchTree class
Post by: rickg22 on December 21, 2005, 03:15:04 am
Adding a node implies searching for it, so yes, it's definitely tested ;-)
In fact, I realized some of the inserts weren't working right because there was a fault in the search.
Title: Re: Progress with SearchTree class
Post by: anonuser on December 21, 2005, 04:36:03 am
Very neat. This is for the code completion right?
Title: Re: Progress with SearchTree class
Post by: Game_Ender on December 21, 2005, 04:37:53 am
So this is a kind of fast, searchable Abstract Syntax Tree (AST)?  Do you have your parser generate an AST and then build a SearchTree instance in order to the code completion from?
Title: Re: Progress with SearchTree class
Post by: rickg22 on December 21, 2005, 04:46:36 am
No, it's not for an abstract syntax tree ( http://en.wikipedia.org/wiki/Abstract_syntax_tree ), it will only hold the token names. Actually, I just wanted to optimize the addition of tokens :lol:

But this tree will hopefully be used in code completion, to search for all the tokens with a certain prefix.

Title: Re: Progress with SearchTree class
Post by: rickg22 on December 21, 2005, 06:58:00 am
Here's the results, part deux ;-)

I added some items, and then queried the contents of the tree.

The following " => " sequences are the results of the queries.
Code
'physics' => ' 1 - uno'
'physiology' => ' 2 - dos'
'psychics' => ' 3 - tres'
'psychotic' => ' 4 - cuatro'
'paranormal' => ' 5 - cinco'
'psychiatrist' => ' 6 - seis'
'psychiatrists' => ' 7 - siete'
'psychi' => ' 8 - ocho'
'ghosts' => ' 9 - nueve'
'paranormals' => '10 - diez'
'blah' => ''
Adding item: blah => bleh
'blah' => 'bleh'
'' => 'this is the root node :P' (I did this to test if i
could add a value to the empty string)
'psychic' => '' (Notice that here even when 'psychic'
is contained in the tree, it doesn't have a corresponding  item)

Clearing tree...

'' => 'this is the root node :)'
(i assigned this after clearing the tree)
'physics' => ''
'physiology' => ''
'psychics' => ''
'psychotic' => ''
'paranormal' => ''
'psychiatrist' => ''
'psychiatrists' => ''
'psychi' => ''
'ghosts' => ''
'paranormals' => ''
'blah' => ''
1 items in the tree
- "" (0)
---

It seems to work, alright. I'll upload the zip now. I think I'm ready now to start working on the codecompletion improvements :)
Title: Re: Progress with SearchTree class
Post by: TDragon on December 21, 2005, 04:42:04 pm
In case anyone's interested, the name for this type of structure is a "trie" (an odd abbreviation of "retrieval"). I had built one a few days ago for this same purpose, though without serialization functionality. Hmm...maybe I'll templatize it...

If I'm seeming to hijack all the threads in this forum, it's just your imagination.
Title: Re: Progress with SearchTree class
Post by: rickg22 on December 24, 2005, 07:17:44 am
Good news everybody! I'm testing Code::Blocks with the NEW Search Tree! :)

In my dad's machine (I'm on vacations), parsing time (for codeblocks-newbuild) goes down from 4.75 to 1.895 seconds, that's 2.5 times faster! :D

So far, everything seems stable, but I'm having a little problem. Whether I use the new class or not, enabling "follow global includes" seems to shut down Codeblocks in the middle of the parsing. Has it happened to any of you? (Windows, ANSI)
Title: Re: Progress with SearchTree class
Post by: Ceniza on December 24, 2005, 08:06:07 am
Hmmm, I just tried enabling "follow global includes" to parse the NewBuild project file and it didn't shut down.

Times look good though. I'll be waiting for that baby in SVN :)

But wait! I disabled "follow global includes", it re-paresed with no problem, re-enabled that option, started re-parsing but Code::Blocks is now sucking 50% CPU (100% if it hadn't HT enabled) and is just stuck there. Seems like a bug somewhere...
Title: Re: Progress with SearchTree class
Post by: Der Meister on December 24, 2005, 12:10:09 pm
Looks really nice, great work  :)
But you should also add it to the linux make system. Without it the compilation process doesn't abort (I wonder why this is the case) but Code::Blocks failes to load the plugin. After adding the two new source-files to the build system and re-compiling the plugin it seems to work fine. (Yet, I didn't check if the problem with "follow global includes" occures here, too.)
Thus, you should apply this little patch:
Code
Index: src/plugins/codecompletion/Makefile.am
===================================================================
--- src/plugins/codecompletion/Makefile.am      (revision 1584)
+++ src/plugins/codecompletion/Makefile.am      (working copy)
@@ -24,7 +24,8 @@
                        parser/parser.cpp \
                        parser/parserthread.cpp \
                        parser/token.cpp \
-                       parser/tokenizer.cpp
+                       parser/tokenizer.cpp \
+                       parser/searchtree.cpp
 
 noinst_HEADERS = cclist.h \
                        cclistctrl.h \
@@ -37,4 +38,5 @@
                        parser/parser.h \
                        parser/parserthread.h \
                        parser/token.h \
-                       parser/tokenizer.h
+                       parser/tokenizer.h \
+                       parser/searchtree.h
Title: Re: Progress with SearchTree class
Post by: rickg22 on December 24, 2005, 02:25:15 pm
Ah! Thanks for that one, I didn't know you had to do this, too.
Title: Profiling info...
Post by: rickg22 on December 24, 2005, 07:26:46 pm
Here's some profiling on the codecompletion plugin. Results are interesting...


Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name   
 15.79      0.12     0.12 32143644     0.00     0.00  wxStringBase::at(unsigned int) const
 13.16      0.22     0.10 32143644     0.00     0.00  wxString::GetChar(unsigned int) const
  7.89      0.28     0.06  7299941     0.00     0.00  Tokenizer::IsEOF() const
  6.58      0.33     0.05    66867     0.00     0.00  Tokenizer::CompactSpaces(wxString&) const
  5.26      0.37     0.04 25016299     0.00     0.00  Tokenizer::CurrentChar() const
  5.26      0.41     0.04    69053     0.00     0.00  Tokenizer::SkipBlock(char const&)
  3.95      0.44     0.03  6016913     0.00     0.00  Tokenizer::MoveToNextChar(unsigned int)
  2.63      0.46     0.02  1454827     0.00     0.00  wxString::append(unsigned int, char)
  2.63      0.48     0.02  1060893     0.00     0.00  wxStringBase::empty() const
  2.63      0.50     0.02   818659     0.00     0.00  wxString::operator char const*() const
  2.63      0.52     0.02   444750     0.00     0.00  Tokenizer::DoGetToken()
  2.63      0.54     0.02   108887     0.00     0.00  SearchTreeNode::GetDeepestMatchingPosition(BasicSearchTree*, wxString const&, unsigned int)
  1.32      0.55     0.01  6489068     0.00     0.00  wxStringBase::size() const
  1.32      0.56     0.01  2880137     0.00     0.00  wxStringBase::at(unsigned int)
  1.32      0.57     0.01  2880137     0.00     0.00  wxString::operator[](unsigned int)
  1.32      0.58     0.01  2809196     0.00     0.00  Tokenizer::NotEOF() const
  1.32      0.59     0.01  1520388     0.00     0.00  wxString::~wxString()
  1.32      0.60     0.01  1454827     0.00     0.00  wxString::operator<<(char)
  1.32      0.61     0.01   687707     0.00     0.00  wxBaseArrayPtrVoid::operator[](unsigned int) const
  1.32      0.62     0.01   669131     0.00     0.00  TokensArray::operator[](unsigned int) const
  1.32      0.63     0.01   460889     0.00     0.00  wxStringData::Lock()
  1.32      0.64     0.01   423075     0.00     0.00  std::_Rb_tree<char, std::pair<char const, unsigned int>, std::_Select1st<std::pair<char const, unsigned int> >, std::less<char>, std::allocator<std::pair<char const, unsigned int> > >::_S_key(std::_Rb_tree_node<std::pair<char const, unsigned int> > const*)
  1.32      0.65     0.01   368749     0.00     0.00  __gnu_cxx::__normal_iterator<SearchTreeNode**, std::vector<SearchTreeNode*, std::allocator<SearchTreeNode*> > >::operator+(int const&) const
  1.32      0.66     0.01   245572     0.00     0.00  std::_Rb_tree<char, std::pair<char const, unsigned int>, std::_Select1st<std::pair<char const, unsigned int> >, std::less<char>, std::allocator<std::pair<char const, unsigned int> > >::end()
  1.32      0.67     0.01   122763     0.00     0.00  std::vector<SearchTreeNode*, std::allocator<SearchTreeNode*> >::size() const
  1.32      0.68     0.01   116830     0.00     0.00  BasicSearchTree::GetNode(unsigned int, bool)
  1.32      0.69     0.01    97276     0.00     0.00  operator==(wxString const&, char const*)
  1.32      0.70     0.01    76280     0.00     0.00  wxString::append(wxString const&)
  1.32      0.71     0.01    25307     0.00     0.00  BasicSearchTree::FindNode(wxString const&, unsigned int, SearchTreePoint*)
  1.32      0.72     0.01    25185     0.00     0.00  SearchTree<TokensArray>::GetItemAtPos(unsigned int)
  1.32      0.73     0.01    14523     0.00     0.00  std::_Rb_tree<unsigned int, std::pair<unsigned int const, unsigned int>, std::_Select1st<std::pair<unsigned int const, unsigned int> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::_M_get_node()
  1.32      0.74     0.01    10563     0.00     0.00  void std::_Construct<std::pair<char const, unsigned int>, std::pair<char const, unsigned int> >(std::pair<char const, unsigned int>*, std::pair<char const, unsigned int> const&)
  1.32      0.75     0.01       11     0.91     2.34  Parser::AddTreeNamespace(wxTreeCtrl&, wxTreeItemId const&, Token*)
  1.32      0.76     0.01        1    10.00    10.29  BasicSearchTree::~BasicSearchTree()
  0.00      0.76     0.00  8478747     0.00     0.00  wxStringBase::GetStringData() const


The rest of the functions and methods give zero time.
Title: Re: Progress with SearchTree class
Post by: 280Z28 on December 24, 2005, 09:00:50 pm
Hey rick rev1587 is not compiling. You have these two lines in searchtree.cpp (223, 232):

Code
            result << _T("    <item depth=\"") << u2s(item->first) << _T("\" itemid=\"" << u2s(item->second) <<  "\"") << _T(" />\n");

            result << _T("    <child char=\"") << SerializeString(string(link->first)) << _T("\" nodeid=\"" << u2s(link->second) <<  "\"") << _T(" />\n");

That I think should be this:

Code
            result << _T("    <item depth=\"") << u2s(item->first) << _T("\" itemid=\"") << u2s(item->second) <<  _T("\"") << _T(" />\n");

            result << _T("    <child char=\"") << SerializeString(string(link->first)) << _T("\" nodeid=\"") << u2s(link->second) <<  _T("\"") << _T(" />\n");
Title: Re: Progress with SearchTree class
Post by: rickg22 on December 25, 2005, 08:44:24 am
Ah, thanks! :) Fixed in rev 1588.