Author Topic: Progress with SearchTree class  (Read 40346 times)

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Progress with SearchTree class
« on: December 19, 2005, 06:55:56 am »
hi guys, just want to show some of my current progress.

The Tree class can already add items, and serialize its structure into XML format. It can also dump the node structure in a nice graphical way. Look!

Code
    mytree->AddItem(_T("physics"),_T("1 - uno"));
    mytree->AddItem(_T("physiology"),_T("2 - dos"));
    mytree->AddItem(_T("psychic"),_T("3 - tres"));

results in:

Code
- "" (0)
      \- "p" (4)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sychic" (5)

The numbers between parenthesis are the node id's.
Unfortunately, adding another item screws up the tree, so I'm still in the process of debugging. But at least I have something now :)

I also have to thank Thomas for his valuable advice on using the STL classes like vector and map. They saved a lot of work!

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Progress with SearchTree class
« Reply #1 on: December 20, 2005, 08:41:46 am »
Good news people! It seems I have fixed all bugs in the tree's adding of nodes.
Code
Adding word: physics
1 items in the tree
- "" (0)
      \- "physics" (1)
---
Adding word: physiology
2 items in the tree
- "" (0)
      \- "physi" (2)
                  +- "cs" (1)
                  \- "ology" (3)
---
Adding word: psychics
3 items in the tree
- "" (0)
      \- "p" (4)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sychics" (5)
---
Adding word: psychotic
4 items in the tree
- "" (0)
      \- "p" (4)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "ics" (5)
                         \- "otic" (7)
---
Adding word: paranormal
5 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "ics" (5)
                         \- "otic" (7)
---
Adding word: psychiatrist
6 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrist" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: psychiatrists
7 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: psychic
8 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: psychi
9 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: ghosts
10 items in the tree
- "" (0)
      +- "ghosts" (11)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: paranormals
11 items in the tree
- "" (0)
      +- "ghosts" (11)
      \- "p" (4)
              +- "aranormals" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---

I haven't finished testing the class, but just in case, I'm sending a backup of the tree class to the /downloads directory. Its name is searchtree.zip.

takeshimiya

  • Guest
Re: Progress with SearchTree class
« Reply #2 on: December 20, 2005, 08:47:37 am »
I can guess this class is for the CodeCompletion plugin?

Offline Urxae

  • Regular
  • ***
  • Posts: 376
Re: Progress with SearchTree class
« Reply #3 on: December 20, 2005, 09:29:39 am »
Good news people! It seems I have fixed all bugs in the tree's adding of nodes.

Bad news, I think I found a bug just from those diagrams :(.

Look at the cases where a word is added that starts with another word (psychiatrist/psychiatrists, paranormal/paranormals).

Code
Adding word: psychiatrist
6 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrist" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)
---
Adding word: psychiatrists
7 items in the tree
- "" (0)
      \- "p" (4)
              +- "aranormal" (8)
              +- "hysi" (2)
              |          +- "cs" (1)
              |          \- "ology" (3)
              \- "sych" (6)
                         +- "i" (9)
                         |       +- "atrists" (10)
                         |       \- "cs" (5)
                         \- "otic" (7)

Since the node numbers don't change I presume this is an accurate representation of the internals of the data structure, so where did the node for "psychiatrist" go? Shouldn't there be an extra ending node there?

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Progress with SearchTree class
« Reply #4 on: December 20, 2005, 04:55:48 pm »
Yes, it's by design. Edges (words) are only split when needed, to. Notice that there is no node with only one child. This is what allows the path compression and saves space in memory :)

Each node has a list of the items and their corresponding depths. In this case, "psychiatrists" has an item with depth 12 for "psychiatrist", and another with depth 13 for "psychiatrists". When a node is split, the items are recalculated so they belong to the right node (at least in theory, i haven't checked if that one works :P )

The next step is implementing the part that recalls the items from the tree. Eeach item has a value, so it's an associative array with strings for keys. When I implement that, my tree class will be complete, and i'll begin doing the adaptation to hold the class completion tokens :)
« Last Edit: December 20, 2005, 05:02:13 pm by rickg22 »

Offline Urxae

  • Regular
  • ***
  • Posts: 376
Re: Progress with SearchTree class
« Reply #5 on: December 20, 2005, 05:26:47 pm »
Yes, it's by design. Edges (words) are only split when needed, to. Notice that there is no node with only one child. This is what allows the path compression and saves space in memory :)

Each node has a list of the items and their corresponding depths. In this case, "psychiatrists" has an item with depth 12 for "psychiatrist", and another with depth 13 for "psychiatrists". When a node is split, the items are recalculated so they belong to the right node (at least in theory, i haven't checked if that one works :P )

Ah ok then, but that wasn't shown in the tree. Maybe you could append something like "[12, 13]" to the tree output for those nodes to show that?
Well it doesn't really matter as it's debug output, but that way you could check if it works at least.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Progress with SearchTree class
« Reply #6 on: December 20, 2005, 06:47:28 pm »
I'll debug in an easier way: associating the strings with themselves. If a string doesn't match, then something is wrong. :)

anonuser

  • Guest
Re: Progress with SearchTree class
« Reply #7 on: December 20, 2005, 07:28:21 pm »
Neat searching. Have you tested searching?

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Progress with SearchTree class
« Reply #8 on: December 21, 2005, 03:15:04 am »
Adding a node implies searching for it, so yes, it's definitely tested ;-)
In fact, I realized some of the inserts weren't working right because there was a fault in the search.

anonuser

  • Guest
Re: Progress with SearchTree class
« Reply #9 on: December 21, 2005, 04:36:03 am »
Very neat. This is for the code completion right?

Offline Game_Ender

  • Lives here!
  • ****
  • Posts: 551
Re: Progress with SearchTree class
« Reply #10 on: December 21, 2005, 04:37:53 am »
So this is a kind of fast, searchable Abstract Syntax Tree (AST)?  Do you have your parser generate an AST and then build a SearchTree instance in order to the code completion from?

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Progress with SearchTree class
« Reply #11 on: December 21, 2005, 04:46:36 am »
No, it's not for an abstract syntax tree ( http://en.wikipedia.org/wiki/Abstract_syntax_tree ), it will only hold the token names. Actually, I just wanted to optimize the addition of tokens :lol:

But this tree will hopefully be used in code completion, to search for all the tokens with a certain prefix.


Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Progress with SearchTree class
« Reply #12 on: December 21, 2005, 06:58:00 am »
Here's the results, part deux ;-)

I added some items, and then queried the contents of the tree.

The following " => " sequences are the results of the queries.
Code
'physics' => ' 1 - uno'
'physiology' => ' 2 - dos'
'psychics' => ' 3 - tres'
'psychotic' => ' 4 - cuatro'
'paranormal' => ' 5 - cinco'
'psychiatrist' => ' 6 - seis'
'psychiatrists' => ' 7 - siete'
'psychi' => ' 8 - ocho'
'ghosts' => ' 9 - nueve'
'paranormals' => '10 - diez'
'blah' => ''
Adding item: blah => bleh
'blah' => 'bleh'
'' => 'this is the root node :P' (I did this to test if i
could add a value to the empty string)
'psychic' => '' (Notice that here even when 'psychic'
is contained in the tree, it doesn't have a corresponding  item)

Clearing tree...

'' => 'this is the root node :)'
(i assigned this after clearing the tree)
'physics' => ''
'physiology' => ''
'psychics' => ''
'psychotic' => ''
'paranormal' => ''
'psychiatrist' => ''
'psychiatrists' => ''
'psychi' => ''
'ghosts' => ''
'paranormals' => ''
'blah' => ''
1 items in the tree
- "" (0)
---

It seems to work, alright. I'll upload the zip now. I think I'm ready now to start working on the codecompletion improvements :)

Offline TDragon

  • Lives here!
  • ****
  • Posts: 943
    • TDM-GCC
Re: Progress with SearchTree class
« Reply #13 on: December 21, 2005, 04:42:04 pm »
In case anyone's interested, the name for this type of structure is a "trie" (an odd abbreviation of "retrieval"). I had built one a few days ago for this same purpose, though without serialization functionality. Hmm...maybe I'll templatize it...

If I'm seeming to hijack all the threads in this forum, it's just your imagination.
https://jmeubank.github.io/tdm-gcc/ - TDM-GCC compiler suite for Windows (GCC 9.2.0 2020-03-08, 32/64-bit, no extra DLLs)

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Re: Progress with SearchTree class
« Reply #14 on: December 24, 2005, 07:17:44 am »
Good news everybody! I'm testing Code::Blocks with the NEW Search Tree! :)

In my dad's machine (I'm on vacations), parsing time (for codeblocks-newbuild) goes down from 4.75 to 1.895 seconds, that's 2.5 times faster! :D

So far, everything seems stable, but I'm having a little problem. Whether I use the new class or not, enabling "follow global includes" seems to shut down Codeblocks in the middle of the parsing. Has it happened to any of you? (Windows, ANSI)