Author Topic: improvement on the SearchTree<T>::AddItem function  (Read 8389 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
improvement on the SearchTree<T>::AddItem function
« on: July 10, 2009, 02:28:01 pm »
Hi, all.
I'm examining the code in searchtree.h. In these code, I set a breakpoint in the line

"m_Items.resize(itemno);", but I found this breakpoint never be hit.

So, I suspect we can remove this if clause?
or ,we can move these clause to the last comparison.

By the way, this vector, I mean vector<TokenIdxSet> in the Token search Tree



will be grown from 1 -> 2 -> 4-> 8->16->32......

can we find a better way, for example, manually set it's capability like resize(1000) to avoid memory allocation?

Any comments?

Code
template <class T>
size_t SearchTree<T>::AddItem(const wxString& s,T item,bool replaceexisting)
{
    size_t itemno = insert(s);
    if(itemno > m_Items.size())
        m_Items.resize(itemno);   //*************set breakpoint here!!
    else if(itemno == m_Items.size())
        m_Items.push_back(item);
    else if(replaceexisting)
        m_Items[itemno] = item;
    return itemno;
}

Edit


since these code was like:
Code
if
else if
else if
//Where is the last "else" clause???
:D
« Last Edit: July 10, 2009, 02:33:03 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: improvement on the SearchTree<T>::AddItem function
« Reply #1 on: July 18, 2009, 09:17:59 am »
Here is another suggestion of the search_tree sturcture.
Maybe, we can use the patricia tree from the standard stl library.
see here:
http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html

And here is one example code:
http://gcc.gnu.org/viewcvs/*checkout*/trunk/libstdc%2B%2B-v3/testsuite/ext/pb_ds/example/trie_prefix_search.cc

Code
/**
 * This example shows how to use a PATRICIA trie for searching
 * for words with a given prefix.
 */

#include <cassert>
#include <iostream>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/trie_policy.hpp>
#include <ext/pb_ds/tag_and_trait.hpp>

using namespace std;
using namespace __gnu_pbds;

// A PATRICIA trie with a prefix-search node-updator type. Note that
// since the node updator is trie_prefix_search_node_update, then the
// container includes its method prefix_range.
typedef null_mapped_type mapped_type;
typedef string_trie_e_access_traits<> cmp_fn;
typedef pat_trie_tag tag_type;

typedef trie<string, mapped_type, cmp_fn, tag_type,
     trie_prefix_search_node_update> trie_type;

// The following helper function takes a trie object and r_key, a
// const reference to a string, and prints all entries whose key
// includes r_key as a prefix.
void
print_prefix_match(const trie_type& t, const string& key)
{
  typedef trie_type::const_iterator const_iterator;
  typedef pair<const_iterator, const_iterator> pair_type;

  cout << "All keys whose prefix matches \"" << key << "\":" << endl;

  const pair_type match_range = t.prefix_range(key);
  for (const_iterator it = match_range.first; it != match_range.second; ++it)
    cout << *it << ' ';
  cout << endl;
}

int main()
{
  trie_type t;

  // Insert some entries.
  assert(t.insert("I").second == true);
  assert(t.insert("wish").second == true);
  assert(t.insert("that").second == true);
  assert(t.insert("I").second == false);
  assert(t.insert("could").second == true);
  assert(t.insert("ever").second == true);
  assert(t.insert("see").second == true);
  assert(t.insert("a").second == true);
  assert(t.insert("poem").second == true);
  assert(t.insert("lovely").second == true);
  assert(t.insert("as").second == true);
  assert(t.insert("a").second == false);
  assert(t.insert("trie").second == true);

  // Now search for prefixes.
  print_prefix_match(t, "");
  print_prefix_match(t, "a");
  print_prefix_match(t, "as");
  print_prefix_match(t, "ad");
  print_prefix_match(t, "t");
  print_prefix_match(t, "tr");
  print_prefix_match(t, "trie");
  print_prefix_match(t, "zzz");

  return 0;
}
result is like below:

Code
All keys whose prefix matches "":
I a as could ever lovely poem see that trie wish
All keys whose prefix matches "a":
a as
All keys whose prefix matches "as":
as
All keys whose prefix matches "ad":

All keys whose prefix matches "t":
that trie
All keys whose prefix matches "tr":
trie
All keys whose prefix matches "trie":
trie
All keys whose prefix matches "zzz":


Process returned 0 (0x0)   execution time : 0.015 s
Press any key to continue.


« Last Edit: July 18, 2009, 09:24:48 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: improvement on the SearchTree<T>::AddItem function
« Reply #2 on: August 12, 2009, 05:20:17 pm »
Continue to my own discussion:

For a trie implementation now in Code completion plugin, the search tree achitecture was designed by Rick22, and it is mainly something like:

Code
std::map<wxString, T>

To record many Token with the same name, we usually use a TokenIndex structure.(which is a map<Index to Token>).

Here is the belief SearchTree structure steps:

1, we know that a Token is allocated from a memory pool, and its address was recorded in a vector<Token*> structure.

2, From the step 1, we can refer to any Token by a index in the vector<Token*>

3, For some Tokens which were different types but have the "same" name, such as:

Code
int abc;
int abc();
class abc{
   XXXXX
}
All of the three Tokens above have its name "abc", so, we use a set<index> which contains all the Token index share the same Token names.

4, Our search tree is just something like  map<wxString, set<index>>, which the key is a wxString. So, actually, we choose a memory efficient data structure called trie, it is like trie<wxString, set<index>

5, given the trie, we can query any Tokens in this trie by its name, or by a pre-fix names.


My question is:

Why do we need a vector<Token*>, is it necessary?

I think we can just skip the step 1,2,3, and directly use trie<wxString, set<Token*>>

Any comments about my suggestion? Thanks. :D








« Last Edit: August 12, 2009, 05:25:12 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.