Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

patch : from std::map replace to hashmap

<< < (6/9) > >>

Loaden:

--- Quote from: ollydbg on March 22, 2010, 09:15:58 am ---
--- Quote from: koso on March 22, 2010, 09:08:49 am ---I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang  gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

--- End quote ---
en, if we use unordered_map in the CodeCompletion plugin, then Codeblocks can only be built from gcc > 4.3. This means we abandon several old gcc version.

--- End quote ---

--- Code: ---#if __GNUC_MINOR__ < 4 || (__GNUC_MINOR__ >= 4 && __GNUC_PATCHLEVEL__ < 3)
    typedef std::map<wxString, wxString> wxStringHashMap;
#else
    typedef std::unordered_map<wxString, wxString> wxStringHashMap;
#endif
--- End code ---

oBFusCATed:
unordered_map is available on every gcc >= 4.0 (I think), 4.1.2 have it for sure!

koso:
Littlee modification. This version is slower, but should be better when searching mostly for not mapped strings. This version has lower probability for collision -> but it needs to be tweaked using full "key" set. Each mapped key should have different hash value.


--- Code: ---namespace std
{
template<> struct hash< wxString > {
    size_t operator()( const wxString& x ) const {
unsigned long len  = x.Len();
size_t __h  = len;

const wxChar* p = x.c_str();

for (unsigned long i = 1; i < len; i+=6) {
  __h  ^= p[i];
  __h <<= 4;
}
        return __h;
    }
};
}
--- End code ---

PS: before cycle can be added this code:


--- Code: ---if ( p[0]!=wxChar('_') ) {
  return size_t(128 ^ __h);
}
--- End code ---

this will rapidly decrease search time for random strings, BUT only if there are no mapped strings without "_" at the begginning. This conforms your test example, but probably wont real usage. 

ollydbg:
@koso
I'm not familiar with the customized hash function, but I'm familiar with the situation when we need a macro check. So, please correct me if I'm wrong.

Then only time a wxString will be checked in Tokenizer class is the case like below:


--- Code: ---    if (c == '_' || wxIsalpha(c))
    {
        // keywords, identifiers, etc.

        // operator== is cheaper than wxIsalnum, also MoveToNextChar already includes IsEOF
        while (    ( (c == '_') || (wxIsalnum(c)) )
               &&  MoveToNextChar() )
            c = CurrentChar(); // repeat

        if (IsEOF())
            return wxEmptyString;

        needReplace = true;
        str = m_Buffer.Mid(start, m_TokenIndex - start);
    }
--- End code ---

koso:

--- Quote from: ollydbg on March 22, 2010, 12:30:20 pm ---@koso
I'm not familiar with the customized hash function, but I'm familiar with the situation when we need a macro check. So, please correct me if I'm wrong.

Then only time a wxString will be checked in Tokenizer class is the case like below:

--- End quote ---

So if I understand it good, you are checking all words in source code (which can potentially be macro) for replacement -> so there will be many checks, but only small amount of them will lead to success.

More interesting question is, it the list/set of macros beeing replaced is constant, or it is dependent on user config? (Is this related to CC  plugin configuration, where user can add custom replacement rules?). First case would simplify many things, but second will make it little complicated => you won`t be able to add there some "super-fast" filter functions, which will eliminate most false checks. (example of this is check for "_" at the begining -> it is very fast, but once user can add custom macros, it will be probably even slower than without it). 

Little theory:

1. searching in <map> is based on string comparision in R-B trees. Count of operation depends on height of tree, which will be soomething like log2 (number of macros) - in your case not more than 5 comparisions  of wxString.

2. <unordered_map> with theoreticaly good hash function will run hasher function, and then 0 or 1 string comparission.

So if hash function is faster than aprox. 3 or 4 wxString comparision, you should get better results with hashing. This looks like no problem, but effectivity of string comparision is depedent on string length -> if you are checking many short words,  it will be realy hard to design better hash function. And also, functions presented in this topic where more faster than good hashing function, so <unordered_map> will have to use more comparision operations...

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version