Author Topic: patch : from std::map replace to hashmap (Read 45837 times)

Loaden · « **on:** March 21, 2010, 01:21:02 pm »

Hello everyone, I created a patch: it can enable CC faster.
Before, CC macro replaced rules use std::map<wxString, wxString>, Like this:

Code

static ConfigManagerContainer::StringToStringMap s_Replacements;

Now, I change it to :

Code

#include <wx/hashmap.h>
WX_DECLARE_STRING_HASH_MAP(wxString, wxStringHashMap);
...
static wxStringHashMap s_HashReplacements;

This is a test code, the compiler is VC10.

Code

#include <windows.h>
#include <winbase.h>
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <set>
#include <list>

using namespace std;

int main()
{
    cout << "Hello world!" << endl;
    return 0;
}

This is a test result：

Code

9 repl rules:

std::map<wxString, wxString> :
Parsing stage done (165 total parsed files, 47380 tokens in 0 minute(s), 4.719 seconds).
Parsing stage done (165 total parsed files, 47380 tokens in 0 minute(s), 4.625 seconds).
Parsing stage done (165 total parsed files, 47380 tokens in 0 minute(s), 4.656 seconds).
Parsing stage done (165 total parsed files, 47380 tokens in 0 minute(s), 4.609 seconds).
Parsing stage done (165 total parsed files, 47380 tokens in 0 minute(s), 4.906 seconds).
Parsing stage done (165 total parsed files, 47380 tokens in 0 minute(s), 4.656 seconds).
Total Time: 28.171
Average: 4.70

wxStringHashMap :
Parsing stage done (165 total parsed files, 47320 tokens in 0 minute(s), 4.609 seconds).
Parsing stage done (165 total parsed files, 47320 tokens in 0 minute(s), 4.500 seconds).
Parsing stage done (165 total parsed files, 47320 tokens in 0 minute(s), 4.500 seconds).
Parsing stage done (165 total parsed files, 47320 tokens in 0 minute(s), 4.547 seconds).
Parsing stage done (165 total parsed files, 47320 tokens in 0 minute(s), 4.500 seconds).
Parsing stage done (165 total parsed files, 47320 tokens in 0 minute(s), 4.516 seconds).
Total Time: 27.172
Average: 4.53
Rate: (4.70 - 4.53) / 4.70 = 3.6%


18 repl rules:

std::map<wxString, wxString> :
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.47 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.46 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.62 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.63 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.62 seconds).
Total Time: 22.8
Average: 4.53

wxStringHashMap :
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.922 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.938 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.937 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.938 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.953 seconds).
Total Time: 19.688
Average: 3.9376
Rate: (4.53 - 3.9376) / 4.53 = 13.1%

MODIFY：Rename static member variable s_HashReplacements TO s_Replacements

[attachment deleted by admin]

ollydbg · « **Reply #1 on:** March 21, 2010, 03:46:39 pm »

In fact, we have discussed this change from the stand-map to hash-map in this forum several months ago.
As far As I can remember, oBFusCATed has suggest using a hashmap.

Today, you have implement it, so I really appreciate your effort and contribution. I have downloaded and test it in my local, it did work fine.

I hope using the hashmap can boost the performance, as loaden said, it can improve about 13.1%.

Jenna · « **Reply #2 on:** March 21, 2010, 03:56:38 pm »

Here are some test on linux:

I tested it with C::B build with automake-system and inside C::B, I used Loaden's sample-code and the linux 2.6.29 kernel-sources to test.
There is a big difference between automake and C::B-build code, but not between std::map and wxHashmap, automake uses -O2 if I remember right (in fact in some cases the patched C::B was slower, but I think that is statistically noise).

Code

kernel:

build with automake (trunk)

Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 14.438 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 15.590 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 11.838 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 17.893 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 14.574 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 13.328 seconds).
                                                      average: 1 minute     14.610 seconds

build with automake (patched)

Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 13.465 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 13.550 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 16.289 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 16.961 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 14.989 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 1 minute(s), 18.696 seconds).
                                                      average: 1 minute     15.658 seconds

                                                     
build inside C::B (trunk)

Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 28.661 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 28.677 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 27.898 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 29.919 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 25.487 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 26.787 seconds).
                                                      average: 2 minutes    27.905 seconds
build inside C::B (patched)

Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 26.98 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 26.431 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 23.28 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 22.661 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 25.386 seconds).
Parsing stage done (22398 total parsed files, 905754 tokens in 2 minute(s), 22.730 seconds).
                                                      average: 2 minutes    24.578 seconds
	

sample-code:

build with automake (trunk)

Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.395 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.330 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.370 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.342 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.327 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.365 seconds).
                                                               average: 0.355 seconds

build with automake (patched)

Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.379 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.330 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.358 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.338 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.346 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 0.327 seconds).
                                                               average: 0.346 seconds

build inside C::B (trunk)

Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.195 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.135 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.135 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.153 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.153 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.133 seconds).
                                                               average: 1,151 seconds

build inside C::B (patched)

Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.117 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.75 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.51 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.73 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.74 seconds).
Parsing stage done (52 total parsed files, 27554 tokens in 0 minute(s), 1.71 seconds).
                                                               average: 1.593 seconds

ollydbg · « **Reply #3 on:** March 21, 2010, 04:05:32 pm »

@jens, here is Loaden's test result he send to me by email.

Code

18repl:

STD:
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.47 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.46 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.62 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.63 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 4.62 seconds).
Total: 22.8
AVG: 4.53

HASH:
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.922 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.938 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.937 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.938 seconds).
Parsing stage done (155 total parsed files, 41958 tokens in 0 minute(s), 3.953 seconds).
Total: 19.688
Avg: 3.9376
(4.53 - 3.9376) / 4.53 = 13.1%

He use 18 replacement rules.
I personally think that the more the replacement rules we use, the more benefit we can get by using hashmap.

You can compare with codelite's replacement rule:
Re: vector<int> is OK, but string or wstring no-work.
There are a lot of replacement rules.

Loaden · « **Reply #4 on:** March 21, 2010, 04:25:46 pm »

@ jens , Can you test this demo in linux?

Code

#include <map>
#include <wx/hashmap.h>
#include <wx/string.h>

WX_DECLARE_STRING_HASH_MAP(wxString, wxHashString);

int main()
{
    wxHashString hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    
    hashMap[_T("_CXX_STD")] = _T("std");
    hashMap[_T("_CXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_CXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_CXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_CXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_CXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_CXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_CXX_STD2")] = _T("2std");
    hashMap[_T("_CXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_CXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_CXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_CXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_CXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_CXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");

    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    
    stdMap[_T("_CXX_STD")] = _T("std");
    stdMap[_T("_CXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_CXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_CXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_CXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_CXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_CXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_CXX_STD2")] = _T("2std");
    stdMap[_T("_CXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_CXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_CXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_CXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_CXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_CXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");

    for (int i = 0; i < 20000000; ++i)
    {
        hashMap.find(_T("testfindkey")); // you can switch to stdMap for test
    }

    return 0;
}

Thanks!
The attachments is a demo on win32.

[attachment deleted by admin]

Jenna · « **Reply #5 on:** March 21, 2010, 08:18:43 pm »

Quote from: Loaden on March 21, 2010, 04:25:46 pm

@ jens , Can you test this demo in linux?

I changed the code a little bit to get some more output on the console:

Code

#include <iostream>
#include <map>
#include <wx/hashmap.h>
#include <wx/string.h>
#include <wx/stopwatch.h>

WX_DECLARE_STRING_HASH_MAP(wxString, wxHashString);

int main()
{
    wxHashString hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");

    hashMap[_T("_CXX_STD")] = _T("std");
    hashMap[_T("_CXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_CXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_CXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_CXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_CXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_CXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_CXX_STD2")] = _T("2std");
    hashMap[_T("_CXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_CXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_CXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_CXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_CXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_CXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");

    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");

    stdMap[_T("_CXX_STD")] = _T("std");
    stdMap[_T("_CXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_CXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_CXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_CXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_CXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_CXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_CXX_STD2")] = _T("2std");
    stdMap[_T("_CXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_CXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_CXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_CXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_CXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_CXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");

    int stdMapTime=0;
    int hashMapTime=0;

    wxStopWatch sw;
    for(int j = 0; j < 10 ; j++)
    {
        sw.Start();
        for (int i = 0; i < 20000000; ++i)
        {
            stdMap.find(_T("testfindkey")); // you can switch to stdMap for test
        }
        stdMapTime+=sw.Time();
        std::cout << "stdMap took " << sw.Time() << " ms" << std::endl;

        sw.Start();
        for (int i = 0; i < 20000000; ++i)
        {
            hashMap.find(_T("testfindkey")); // you can switch to stdMap for test
        }

        hashMapTime+=sw.Time();
        std::cout << "hashMap took " << sw.Time() << " ms" << std::endl;
    }
    std::cout << "average (stdMap): " << stdMapTime / 10 << " ms" << std::endl;
    std::cout << "average (hashMap): " << hashMapTime / 10 << " ms" << std::endl;
    std::cout << "100 * (" << hashMapTime << "-" <<  stdMapTime << ") / " << hashMapTime <<  " = : " << 100 * (hashMapTime-stdMapTime) / hashMapTime << " %" << std::endl;

    return 0;
}

Here is the output from debug-version:

Code

stdMap took 6627 ms
hashMap took 5962 ms
stdMap took 6636 ms
hashMap took 5988 ms
stdMap took 6640 ms
hashMap took 6019 ms
stdMap took 6680 ms
hashMap took 6017 ms
stdMap took 6677 ms
hashMap took 6119 ms
stdMap took 6722 ms
hashMap took 6010 ms
stdMap took 6664 ms
hashMap took 6062 ms
stdMap took 6793 ms
hashMap took 6044 ms
stdMap took 6671 ms
hashMap took 6032 ms
stdMap took 6683 ms
hashMap took 6035 ms
average (stdMap): 6679 ms
average (hashMap): 6028 ms
100 * (60288-66793) / 60288 = : -10 %

Process returned 0 (0x0)   execution time : 127.129 s
Press ENTER to continue.

and here the output from the release-version with -O2 optimization:

Code

stdMap took 2599 ms
hashMap took 3312 ms
stdMap took 2624 ms
hashMap took 3320 ms
stdMap took 2590 ms
hashMap took 3315 ms
stdMap took 2585 ms
hashMap took 3318 ms
stdMap took 2580 ms
hashMap took 3310 ms
stdMap took 2577 ms
hashMap took 3297 ms
stdMap took 2614 ms
hashMap took 3298 ms
stdMap took 2577 ms
hashMap took 3302 ms
stdMap took 2573 ms
hashMap took 3304 ms
stdMap took 2568 ms
hashMap took 3303 ms
average (stdMap): 2588 ms
average (hashMap): 3307 ms
100 * (33079-25887) / 33079 = : 21 %

Process returned 0 (0x0)   execution time : 59.013 s
Press ENTER to continue.

It seems that the STL-map can be better optimized (at least on linux), maybe we should think about a real release-version of C::B (for windows), not just a stripped version.

MortenMacFly · « **Reply #6 on:** March 21, 2010, 08:30:03 pm »

Quote from: jens on March 21, 2010, 08:18:43 pm

It seems that the STL-map can be better optimized (at least on linux), maybe we should think about a real release-version of C::B (for windows), not just a stripped version.

What do you mean: compiling C::B with optimization flags enabled?

Jenna · « **Reply #7 on:** March 21, 2010, 08:58:47 pm »

Quote from: MortenMacFly on March 21, 2010, 08:30:03 pm

Quote from: jens on March 21, 2010, 08:18:43 pm
It seems that the STL-map can be better optimized (at least on linux), maybe we should think about a real release-version of C::B (for windows), not just a stripped version.
What do you mean: compiling C::B with optimization flags enabled?

Yes, if automake-system is used, C::B is compiled with -O2 and batch-parsing is significantly faster than without optimization.

You can see it in my first post in this thread (http://forums.codeblocks.org/index.php/topic,12235.msg83070.html#msg83070), batch-parsing of linux kernel-code is about 50 % faster, if C::B is build with automake-system, for Loadens sample code the speedup is nearly 70 %.

oBFusCATed · « **Reply #8 on:** March 21, 2010, 09:53:24 pm »

Hah, you don't use optimizations on windows! :shock:

Keep in mind that gcc can generate wrong code in optimization mode lower that -O2 ... I've seen reports for -O0

Loaden · « **Reply #9 on:** March 22, 2010, 01:30:30 am »

Yes, I don't use -O2, because Code-Completion plugin not use -O2 too. :lol:
Here is another demo, I wonder is: gnu's hash_map is so efficient, even when use -O2.

Code

#include <iostream>
#include <map>
#include <string>

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/hashmap.h>
#include <wx/string.h>
#include <wx/timer.h>

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< std::string >
{
    size_t operator()( const std::string& x ) const
    {
        return hash< const char* >()( x.c_str() );
    }
};
}

//namespace __gnu_cxx
//{
//template<> struct hash< wxString >
//{
//    size_t operator()( const wxString& x ) const
//    {
//        return hash< const wxChar* >()( x.c_str() );
//    }
//};
//}

WX_DECLARE_STRING_HASH_MAP(wxString, wxHashString);

int main()
{
    wxHashString hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    hashMap[_T("_XXGLIBCXX_END_NAMESPACE_TR12")] = _T("X2}");
    hashMap[_T("_XXGLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("X2-namespace tr1 {");

    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    stdMap[_T("_XXGLIBCXX_END_NAMESPACE_TR12")] = _T("X2}");
    stdMap[_T("_XXGLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("X2-namespace tr1 {");
    
    std::hash_map<std::string, std::string> hashMap2;
    hashMap2["_GLIBCXX_STD"] = "std";
    hashMap2["_GLIBCXX_BEGIN_NESTED_NAMESPACE"] = "+namespace";
    hashMap2["_GLIBCXX_BEGIN_NAMESPACE"] = "+namespace";
    hashMap2["_GLIBCXX_END_NESTED_NAMESPACE"] = "}";
    hashMap2["_GLIBCXX_END_NAMESPACE"] = "}";
    hashMap2["_GLIBCXX_END_NAMESPACE_TR1"] = "}";
    hashMap2["_GLIBCXX_BEGIN_NAMESPACE_TR1"] = "-namespace tr1 {";
    hashMap2["_GLIBCXX_STD2"] = "2std";
    hashMap2["_GLIBCXX_BEGIN_NESTED_NAMESPACE2"] = "2+namespace";
    hashMap2["_GLIBCXX_BEGIN_NAMESPACE2"] = "2+namespace";
    hashMap2["_GLIBCXX_END_NESTED_NAMESPACE2"] = "2}";
    hashMap2["_GLIBCXX_END_NAMESPACE2"] = "2}";
    hashMap2["_GLIBCXX_END_NAMESPACE_TR12"] = "2}";
    hashMap2["_GLIBCXX_BEGIN_NAMESPACE_TR12"] = "2-namespace tr1 {";
    hashMap2["_XXGLIBCXX_END_NAMESPACE_TR12"] = "X2}";
    hashMap2["_XXGLIBCXX_BEGIN_NAMESPACE_TR12"] = "X2-namespace tr1 {";
    
    std::map<std::string, std::string> stdMap2;
    stdMap2["_GLIBCXX_STD"] = "std";
    stdMap2["_GLIBCXX_BEGIN_NESTED_NAMESPACE"] = "+namespace";
    stdMap2["_GLIBCXX_BEGIN_NAMESPACE"] = "+namespace";
    stdMap2["_GLIBCXX_END_NESTED_NAMESPACE"] = "}";
    stdMap2["_GLIBCXX_END_NAMESPACE"] = "}";
    stdMap2["_GLIBCXX_END_NAMESPACE_TR1"] = "}";
    stdMap2["_GLIBCXX_BEGIN_NAMESPACE_TR1"] = "-namespace tr1 {";
    stdMap2["_GLIBCXX_STD2"] = "2std";
    stdMap2["_GLIBCXX_BEGIN_NESTED_NAMESPACE2"] = "2+namespace";
    stdMap2["_GLIBCXX_BEGIN_NAMESPACE2"] = "2+namespace";
    stdMap2["_GLIBCXX_END_NESTED_NAMESPACE2"] = "2}";
    stdMap2["_GLIBCXX_END_NAMESPACE2"] = "2}";
    stdMap2["_GLIBCXX_END_NAMESPACE_TR12"] = "2}";
    stdMap2["_GLIBCXX_BEGIN_NAMESPACE_TR12"] = "2-namespace tr1 {";
    stdMap2["_XXGLIBCXX_END_NAMESPACE_TR12"] = "X2}";
    stdMap2["_XXGLIBCXX_BEGIN_NAMESPACE_TR12"] = "X2-namespace tr1 {";

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) stdMap.find(_T("testfindkey"));
        std::cout << "stdMap " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }

    std::cout << "\n---------------\n" << std::endl;

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) hashMap.find(_T("testfindkey"));
        std::cout << "hashMap " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }
    
    std::cout << "\n---------------\n" << std::endl;

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) stdMap2.find("testfindkey");
        std::cout << "stdMap2 " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }
    
    std::cout << "\n---------------\n" << std::endl;

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) hashMap2.find("testfindkey");
        std::cout << "hashMap2 " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }

    return 0;
}

Code

stdMap 1 : 2281
stdMap 2 : 2266
stdMap 3 : 2281
stdMap 4 : 2266
stdMap 5 : 2265

---------------

hashMap 1 : 2375
hashMap 2 : 2360
hashMap 3 : 2359
hashMap 4 : 2359
hashMap 5 : 2360

---------------

stdMap2 1 : 3578
stdMap2 2 : 3578
stdMap2 3 : 3594
stdMap2 4 : 3578
stdMap2 5 : 3594

---------------

hashMap2 1 : 2421
hashMap2 2 : 2422
hashMap2 3 : 2422
hashMap2 4 : 2422
hashMap2 5 : 2422

Process returned 0 (0x0)   execution time : 53.531 s
Press any key to continue.

However, I can NOT compiled std::hash_map<wxString, wxString>, who knows how to solution?
Thanks!

[attachment deleted by admin]

Loaden · « **Reply #10 on:** March 22, 2010, 01:55:27 am »

If use -O0 like Code-Completion plugins, here are some test data:

Code

stdMap 1 : 4282
stdMap 2 : 4265
stdMap 3 : 4281
stdMap 4 : 4266
stdMap 5 : 4281

---------------

hashMap 1 : 2969
hashMap 2 : 2969
hashMap 3 : 2969
hashMap 4 : 2984
hashMap 5 : 2969

---------------

stdMap2 1 : 5234
stdMap2 2 : 5234
stdMap2 3 : 5235
stdMap2 4 : 5234
stdMap2 5 : 5250

---------------

hashMap2 1 : 3266
hashMap2 2 : 3265
hashMap2 3 : 3266
hashMap2 4 : 3266
hashMap2 5 : 3265

Process returned 0 (0x0)   execution time : 78.906 s
Press any key to continue.

ollydbg · « **Reply #11 on:** March 22, 2010, 02:19:16 am »

Quote from: Loaden on March 22, 2010, 01:55:27 am

If use -O0 like Code-Completion plugins, here are some test data:

Code

stdMap 1 : 4282
stdMap 2 : 4265
stdMap 3 : 4281
stdMap 4 : 4266
stdMap 5 : 4281

---------------

hashMap 1 : 2969
hashMap 2 : 2969
hashMap 3 : 2969
hashMap 4 : 2984
hashMap 5 : 2969

---------------

stdMap2 1 : 5234
stdMap2 2 : 5234
stdMap2 3 : 5235
stdMap2 4 : 5234
stdMap2 5 : 5250

---------------

hashMap2 1 : 3266
hashMap2 2 : 3265
hashMap2 3 : 3266
hashMap2 4 : 3266
hashMap2 5 : 3265

Process returned 0 (0x0)   execution time : 78.906 s
Press any key to continue.

so, the xxx2 version is ANSI version(std::string) and the other is UNICODE version(wxString).

For std-map, UNICODE is better.
For hash-map, ANSI is better.

Oh, wait, I think you don't defined the UNICODE macros, so all the string type is ANSI.

Loaden · « **Reply #12 on:** March 22, 2010, 02:23:53 am »

If you use only seven replacement rules:
No Optimization (-O0)：

Code

stdMap 1 : 3671
stdMap 2 : 3657
stdMap 3 : 3672
stdMap 4 : 3671
stdMap 5 : 3657

---------------

hashMap 1 : 2953
hashMap 2 : 2953
hashMap 3 : 2953
hashMap 4 : 2953
hashMap 5 : 2938

---------------

stdMap2 1 : 4234
stdMap2 2 : 4234
stdMap2 3 : 4235
stdMap2 4 : 4234
stdMap2 5 : 4235

---------------

hashMap2 1 : 3156
hashMap2 2 : 3172
hashMap2 3 : 3172
hashMap2 4 : 3171
hashMap2 5 : 3157

Process returned 0 (0x0)   execution time : 70.281 s
Press any key to continue.

Optimization (-O2):

Code

stdMap 1 : 2156
stdMap 2 : 2156
stdMap 3 : 2156
stdMap 4 : 2156
stdMap 5 : 2157

---------------

hashMap 1 : 2296
hashMap 2 : 2297
hashMap 3 : 2297
hashMap 4 : 2281
hashMap 5 : 2297

---------------

stdMap2 1 : 3047
stdMap2 2 : 3047
stdMap2 3 : 3047
stdMap2 4 : 3031
stdMap2 5 : 3047

---------------

hashMap2 1 : 2391
hashMap2 2 : 2406
hashMap2 3 : 2406
hashMap2 4 : 2391
hashMap2 5 : 2406

Process returned 0 (0x0)   execution time : 49.734 s
Press any key to continue.

ollydbg · « **Reply #13 on:** March 22, 2010, 02:53:02 am »

I suggest changing the variable names. see the code below:

Code

#include <iostream>
#include <map>
#include <string>

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/hashmap.h>
#include <wx/string.h>
#include <wx/timer.h>

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< std::string >
{
    size_t operator()( const std::string& x ) const
    {
        return hash< const char* >()( x.c_str() );
    }
};
}

//namespace __gnu_cxx
//{
//template<> struct hash< wxString >
//{
//    size_t operator()( const wxString& x ) const
//    {
//        return hash< const wxChar* >()( x.c_str() );
//    }
//};
//}

WX_DECLARE_STRING_HASH_MAP(wxString, wxHashString);

int main()
{
    wxHashString wxHashMapwxString;
    wxHashMapwxString[_T("_GLIBCXX_STD")] = _T("std");
    wxHashMapwxString[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    wxHashMapwxString[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    wxHashMapwxString[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    wxHashMapwxString[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    wxHashMapwxString[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    wxHashMapwxString[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    wxHashMapwxString[_T("_GLIBCXX_STD2")] = _T("2std");
    wxHashMapwxString[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    wxHashMapwxString[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    wxHashMapwxString[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    wxHashMapwxString[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    wxHashMapwxString[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    wxHashMapwxString[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    wxHashMapwxString[_T("_XXGLIBCXX_END_NAMESPACE_TR12")] = _T("X2}");
    wxHashMapwxString[_T("_XXGLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("X2-namespace tr1 {");

    std::map<wxString, wxString> stdMapwxString;
    stdMapwxString[_T("_GLIBCXX_STD")] = _T("std");
    stdMapwxString[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMapwxString[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMapwxString[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMapwxString[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMapwxString[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMapwxString[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMapwxString[_T("_GLIBCXX_STD2")] = _T("2std");
    stdMapwxString[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMapwxString[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMapwxString[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMapwxString[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    stdMapwxString[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMapwxString[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    stdMapwxString[_T("_XXGLIBCXX_END_NAMESPACE_TR12")] = _T("X2}");
    stdMapwxString[_T("_XXGLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("X2-namespace tr1 {");

    std::hash_map<std::string, std::string> stdHashMapstdString;
    stdHashMapstdString["_GLIBCXX_STD"] = "std";
    stdHashMapstdString["_GLIBCXX_BEGIN_NESTED_NAMESPACE"] = "+namespace";
    stdHashMapstdString["_GLIBCXX_BEGIN_NAMESPACE"] = "+namespace";
    stdHashMapstdString["_GLIBCXX_END_NESTED_NAMESPACE"] = "}";
    stdHashMapstdString["_GLIBCXX_END_NAMESPACE"] = "}";
    stdHashMapstdString["_GLIBCXX_END_NAMESPACE_TR1"] = "}";
    stdHashMapstdString["_GLIBCXX_BEGIN_NAMESPACE_TR1"] = "-namespace tr1 {";
    stdHashMapstdString["_GLIBCXX_STD2"] = "2std";
    stdHashMapstdString["_GLIBCXX_BEGIN_NESTED_NAMESPACE2"] = "2+namespace";
    stdHashMapstdString["_GLIBCXX_BEGIN_NAMESPACE2"] = "2+namespace";
    stdHashMapstdString["_GLIBCXX_END_NESTED_NAMESPACE2"] = "2}";
    stdHashMapstdString["_GLIBCXX_END_NAMESPACE2"] = "2}";
    stdHashMapstdString["_GLIBCXX_END_NAMESPACE_TR12"] = "2}";
    stdHashMapstdString["_GLIBCXX_BEGIN_NAMESPACE_TR12"] = "2-namespace tr1 {";
    stdHashMapstdString["_XXGLIBCXX_END_NAMESPACE_TR12"] = "X2}";
    stdHashMapstdString["_XXGLIBCXX_BEGIN_NAMESPACE_TR12"] = "X2-namespace tr1 {";

    std::map<std::string, std::string> stdMapstdString;
    stdMapstdString["_GLIBCXX_STD"] = "std";
    stdMapstdString["_GLIBCXX_BEGIN_NESTED_NAMESPACE"] = "+namespace";
    stdMapstdString["_GLIBCXX_BEGIN_NAMESPACE"] = "+namespace";
    stdMapstdString["_GLIBCXX_END_NESTED_NAMESPACE"] = "}";
    stdMapstdString["_GLIBCXX_END_NAMESPACE"] = "}";
    stdMapstdString["_GLIBCXX_END_NAMESPACE_TR1"] = "}";
    stdMapstdString["_GLIBCXX_BEGIN_NAMESPACE_TR1"] = "-namespace tr1 {";
    stdMapstdString["_GLIBCXX_STD2"] = "2std";
    stdMapstdString["_GLIBCXX_BEGIN_NESTED_NAMESPACE2"] = "2+namespace";
    stdMapstdString["_GLIBCXX_BEGIN_NAMESPACE2"] = "2+namespace";
    stdMapstdString["_GLIBCXX_END_NESTED_NAMESPACE2"] = "2}";
    stdMapstdString["_GLIBCXX_END_NAMESPACE2"] = "2}";
    stdMapstdString["_GLIBCXX_END_NAMESPACE_TR12"] = "2}";
    stdMapstdString["_GLIBCXX_BEGIN_NAMESPACE_TR12"] = "2-namespace tr1 {";
    stdMapstdString["_XXGLIBCXX_END_NAMESPACE_TR12"] = "X2}";
    stdMapstdString["_XXGLIBCXX_BEGIN_NAMESPACE_TR12"] = "X2-namespace tr1 {";

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) stdMapwxString.find(_T("testfindkey"));
        std::cout << "stdMapwxString " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }

    std::cout << "\n---------------\n" << std::endl;

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) wxHashMapwxString.find(_T("testfindkey"));
        std::cout << "wxHashMapwxString " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }

    std::cout << "\n---------------\n" << std::endl;

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) stdMapstdString.find("testfindkey");
        std::cout << "stdMapstdString " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }

    std::cout << "\n---------------\n" << std::endl;

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) stdHashMapstdString.find("testfindkey");
        std::cout << "stdHashMapstdString " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }

    return 0;
}

And the result is here, I built this project with -O2 :

Code

stdMapwxString 1 : 3094
stdMapwxString 2 : 3094
stdMapwxString 3 : 3094
stdMapwxString 4 : 3109
stdMapwxString 5 : 3110

---------------

wxHashMapwxString 1 : 3296
wxHashMapwxString 2 : 3313
wxHashMapwxString 3 : 3297
wxHashMapwxString 4 : 3312
wxHashMapwxString 5 : 3297

---------------

stdMapstdString 1 : 4719
stdMapstdString 2 : 4734
stdMapstdString 3 : 4766
stdMapstdString 4 : 4734
stdMapstdString 5 : 4735

---------------

stdHashMapstdString 1 : 3515
stdHashMapstdString 2 : 3500
stdHashMapstdString 3 : 3516
stdHashMapstdString 4 : 3516
stdHashMapstdString 5 : 3515

Process returned 0 (0x0)   execution time : 73.344 s
Press any key to continue.

So, you can see:

Quote

stdMapwxString 1 : 3094
stdMapwxString 2 : 3094
stdMapwxString 3 : 3094
stdMapwxString 4 : 3109
stdMapwxString 5 : 3110

---------------

wxHashMapwxString 1 : 3296
wxHashMapwxString 2 : 3313
wxHashMapwxString 3 : 3297
wxHashMapwxString 4 : 3312
wxHashMapwxString 5 : 3297

Optimized stdMap is better than wxHashMap.

Loaden · « **Reply #14 on:** March 22, 2010, 05:07:39 am »

Please look at GNU's hash_map test results. (-O2)

Code

#include <iostream>
#include <map>

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/hashmap.h>
#include <wx/string.h>
#include <wx/timer.h>

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        const wxChar* p = x.wx_str();
        size_t size = x.size();
        while (size) __h = 5 * __h + p[--size];
        return size_t(__h);
    }
};
}

WX_DECLARE_STRING_HASH_MAP(wxString, wxHashString);

int main()
{
    std::hash_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
//    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
//    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
//    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
//    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
//    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
//    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
//    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
//    hashMap[_T("_XXGLIBCXX_END_NAMESPACE_TR12")] = _T("X2}");
//    hashMap[_T("_XXGLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("X2-namespace tr1 {");

    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
//    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
//    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
//    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
//    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
//    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
//    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
//    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
//    stdMap[_T("_XXGLIBCXX_END_NAMESPACE_TR12")] = _T("X2}");
//    stdMap[_T("_XXGLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("X2-namespace tr1 {");

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) stdMap.find(_T("testfindkey"));
        std::cout << "stdMap " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }

    std::cout << "\n---------------\n" << std::endl;

    for (int i = 0; i < 5; ++i)
    {
        wxStartTimer();
        for (int j = 0; j < 10000000; ++j) hashMap.find(_T("testfindkey"));
        std::cout << "hashMap " << i + 1 << " : " << wxGetElapsedTime() << std::endl;
    }

    return 0;
}

Code

stdMap 1 : 2203
stdMap 2 : 2188
stdMap 3 : 2203
stdMap 4 : 2203
stdMap 5 : 2203

---------------

hashMap 1 : 2078
hashMap 2 : 2094
hashMap 3 : 2078
hashMap 4 : 2078
hashMap 5 : 2078

Process returned 0 (0x0)   execution time : 21.484 s
Press any key to continue.

If changed TO 16 replace rules ：

Code

stdMap 1 : 2250
stdMap 2 : 2250
stdMap 3 : 2250
stdMap 4 : 2250
stdMap 5 : 2265

---------------

hashMap 1 : 2031
hashMap 2 : 2047
hashMap 3 : 2047
hashMap 4 : 2031
hashMap 5 : 2032

Process returned 0 (0x0)   execution time : 21.672 s
Press any key to continue.

NO OPTI, 16 replacement: (-O0)

Code

stdMap 1 : 4281
stdMap 2 : 4282
stdMap 3 : 4281
stdMap 4 : 4281
stdMap 5 : 4297

---------------

hashMap 1 : 3203
hashMap 2 : 3203
hashMap 3 : 3219
hashMap 4 : 3203
hashMap 5 : 3203

Process returned 0 (0x0)   execution time : 37.609 s
Press any key to continue.

OPTI(-O2), 9 replacement:

Code

stdMap 1 : 2235
stdMap 2 : 2234
stdMap 3 : 2219
stdMap 4 : 2234
stdMap 5 : 2219

---------------

hashMap 1 : 2047
hashMap 2 : 2031
hashMap 3 : 2047
hashMap 4 : 2031
hashMap 5 : 2031

Process returned 0 (0x0)   execution time : 21.609 s
Press any key to continue.

ollydbg · « **Reply #15 on:** March 22, 2010, 05:36:29 am »

Nice!
So, you have implemented a hash function for wxString when using gnu hash map?

Here is my test result, all the test was built in -O2 optimization.

Here is the result for 9 replacement rules:

Code

stdMap 1 : 2953
stdMap 2 : 2953
stdMap 3 : 2953
stdMap 4 : 2969
stdMap 5 : 2953

---------------

hashMap 1 : 3000
hashMap 2 : 2968
hashMap 3 : 2969
hashMap 4 : 2969
hashMap 5 : 2969

with 16 map entries. the result is listed below:

Code

stdMap 1 : 3079
stdMap 2 : 3062
stdMap 3 : 3078
stdMap 4 : 3078
stdMap 5 : 3063

---------------

hashMap 1 : 2953
hashMap 2 : 2953
hashMap 3 : 2969
hashMap 4 : 2953
hashMap 5 : 2984

The conclusion is: We should use gnu hash wxString map.

Loaden · « **Reply #16 on:** March 22, 2010, 05:53:52 am »

Quote from: ollydbg on March 22, 2010, 05:36:29 am

Nice!
So, you have implemented a hash function for wxString when using gnu hash map?

Here is my test result, all the test was built in -O2 optimization.

Here is the result for 9 replacement rules:

Try change the hash function TO:

Code

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

Loaden · « **Reply #17 on:** March 22, 2010, 06:22:11 am »

Please test this demo, -O2:

Code

#include <iostream>
#include <map>

#define _BACKWARD_BACKWARD_WARNING_H 0

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
    
    std::hash_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
    
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

Code

stdMap 1 : 3375
stdMap 2 : 3407
stdMap 3 : 3390
stdMap 4 : 3375
stdMap 5 : 3407

stdMap Total Time: 16954
stdMap Average Time: 3390
------------------------------

hashMap 1 : 1625
hashMap 2 : 1625
hashMap 3 : 1625
hashMap 4 : 1625
hashMap 5 : 1625

hashMap Total Time: 8125
hashMap Average Time: 1625
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.520649

Process returned 0 (0x0)   execution time : 25.219 s
Press any key to continue.

18 replacement:

Code

#include <iostream>
#include <map>

#define _BACKWARD_BACKWARD_WARNING_H 0

#ifdef __GNUC__
#include <ext/hash_map>
#else
#include <hash_map>
#endif

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
using namespace __gnu_cxx;
}

namespace __gnu_cxx
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    stdMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
    stdMap[_T("_STD_END2")] = _T("2}");
    
    std::hash_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");
    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    hashMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
    hashMap[_T("_STD_END2")] = _T("2}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
    
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

Code

stdMap 1 : 4265
stdMap 2 : 4266
stdMap 3 : 4265
stdMap 4 : 4297
stdMap 5 : 4266

stdMap Total Time: 21359
stdMap Average Time: 4271
------------------------------

hashMap 1 : 1625
hashMap 2 : 1625
hashMap 3 : 1625
hashMap 4 : 1625
hashMap 5 : 1625

hashMap Total Time: 8125
hashMap Average Time: 1625
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.619527

Process returned 0 (0x0)   execution time : 29.609 s
Press any key to continue.

Jenna · « **Reply #18 on:** March 22, 2010, 07:08:53 am »

I am (and was) aware of std::hash_map, but I did not use it, because it's declared as deprecated and just supressing the warnings is not a real option to me, because it might supress warnings about other deprecated headers too and what's more it might break build in newer versions of gcc.

Loaden · « **Reply #19 on:** March 22, 2010, 07:36:15 am »

Quote from: jens on March 22, 2010, 07:08:53 am

I am (and was) aware of std::hash_map, but I did not use it, because it's declared as deprecated and just supressing the warnings is not a real option to me, because it might supress warnings about other deprecated headers too and what's more it might break build in newer versions of gcc.

OK, I think we can use unordered_map.

Code

This file includes at least one deprecated or antiquated header which \ may be removed without further notice at a future date. Please use a \ non-deprecated interface with equivalent functionality instead. For a \ listing of replacement headers and interfaces, consult the file \ backward_warning.h. To disable this warning use -Wno-deprecated. 


/* A list of valid replacements is as follows: 
Use: Instead of: 
<sstream>, basic_stringbuf <strstream>, strstreambuf
<sstream>, basic_istringstream <strstream>, istrstream
<sstream>, basic_ostringstream <strstream>, ostrstream
<sstream>, basic_stringstream <strstream>, strstream
<unordered_set>, unordered_set <ext/hash_set>, hash_set
<unordered_set>, unordered_multiset <ext/hash_set>, hash_multiset
<unordered_map>, unordered_map <ext/hash_set>, hash_map
<unordered_map>, unordered_multimap <ext/hash_set>, hash_multimap
<functional>, bind <functional>, binder1st
<functional>, bind <functional>, binder2nd
<functional>, bind <functional>, bind1st
<functional>, bind <functional>, bind2nd
<memory>, unique_ptr <memory>, auto_ptr
*/

Here is the demo, need -std=gnu++0x option

Code

#include <iostream>
#include <map>
#include <unordered_map>

#include <wx/string.h>
#include <wx/timer.h>

const size_t LOOPS = 5;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

namespace std
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
        for (const wxChar* p = x.c_str(); *p; ++p)
            __h = 5 * __h + *p;
        return size_t(__h);
    }
};
}

int main()
{
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
    
    std::unordered_map<wxString, wxString> hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(FINDKEY);
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;
    
    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

Code

stdMap 1 : 1828
stdMap 2 : 1828
stdMap 3 : 1812
stdMap 4 : 1829
stdMap 5 : 1812

stdMap Total Time: 9109
stdMap Average Time: 1821
------------------------------

hashMap 1 : 703
hashMap 2 : 703
hashMap 3 : 703
hashMap 4 : 688
hashMap 5 : 703

hashMap Total Time: 3500
hashMap Average Time: 700
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.615596

Process returned 0 (0x0)   execution time : 12.797 s
Press any key to continue.

ollydbg · « **Reply #20 on:** March 22, 2010, 08:38:30 am »

@loaden, I have tested and find that there is nearly about 60% speed improvement, really cool!!!!

Code

stdMap 1 : 2203
stdMap 2 : 2188
stdMap 3 : 2203
stdMap 4 : 2203
stdMap 5 : 2203

stdMap Total Time: 11000
stdMap Average Time: 2200
------------------------------

hashMap 1 : 860
hashMap 2 : 875
hashMap 3 : 890
hashMap 4 : 875
hashMap 5 : 891

hashMap Total Time: 4391
hashMap Average Time: 878
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.600909

Process returned 0 (0x0)   execution time : 15.500 s
Press any key to continue.

koso · « **Reply #21 on:** March 22, 2010, 09:08:49 am »

I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

And btw. Loaden, your hash function is quiet complicated for this purpose, especially when list of possible values is so small. Try this:

Code

namespace std
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
	unsigned long len = __h  = 128 ^ x.Len();
	
	const wxChar* p = x.c_str();
	
	for (long i = 1; i < len; i+=5)
            __h = __h ^ p[i];
	
        return size_t(__h);
    }
};
}

ollydbg · « **Reply #22 on:** March 22, 2010, 09:15:58 am »

Quote from: koso on March 22, 2010, 09:08:49 am

I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

en, if we use unordered_map in the CodeCompletion plugin, then Codeblocks can only be built from gcc > 4.3. This means we abandon several old gcc version.

Loaden · « **Reply #23 on:** March 22, 2010, 09:32:25 am »

Quote from: koso on March 22, 2010, 09:08:49 am

I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.

And btw. Loaden, your hash function is quiet complicated for this purpose, especially when list of possible values is so small. Try this:

Code
namespace std
{
template<> struct hash< wxString >
{
    size_t operator()( const wxString& x ) const
    {
        unsigned long __h = 0;
	unsigned long len = __h  = 128 ^ x.Len();
	
	const wxChar* p = x.c_str();
	
	for (long i = 1; i < len; i+=5)
            __h = __h ^ p[i];
	
        return size_t(__h);
    }
};
}

Thanks! What mean is unsigned long len = __h = 128 ^ x.Len();
This code will crash in my demo.
I changed to :

Code

namespace std
{
template<> struct hash<wxString>
{
    size_t operator()(const wxString& x) const
    {
        size_t len = x.length();
        unsigned long __h = 128 ^ len;
        const wxChar* p = x.c_str();
        for (size_t i = 0; i < len; i += 5)
            __h = __h ^ p[i];
        return size_t(__h);
    }
};
}

This is the lastest report:

Code

stdMap 1 : 1812
stdMap 2 : 1797
stdMap 3 : 1828
stdMap 4 : 1797
stdMap 5 : 1812

stdMap Total Time: 9046
stdMap Average Time: 1809
------------------------------

hashMap 1 : 469
hashMap 2 : 453
hashMap 3 : 469
hashMap 4 : 468
hashMap 5 : 454

hashMap Total Time: 2313
hashMap Average Time: 462
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.74461

koso · « **Reply #24 on:** March 22, 2010, 09:47:15 am »

Yes, there was typo

Btw. it can be even better, but you should analyze, how it iss mostly used in code:

1. when you test this code with some string, which is "mapped", std::map will be slower. But once you try it with some random string, std::map can be faster (depends how well is writen hash function) - So you should find out, how often is it searching for known string, and how often for some other...

2. hash function should be faster, that string comparision used in std::map, so for example, when you try to search for string XXX, wxString comparator will soon find out, that there is no string beggining with XXX and will end -> hash function should use the same system (i.e. check, if first char is "_" and if not, then add to h some penalty value and imediately end hashing)

Loaden · « **Reply #25 on:** March 22, 2010, 09:48:09 am »

Quote from: ollydbg on March 22, 2010, 09:15:58 am

Quote from: koso on March 22, 2010, 09:08:49 am
I have recently used unordered_map and it is very fast. There was only one problem .. it is available in newer comilators (VC2008 with installed Feature Pack ang gcc >4.3), so there must be some kind of fallback solution. In my case i used map, besace they both have compatible interfaces. So you will need only some "macro checks" for preprocessor.
en, if we use unordered_map in the CodeCompletion plugin, then Codeblocks can only be built from gcc > 4.3. This means we abandon several old gcc version.

Code

#if __GNUC_MINOR__ < 4 || (__GNUC_MINOR__ >= 4 && __GNUC_PATCHLEVEL__ < 3)
    typedef std::map<wxString, wxString> wxStringHashMap;
#else
    typedef std::unordered_map<wxString, wxString> wxStringHashMap;
#endif

oBFusCATed · « **Reply #26 on:** March 22, 2010, 10:18:32 am »

unordered_map is available on every gcc >= 4.0 (I think), 4.1.2 have it for sure!

koso · « **Reply #27 on:** March 22, 2010, 11:53:47 am »

Littlee modification. This version is slower, but should be better when searching mostly for not mapped strings. This version has lower probability for collision -> but it needs to be tweaked using full "key" set. Each mapped key should have different hash value.

Code

namespace std
{
template<> struct hash< wxString > {
    size_t operator()( const wxString& x ) const {
	unsigned long len  = x.Len();
	size_t __h  = len;
	
	const wxChar* p = x.c_str();
	
	for (unsigned long i = 1; i < len; i+=6) {
	  __h  ^= p[i];
	  __h <<= 4;
	}
        return __h;
    }
};
}

PS: before cycle can be added this code:

Code

if ( p[0]!=wxChar('_') ) {
	  return size_t(128 ^ __h);
	}

this will rapidly decrease search time for random strings, BUT only if there are no mapped strings without "_" at the begginning. This conforms your test example, but probably wont real usage.

ollydbg · « **Reply #28 on:** March 22, 2010, 12:30:20 pm »

@koso
I'm not familiar with the customized hash function, but I'm familiar with the situation when we need a macro check. So, please correct me if I'm wrong.

Then only time a wxString will be checked in Tokenizer class is the case like below:

Code

    if (c == '_' || wxIsalpha(c))
    {
        // keywords, identifiers, etc.

        // operator== is cheaper than wxIsalnum, also MoveToNextChar already includes IsEOF
        while (    ( (c == '_') || (wxIsalnum(c)) )
               &&  MoveToNextChar() )
            c = CurrentChar(); // repeat

        if (IsEOF())
            return wxEmptyString;

        needReplace = true;
        str = m_Buffer.Mid(start, m_TokenIndex - start);
    }

koso · « **Reply #29 on:** March 22, 2010, 03:11:01 pm »

Quote from: ollydbg on March 22, 2010, 12:30:20 pm

@koso
I'm not familiar with the customized hash function, but I'm familiar with the situation when we need a macro check. So, please correct me if I'm wrong.

Then only time a wxString will be checked in Tokenizer class is the case like below:

So if I understand it good, you are checking all words in source code (which can potentially be macro) for replacement -> so there will be many checks, but only small amount of them will lead to success.

More interesting question is, it the list/set of macros beeing replaced is constant, or it is dependent on user config? (Is this related to CC plugin configuration, where user can add custom replacement rules?). First case would simplify many things, but second will make it little complicated => you won`t be able to add there some "super-fast" filter functions, which will eliminate most false checks. (example of this is check for "_" at the begining -> it is very fast, but once user can add custom macros, it will be probably even slower than without it).

Little theory:

1. searching in <map> is based on string comparision in R-B trees. Count of operation depends on height of tree, which will be soomething like log₂ (number of macros) - in your case not more than 5 comparisions of wxString.

2. <unordered_map> with theoreticaly good hash function will run hasher function, and then 0 or 1 string comparission.

So if hash function is faster than aprox. 3 or 4 wxString comparision, you should get better results with hashing. This looks like no problem, but effectivity of string comparision is depedent on string length -> if you are checking many short words, it will be realy hard to design better hash function. And also, functions presented in this topic where more faster than good hashing function, so <unordered_map> will have to use more comparision operations...

Loaden · « **Reply #30 on:** March 22, 2010, 05:18:25 pm »

Hi, I write a new wxStringHashMap, It's use random key, like this:

Code

#include <iostream>
#include <map>
#include <vector>
#include <stdlib.h>
#include <conio.h>
#include <time.h>

#include <wx/string.h>
#include <wx/timer.h>
#include <wx/hashmap.h>

const size_t LOOPS = 9;
const size_t TIMES = 10000000;
const wxString FINDKEY = _T("_GLIBCXX_END_NAMESPACE_TR1");

class HashForWxStringMap
{
public:
    HashForWxStringMap() {}
    unsigned long operator()(const wxString& x) const
    {
        size_t len = x.length();
        return size_t((128 ^ len) + *((unsigned int*)(x.c_str() + len - 4)));
    }
    HashForWxStringMap& operator=(const HashForWxStringMap&) { return *this; }
};

class EqualForWxStringMap
{
public:
    EqualForWxStringMap() { }
    bool operator()(const wxString& a, const wxString& b) const
    { return a == b; }

    EqualForWxStringMap& operator=(const EqualForWxStringMap&) { return *this; }
};

WX_DECLARE_HASH_MAP(wxString, wxString, HashForWxStringMap, EqualForWxStringMap, HashForWxStringMapMap);
//WX_DECLARE_STRING_HASH_MAP(wxString, HashForWxStringMapMap);

typedef std::vector<wxString> RandomString;
const wxChar CCH[] = _T("_0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_");
const int RANDOM_NUM = 1000000;

void InitRandomString(RandomString& rs)
{
    for (int i = 0; i < RANDOM_NUM + 1; ++i)
    {
        int len = 1 + rand() % (sizeof(CCH) / sizeof(wxChar) - 1);
        wxChar* buf = new wxChar[len + 1];
        for (int j = 0; j < len; ++j)
        {
            int x = rand() % (sizeof(CCH) / sizeof(wxChar) - 1);
            buf[j] = CCH[x];
        }
        buf[len] = 0;
//        std::wcout << buf << std::endl;
        rs.push_back(buf);
        delete[] buf;
    } 
}

int main()
{
    srand((unsigned)time(NULL));
    std::cout << "Please wait for random string..." << std::endl;
    std::cout << std::endl;
    
    RandomString rs;
    InitRandomString(rs);
    
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    stdMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
    stdMap[_T("_STD_END2")] = _T("2}");

    HashForWxStringMapMap hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");
    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
    hashMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
    hashMap[_T("_STD_END2")] = _T("2}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(rs.at(i % RANDOM_NUM));
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(rs.at(i % RANDOM_NUM));
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

Code

Please wait for random string...

stdMap 1 : 1265
stdMap 2 : 1266
stdMap 3 : 984
stdMap 4 : 1281
stdMap 5 : 1016
stdMap 6 : 1250
stdMap 7 : 828
stdMap 8 : 1234
stdMap 9 : 1125

stdMap Total Time: 10249
stdMap Average Time: 1138
------------------------------

hashMap 1 : 250
hashMap 2 : 250
hashMap 3 : 297
hashMap 4 : 250
hashMap 5 : 250
hashMap 6 : 250
hashMap 7 : 250
hashMap 8 : 266
hashMap 9 : 250

hashMap Total Time: 2313
hashMap Average Time: 257
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.774165

Process returned 0 (0x0)   execution time : 16.000 s
Press any key to continue.

Loaden · « **Reply #31 on:** March 22, 2010, 05:24:01 pm »

If change the hash function TO:

Code

class HashForWxStringMap
{
public:
    HashForWxStringMap() {}
    unsigned long operator()(const wxString& x) const
    {
        size_t len = x.length();
        if (len >= sizeof(unsigned int) / sizeof(wxChar))
            return size_t((128 ^ len) + *((unsigned int*)(x.c_str() + len - sizeof(unsigned int) / sizeof(wxChar))));
        else if (len >= sizeof(unsigned short) / sizeof(wxChar))
            return size_t((256 ^ len) + *((unsigned short*)(x.c_str() + len - sizeof(unsigned short) / sizeof(wxChar))));
        else
            return size_t((512 ^ len) + *(x.c_str() + len - sizeof(unsigned short)));
    }
    HashForWxStringMap& operator=(const HashForWxStringMap&) { return *this; }
};

AND ONLY 9 replacement, report here:

Code

Please wait for random string...

stdMap 1 : 609
stdMap 2 : 610
stdMap 3 : 562
stdMap 4 : 1063
stdMap 5 : 562
stdMap 6 : 610
stdMap 7 : 1047
stdMap 8 : 765
stdMap 9 : 610

stdMap Total Time: 6438
stdMap Average Time: 715
------------------------------

hashMap 1 : 250
hashMap 2 : 328
hashMap 3 : 250
hashMap 4 : 250
hashMap 5 : 250
hashMap 6 : 234
hashMap 7 : 250
hashMap 8 : 250
hashMap 9 : 250

hashMap Total Time: 2312
hashMap Average Time: 256
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.641958

Loaden · « **Reply #32 on:** March 22, 2010, 06:05:58 pm »

Here is final report:

Code

Please wait for random string...

stdMap 1 : 1032
stdMap 2 : 766
stdMap 3 : 609
stdMap 4 : 766
stdMap 5 : 578
stdMap 6 : 609
stdMap 7 : 1032
stdMap 8 : 1015
stdMap 9 : 609

stdMap Total Time: 7016
stdMap Average Time: 779
------------------------------

hashMap 1 : 250
hashMap 2 : 250
hashMap 3 : 250
hashMap 4 : 250
hashMap 5 : 250
hashMap 6 : 250
hashMap 7 : 250
hashMap 8 : 235
hashMap 9 : 250

hashMap Total Time: 2235
hashMap Average Time: 248
------------------------------

(stdAverage - hashAverage) / stdAverage = 0.681643

Process returned 0 (0x0)   execution time : 12.703 s
Press any key to continue.

DEMO (-O2):

Code

#include <iostream>
#include <map>
#include <vector>
#include <stdlib.h>
#include <conio.h>
#include <time.h>

#include <wx/string.h>
#include <wx/timer.h>
#include <wx/hashmap.h>

const size_t LOOPS = 9;
const size_t TIMES = 10000000;
const wxChar CCH[] = _T("_0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_");
const int RANDOM_NUM = 1000000;
typedef std::vector<wxString> RandomString;

class HashForWxStringMap
{
public:
    HashForWxStringMap() {}
    unsigned long operator()(const wxString& x) const
    {
        const size_t len = x.length();
        const size_t intWxChar = sizeof(unsigned int) / sizeof(wxChar);
        const size_t shortWxChar = sizeof(unsigned short) / sizeof(wxChar);
        if (len >= intWxChar)
            return size_t((128 ^ len) + *((unsigned int*)(x.c_str() + len - intWxChar)));
        else if (len >= shortWxChar)
            return size_t((256 ^ len) + *((unsigned short*)(x.c_str() + len - shortWxChar)));
        else
            return size_t((512 ^ len) + *(x.c_str() + len - 1));
    }
    HashForWxStringMap& operator=(const HashForWxStringMap&) { return *this; }
};

class EqualForWxStringMap
{
public:
    EqualForWxStringMap() { }
    bool operator()(const wxString& a, const wxString& b) const
    { return a == b; }

    EqualForWxStringMap& operator=(const EqualForWxStringMap&) { return *this; }
};

WX_DECLARE_HASH_MAP(wxString, wxString, HashForWxStringMap, EqualForWxStringMap, wxStringHashMap);

void InitRandomString(RandomString& rs)
{
    for (int i = 0; i < RANDOM_NUM + 1; ++i)
    {
        int len = 1 + rand() % (sizeof(CCH) / sizeof(wxChar) - 1);
        wxChar* buf = new wxChar[len + 1];
        for (int j = 0; j < len; ++j)
        {
            int x = rand() % (sizeof(CCH) / sizeof(wxChar) - 1);
            buf[j] = CCH[x];
        }
        buf[len] = 0;
        rs.push_back(buf);
        delete[] buf;
    } 
}

int main()
{
    srand((unsigned)time(NULL));
    std::cout << "Please wait for random string..." << std::endl;
    std::cout << std::endl;
    
    RandomString rs;
    InitRandomString(rs);
    
    std::map<wxString, wxString> stdMap;
    stdMap[_T("_GLIBCXX_STD")] = _T("std");
    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    stdMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    stdMap[_T("_STD_END")] = _T("}");
//    stdMap[_T("_GLIBCXX_STD2")] = _T("2std");
//    stdMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
//    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
//    stdMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
//    stdMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
//    stdMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
//    stdMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
//    stdMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
//    stdMap[_T("_STD_END2")] = _T("2}");

    wxStringHashMap hashMap;
    hashMap[_T("_GLIBCXX_STD")] = _T("std");
    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE")] = _T("+namespace");
    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE")] = _T("}");
    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR1")] = _T("}");
    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR1")] = _T("-namespace tr1 {");
    hashMap[_T("_STD_BEGIN")] = _T("-namespace std {");
    hashMap[_T("_STD_END")] = _T("}");
//    hashMap[_T("_GLIBCXX_STD2")] = _T("2std");
//    hashMap[_T("_GLIBCXX_BEGIN_NESTED_NAMESPACE2")] = _T("2+namespace");
//    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE2")] = _T("2+namespace");
//    hashMap[_T("_GLIBCXX_END_NESTED_NAMESPACE2")] = _T("2}");
//    hashMap[_T("_GLIBCXX_END_NAMESPACE2")] = _T("2}");
//    hashMap[_T("_GLIBCXX_END_NAMESPACE_TR12")] = _T("2}");
//    hashMap[_T("_GLIBCXX_BEGIN_NAMESPACE_TR12")] = _T("2-namespace tr1 {");
//    hashMap[_T("_STD_BEGIN2")] = _T("2-namespace std {");
//    hashMap[_T("_STD_END2")] = _T("2}");

    int stdTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) stdMap.find(rs.at(i % RANDOM_NUM));
        sw.Pause();
        int time = sw.Time();
        std::cout << "stdMap " << i + 1 << " : " << time << std::endl;
        stdTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "stdMap Total Time: " << stdTotalTime << std::endl;
    double stdAverage = stdTotalTime / LOOPS;
    std::cout << "stdMap Average Time: " << stdAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    int hashTotalTime = 0;
    for (size_t i = 0; i < LOOPS; ++i)
    {
        wxStopWatch sw;
        for (size_t j = 0; j < TIMES; ++j) hashMap.find(rs.at(i % RANDOM_NUM));
        sw.Pause();
        int time = sw.Time();
        std::cout << "hashMap " << i + 1 << " : " << time << std::endl;
        hashTotalTime += time;
    }
    std::cout << std::endl;
    std::cout << "hashMap Total Time: " << hashTotalTime << std::endl;
    double hashAverage = hashTotalTime / LOOPS;
    std::cout << "hashMap Average Time: " << hashAverage << std::endl;
    std::cout << "------------------------------\n" << std::endl;

    std::cout << "(stdAverage - hashAverage) / stdAverage = " << (stdAverage - hashAverage) / stdAverage;
    std::cout << std::endl;

    return 0;
}

[attachment deleted by admin]

koso · « **Reply #33 on:** March 22, 2010, 06:10:04 pm »

@Loaden: you made the hash function quiet complicated, but i dont understand, what were you trying to do. Are you aware of sizeof(int)=sizeof(wxChar)=4 .. on 32 bit system with UNICODE wxWidgets? At this configuration, your hash function only takes last character of string.

Loaden · « **Reply #34 on:** March 22, 2010, 06:24:57 pm »

Quote from: koso on March 22, 2010, 06:10:04 pm

@Loaden: you made the hash function quiet complicated, but i dont understand, what were you trying to do. Are you aware of sizeof(int)=sizeof(wxChar)=4 .. on 32 bit system with UNICODE wxWidgets? At this configuration, your hash function only takes last character of string.

On 32 bit windows OS, sizeof(wxChar) == 2, but sizeof(int) = 4;
Yes, I only only judge the length and the last one (or two) character, Because has enough for a replacement rules.
You can look at the rules of these replacement：

Code

<TOKEN_REPLACEMENTS>
			<ssmap>
				<QT_BEGIN_HEADER>
					<![CDATA[]]>
				</QT_BEGIN_HEADER>
				<QT_END_HEADER>
					<![CDATA[]]>
				</QT_END_HEADER>
				<Q_GADGET>
					<![CDATA[]]>
				</Q_GADGET>
				<Q_INLINE_TEMPLATE>
					<![CDATA[]]>
				</Q_INLINE_TEMPLATE>
				<Q_OBJECT>
					<![CDATA[]]>
				</Q_OBJECT>
				<Q_OUTOFLINE_TEMPLATE>
					<![CDATA[]]>
				</Q_OUTOFLINE_TEMPLATE>
				<Q_PACKED>
					<![CDATA[]]>
				</Q_PACKED>
				<Q_REQUIRED_RESULT>
					<![CDATA[]]>
				</Q_REQUIRED_RESULT>
				<_GLIBCXX_BEGIN_NAMESPACE>
					<![CDATA[+namespace]]>
				</_GLIBCXX_BEGIN_NAMESPACE>
				<_GLIBCXX_BEGIN_NAMESPACE_TR1>
					<![CDATA[-namespace tr1 {]]>
				</_GLIBCXX_BEGIN_NAMESPACE_TR1>
				<_GLIBCXX_BEGIN_NESTED_NAMESPACE>
					<![CDATA[+namespace]]>
				</_GLIBCXX_BEGIN_NESTED_NAMESPACE>
				<_GLIBCXX_END_NAMESPACE>
					<![CDATA[}]]>
				</_GLIBCXX_END_NAMESPACE>
				<_GLIBCXX_END_NAMESPACE_TR1>
					<![CDATA[}]]>
				</_GLIBCXX_END_NAMESPACE_TR1>
				<_GLIBCXX_END_NESTED_NAMESPACE>
					<![CDATA[}]]>
				</_GLIBCXX_END_NESTED_NAMESPACE>
				<_GLIBCXX_STD>
					<![CDATA[std]]>
				</_GLIBCXX_STD>
				<_STD_BEGIN>
					<![CDATA[-namespace std {]]>
				</_STD_BEGIN>
				<_STD_END>
					<![CDATA[}]]>
				</_STD_END>
				<emit>
					<![CDATA[]]>
				</emit>
			</ssmap>
		</TOKEN_REPLACEMENTS>

thomas · « **Reply #35 on:** March 23, 2010, 12:18:54 pm »

I think you're thinking too complicated

Don't think in wxChars. All that Unicode shit only makes people's heads spin and renders code unnecessarily complex. Think in bytes. Sometimes you have to think in wxChars, but don't do it when it is not necessary. After all, wxChars are only bytes. On some systems, wxChars are several bytes, and on some they are just one byte. To a hash function it does not really matter, you can just treat it as a BLOB of size str.length() * sizeof(wxChar) bytes.

Personally, I wouldn't make it much more complicated than something like that:
unsigned long djb2mod(const unsigned char *str, unsigned int len) { unsigned long hash = 9737333; while(len--) hash = 33 * hash + *str++; return hash; } class HashForWxStringMap { ... unsigned long operator()(const wxString& x) const { return ::djb2mod((const unsigned char*) x.data(), x.length()*sizeof(wxChar); } };

Did not try whether what I wrote there compiles at all, hopefully there aren't too many severe mistakes in it

koso · « **Reply #36 on:** March 23, 2010, 01:10:06 pm »

I alsa think he complicates it too much

There is no need to make it so complex. We have seen, that even lenght and one character are enought effective. I suggest to use only fast arithmetic operations (XOR, bit rotations).

About using bytes, I dont think it is good idea becase when strings are made of ANSI characters (in programming probably 99,9999%) then there are many empty bytes. And also it is much more effective working with native data types (32 bit system -> 32 bit integer), than with bytes. So ideal will be to convert/retype wxChar to (unsigned long) and make some arithmetic operations -> and that was used in previous versions, and also in Loadens, but there is it little lost

thomas · « **Reply #37 on:** March 27, 2010, 03:12:17 pm »

Quote from: koso on March 23, 2010, 01:10:06 pm

About using bytes, I dont think it is good idea becase when strings are made of ANSI characters (in programming probably 99,9999%) then there are many empty bytes.

I beg to differ

Bytes give more iterations in the hash function which is a good thing, as it reduces the likelihood of collisions due to imperfect mixing. There will obviously still be collisions due to the laws of information theory (mapping many to few must have collisions), but those cannot be avoided.
The computional difference is a few hundred or thousand cycles for reasonably long words with a reasonably efficient hash function, which is comparable with the extra cache misses you'll get from collisions due to a less mixed hash function, or comparable with one heap allocation, which we implicitely do many million times during parsing.

koso · « **Reply #38 on:** March 27, 2010, 03:51:05 pm »

Let f be our hash function, which maps all mapped strings to a different value. When searching for some random string s, there are two possible situations:

1. f(s) is found in table (or success or collision). Searching procedure must compare mapped string with string we are trying to find.
2. f(s) is not in table -> we instanly know, that string is not mapped

You are right, that cryptographical-strong hash function function will have less collision, but contrary to that, function will be much more complex. From my example can be clearly seen, that penalty for collision is one extra string comparission. Once comparission is faster than strong hash function, there will be no reason to use it, becase it will be slower, even where there will be much less collisions.

I havent tested it, bot my opinion is, that your hash function is more complex than string comparission. Your function has more cycles, and uses less effective operations (multiplication). Also string comparission is very fast once strings are random, becase procedure ends at first character they differ.

thomas · « **Reply #39 on:** March 27, 2010, 10:31:12 pm »

Quote from: koso on March 27, 2010, 03:51:05 pm

1. f(s) is found in table (or success or collision). Searching procedure must compare mapped string with string we are trying to find.
2. f(s) is not in table -> we instanly know, that string is not mapped

2. is correct, but 1. is not. If you have collisions, there is either a linked list in the respective slot or some other scheme (linear/quadratic probing) is used to place values in different slots.
In either case, this is guaranteed to take more than one string comparison and more than one cache miss on the average. A string compare is a triple dereference, which usually means 3 cache misses if you have enough different strings. Three cache misses are approximately 1500 cpu cycles gone, and this isn't event taking the actual compare in account. You can run a lot of hash function in 1500 cycles

Quote

I havent tested it, bot my opinion is, that your hash function is more complex than string comparission. Your function has more cycles, and uses less effective operations (multiplication).

This function (which is a modification of a funciton written by D.J. Bernstein btw, not me -- I only replaced Wilson's 9th primeth recurrence with the 12th recurrence to flip more bits per step) is not as inefficient as you think, as the compiler will transform the multiplication to a shift and add. Some implementations "optimize" the source code to contain a shift directly, but I prefer to write out he multiplication, which is much cleaner code and exactly the same binary.
It is also not cryptographically secure, but that is not the function's intent. The funciton's intent is to hash arbitrary data nicely with few cpu cycles, and it does that quite well.

koso · « **Reply #40 on:** March 27, 2010, 11:11:36 pm »

Quote from: thomas on March 27, 2010, 10:31:12 pm

2. is correct, but 1. is not. If you have collisions, there is either a linked list in the respective slot or some other scheme (linear/quadratic probing) is used to place values in different slots.
In either case, this is guaranteed to take more than one string comparison and more than one cache miss on the average. A string compare is a triple dereference, which usually means 3 cache misses if you have enough different strings. Three cache misses are approximately 1500 cpu cycles gone, and this isn't event taking the actual compare in account. You can run a lot of hash function in 1500 cycles

You have probably missed my assumption that mapped strings (strings stored in hash table) have all different hash values. There are most probably linked lists (called tokens), but all with only one string. This assumption is broken once user adds some custom replacing rules with "not good" strings.

And about cache I must confess I don't understand what you think with "cache" ... it is something related to wxString? I have not much experience with wx programming, only with standard "direct memory" accessible c/c++ strings ...

And you are right, that multiplication with 33 is not as complex, as i wrote. Now I realized that 33 has hamming weight only 2. Your hash function usable as a general purpose hash function for wxString, but I still thinks It can be more effective, once hash function is optimized to C::B needs (i.e. use knowledge, that mapped string are mostly uppercase, and searched string are mostly lowercase, and at least in my case, they are much shorter).

PS: but even quiet bad hash function seems to by much faster, than used <map>. Used hash function is only detail which can be tweaked later. It will be even easier with testing on real projects, than using synthetic tests and theoretical assumptions

News:

Author Topic: patch : from std::map replace to hashmap (Read 45837 times)