Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

ParserThread - switch on token strings

(1/2) > >>

Alpha:
In ParserThread::DoParse() there are a number of string comparisons.  What if you could just send it through a switch?

--- Code: ---#include "ugly_macros.h"
switch (HashStr(token))
{
    case ID('f','o','r'):
        break;
    case ID('e','l','s','e'):
        break;
    case ID('e','n','u','m'):
        break;
    default:
        break;
}

--- End code ---
Yes, sort of ugly, but it works.  (If C++11 constexpr were allowed, it could be much more beautiful :) .)  I constructed the following macro/template hacks for some other project, and thought they might be useful here.  However, I do not currently have time to test integrating it myself.

Is it actually faster?  I have no idea...


--- Code: (ugly_macros.h/cpp) ---/* helper, truncate to 20 args */
#define _GET_20_ARGS(                                                                                  \
    _0,  _1,  _2,  _3,  _4,  _5,  _6,  _7,  _8,  _9, _10, _11, _12, _13, _14, _15, _16, _17, _18, _19, \
    ... )                                                                                              \
    _0,  _1,  _2,  _3,  _4,  _5,  _6,  _7,  _8,  _9, _10, _11, _12, _13, _14, _15, _16, _17, _18, _19

/* results in up to 20 arguments, padded with null characters
 * requires at least 1 argument */
#define GET_20_ARGS(...)                                                             \
    _GET_20_ARGS( __VA_ARGS__, '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', \
                   '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0' )

/* user interface */
#define ID(...) CompileStrHasher<GET_20_ARGS(__VA_ARGS__)>::value

/* hash function (FNV in reverse) */
template< char  _0, char  _1, char  _2, char  _3, char  _4, char  _5, char  _6, char  _7, char  _8, char  _9,
          char _10, char _11, char _12, char _13, char _14, char _15, char _16, char _17, char _18, char _19  >
struct CompileStrHasher
{
    enum
    {
        value = (_0 ^ ID(_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, _14, _15, _16, _17, _18, _19)) * 16777619u
    };
};

/* seed value */
template<>
struct CompileStrHasher<GET_20_ARGS('\0')>
{
    enum
    {
        value = 2166136261u
    };
};



// example usage

#if __cplusplus < 201103L
// flat solution (run-time execution)
unsigned HashStr(const char* str)
{
    unsigned value = 2166136261u;
    if (!*str)
        return value;
    const char* end = str;
    for (int i = 0; *(end + 1) && i < 20; ++end, ++i)
        ;
    do
    {
        value = (*end ^ value) * 16777619u;
    } while (str != (end--));
    return value;
}
#else
// recursive C++11 constexpr (compile-time execution)
constexpr unsigned HashStr(const char* str, int depth = 0)
{
    return (*str && depth < 20) ? (*str ^ HashStr(str+1, depth+1)) * 16777619u : 2166136261u;
}
#endif

int main()
{
    switch (HashStr("str"))
    {
        case ID('s','t','r'):
            return 0;

        default:
            break;
    }
    return 1;
}

--- End code ---
(Note: I did not spend time trying to optimize the run-time counterpart much.)

oBFusCATed:
I don't know the exact place you're trying to optimize, but I'm almost 100% sure that HashStr will be always called with a dynamic string not known at compile time. So you'll have a loop there.

Keep in mind that all the strxxx functions are pretty well optimized in the libc! And a simple for will be slower.

Alpha:
Yes, HashStr() will have to be a run time cost.  However, if the switch itself saves enough over repeated if/else if's on token, it might be worth it.

Okay, no loops :P ... sort of.  (except strlen())

--- Code: ---#include <cstring>
unsigned HashStr(const char* str)
{
    unsigned value = 2166136261u;
    switch (strlen(str))
    {
        default:
        case 20: value = (str[19] ^ value) * 16777619u;
        case 19: value = (str[18] ^ value) * 16777619u;
        case 18: value = (str[17] ^ value) * 16777619u;
        case 17: value = (str[16] ^ value) * 16777619u;
        case 16: value = (str[15] ^ value) * 16777619u;
        case 15: value = (str[14] ^ value) * 16777619u;
        case 14: value = (str[13] ^ value) * 16777619u;
        case 13: value = (str[12] ^ value) * 16777619u;
        case 12: value = (str[11] ^ value) * 16777619u;
        case 11: value = (str[10] ^ value) * 16777619u;
        case 10: value = (str[ 9] ^ value) * 16777619u;
        case  9: value = (str[ 8] ^ value) * 16777619u;
        case  8: value = (str[ 7] ^ value) * 16777619u;
        case  7: value = (str[ 6] ^ value) * 16777619u;
        case  6: value = (str[ 5] ^ value) * 16777619u;
        case  5: value = (str[ 4] ^ value) * 16777619u;
        case  4: value = (str[ 3] ^ value) * 16777619u;
        case  3: value = (str[ 2] ^ value) * 16777619u;
        case  2: value = (str[ 1] ^ value) * 16777619u;
        case  1: value = (str[ 0] ^ value) * 16777619u;
        case  0:
            return value;
    }
}

--- End code ---


Maybe I can find time this weekend to run a few tests.  We shall see...

oBFusCATed:
Then how would you handle hash collisions?

ollydbg:
Maybe some off topic, but to accelerate the token compare speed, I think we need a fast lexer, I have both C preprocessor grammar for both Quex - A Fast Universal Lexical Analyzer Generator and re2c scanner generator | SourceForge.net, they can generate very fast lexer, which not only get a lexeme string (like what currently we do in Tokenizer::GetToken()), but also an ID value. So, finally we can compare all the keyword tokens by IDs, not the literal strings. But using them will change our CC's code a lot :(

Navigation

[0] Message Index

[#] Next page

Go to full version