ParserThread - switch on token strings

Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

(1/2) > >>

Alpha:
In ParserThread::DoParse() there are a number of string comparisons. What if you could just send it through a switch?

--- Code: ---#include "ugly_macros.h"
switch (HashStr(token))
{
case ID('f','o','r'):
break;
case ID('e','l','s','e'):
break;
case ID('e','n','u','m'):
break;
default:
break;
}

--- End code ---
Yes, sort of ugly, but it works. (If C++11 constexpr were allowed, it could be much more beautiful :) .) I constructed the following macro/template hacks for some other project, and thought they might be useful here. However, I do not currently have time to test integrating it myself.

Is it actually faster? I have no idea...

--- Code: (ugly_macros.h/cpp) ---/* helper, truncate to 20 args */
#define _GET_20_ARGS( \
_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, _14, _15, _16, _17, _18, _19, \
... ) \
_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, _14, _15, _16, _17, _18, _19

/* results in up to 20 arguments, padded with null characters
* requires at least 1 argument */
#define GET_20_ARGS(...) \
_GET_20_ARGS( __VA_ARGS__, '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', \
'\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0' )

/* user interface */
#define ID(...) CompileStrHasher<GET_20_ARGS(__VA_ARGS__)>::value

/* hash function (FNV in reverse) */
template< char _0, char _1, char _2, char _3, char _4, char _5, char _6, char _7, char _8, char _9,
char _10, char _11, char _12, char _13, char _14, char _15, char _16, char _17, char _18, char _19 >
struct CompileStrHasher
{
enum
{
value = (_0 ^ ID(_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, _14, _15, _16, _17, _18, _19)) * 16777619u
};
};

/* seed value */
template<>
struct CompileStrHasher<GET_20_ARGS('\0')>
{
enum
{
value = 2166136261u
};
};

// example usage

#if __cplusplus < 201103L
// flat solution (run-time execution)
unsigned HashStr(const char* str)
{
unsigned value = 2166136261u;
if (!*str)
return value;
const char* end = str;
for (int i = 0; *(end + 1) && i < 20; ++end, ++i)
;
do
{
value = (*end ^ value) * 16777619u;
} while (str != (end--));
return value;
}
#else
// recursive C++11 constexpr (compile-time execution)
constexpr unsigned HashStr(const char* str, int depth = 0)
{
return (*str && depth < 20) ? (*str ^ HashStr(str+1, depth+1)) * 16777619u : 2166136261u;
}
#endif

int main()
{
switch (HashStr("str"))
{
case ID('s','t','r'):
return 0;

default:
break;
}
return 1;
}

--- End code ---
(Note: I did not spend time trying to optimize the run-time counterpart much.)

oBFusCATed:
I don't know the exact place you're trying to optimize, but I'm almost 100% sure that HashStr will be always called with a dynamic string not known at compile time. So you'll have a loop there.

Keep in mind that all the strxxx functions are pretty well optimized in the libc! And a simple for will be slower.

Alpha:
Yes, HashStr() will have to be a run time cost. However, if the switch itself saves enough over repeated if/else if's on token, it might be worth it.

Okay, no loops :P ... sort of. (except strlen())

--- Code: ---#include <cstring>
unsigned HashStr(const char* str)
{
   unsigned value = 2166136261u;
   switch (strlen(str))
   {
   default:
   case 20: value = (str[19] ^ value) * 16777619u;
   case 19: value = (str[18] ^ value) * 16777619u;
   case 18: value = (str[17] ^ value) * 16777619u;
   case 17: value = (str[16] ^ value) * 16777619u;
   case 16: value = (str[15] ^ value) * 16777619u;
   case 15: value = (str[14] ^ value) * 16777619u;
   case 14: value = (str[13] ^ value) * 16777619u;
   case 13: value = (str[12] ^ value) * 16777619u;
   case 12: value = (str[11] ^ value) * 16777619u;
   case 11: value = (str[10] ^ value) * 16777619u;
   case 10: value = (str[ 9] ^ value) * 16777619u;
   case 9: value = (str[ 8] ^ value) * 16777619u;
   case 8: value = (str[ 7] ^ value) * 16777619u;
   case 7: value = (str[ 6] ^ value) * 16777619u;
   case 6: value = (str[ 5] ^ value) * 16777619u;
   case 5: value = (str[ 4] ^ value) * 16777619u;
   case 4: value = (str[ 3] ^ value) * 16777619u;
   case 3: value = (str[ 2] ^ value) * 16777619u;
   case 2: value = (str[ 1] ^ value) * 16777619u;
   case 1: value = (str[ 0] ^ value) * 16777619u;
   case 0:
   return value;
   }
}

--- End code ---

Maybe I can find time this weekend to run a few tests. We shall see...

oBFusCATed:
Then how would you handle hash collisions?

ollydbg:
Maybe some off topic, but to accelerate the token compare speed, I think we need a fast lexer, I have both C preprocessor grammar for both Quex - A Fast Universal Lexical Analyzer Generator and re2c scanner generator | SourceForge.net, they can generate very fast lexer, which not only get a lexeme string (like what currently we do in Tokenizer::GetToken()), but also an ID value. So, finally we can compare all the keyword tokens by IDs, not the literal strings. But using them will change our CC's code a lot :(

Navigation

[0] Message Index

[#] Next page

Go to full version