@ollydbg
I copied the quex generated code on your repo to eliminate my custom tokenizer and use something that would simplify the process as improve it. Also I was studying the modifications you did on the tokenizer class. Is there a simpler example to follow? If not I guess I will have to study it more and search the documentation
I'm grad that you use quex generated parser now.
1, all the lexer code was under the folder: /cppparser/lexer, they were generated by the file cpp.qx(this is called lexer grammar file), and the running command to generate is "cpp.bat" (windows command to call quex library). If you just want to use the generated lexer, you do not need to install python and quex, because the generated code has contains all the files necessary to compile. If you want to modify the cpp.qx file, then you need to install python and quex.
2, I use the mode "pointing to a buffer" in the generated lexer. which is, when it initialized, I set a NULL pointer to the lexer.
m_Quex((QUEX_TYPE_CHARACTER*)s_QuexBuffer,4,(QUEX_TYPE_CHARACTER*)s_QuexBuffer+1)
this code is the initialization of the lexer, and s_QuexBuffer is infact a buffer of NULL.
QUEX_TYPE_CHARACTER Tokenizer::s_QuexBuffer[4] = {0,0,0,0};
3, When you own buffer is ready, I just "point to" it, see:
bool Tokenizer::ReadFile()
{
bool success = false;
cc_string fileName = cc_text("");
if (m_pLoader)
{
fileName = m_pLoader->fileName();
const char * pBuffer = m_pLoader->data();
m_BufferLen = m_pLoader->length();
if( m_BufferLen != 0)
success = true;
m_Quex.reset_buffer((QUEX_TYPE_CHARACTER*)pBuffer,
m_BufferLen+2,
(QUEX_TYPE_CHARACTER*)pBuffer+m_BufferLen+1);
(void)m_Quex.token_p_switch(&m_TokenBuffer[0]);
cout<< "set buffer size" << (int)QUEX_SETTING_BUFFER_SIZE <<endl;
return true;
}
the char buffer is loaded by "m_pLoader" which is a file loader. so, it contains two info, one is its buffer start address, the other is the length
const char * pBuffer = m_pLoader->data();
m_BufferLen = m_pLoader->length();
this is the Token address set by the user, so when you call
(void)m_Quex.token_p_switch(&m_TokenBuffer[0]);
this will let the lexer go one step and fill the Token.
4, my Tokenizer is modified from the Tokenizer class in CC's current implementation, but has a lot of things changed. The normal way to receive the Token is like below:
bool Tokenizer::FetchToken(RawToken * pToken)
{
(void)m_Quex.token_p_switch(pToken);
QUEX_TYPE_TOKEN_ID id = m_Quex.receive();
if( id == TKN_TERMINATION )
{
m_IsEOF = true;
return false;
}
return true;
}
you supply a token address, and the lexer fill the token. then, you can get the Token's Id and Token's text(if it is an identifier), also it's line and column information.
If you have some problems using quex, feel free to ask me.