We have such code in our parser:
while (m_Tokenizer.NotEOF() && IS_ALIVE)
{
wxString token = m_Tokenizer.GetToken();
if (token == ParserConsts::kw_class)
{
m_Str.Clear();
if (m_Options.handleClasses)
HandleClass(ctClass);
else
SkipToOneOfChars(ParserConsts::semicolonclbrace, true, true);
}
...
...
}
While, the function "GetToken" just consume the Token from the Token Stream, and advance the Token position.
But I see for a standard implementation, it usually use "PeekToken", and consume the Token later. See here: Recursive descent parser implementation (http://stackoverflow.com/questions/9814528/recursive-descent-parser-implementation)
Any ideas about the difference?
BTW: I have post a question in parsing - Are the peeked token or current token the same thing in recursive decent parser implementation? - Stack Overflow (http://stackoverflow.com/questions/42709627/are-the-peeked-token-or-current-token-the-same-thing-in-recursive-decent-parser).
OK, in this clang based code browser: Parser.h source code [clang/include/clang/Parse/Parser.h] - Woboq Code Browser (https://code.woboq.org/llvm/clang/include/clang/Parse/Parser.h.html#575), I see such code and comments:
class Parser : public CodeCompletionHandler {
...
...
Preprocessor &PP;
/// Tok - The current token we are peeking ahead. All parsing methods assume
/// that this is valid.
Token Tok;
// PrevTokLocation - The location of the token we previously
// consumed. This token is used for diagnostics where we expected to
// see a token following another token (e.g., the ';' at the end of
// a statement).
SourceLocation PrevTokLocation;
...
...
/// GetLookAheadToken - This peeks ahead N tokens and returns that token
/// without consuming any tokens. LookAhead(0) returns 'Tok', LookAhead(1)
/// returns the token after Tok, etc.
///
/// Note that this differs from the Preprocessor's LookAhead method, because
/// the Parser always has one token lexed that the preprocessor doesn't.
///
const Token &GetLookAheadToken(unsigned N) {
if (N == 0 || Tok.is(tok::eof)) return Tok;
return PP.LookAhead(N-1);
}
public:
/// NextToken - This peeks ahead one token and returns it without
/// consuming it.
const Token &NextToken() {
return PP.LookAhead(0);
}
Especially the comments for the Parser::GetLookAheadToken() function, and I see that the Tok member variable is very important, and it stands for the "peeked token", this is the cache. This value is the same thing as I can see from the wikipedia's code: Recursive_descent_parser's C implementation (https://en.wikipedia.org/wiki/Recursive_descent_parser#C_implementation)
But in the Preprocessor(see the PP variable), the peeked token is the one in the Token queue, but this is the one we returned as next token (see const Token &NextToken() in the above code snippet).
Now, back to our build-in parser, I think we may need a cache variable which store the value from the "m_Tokenizer.GetToken()", while we call "m_Tokenizer.PeekToken()", this actually means the "Token &NextToken()" as in clang's Parser implementation.
EDIT: this tutorial also shows the same method(cached Tok), see here: 2. Kaleidoscope: Implementing a Parser and AST — LLVM 5 documentation (http://llvm.org/docs/tutorial/LangImpl02.html)