Developer forums (C::B DEVELOPMENT STRICTLY!) > CodeCompletion redesign

Predictive recursive decent parser question about our build-in parser


We have such code in our parser:

--- Code: ---    while (m_Tokenizer.NotEOF() && IS_ALIVE)
        wxString token = m_Tokenizer.GetToken();
        if (token == ParserConsts::kw_class)
            if (m_Options.handleClasses)
                SkipToOneOfChars(ParserConsts::semicolonclbrace, true, true);

--- End code ---

While, the function "GetToken" just consume the Token from the Token Stream, and advance the Token position.

But I see for a standard implementation, it usually use "PeekToken", and consume the Token later. See here: Recursive descent parser implementation

Any ideas about the difference?

BTW: I have post a question in parsing - Are the peeked token or current token the same thing in recursive decent parser implementation? - Stack Overflow.

OK, in this clang based code browser: Parser.h source code [clang/include/clang/Parse/Parser.h] - Woboq Code Browser, I see such code and comments:

--- Code: ---class Parser : public CodeCompletionHandler {

  Preprocessor &PP;
  /// Tok - The current token we are peeking ahead.  All parsing methods assume
  /// that this is valid.
  Token Tok;
  // PrevTokLocation - The location of the token we previously
  // consumed. This token is used for diagnostics where we expected to
  // see a token following another token (e.g., the ';' at the end of
  // a statement).
SourceLocation PrevTokLocation;

  /// GetLookAheadToken - This peeks ahead N tokens and returns that token
  /// without consuming any tokens.  LookAhead(0) returns 'Tok', LookAhead(1)
  /// returns the token after Tok, etc.
  /// Note that this differs from the Preprocessor's LookAhead method, because
  /// the Parser always has one token lexed that the preprocessor doesn't.
  const Token &GetLookAheadToken(unsigned N) {
    if (N == 0 || return Tok;
    return PP.LookAhead(N-1);
  /// NextToken - This peeks ahead one token and returns it without
  /// consuming it.
  const Token &NextToken() {
    return PP.LookAhead(0);

--- End code ---

Especially the comments for the Parser::GetLookAheadToken() function, and I see that the Tok member variable is very important, and it stands for the "peeked token", this is the cache. This value is the same thing as I can see from the wikipedia's code: Recursive_descent_parser's C implementation

But in the Preprocessor(see the PP variable), the peeked token is the one in the Token queue, but this is the one we returned as next token (see const Token &NextToken()  in the above code snippet).

Now, back to our build-in parser, I think we may need a cache variable which store the value from the "m_Tokenizer.GetToken()", while we call "m_Tokenizer.PeekToken()", this actually means the "Token &NextToken()" as in clang's Parser implementation.

EDIT: this tutorial also shows the same method(cached Tok), see here: 2. Kaleidoscope: Implementing a Parser and AST — LLVM 5 documentation


[0] Message Index

Go to full version