Author Topic: Predictive recursive decent parser question about our build-in parser  (Read 21610 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6036
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
We have such code in our parser:

Code
    while (m_Tokenizer.NotEOF() && IS_ALIVE)
    {
        wxString token = m_Tokenizer.GetToken();
        if (token == ParserConsts::kw_class)
        {
            m_Str.Clear();
            if (m_Options.handleClasses)
                HandleClass(ctClass);
            else
                SkipToOneOfChars(ParserConsts::semicolonclbrace, true, true);
        }
        ...
        ...
    }

While, the function "GetToken" just consume the Token from the Token Stream, and advance the Token position.

But I see for a standard implementation, it usually use "PeekToken", and consume the Token later. See here: Recursive descent parser implementation

Any ideas about the difference?


BTW: I have post a question in parsing - Are the peeked token or current token the same thing in recursive decent parser implementation? - Stack Overflow.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6036
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: Predictive recursive decent parser question about our build-in parser
« Reply #1 on: March 11, 2017, 03:48:06 pm »
OK, in this clang based code browser: Parser.h source code [clang/include/clang/Parse/Parser.h] - Woboq Code Browser, I see such code and comments:

Code
class Parser : public CodeCompletionHandler {
...
...

  Preprocessor &PP;
  /// Tok - The current token we are peeking ahead.  All parsing methods assume
  /// that this is valid.
  Token Tok;
  // PrevTokLocation - The location of the token we previously
  // consumed. This token is used for diagnostics where we expected to
  // see a token following another token (e.g., the ';' at the end of
  // a statement).
SourceLocation PrevTokLocation;

...
...
  /// GetLookAheadToken - This peeks ahead N tokens and returns that token
  /// without consuming any tokens.  LookAhead(0) returns 'Tok', LookAhead(1)
  /// returns the token after Tok, etc.
  ///
  /// Note that this differs from the Preprocessor's LookAhead method, because
  /// the Parser always has one token lexed that the preprocessor doesn't.
  ///
  const Token &GetLookAheadToken(unsigned N) {
    if (N == 0 || Tok.is(tok::eof)) return Tok;
    return PP.LookAhead(N-1);
  }
public:
  /// NextToken - This peeks ahead one token and returns it without
  /// consuming it.
  const Token &NextToken() {
    return PP.LookAhead(0);
}

Especially the comments for the Parser::GetLookAheadToken() function, and I see that the Tok member variable is very important, and it stands for the "peeked token", this is the cache. This value is the same thing as I can see from the wikipedia's code: Recursive_descent_parser's C implementation

But in the Preprocessor(see the PP variable), the peeked token is the one in the Token queue, but this is the one we returned as next token (see const Token &NextToken()  in the above code snippet).

Now, back to our build-in parser, I think we may need a cache variable which store the value from the "m_Tokenizer.GetToken()", while we call "m_Tokenizer.PeekToken()", this actually means the "Token &NextToken()" as in clang's Parser implementation.

EDIT: this tutorial also shows the same method(cached Tok), see here: 2. Kaleidoscope: Implementing a Parser and AST — LLVM 5 documentation
« Last Edit: March 11, 2017, 03:58:13 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.