Code::Blocks Forums

Developer forums (C::B DEVELOPMENT STRICTLY!) => Development => CodeCompletion redesign => Topic started by: ollydbg on March 10, 2017, 08:09:07 am

Title: Predictive recursive decent parser question about our build-in parser
Post by: ollydbg on March 10, 2017, 08:09:07 am
We have such code in our parser:

Code
    while (m_Tokenizer.NotEOF() && IS_ALIVE)
    {
        wxString token = m_Tokenizer.GetToken();
        if (token == ParserConsts::kw_class)
        {
            m_Str.Clear();
            if (m_Options.handleClasses)
                HandleClass(ctClass);
            else
                SkipToOneOfChars(ParserConsts::semicolonclbrace, true, true);
        }
        ...
        ...
    }

While, the function "GetToken" just consume the Token from the Token Stream, and advance the Token position.

But I see for a standard implementation, it usually use "PeekToken", and consume the Token later. See here: Recursive descent parser implementation (http://stackoverflow.com/questions/9814528/recursive-descent-parser-implementation)

Any ideas about the difference?


BTW: I have post a question in parsing - Are the peeked token or current token the same thing in recursive decent parser implementation? - Stack Overflow (http://stackoverflow.com/questions/42709627/are-the-peeked-token-or-current-token-the-same-thing-in-recursive-decent-parser).
Title: Re: Predictive recursive decent parser question about our build-in parser
Post by: ollydbg on March 11, 2017, 03:48:06 pm
OK, in this clang based code browser: Parser.h source code [clang/include/clang/Parse/Parser.h] - Woboq Code Browser (https://code.woboq.org/llvm/clang/include/clang/Parse/Parser.h.html#575), I see such code and comments:

Code
class Parser : public CodeCompletionHandler {
...
...

  Preprocessor &PP;
  /// Tok - The current token we are peeking ahead.  All parsing methods assume
  /// that this is valid.
  Token Tok;
  // PrevTokLocation - The location of the token we previously
  // consumed. This token is used for diagnostics where we expected to
  // see a token following another token (e.g., the ';' at the end of
  // a statement).
SourceLocation PrevTokLocation;

...
...
  /// GetLookAheadToken - This peeks ahead N tokens and returns that token
  /// without consuming any tokens.  LookAhead(0) returns 'Tok', LookAhead(1)
  /// returns the token after Tok, etc.
  ///
  /// Note that this differs from the Preprocessor's LookAhead method, because
  /// the Parser always has one token lexed that the preprocessor doesn't.
  ///
  const Token &GetLookAheadToken(unsigned N) {
    if (N == 0 || Tok.is(tok::eof)) return Tok;
    return PP.LookAhead(N-1);
  }
public:
  /// NextToken - This peeks ahead one token and returns it without
  /// consuming it.
  const Token &NextToken() {
    return PP.LookAhead(0);
}

Especially the comments for the Parser::GetLookAheadToken() function, and I see that the Tok member variable is very important, and it stands for the "peeked token", this is the cache. This value is the same thing as I can see from the wikipedia's code: Recursive_descent_parser's C implementation (https://en.wikipedia.org/wiki/Recursive_descent_parser#C_implementation)

But in the Preprocessor(see the PP variable), the peeked token is the one in the Token queue, but this is the one we returned as next token (see const Token &NextToken()  in the above code snippet).

Now, back to our build-in parser, I think we may need a cache variable which store the value from the "m_Tokenizer.GetToken()", while we call "m_Tokenizer.PeekToken()", this actually means the "Token &NextToken()" as in clang's Parser implementation.

EDIT: this tutorial also shows the same method(cached Tok), see here: 2. Kaleidoscope: Implementing a Parser and AST — LLVM 5 documentation (http://llvm.org/docs/tutorial/LangImpl02.html)