Author Topic: python pretty printer for Tokenizer class  (Read 8254 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
python pretty printer for Tokenizer class
« on: April 27, 2015, 11:50:04 am »
Hi, this is my draft forTokenizer class, see the code:
Code
import gdb
import re

class TokenizerPrinter(object):
    '''Print a Tokenizer object'''
    def __init__(self, val):
        self.val = val

    def to_string(self):
        # This add a new line here, so we can align the buffer and the token line
        print ("The Tokenizer object")
        buffer = self.val['m_Buffer']
        length = int(self.val['m_BufferLen'])
        curr_index = int(self.val['m_TokenIndex'])
        undo_index = int(self.val['m_UndoTokenIndex'])
        peek_index = int(self.val['m_PeekTokenIndex'])
        curr = self.val['m_Token']
        peek = self.val['m_PeekToken']
        #print (buffer)
        dir(buffer)
        dataAsCharPointer = buffer['m_pchData']
        start = curr_index - 30
        end = curr_index + 30
        #print (start)
        #print (end)
        #print (dataAsCharPointer)
        s = dataAsCharPointer.string()
        cut_s_left  = s[start:curr_index].encode("ascii")
        cut_s_right = s[curr_index:end].encode("ascii")
        #print (cut_s_left)
        #print (cut_s_right)
        #print (repr(cut_s_left))
        #print (repr(cut_s_right))
        left = repr(cut_s_left)
        right = repr(cut_s_right)
        # remove the single quote
        left = left[1:-1]
        right = right[1:-1]
        buffer_line = "Buffer: " + left + right
        print (str(buffer_line))
        t_left_space = " "* len(left)
        token = curr['m_pchData'].string()
        left_space = token.rjust(len(left), ' ')
        token_line =  "Token : " + left_space + "^"
        print (str(token_line))
        have_peek = self.val['m_PeekAvailable']
        if have_peek == True:
            cut_p_left = s[curr_index:peek_index].encode("ascii")
            p_left = repr(cut_p_left)
            p_left = p_left[1:-1]
            s_peek = peek['m_pchData'].string()
            #print (len(p_left))
            p_space_left = s_peek.rjust(len(p_left), ' ')
            #print (len(p_space_left))
            peek_line =  "Peek  : " + t_left_space + p_space_left + "^"
            print (peek_line)
        return None

    def display_hint (self):
        return 'string'

def lookup_function (val):
    "Look-up and return a pretty-printer that can print val."
    typename = val.type.tag
    if typename == None:
        return None
    regex = re.compile('^Tokenizer$')
    if regex.match(typename):
        return  TokenizerPrinter(val)
    return None

def register_my_printers (obj):
    if obj == None:
        obj = gdb
    obj.pretty_printers.append (lookup_function)

register_my_printers(None)

So, when you debugging, you can have the following gdb output in the command line:
Code
$35 = The Tokenizer object
Buffer: /Category.hh"\r\n\r\n\r\n    int abc;\r\n    int def;\r\n    int abc;\r
Token :                                  abc^
Peek  :                                     ;^

or

Code
$36 = The Tokenizer object
Buffer: hh"\r\n\r\n\r\n    int abc;\r\n    int def;\r\n    int abc;\r\n    int d
Token :                                    int^
Peek  :                                        def^

When you type the "p *this" when you inside the Tokenizer member function, or "p m_Tokenizer" in upper layer class.
The source code may be a bit hard to understand, so it need some refactoring (such as variable name change)  ;)
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.