Author Topic: Python lexer  (Read 15151 times)

sethjackson

  • Guest
Python lexer
« on: January 17, 2006, 02:19:26 am »
I have made a python lexer for C::B. I know 0 python, but mosfet asked if scripting languages
and specifically python was going to be supported so I wrote a little sample code and a (most of the) lexer.
The sample code may be wrong (I peeked at the python website). Some python geek will find something
wrong with it probably. :( The forum link I was talking about is here.

http://forums.codeblocks.org/index.php?topic=1980.0

Download the patch from SF.net

http://sourceforge.net/tracker/index.php?func=detail&aid=1407815&group_id=126998&atid=707418

Constructive criticism is always welcome. :)

BTW no offense intended about the python geek part.  :D

Stevo

  • Guest
Re: Python lexer
« Reply #1 on: January 19, 2006, 07:10:13 am »
Ive just tried your lexer.  Looks good.

My only comments are, can you add "*SConstruct" and "*SConscript" to the filemasks.  These are python files, which are used by the SCons project (www.scons.org (which is a make replacement).  Ive just switched to it from jam, and this lexer syntax highlights these build scripts fine.

Any chance this can get added to the repo as a standard lexer, it looks as good as the others to me, and python is a pretty popular language.

Stevo

Actually, ive attemted to enhance the lexer and sample python file.  They are attached.  unfortunately, my enhancements didnt do what i expected, and im a bit lost as to what to do.  Ive added a bunch of definitions to keywords, for builtins, etc.  ive added all the standard python modules to user and ive added the standard exceptions to documentation (do they can be highlighted different to keywords).  The problem is i dont see these new sets of keywords, when i copy these files into /share/codeblocks/lexers/ if anyone can look at them, and provide any advice id appreciate it.

lexer_python.xml:
Code
<?xml version="1.0"?>
<!DOCTYPE CodeBlocks_lexer_properties>
<CodeBlocks_lexer_properties>
        <Lexer name="Python"
                index="2"
                filemasks="*.py,*SConstruct,*SConscript">
                <Style name="Default"
                                                index="0"
                                                fg="0,0,0"
                                                bg="255,255,255"
                                                bold="0"
                                                italics="0"
                                                underlined="0"/>
                <Style name="Comment"
                        index="1"
                        fg="160,160,160"/>
                <Style name="Number"
                        index="2"
                        fg="240,0,240"/>
                <Style name="String"
                        index="3"
                        fg="0,0,255"/>
                <Style name="Character"
                        index="4"
                        fg="224,160,0"/>
                <Style name="Keyword"
                        index="5"
                        fg="0,0,160"
                        bold="1"/>
                <Style name="Triple qutoes"
                        index="6"
                        fg="128,0,0"/>
                <Style name="Triple double quotes"
                        index="7"
                        fg="128,0,128"/>
                <Style name="Class name"
                        index="8"
                        fg="0,0,0"/>
                <Style name="Definiton name"
                        index="9"
                        fg="0,160,0"
                        bold="1"/>
                <Style name="Operator"
                        index="10"
                        fg="255,0,0"/>
                <Style name="Identifier"
                        index="11"/>
                <Style name="Comment block"
                        index="12"
                        fg="128,128,255"
                        bold="1"/>
                <Style name="String EOL"
                        index="13"/>
                <Style name="User Keyword"
                        index="14"/>
                <Style name="Decorator"
                        index="15"/>
                <Keywords>
                        <Language index="0"
                                value="and assert break class continue def del elif else except
                                       exec finally for from global if import in is lambda None
                                       not or pass print raise return try while yield

                                       __import__ abs basestring bool callable chr classmethod
                                       cmp compile complex delattr dict dir divmod enumerate
                                       eval execfile file filter float frozenset getattr globals
                                       hasattr hash help hex id input int isinstance issubclass
                                       iter len list locals long map max min object oct open
                                       ord pow property range raw_input reduce reload repr
                                       reversed round set setattr slice sorted staticmethod
                                       str sum super tuple type type unichr unicode vars xrange
                                       zip

                                       apply buffer coerce intern

                                       __dict__ Ellipsis False True NotImplemented
                                       __class__ __bases__ __name__
                                      "/>
                        <User index="1"
                                value="sys gc weakref fpectl atexit types UserDict UserList UserString
                                                 operator inspect traceback linecache pickle cPickle copy_reg
                                                 shelve copy marshal warnings imp zipimport pkgutil modulefinder
                                                 code codeop pprint repr new site user __builtin__ __main__
                                                 __future__

                                                 string re struct difflib fpformat StringIO cStringIO textwrap
                                                 codecs encodings.idna unicodedata stringprep

                                                 pydoc doctest unittest test test.test_support decimal math
                                                 cmath random whrandom bisect collections heapq array sets
                                                 itertools ConfigParser fileinput calendar cmd shlex

                                       os os.path dircache stat statcache statvfs filecmp subprocess
                                       popen2 datetime time sched mutex getpass curses curses.textpad
                                       curses.wrapper curses.ascii curses.panel getopt optparse tempfile
                                       errno glob fnmatch shutil locale gettext logging platform

                                       signal socket select thread threading dummy_thread dummy_threading
                                       Queue mmap anydbm dbhash whichdb bsddb dumbdbm zlib gzip bz2
                                       zipfile tarfile readline rlcompleter

                                       posix pwd grp crypt dl dbm gdbm termios tty pty fcntl pipes
                                       posixfile resource nis syslog commands

                                       hotshot timeit

                                       webbrowser cgi cgitb urllib urllib2 httplib ftplib gopherlib
                                       poplib imaplib nntplib smtplib smtpd telnetlib urlparse
                                       SocketServer BaseHTTPServer SimpleHTTPServer CGIHTTPServer
                                       cookielib Cookie xmlrpclib SimpleXMLRPCServer DocXMLRPCServer
                                       asyncore asynchat

                                       formatter email email.Message email.Parser email.Generator
                                       email.Header email.Charset email.Encoders email.Errors
                                       email.Utils email.Iterators mailcap mailbox mhlib mimetools
                                       mimetypes MimeWriter mimify multifile rfc822 base64 binascii
                                       binhex quopri uu xdrlib netrc robotparser csv

                                       HTMLParser sgmllib htmllib htmlentitydefs xml.parsers.expat
                                       xml.dom xml.dom.minidom xml.dom.pulldom xml.sax
                                       xml.sax.handler xml.sax.saxutils xml.sax.xmlreader xmllib

                                       audioop imageop aifc sunau wave chunk colorsys rgbimg imghdr
                                       sndhdr ossaudiodev

                                       hmac md5 sha

                                       Tkinter Tix ScrolledText turtle

                                       parser symbol token keyword tokenize tabnanny pyclbr
                                       py_compile compileall dis pickletools distutils

                                      "/>
                        <Documentation index="2"
                                value="exception Exception StandardError ArithmeticError
                                       LookupError EnvironmentError AssertionError
                                       AttributeError EOFError FloatingPointError IOError
                                       ImportError IndexError KeyError KeyboardInterrupt
                                       MemoryError NameError NotImplementedError OSError
                                       OverflowError ReferenceError RuntimeError
                                       StopIteration SyntaxError SystemError SystemExit
                                       TypeError UnboundLocalError UnicodeError
                                       UnicodeEncodeError UnicodeDecodeError
                                       UnicodeTranslateError ValueError WindowsError
                                       ZeroDivisionError Warning UserWarning
                                       DeprecationWarning PendingDeprecationWarning
                                       SyntaxWarning RuntimeWarning FutureWarning
                                      "/>
                </Keywords>
                <SampleCode value="lexer_python.sample"/>
        </Lexer>
</CodeBlocks_lexer_properties>

lexer_python.sample
Code
# This is a comment
## This is a comment block

>>> "Hello World!"
>>> 'Test'
>>> 2 + 2
>>> '''Triple quotes!'''
>>> """Triple double quotes!"""

month_names = ['Januari', 'Februari', 'Maart',      # These are the
               'April',   'Mei',      'Juni',       # Dutch names
               'Juli',    'Augustus', 'September',  # for the months
               'Oktober', 'November', 'December']   # of the year

class ClassName:
  def perm(l):
  # Compute the list of all permutations of l
    if len(l) <= 1:
      return [l]
    r = []
    for i in range(len(l)):
      s = l[:i] + l[i+1:]
      p = perm(s)
      for x in p:
        r.append(l[i:i+1] + x)
    return r

  def func(self)
    return  'A string\n'
« Last Edit: January 19, 2006, 01:54:31 pm by Stevo »

Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
Re: Python lexer
« Reply #2 on: January 19, 2006, 04:24:09 pm »
Ive just tried your lexer.  Looks good.
[...]
Ive added a bunch of definitions to keywords, for builtins, etc.  ive added all the standard python modules
Are you a python geek then? :)
I am asking because I have no clue regarding python, all I can say is your thingie does a lot of nice colours in the sample code. Looks good to me, so if somebody tells me that this is really good python, I'll commit it... :)

Quote
The problem is i dont see these new sets of keywords, when i copy these files into /share/codeblocks/lexers/ if anyone can look at them, and provide any advice id appreciate it.
Have you tried clicking on "Reset Defaults" to force the editor to reload them? Otherwise, changes are not visible.

EDIT:
Found one insignificant typo, it says "qutoe" where it should be "quote". Corrected that in my copy, now just waiting for somebody to tell me if this is "good python".
« Last Edit: January 19, 2006, 04:30:22 pm by thomas »
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

Stevo

  • Guest
Re: Python lexer
« Reply #3 on: January 20, 2006, 12:55:34 am »
Are you a python geek then? :)

No, im an aspirant Python geek, ive only just started using Python, becuase ive started using SCons.  But so far, im not hating it.

I am asking because I have no clue regarding python, all I can say is your thingie does a lot of nice colours in the sample code. Looks good to me, so if somebody tells me that this is really good python, I'll commit it... :)

Im sure the example could probably be beefed up.  I have copied some of the examples from the python tutorial and language reference from www.python.org Below, ive included a beefier example, with code snippets taken from various places.  All of the syntactic elements (i believe) have examples in the example.  I got all of the names for the keywords, user keywords and documentation (which im using for exception keywords) from the documents on the python site.

The decorator points out a minor bug (what i believe is a bug anyway) in the underlying scintilla lexer.  It highlights the comment following the decorators as decorators, i think they should be comments (ie, the decorator should stop where the comment starts) but it isnt a big issue for me.

Quote
The problem is i dont see these new sets of keywords, when i copy these files into /share/codeblocks/lexers/ if anyone can look at them, and provide any advice id appreciate it.
Have you tried clicking on "Reset Defaults" to force the editor to reload them? Otherwise, changes are not visible.
That did the trick, thanks.

EDIT:
Found one insignificant typo, it says "qutoe" where it should be "quote". Corrected that in my copy, now just waiting for somebody to tell me if this is "good python".

As I said, im not a Python geek, so id be more than happy for anyone to second it, but i feel it is pretty good, based on my research.  Thanks for starting this BTW.  There are still a couple of issues:

1. I think the following colour elements should be renamed:
String -> Double Quote String
Character -> Single Quote String
Triple Quotes -> Triple Single Quoted String
Triple Double Quotes -> Triple Double Quoted String

String and Character are the wrong names (i think),  'aaa' == "aaa" they are both strings, and are interchangeable, so they should both be listed as strings, i think.  The change to the other 2 is just for consistency.

2. Documentation highlighting:
I dont know how the get the words in the documentation area to highlight.  There doesnt seem to be a lexer item for them, so if they cant be independently highlighted, I think they should be incorporated into Keywords.

Anyway, here is the revised example:
Code
# This is a comment
## This is a comment block

import sys, time, string

month_names = ['Januari', 'Februari', 'Maart',      # These are the
               'April',   'Mei',      'Juni',       # Dutch names
               "Juli",    "Augustus", "September",  # for the months
               "Oktober", "November", "December"]   # of the year

if len(sys.argv)!=2:
    print '''Usage: This is a 'Code::Blocks' Example'''
    print "This String goes to EOL
    sys.exit(0)

class ClassName:
  def perm(l):
  # Compute the list of all permutations of l
    if len(l) <= 1:
      return [l]
    r = []
    for i in range(len(l)):
      s = l[:i] + l[i+1:]
      p = perm(s)
      for x in p:
        r.append(l[i:i+1] + x)
    return r

@classmethod           # This is a decorator
@synchronized(lock)    # And so is this
def func(self):
  try:
    return  """A "Triple Double Quote" String\n"""
  except SystemExit:
    pass


sethjackson

  • Guest
Re: Python lexer
« Reply #4 on: January 20, 2006, 01:35:27 am »
1. I think the following colour elements should be renamed:
String -> Double Quote String
Character -> Single Quote String
Triple Quotes -> Triple Single Quoted String
Triple Double Quotes -> Triple Double Quoted String

String and Character are the wrong names (i think),  'aaa' == "aaa" they are both strings, and are interchangeable, so they should both be listed as strings, i think.  The change to the other 2 is just for consistency.

2. Documentation highlighting:
I dont know how the get the words in the documentation area to highlight.  There doesnt seem to be a lexer item for them, so if they cant be independently highlighted, I think they should be incorporated into Keywords.


1. Rename them. :)
2. About the documentation keywords.

Code: xml
<Documentation index="2"
                  value=""/>

put all the doc keywords in value="" (sperate each item by a space)

EDIT:

This will help you understand how the lexers work. :)

http://forums.codeblocks.org/index.php?topic=519.0

Stevo

  • Guest
Re: Python lexer
« Reply #5 on: January 20, 2006, 03:27:52 am »
wxScintella doesnt support documentation keywords for python :(

Ive located the bug in wxScintella for processing comments on Decorators, would i patch it and submit the patch here, or to wxScintella directly?

Attached are fixed up versions that im satisfied with for a python lexer.

lexer_python.xml
Code
<?xml version="1.0"?>
<!DOCTYPE CodeBlocks_lexer_properties>
<CodeBlocks_lexer_properties>
        <Lexer name="Python"
                index="2"
                filemasks="*.py,*SConstruct,*SConscript">
                <Style name="Default"
index="0"
fg="0,0,0"
bg="255,255,255"
bold="0"
italics="0"
underlined="0"/>
                <Style name="Comment"
                        index="1"
                        fg="160,160,160"/>
                <Style name="Number"
                        index="2"
                        fg="240,0,240"/>
                <Style name="Double Quote String"
                        index="3"
                        fg="0,0,255"/>
                <Style name="Single Quote String"
                        index="4"
                        fg="224,160,0"/>
                <Style name="Keyword"
                        index="5"
                        fg="0,0,160"
                        bold="1"/>
                <Style name="Triple Single Quote String"
                        index="6"
                        fg="128,0,0"/>
                <Style name="Triple Double Quote String"
                        index="7"
                        fg="128,0,128"/>
                <Style name="Class name"
                        index="8"
                        fg="0,0,0"/>
                <Style name="Definiton name"
                        index="9"
                        fg="0,160,0"
                        bold="1"/>
                <Style name="Operator"
                        index="10"
                        fg="255,0,0"/>
                <Style name="Identifier"
                        index="11"/>
                <Style name="Comment block"
                        index="12"
                        fg="128,128,255"
                        bold="1"/>
                <Style name="String EOL"
                        index="13"/>
                <Style name="User Keyword"
                        index="14"/>
                <Style name="Decorator"
                        index="15"/>
                <Keywords>
                        <Language index="0"
                                value="and assert break class continue def del elif else except
                                       exec finally for from global if import in is lambda None
                                       not or pass print raise return try while yield

                                       __import__ abs basestring bool callable chr classmethod
                                       cmp compile complex delattr dict dir divmod enumerate
                                       eval execfile file filter float frozenset getattr globals
                                       hasattr hash help hex id input int isinstance issubclass
                                       iter len list locals long map max min object oct open
                                       ord pow property range raw_input reduce reload repr
                                       reversed round set setattr slice sorted staticmethod
                                       str sum super tuple type type unichr unicode vars xrange
                                       zip

                                       apply buffer coerce intern

                                       __dict__ Ellipsis False True NotImplemented
                                       __class__ __bases__ __name__

                                       exception Exception StandardError ArithmeticError
                                       LookupError EnvironmentError AssertionError
                                       AttributeError EOFError FloatingPointError IOError
                                       ImportError IndexError KeyError KeyboardInterrupt
                                       MemoryError NameError NotImplementedError OSError
                                       OverflowError ReferenceError RuntimeError
                                       StopIteration SyntaxError SystemError SystemExit
                                       TypeError UnboundLocalError UnicodeError
                                       UnicodeEncodeError UnicodeDecodeError
                                       UnicodeTranslateError ValueError WindowsError
                                       ZeroDivisionError Warning UserWarning
                                       DeprecationWarning PendingDeprecationWarning
                                       SyntaxWarning RuntimeWarning FutureWarning
                                      "/>
                        <User index="1"
                                value="sys gc weakref fpectl atexit types UserDict UserList UserString
                                    operator inspect traceback linecache pickle cPickle copy_reg
                                    shelve copy marshal warnings imp zipimport pkgutil modulefinder
                                    code codeop pprint repr new site user __builtin__ __main__
                                    __future__

                                    string re struct difflib fpformat StringIO cStringIO textwrap
                                    codecs encodings.idna unicodedata stringprep

                                    pydoc doctest unittest test test.test_support decimal math
                                    cmath random whrandom bisect collections heapq array sets
                                    itertools ConfigParser fileinput calendar cmd shlex

                                       os os.path dircache stat statcache statvfs filecmp subprocess
                                       popen2 datetime time sched mutex getpass curses curses.textpad
                                       curses.wrapper curses.ascii curses.panel getopt optparse tempfile
                                       errno glob fnmatch shutil locale gettext logging platform

                                       signal socket select thread threading dummy_thread dummy_threading
                                       Queue mmap anydbm dbhash whichdb bsddb dumbdbm zlib gzip bz2
                                       zipfile tarfile readline rlcompleter

                                       posix pwd grp crypt dl dbm gdbm termios tty pty fcntl pipes
                                       posixfile resource nis syslog commands

                                       hotshot timeit

                                       webbrowser cgi cgitb urllib urllib2 httplib ftplib gopherlib
                                       poplib imaplib nntplib smtplib smtpd telnetlib urlparse
                                       SocketServer BaseHTTPServer SimpleHTTPServer CGIHTTPServer
                                       cookielib Cookie xmlrpclib SimpleXMLRPCServer DocXMLRPCServer
                                       asyncore asynchat

                                       formatter email email.Message email.Parser email.Generator
                                       email.Header email.Charset email.Encoders email.Errors
                                       email.Utils email.Iterators mailcap mailbox mhlib mimetools
                                       mimetypes MimeWriter mimify multifile rfc822 base64 binascii
                                       binhex quopri uu xdrlib netrc robotparser csv

                                       HTMLParser sgmllib htmllib htmlentitydefs xml.parsers.expat
                                       xml.dom xml.dom.minidom xml.dom.pulldom xml.sax
                                       xml.sax.handler xml.sax.saxutils xml.sax.xmlreader xmllib

                                       audioop imageop aifc sunau wave chunk colorsys rgbimg imghdr
                                       sndhdr ossaudiodev

                                       hmac md5 sha

                                       Tkinter Tix ScrolledText turtle

                                       parser symbol token keyword tokenize tabnanny pyclbr
                                       py_compile compileall dis pickletools distutils

                                      "/>
                        <Documentation index="2"
                                value=""/>
                </Keywords>
                <SampleCode value="lexer_python.sample"/>
        </Lexer>
</CodeBlocks_lexer_properties>

and lexer_python.sample
Code
# This is a comment
## This is a comment block

import sys, time, string

month_names = ['Januari', 'Februari', 'Maart',      # These are the
               'April',   'Mei',      'Juni',       # Dutch names
               "Juli",    "Augustus", "September",  # for the months
               "Oktober", "November", "December"]   # of the year

if len(sys.argv)!=2:
    print '''Usage: This is a 'Code::Blocks' Example'''
    print "This String goes to EOL
    sys.exit(0)

class ClassName:
  def perm(l):
  # Compute the list of all permutations of l
    if len(l) <= 1:
      return [l]
    r = []
    for i in range(len(l)):
      s = l[:i] + l[i+1:]
      p = perm(s)
      for x in p:
        r.append(l[i:i+1] + x)
    return r

@classmethod           # This is a decorator
@synchronized(lock)    # And so is this
def func(self):
  try:
    return  """A "Triple Double Quote" String\n"""
  except SystemExit:
    pass



Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4315
    • Code::Blocks IDE
Re: Python lexer
« Reply #6 on: January 20, 2006, 08:47:29 am »
wxScintella doesnt support documentation keywords for python :(

Ive located the bug in wxScintella for processing comments on Decorators, would i patch it and submit the patch here, or to wxScintella directly?

If it is a wxScintilla issue and not a scintilla one, then:
Post it in our patch tracker so we can use it immediately and then post it at wxScintilla's patch tracker. I 'm sure Otto will apply it for the next version.

If on the other hand it is a scintilla issue, post it there directly.
Be patient!
This bug will be fixed soon...

sethjackson

  • Guest
Re: Python lexer
« Reply #7 on: January 20, 2006, 05:42:19 pm »
Hey this patch was accepted.  8) So could the two files (lexer_python.xml, lexer_python.sample) files be added to the .cbp projects now.   :wink:

sethjackson

  • Guest
Re: Python lexer
« Reply #8 on: January 20, 2006, 08:17:16 pm »
Here is a patch for the .cbp project. :)

[attachment deleted by admin]

sethjackson

  • Guest
Re: Python lexer
« Reply #9 on: January 21, 2006, 03:07:59 pm »
@Stevo are you sure it is a scintilla bug? Those decorators (in the sample) aren't highlighted because you did not specify a color for them. :) I have a patch that does.

http://sourceforge.net/tracker/index.php?func=detail&aid=1411504&group_id=126998&atid=707418




Stevo

  • Guest
Re: Python lexer
« Reply #10 on: January 22, 2006, 08:12:46 am »
You are however correct in that i left the default hilight color for decorators as the DEFAULT text color, but that wasnt the problem i was describing.

The problem i was talking about is:

@classmethod           # This is a decorator

The decorator color highlights the entire line up until the EOL.  It should stop at the #, which is a comment or EOL (whichever comes first).  The Lexer in scintilla doesnt end the decorator state, until EOL< it should also check for a comment, and change state.

The code to fix is pretty straight forward, i just havent had a chance to implement it and test yet.  I actually think its an underlying scintilla problem, but i havent tracked it back to see where it originates yet.

Stevo

sethjackson

  • Guest
Re: Python lexer
« Reply #11 on: January 22, 2006, 03:29:53 pm »
You are however correct in that i left the default hilight color for decorators as the DEFAULT text color, but that wasnt the problem i was describing.

The problem i was talking about is:

@classmethod           # This is a decorator

The decorator color highlights the entire line up until the EOL.  It should stop at the #, which is a comment or EOL (whichever comes first).  The Lexer in scintilla doesnt end the decorator state, until EOL< it should also check for a comment, and change state.

The code to fix is pretty straight forward, i just havent had a chance to implement it and test yet.  I actually think its an underlying scintilla problem, but i havent tracked it back to see where it originates yet.

Stevo

Ok check out the String EOL style in the Python lexer maybe that will help you..... I'm just shooting in the dark, but it seems to me that if it was a bug it would have been found and fixed long ago......

Stevo

  • Guest
Re: Python lexer
« Reply #12 on: January 23, 2006, 03:22:19 am »
Here is a patch to fix the python lexer is scintilla, so that decorators end at a comment, as well as at the end of the line.

It is a bug, it probably hasnt been found before, because its a pretty minor issue, and it doesnt hurt anyone, its also probably unusual to put comments on decorators.

It looks like the person who added decorator highlighting just copied the logic for comment highlighting, but forgot the case where a decorator is terminated by a comment.

It is a bug in scintilla the attached patch is also being sent to wxscintilla and scintilla projects for completeness.

Stevo

[attachment deleted by admin]

Offline mandrav

  • Project Leader
  • Administrator
  • Lives here!
  • *****
  • Posts: 4315
    • Code::Blocks IDE
Re: Python lexer
« Reply #13 on: January 23, 2006, 08:48:03 am »
Stevo, thanks for the patch. Please submit it to our patch tracker, not here...
Be patient!
This bug will be fixed soon...

Stevo

  • Guest
Re: Python lexer
« Reply #14 on: January 24, 2006, 02:51:30 am »
Stevo, thanks for the patch. Please submit it to our patch tracker, not here...

Done.