Archive for September, 2009

Yajii interpretion

Friday, September 25th, 2009

I got a itch to work on my code reading skills again, so I skimmed through my ancient structured programming book. It went over a technique for interpretation where you start with the smallest, least abstracted parts of code and work your way up. I’m going to do this with the YAJII library. First I’m going to get all the class names.

Grepping through all the directories gives 62 distinct classes. This is too big to go through by hand, so lets start on a python script to help us.

import sys
import os

def main(directory):
"""Find all the java classes recursively in a directory."""

# Recursively walk the directory structure, gathering all java files.

# For each java file found, search through it line by line for class
# definition headers.

# Return the list of class names found.

if __name__ == '__main__':
main(sys.argv[1])

So let’s start first with recursively walking the directories for Java files.

def get_java_files(directory):
    """Recursively walk the directory structure, gathering all java files."""
    # Walk the directory structure
    # for each java filename found, gather up the absolute path.

    # Return the absolute paths

Whipping up an implementation, that gives

def get_java_files(directory):
    """Recursively walk the directory structure, gathering all java files."""
    files = []
    # Walk the directory structure
    # for each java filename found, gather up the absolute path.
    for dirpath, _, filenames in os.walk(directory):
        for filename in filenames:
            if filename.endswith(’.java’):
                files.append(os.path.join(dirpath, filename)

    #Return the absolute paths
    return files

Now, I’m relatively certain that is correct, so I’m going to be “clean room” today and not test it. Now we’re left with main:

def main(directory):
    """Find all the java classes recursively in a directory."""

    # For each java file found, search through it line by line for class
    # definition headers.

    # Return the list of class names found.

Let’s sketch out the searching of each file found.

# For each java file found, search through it line by line for class
# definition headers.
for filename in get_java_files(directory):
    search_for_class(filename)

Very simple, now let’s define search_for_class, this is how it looks after defining, implementing and swapping back and forth between abstraction layers a few times:

def search_for_classes(filename):
    """Search file at filename line by line for class definition headers and
    return the names of classes found.
    """
    classnames = []
    # Search each line for a class declaration.
    # NOTE: This will miss classes that have a declaration on multiple lines.
    for line in open(filename):
        # If the line has a class declaration, extract the class name.
        if ‘class’ in line:
            classname = re.match(r’class\s([a-Z_][a-Z_]*)’).group(1)
            classnames.append(classname)

    return classnames

Note that it’s a heuristic approach, so it wont catch all valid java declarations, but it should catch all declarations that aren’t too crazy. Now, this implementation clearly shows a mistake in our main function. We have to assign the result of search_for_classes to something!

def main(directory):
    """Find all the java classes recursively in a directory."""
    classnames = []

    # For each java file found, search through it line by line for class
    # definition headers.
    for filename in get_java_files(directory):
        classnames.extend(search_for_classes(filename))

    # Return the list of class names found.
    return classnames

Now, this looks pretty complete to me, so let’s try it out.

import sys
import os

def get_java_files(directory):
    """Recursively walk the directory structure, gathering all java files."""
    files = []
    # Walk the directory structure
    # for each java filename found, gather up the absolute path.
    for dirpath, _, filenames in os.walk(directory):
        for filename in filenames:
            if filename.endswith(’.java’):
                files.append(os.path.join(dirpath, filename)

    # Return the absolute paths
    return files

def search_for_classes(filename):
    “”"Search file at filename line by line for class definition headers and
    return the names of classes found.
    “”"
    classnames = []
    # Search each line for a class declaration.
    # NOTE: This will miss classes that have a declaration on multiple lines.
    for line in open(filename):
        # If the line has a class declaration, extract the class name.
        if ‘class’ in line:
            classname = re.match(r’class\s([a-Z_][a-Z_]*)’).group(1)
            classnames.append(classname)

    return classnames

def main(directory):
    “”"Find all the java classes recursively in a directory.”"”
    classnames = []

    # For each java file found, search through it line by line for class
    # definition headers.
    for filename in get_java_files(directory):
        classnames.extend(search_for_classes(filename))

    # Return the list of class names found.
    return classnames

if __name__ == ‘__main__’:
    for classname in main(sys.argv[1]):
        print classname

Running it from the command line gives:

  • A syntax error
  • An import error
  • A bad method call to match
  • A bad regex character class
  • A NoneType error because match will return None if nothing matched
  • And no good output

So, the clean room experiment is a failure, in what way? Regular expression not described with enough intermediate steps.

Debugger will tell us what went wrong with everything else: get_java_files works as expected. The regex isn’t matching correctly. This implies that I need to extract the regex matching part out to allow for better testing and design.

def get_classname(line):
    """Extract class name from line with class declaration."""
    match = re.match(r'class\s*([a-zA-Z_]\w*)’, line)
    if match:
        return  match.group(1)

So, first step, let’s get this under test.

class TestGetClassname(unittest.TestCase):
    def test_get_classname(self):
        line = 'public class FAQ\n'
        classname = get_classname(line)
        self.assertEquals(classname, 'FAQ')

I ran it through the debugger to get actual sample data and ended up with this implementation:

def get_classname(line):
    """Extract class name from line with class declaration."""
    return line.partition('class')[2].strip()

Now, this of course wont work if the class extends anything. So I add the test:

    def test_get_classname_with_extends(self):
        line = 'public class JelpException extends Exception\n'
        classname = get_classname(line)
        self.assertEquals(classname, 'JelpException')

The following implementation passes both tests

def get_classname(line):
    """Extract class name from line with class declaration."""
    return line.partition('class')[2].split()[0].strip()

I’ve also noted all the limitations with NOTE comments in the source and created two files of examples from the YAJII project of legitimate class definitions and comments with the word class in them.

DSL in Python

Sunday, September 6th, 2009

Fernando Meyer has made a nice DSL in python. It’ll be interesting to see where it heads. He just modifies the file using python’s tokenize library instead of AST. He hasn’t implemented import hooks to bootstrap it yet, but that should be easy to do.

This is another technique that could be used to implement the AST magic I posted about previously.