Yajii interpretion
Friday, September 25th, 2009I got a itch to work on my code reading skills again, so I skimmed through my ancient structured programming book. It went over a technique for interpretation where you start with the smallest, least abstracted parts of code and work your way up. I’m going to do this with the YAJII library. First I’m going to get all the class names.
Grepping through all the directories gives 62 distinct classes. This is too big to go through by hand, so lets start on a python script to help us.
import sys import os def main(directory): """Find all the java classes recursively in a directory.""" # Recursively walk the directory structure, gathering all java files. # For each java file found, search through it line by line for class # definition headers. # Return the list of class names found. if __name__ == '__main__': main(sys.argv[1])
So let’s start first with recursively walking the directories for Java files.
def get_java_files(directory):
"""Recursively walk the directory structure, gathering all java files."""
# Walk the directory structure
# for each java filename found, gather up the absolute path.
# Return the absolute paths
Whipping up an implementation, that gives
def get_java_files(directory):
"""Recursively walk the directory structure, gathering all java files."""
files = []
# Walk the directory structure
# for each java filename found, gather up the absolute path.
for dirpath, _, filenames in os.walk(directory):
for filename in filenames:
if filename.endswith(’.java’):
files.append(os.path.join(dirpath, filename)
#Return the absolute paths
return files
Now, I’m relatively certain that is correct, so I’m going to be “clean room” today and not test it. Now we’re left with main:
def main(directory):
"""Find all the java classes recursively in a directory."""
# For each java file found, search through it line by line for class
# definition headers.
# Return the list of class names found.
Let’s sketch out the searching of each file found.
# For each java file found, search through it line by line for class
# definition headers.
for filename in get_java_files(directory):
search_for_class(filename)
Very simple, now let’s define search_for_class, this is how it looks after defining, implementing and swapping back and forth between abstraction layers a few times:
def search_for_classes(filename):
"""Search file at filename line by line for class definition headers and
return the names of classes found.
"""
classnames = []
# Search each line for a class declaration.
# NOTE: This will miss classes that have a declaration on multiple lines.
for line in open(filename):
# If the line has a class declaration, extract the class name.
if ‘class’ in line:
classname = re.match(r’class\s([a-Z_][a-Z_]*)’).group(1)
classnames.append(classname)
return classnames
Note that it’s a heuristic approach, so it wont catch all valid java declarations, but it should catch all declarations that aren’t too crazy. Now, this implementation clearly shows a mistake in our main function. We have to assign the result of search_for_classes to something!
def main(directory):
"""Find all the java classes recursively in a directory."""
classnames = []
# For each java file found, search through it line by line for class
# definition headers.
for filename in get_java_files(directory):
classnames.extend(search_for_classes(filename))
# Return the list of class names found.
return classnames
Now, this looks pretty complete to me, so let’s try it out.
import sys
import os
def get_java_files(directory):
"""Recursively walk the directory structure, gathering all java files."""
files = []
# Walk the directory structure
# for each java filename found, gather up the absolute path.
for dirpath, _, filenames in os.walk(directory):
for filename in filenames:
if filename.endswith(’.java’):
files.append(os.path.join(dirpath, filename)
# Return the absolute paths
return files
def search_for_classes(filename):
“”"Search file at filename line by line for class definition headers and
return the names of classes found.
“”"
classnames = []
# Search each line for a class declaration.
# NOTE: This will miss classes that have a declaration on multiple lines.
for line in open(filename):
# If the line has a class declaration, extract the class name.
if ‘class’ in line:
classname = re.match(r’class\s([a-Z_][a-Z_]*)’).group(1)
classnames.append(classname)
return classnames
def main(directory):
“”"Find all the java classes recursively in a directory.”"”
classnames = []
# For each java file found, search through it line by line for class
# definition headers.
for filename in get_java_files(directory):
classnames.extend(search_for_classes(filename))
# Return the list of class names found.
return classnames
if __name__ == ‘__main__’:
for classname in main(sys.argv[1]):
print classname
Running it from the command line gives:
- A syntax error
- An import error
- A bad method call to match
- A bad regex character class
- A NoneType error because match will return None if nothing matched
- And no good output
So, the clean room experiment is a failure, in what way? Regular expression not described with enough intermediate steps.
Debugger will tell us what went wrong with everything else: get_java_files works as expected. The regex isn’t matching correctly. This implies that I need to extract the regex matching part out to allow for better testing and design.
def get_classname(line):
"""Extract class name from line with class declaration."""
match = re.match(r'class\s*([a-zA-Z_]\w*)’, line)
if match:
return match.group(1)
So, first step, let’s get this under test.
class TestGetClassname(unittest.TestCase):
def test_get_classname(self):
line = 'public class FAQ\n'
classname = get_classname(line)
self.assertEquals(classname, 'FAQ')
I ran it through the debugger to get actual sample data and ended up with this implementation:
def get_classname(line):
"""Extract class name from line with class declaration."""
return line.partition('class')[2].strip()
Now, this of course wont work if the class extends anything. So I add the test:
def test_get_classname_with_extends(self):
line = 'public class JelpException extends Exception\n'
classname = get_classname(line)
self.assertEquals(classname, 'JelpException')
The following implementation passes both tests
def get_classname(line):
"""Extract class name from line with class declaration."""
return line.partition('class')[2].split()[0].strip()
I’ve also noted all the limitations with NOTE comments in the source and created two files of examples from the YAJII project of legitimate class definitions and comments with the word class in them.



