Redundancy, information content and Basic English

November 1st, 2009

Basic English increases semantic redundancy, I did not no this. Maybe that’s why it’s easier to understand writing that has a constrained vocabulary.

This I learned from Shannon’s paper.

Reading Code As Literature

October 25th, 2009

After reading a few pages on Code Reading I’ve decided to start reading TinyPy as literature. That is, just read to understand and for enjoyment. Hopefully this will boost my code reading ability.

It also gives me a chance to play with CScope and Splint. Currently the TinyPy code doesn’t compile with gcc -ansi -pedantic, eventually I’ll change that it get to pass through splint with no errors.

I continue to be impressed with cscope’s interface and ability. I wish there was a similar tool for python. Huh, it looks like it may be able to do at least some stuff for python  (being a dumb “grep” without the ability to understand variables or functions).

I also eventually want to try:

  • BLAST (Berkeley Lazy Abstraction Software verification Tool) — a software model checker for C programs based on lazy abstraction.
  • Frama-C — A static analysis framework for C.
  • Uno — A tool designed to find most common type of programming errors without generating too much output.

Projects to test pythocope on

October 20th, 2009

Just chatted in #python to get some ideas of what to try pythoscope on.

I guess I’ll start with pydoc first since it probably could be accepted as a patch to python.

Issues found while initializing:

  • Pythoscope needs the ability to exclude directories when initializing.
  • Pythoscope needs to be optimized during initialization — memory use grows to over 800 MB looking at the python standard library. At which point my poor netbook starts to page like mad :(

Testing, active reading and code reading

October 17th, 2009

After reading Cem Kaner’s and James Bach’s slides on exploratory testing I noticed a technique I didn’t know about before, active reading, which can be distilled into SQ3R. Basically when you have a document to read, start by skimming the headings/topic sentences, determine questions to ask, read the document to answer the questions, distill the information into proper recall and review it.

This fits into spaced repetition quite nicely. After the information is distilled for recall, input it into a spaced repetition system for review.

The emphasis in the exploratory testing slides is to use this technique to rapidly gain information from documentation such as user manuals, design specs, etc. This is important, but I wonder if it can be used in code reading.

For example, for the YAJII project it might be broken into steps like this:

  • Scan the names of public classes
  • Ask questions about functionality
    • What is the public API and how is it used?
    • What are the areas that are highest risk?
    • What are the inflection points where unit tests can easily be added?
  • Read the code in detail while trying to answer these questions.
  • Write unit tests and additional documentation to record the answers found.
  • Reread the new documentation and unit tests, verifying against the code.

I’m going to try to do this in detail for my book on structured programming, even though I’ve read through it once, it’s very dense and some parts are hard to apply to modern programming languages. Some questions that I’ll want to answer (probably using my type system for objects book) are:

What are the terminologies used in structured programming and how do they relate to modern languages and object oriented programming? What techniques do they recommend for reading structured programs that can be applied to modern languages? What techniques need to be adapted? Are there techniques for unstructured program reading and modification that can be applied to modern languages?

YAJII Static Analysis

October 10th, 2009

Ok, so I’ve got YAJII to compile and have the javadoc to help with browsing, so now it’s time to do the static analysis.

I’ll start by doing a callgraph of the Main class in the testbed package. This seems to have been used for doing functional tesitng by the author, so it’s a good place to start.

This leads me to the DefaultIRCPeer class. The most interesting method there is the sendMessage method, with interacts with the private class MessageEventDispatcher.

I’ve been able to comment out the methods in this class, but it uses fireMessageReceivedEvent and fireMessageSentEvent from the IRCPeer class.

The fireMessage*Event methods call the EventDispatcher.fireEvent methods for the receive and send dispatchers. EventDispatcher.fireEvent simply delegates to an EventHandler class that then delegates to the listeners.

And now I’m going to switch gears and try to find a test server to use…

Nomic in Normalized English

October 8th, 2009

New thing I want to try: Convert the Nomic core rules into normalized english.

Yajii interpretion

September 25th, 2009

I got a itch to work on my code reading skills again, so I skimmed through my ancient structured programming book. It went over a technique for interpretation where you start with the smallest, least abstracted parts of code and work your way up. I’m going to do this with the YAJII library. First I’m going to get all the class names.

Grepping through all the directories gives 62 distinct classes. This is too big to go through by hand, so lets start on a python script to help us.

import sys
import os

def main(directory):
"""Find all the java classes recursively in a directory."""

# Recursively walk the directory structure, gathering all java files.

# For each java file found, search through it line by line for class
# definition headers.

# Return the list of class names found.

if __name__ == '__main__':
main(sys.argv[1])

So let’s start first with recursively walking the directories for Java files.

def get_java_files(directory):
    """Recursively walk the directory structure, gathering all java files."""
    # Walk the directory structure
    # for each java filename found, gather up the absolute path.

    # Return the absolute paths

Whipping up an implementation, that gives

def get_java_files(directory):
    """Recursively walk the directory structure, gathering all java files."""
    files = []
    # Walk the directory structure
    # for each java filename found, gather up the absolute path.
    for dirpath, _, filenames in os.walk(directory):
        for filename in filenames:
            if filename.endswith(’.java’):
                files.append(os.path.join(dirpath, filename)

    #Return the absolute paths
    return files

Now, I’m relatively certain that is correct, so I’m going to be “clean room” today and not test it. Now we’re left with main:

def main(directory):
    """Find all the java classes recursively in a directory."""

    # For each java file found, search through it line by line for class
    # definition headers.

    # Return the list of class names found.

Let’s sketch out the searching of each file found.

# For each java file found, search through it line by line for class
# definition headers.
for filename in get_java_files(directory):
    search_for_class(filename)

Very simple, now let’s define search_for_class, this is how it looks after defining, implementing and swapping back and forth between abstraction layers a few times:

def search_for_classes(filename):
    """Search file at filename line by line for class definition headers and
    return the names of classes found.
    """
    classnames = []
    # Search each line for a class declaration.
    # NOTE: This will miss classes that have a declaration on multiple lines.
    for line in open(filename):
        # If the line has a class declaration, extract the class name.
        if ‘class’ in line:
            classname = re.match(r’class\s([a-Z_][a-Z_]*)’).group(1)
            classnames.append(classname)

    return classnames

Note that it’s a heuristic approach, so it wont catch all valid java declarations, but it should catch all declarations that aren’t too crazy. Now, this implementation clearly shows a mistake in our main function. We have to assign the result of search_for_classes to something!

def main(directory):
    """Find all the java classes recursively in a directory."""
    classnames = []

    # For each java file found, search through it line by line for class
    # definition headers.
    for filename in get_java_files(directory):
        classnames.extend(search_for_classes(filename))

    # Return the list of class names found.
    return classnames

Now, this looks pretty complete to me, so let’s try it out.

import sys
import os

def get_java_files(directory):
    """Recursively walk the directory structure, gathering all java files."""
    files = []
    # Walk the directory structure
    # for each java filename found, gather up the absolute path.
    for dirpath, _, filenames in os.walk(directory):
        for filename in filenames:
            if filename.endswith(’.java’):
                files.append(os.path.join(dirpath, filename)

    # Return the absolute paths
    return files

def search_for_classes(filename):
    “”"Search file at filename line by line for class definition headers and
    return the names of classes found.
    “”"
    classnames = []
    # Search each line for a class declaration.
    # NOTE: This will miss classes that have a declaration on multiple lines.
    for line in open(filename):
        # If the line has a class declaration, extract the class name.
        if ‘class’ in line:
            classname = re.match(r’class\s([a-Z_][a-Z_]*)’).group(1)
            classnames.append(classname)

    return classnames

def main(directory):
    “”"Find all the java classes recursively in a directory.”"”
    classnames = []

    # For each java file found, search through it line by line for class
    # definition headers.
    for filename in get_java_files(directory):
        classnames.extend(search_for_classes(filename))

    # Return the list of class names found.
    return classnames

if __name__ == ‘__main__’:
    for classname in main(sys.argv[1]):
        print classname

Running it from the command line gives:

  • A syntax error
  • An import error
  • A bad method call to match
  • A bad regex character class
  • A NoneType error because match will return None if nothing matched
  • And no good output

So, the clean room experiment is a failure, in what way? Regular expression not described with enough intermediate steps.

Debugger will tell us what went wrong with everything else: get_java_files works as expected. The regex isn’t matching correctly. This implies that I need to extract the regex matching part out to allow for better testing and design.

def get_classname(line):
    """Extract class name from line with class declaration."""
    match = re.match(r'class\s*([a-zA-Z_]\w*)’, line)
    if match:
        return  match.group(1)

So, first step, let’s get this under test.

class TestGetClassname(unittest.TestCase):
    def test_get_classname(self):
        line = 'public class FAQ\n'
        classname = get_classname(line)
        self.assertEquals(classname, 'FAQ')

I ran it through the debugger to get actual sample data and ended up with this implementation:

def get_classname(line):
    """Extract class name from line with class declaration."""
    return line.partition('class')[2].strip()

Now, this of course wont work if the class extends anything. So I add the test:

    def test_get_classname_with_extends(self):
        line = 'public class JelpException extends Exception\n'
        classname = get_classname(line)
        self.assertEquals(classname, 'JelpException')

The following implementation passes both tests

def get_classname(line):
    """Extract class name from line with class declaration."""
    return line.partition('class')[2].split()[0].strip()

I’ve also noted all the limitations with NOTE comments in the source and created two files of examples from the YAJII project of legitimate class definitions and comments with the word class in them.

DSL in Python

September 6th, 2009

Fernando Meyer has made a nice DSL in python. It’ll be interesting to see where it heads. He just modifies the file using python’s tokenize library instead of AST. He hasn’t implemented import hooks to bootstrap it yet, but that should be easy to do.

This is another technique that could be used to implement the AST magic I posted about previously.

Endless Poem

August 26th, 2009

This is like the best user of Twitter, ever. An autogenerated poem based on tweets.

LaTeX, Pocket Mod and microlite20

August 25th, 2009

So, I absolutely love the pocketmod versions of microlite20 rules. But, I don’t like how the margins end up when you print them out and fold them, and I’d like to be able to make my own pocket mods without using their software.

I found this example of using postscript tools to make a pocket mod from a full size document. This is great if you’re converting something existing, but I used Lyx/LaTeX to create a pocket book of rules on tiny pages (one eighth the size of letter).

So, here’s my example of how to make a pocket mod:

pdf2ps microlite20.pdf microlite20.ps
pstops -w8.5in -h11in 8:0L(1w,0.75h)+1R(0,1h)+2R(0,0.75h)+3R(0,0.5h)+4R(0,0.25h)+5L(1w,0)+6L(1w,0.25h)+7L(1w,0.5h) microlite20.ps microlite20_pocket.ps
ps2pdf microlite20_pocket.ps

This will work for any document that’s already properly scaled. However there are some issues with margins.

  • Page 1 left margin too small
  • Page 2 right margin too small
  • Page 3 right margin too small
  • Page 4 good margins
  • Page 5 slightly too small on right (alright though)
  • Page 6 slightly too small on left (alright though)
  • Page 7 good margins
  • Page 8 good margins

These are probably artifacts of how I folded it. But I probably should just make the left/right margins 3 ems instead of only 2.