Archive for the ‘computer science’ Category

Sequitor Based Reinforcment Learning Chatbot

Wednesday, June 16th, 2010

A reinforcement learning based chat bot can be implemented by using the context free grammars created by the sequitur algorithm.

As phrases are learned by the chat bot, the sequitar algorithm creates grammar sub trees based on the existing corpus. These sub trees are cataloged and can be used as a state space. Simplistically, a state consists of a vector equal in length to the number of sub trees, with a bit flipped for whether that specific subtree was seen in the input.

Based on the state, various actions can be taken. New phrases could potentially be generated using the same grammar model, or other actions could be taken.

Reinforcement “simulations” or dreams could be executed between each user input.

Primary difficulties: Keep states consistent as new additions to the state space are added, Generating new phrases based on the captured grammar.

DSL in Python

Sunday, September 6th, 2009

Fernando Meyer has made a nice DSL in python. It’ll be interesting to see where it heads. He just modifies the file using python’s tokenize library instead of AST. He hasn’t implemented import hooks to bootstrap it yet, but that should be easy to do.

This is another technique that could be used to implement the AST magic I posted about previously.

YAJII I chose you!

Thursday, August 20th, 2009

So, I’ve basically finished my refactor/resuscitation of PyGP (except for the part where I actually package it and release it). So it’s time to select my next victim.

Since I mostly do Java at work, I thought I’d select a Java project. I’ve found such a project in YAJII, a java implementation for an IRC client. It doesn’t appear to have any unit tests (or a build system) but it does have some documentation. The author’s current webpage is here.

My real goal would be to use this as a project for a coding Kata, potentially to be presented at the OSUG, depending on the interest I get, I could then spin off a sub-group or another meetup for code Katas in general.

Before I’m able to do a Kata I’ll have to do the following:

  1. Get it to build with ant or some other build tool (ant + ivy, maven, buildr, scons…)
  2. Run the “testbed” application through it’s paces.
    • This would follow the context driven testing methodology, maybe a 1/2 hour session to find a big problem to fix. “Go through all the menus to get a feel, read the user manual create a test outline.”
  3. After the problem the big problems from the testing perspective have been found, I’ll use the saff squeeze to isolate it with a unit test. This will be an iterative process of getting the program into a test framework, starting with a “smoke test” covering much of the stack and iterating towards the point that the saff squeeze will lead me to. These will use various techniques from Working Effectively With Legacy Code to get the code unter test.
  4. Finally, use techniques from WEWLC to fix the problem/add the feature found in exploratory testing.

Since this is a Kata, it will mostly focus on getting good technique. It’s okay to go down false allies to figure out what’s happening. The techniques that will be covered are:

  1. Dealing with build systems, especially of code that isn’t under one yet.
  2. Exploratory testing (Context-driven school). A very different skill that most developers don’t have.
  3. Getting code under automated tests.
  4. Converting a exploratory test into a fully automated test. Both acceptance level and unit level.
    1. This includes dependency breaking, the saff squeeze and test doubles.
  5. Fixing code based on automated tests.
  6. Verifying a code fix manually after automated tests.

I think this would hit a lot of the skills needed for software maintenance. This seems to be large enough to break into multiple Katas.

My concern is that it may be too big to do as an introductory Kata, maybe something else would be better? Bowling scores, or maybe one of the problems from S.Lott’s blog?

AST Manipulation fun for when I have time

Sunday, November 2nd, 2008

Nice asserts

Currently python asserts are of the form

assert EXPRESSION, MESSAGE

It’d be really nice if an assert of the form

assert a < b < c

would display display a message equivalent to this (with a = 2, b = 1, c = 3)

AssertaionError: 2 < 1 < 3

I think I can do this for most Boolean expressions by doing an AST transform. So something like

assert a.foo() < g.bar() != f("fizzle")

Would be transformed into:

_1, _2, _3, = a.foo(), g.bar(), f("fizzle")
assert _1 < _2 != _3, "%s < %s != %s" % (_1, _2, _3)

This would happen right before methods/functions were run. Hopefully, by using AST it will be more portable to other python implementations (This is the primary reason cited by Guido for why adding this type of assertion to the CPython implementation wont be supported).

Mutation Testing

Mutation testing plugin for nose. Mutate Boolean expressions and constants akin to Pester, only using an AST transform instead of text substitution. This could potentially increase the types of mutations possible (shuffle orders of statements executed, change parameters, etc.)

Difficulties:

  • Determing what’s under test to change (perhaps use coverage/figleaf to figure out what to change?)
  • Keeping track and reporting mutation results to tester (should be relatively straightforward)
  • Doing this quickly — mutations usually take a long time. (Only mutate files specified by user?)

Genetic programming

AST tree manipulation can also be used to generate syntactically correct statements automatically. Assuming I find or create an acceptable cross-over algorithm (perhaps steal the one from pygp) AST will make this relatively straightforward.

Random stuff so I don’t forget

Wednesday, October 29th, 2008

Freemind management of mnemosyne cards is an awesome idea! Create and organize subject matter in FreeMind, then export (leaves probably) to Mnemosyne cards with containing branches as categories. The tree could also be exported to a LaTeX memoir document, so it’s three in one: Tree, narrative, and notes. OOooh, I really like this idea.

  • Freemind needs a: spellchecker, automagically updated file view and clone nodes.
  • Use python 2.6 ast module to make pretty asserts that use the assert keyword.
  • the ast module could also be used to implement something akin to the Eiffel “old” keyword. But I think a registry of expressions to evaluate before the method and are accessable by name e.g.
    • old(expr1=lambda s: s.attr, expr2=lambda s: s.method()) and accessable by old.expr1, old.expr2
  • would be cleaner to implement.
  • Cmockery is hugs and cuddles, now all I have to do is add it to our make file for OS, mwahahahah!
  • Pythoscope equivalent for C would be interesting, and probably useful. Although really hard to implement…Whoever did it would become really good at C, though.
  • Contracts for python need to handle functions as well as classes (perhaps through wrapper class with __call__ or some such?)
  • Also looking at automake and autotools, makefiles don’t look as brain melting as they used to.
  • Automatically generating Sequence diagrams based on unit test or code execution would be AWESOME, pyumlgraph already does it for class relationship diagrams (generally, at least)

Firefox extension for incremental reading

Monday, October 20th, 2008

What’d be really cool would be a firefox extension that allows you to highlight phrases, pops up a quick dialog box and then creates flashcards for mnemosyne. I think that this would basically give you all you needed for incremental reading. (With the addition of a bookmarking system, hey, Firefox already has that! :))

More thesis ideas and time tracking

Monday, October 6th, 2008

Well, I’ve got a few more ideas for my Master’s thesis: Simulations/Discrete simulation. This may dovetail nicely with bayesian inference, genetic programming and scientific computing. I may also be able to figure out how to associate it with autodidactic software. It’d also probably be a good career choice to develop expertise in simulations.

I looked through the course catalog at UCCS for CS graduate classes and the things that stood out to me were computer learning algorithms and software engineering/design courses (Design By Contract, object oriented problem solving, etc.)

I’ll figure out a great thesis topic eventually. I definitely want to do something new and interesting, not just a review of existing literature.

Oh! Another potential for exploration is procedural content generation.

Doing TDD in Django

Monday, August 18th, 2008

So, as part of this link ranking website project, I’m trying to do TDD In Django, I’ve got a nice development setup with virtualenv with django installed locally in my development folder. My apps are added to sys.path through the PYTHONPATH environment variable and I set the DJANGO_SETTINGS_MODULE variable depending on what I’m developing at the moment. I also used easy_install to get the latest version of nose, my favorite test runner.

So, the first thing I do is write a simple smoke test to make sure that my application is setup correctly:

from django.test import TestCase

class IndexFunctionalTests(TestCase):

    def testIndexReturns200(self):
        response = self.client.get('/')
        self.assertEqual(response.status_code, 200)

This requires the DJANGO_SETTINGS_MODULE environment variable to be set. I copied what django-tagging does, and have a tiny project settings.py specified within the tests folder. I also set up the urls.py file in the linkranking directory and reference it in tests.settings file.

This psuedo project will need to be expanded once I start developing templates. And I’m still running into problem #8358. I really should try this patch to see if it fixes it.

So far using nose with Django is pretty straightforward, if I use the Django TestCase object for all the database and client related stuff.

Functional Tests

Thursday, August 14th, 2008

Here’s the first few functional tests that I’ve based off of the usecases for the new link aggregation website.

Index Page

When an anonymous user visits the main page, the user should see:

  • A Submit Link hyperlink.
  • A tag cloud.
  • A list of the top links from the past week. Each item in the list should have:
    • A hyperlinked title
    • A short description
    • A link to the user profile that submitted the link.
  • Links labeled month, day, year, all time. and a ghosted link labeled week in the upper corner of the list of links.
  • A login/create account link

When a known user visits the main page, the user should see:

  • Everything an anonymous user sees except that the links day, week, month, year, all time should be ghosted based on which date range the user had selected previously
  • A settings link.

Aggregated Design So Far (with more details)

Monday, August 11th, 2008

This is a continuation of this post and this post, showing how the examples gain more specifics as time goes on.

Use Cases

Submitting a Link

Bobby finds a neat link about plate tectonics. he goes to the website “tectonic-plate-o-matics.com” to enter the link to the community. He sees the current links ranked by rating. In a corner, he sees “Submit link”. He clicks it and and is given a form that has the following inputs:

  • Link URL
  • Link title
  • Link Description
  • Tags

He then submits the link and sees it show up on the front page. Note: Since Bobby isn’t logged in when he submits the link, it doesn’t show up for other anonymous users. Just the submitter and any users that have the “See anonymous links” set to true. For Bobby, It is also artificially floated to the top of the link list regardless of its current ranking, since Bobby submitted it.

Visiting A Link

Bobby now clicks on one of the links on the front page. He is taken, in a unobtrusive frame, to the link. The frame has a few buttons or inputs on it, including:

  • rank +1
  • rank -1
  • add tags
  • report link

When he clicks on any of these, he’s taken to a login page, the login page supports either open id or creating a user account on the web page itself. After he’s logged in, he is returned to the page he was on. It also contains a list of current tags for the link.

It also has a prominent “close X” button which cleanly closes the frame.

Ranking Links

Sally is an active member of the link ranking site. She goes there multiple times a day to check out new links about plate tectonics and moderate them, so she always has her login cookie set. When she goes to the homepage, she is presented the top links for the day (since this is the setting she has selected last time, the default is for one week).

She clicks on the link she finds most interesting, “Daffy Ducks view of plate tectonics”, and visits it. She then uses the upper frame to moderate the link with a score of +1, she hits back and is returned to the homepage, the link she moderated is closer to the top of the list because of her moderation.

Note: There needs to be some sort of trust metric (advogato style?) for reputation of moderators, and strength of their moderations. The interface must also be designed so it isn’t like reddit.com, it’s not a news site, just an index.

Reporting Dead Links

Sally then clicks on another link “Tectonic Plates of the ancient world” which brings her to a 404 page, she then clicks the “report bad link” button on the top frame and selects “page not there anymore” from the drop-down menu. When she clicks submit, a status message is displayed in the frame saying something like “Thank you for your attention to detail!”

Note: Other options could be “inappropriate link”, “link moved”, etc. We also need to handle tagging/incorrect tagging somehow…

Main Page

The main page has three main sections:

  • Header, including site logo, title, login, and settings link
  • Sidebar consisting of a Tag Cloud and other navigation links
  • Links, which is a list of links that have been rated in the past week with the highest Bayesian based ranking. Each link has a title and a short description and a link to the profile of the user who submitted it.
    • This also has links to display the links by rank from the last day, month, year and all time.
    • It also shows which, if any, tag is selected for the links.