12.31.2010

Happy New Year

To all my readers, if there are any, have a Happy New Year and a great 2011!

The code is the documentation

Yeah that's right, it is. If you're a working programming, you spend a good amount of your time writing in a language few people understand and then a bunch of time writing in english to help people understand what the hell your run-on overly complicated login function does. What you probably did is believe that you'd document it later and then you tried to be clever. That was a mistake; you're not Shakespeare or Chaucer. Your cut and paste, cargo-culted crap reads like a 4th grade essay on the death penalty. There needs to be a serious change on how we look at the work products produced from programming.

When you buy a cabinet, a chair, or anything from a master craftsman, do they design it with some clever way to sit in it and then write a manual in terrible english prose to explain how you should be sitting in it? NO!... Well at least not most of them, maybe some douche-bag avant-garde retro-bauhaus designer would, but who cares. Let's face it, in life most of what you do will not be sophisticated or clever or even fun. It will be work. The difference is not between how you deal with the opportunity for creativity but the opportunity for craftsmanship.

There are a few things that I've noticed in my short term in industry

  • You will do more craft than art in life (even if you're an artist)
  • If you do something at the limits of your comprehension, you will not understand it later
  • Cleverness leads to sadness

After writing those down a few days ago, I've had some time to reflect on what that all means. It also gave me time to decide that the first and last concepts, while important, are just not as useful as the second; I'm throwing them away for now. Really the most important thing you need to remember is:

If you do something at the limits of your comprehension, you will not understand it later

On second thought, maybe this post was the limits of what I can understand; I should stop trying to say something profound...

10.26.2010

Being lazy in Python

One of my favorite features of Haskell is its lazy evaluation of list comprehensions. This makes it possible to take the nth position item of a list without first having to calculate the entire list. Pretty sweet if you want to reason of infinite data sets such as the Fibonacci numbers or primes. In a perfect world, I'd be writing Haskell all the time. However, that's just not possible. Python is far more popular, is installed by default on many OSes and can make some tasks very simple. So really, I want lazy list evaluations in Python. The only way to implement this without hacking up the actual interpreters object space is to mess with iterators. So I've implemented the Fibonacci sequence as an iterator. Essentially, I use the next parameter to move over the list of Fibonacci numbers and calculate the next one. I then store any numbers which I have already calculated to prevent extra work. I then use that pre-calculated list along with further calculation to implement list subscripting and contains. One thing to note is that itertools.takewhile will continue iteration for one through the failing condition, not up to. Hence the existence of a rollback function. Overall it was a fun exercise on using iterators and who knows, maybe it's actually useful.

from itertools import takewhile


class Fib(object):
    i = 0
    items = list()

    def __init__(self):
        self.n = 0
        self.n_1 = 1

    def __contains__(self, item):
        if item in self.items:
            return True
        else:
            self.items += list(takewhile(lambda x: x <= item, self))
            self._rollback()
            return self.items[-1]

    def __getitem__(self, key):
        if key <= len(self.items):
            return self.items[key - 1]
        else:
            self.items += list(takewhile(lambda x: self.i < key - 1, self))
            self._rollback()
            return self.items[-1]

    def __iter__(self):
        return self

    def next(self):
        next = self.n + self.n_1
        self.n = self.n_1
        self.n_1 = next
        self.i += 1
        return next

    def _rollback(self):
        """
        When iteration gets stopped by takewhile we've gone one too far
        so it needs to be backed up one
        """
        self.n = self.items[-2]
        self.n_1 = self.items[-1]
        self.i -= 1
Update: I've got a gist going of this which includes some bug fixes for slicing.

8.16.2010

GSoC's Over

Well, it was fun, but as of today it's pencils down. I've tagged my repo with a snapshot of my summer's work. I'll be continuing dev of this project so don't despair! In addition, I will probably be speaking at pdxfunc in October about the project. Hope to see you there!

8.05.2010

Abstraction is Complexity

I've been thinking more and more about the issues associated with abstraction. I am becoming convinced that abstraction is not always a good choice and certainly should not be taken lightly. When developing structures that other developers will rely upon, you're coding for a much more critical and savvy audience. Overly complex and needlessly heavy-handed code is always the wrong way to do it. Your efforts in API design might be completely destroyed because of a few small choices. Why wrap up simple things like object constructors, when you can expose them instead? Don't abstract unless you have to and when you do don't abuse the privilege.

I've been working with JS a lot more and I think the patterns used in that language to give the appearance of classical inheritance are terrible. Let the language be what it is, not what you wish it was. Most directly, I'm talking about a JS charting library. It is clearly an API designed around object configuration, not object manipulation. In my attempts to provide a simpler API for less experienced developers to use, I've discovered the quirks that are inherent to the choices its designers made. I believe there is too much data hiding. If I wanted to configure an object on the fly, I'll need to reconfigure the whole thing because the API hides my direct access to these configuration functions. I see no reason for these to be hidden, but for these reasons I will abandon my efforts to simplify things further.

API designers, please trust your users more. I've learned this lesson the hard way. It is better to keep an API simple and require the user to do a little more work, than to wrap it up in a leaky abstraction; and lets face it, all abstractions are leaky. If you're going to abstract something, make it opaque and simple! Too much sekret sauce might make things look nice, but when it all goes wrong it won't be pretty.

APIs should provide all functions that are needed to manipulate the underlying public data structures. Don't make things private unless you need to; In fact, don't make it private unless its for internal use only. For example, if you have a chart with axes, and there is an Axis constructor, make that public. Selfish API's don't make me want to use your library. We're using your code to make our lives simpler not harder. If you've gone out of your way to hide important details --I'm looking at you Javascript!-- no one will thank you.

Remember, you don't know what your audience is assuming, maybe its better not to assume at all. And in the case that you do keep assumptions in your code, list them out somewhere! Maybe seeing the number of assumptions made is a good exercise in facing the complexity you've created, since your users will also need to keep track of them too. Now, I will grant that this opinion is a bit harsh. The ideal is probably somewhere in between. Without assumptions code is tedious and tedium leads to mistakes. However, more is not always better so I will leave you with this thought: magic is capitalizing on assumptions, but the best tricks are usually the simplest.

8.02.2010

FusionSQL: Python and Google's Fusion Tables

This weekend I release a very alpha version of a library I'm working on for connecting with Fusion Tables. I think applications of Fusion Tables will become more interesting as the feature set is improved and they are integrated more with existing Google services. The client and associated libraries are very simple and it can be installed from pypi (assuming its actually up!). Here's a sample session:

> SHOW TABLES
table id | name 
----------------
225017   | tests
224386   | neighborhoods_pdx_join.csv
224383   | neighborhoods_pdx.kml
224239   | PDX Crime Incident Data 2009
> DESCRIBE 224386 
column id | name | type
-----------------------
col0      | OBJECTID | number
col1      | PERIMETER | number
col2      | ASSN_ | number
col3      | ASSN_ID | number
col4      | NAME | location
col5      | COMMPLAN | string
col6      | SHARED | string
col7      | COALIT | string
col8      | CHECK_ | string
col9      | HORZ_VERT | string
> SELECT OBJECTID, PERIMETER, NAME FROM 224386 LIMIT 10
OBJECTID | PERIMETER              | NAME               
-------------------------------------------------------
1        | 74951.898437500000000  | ST. JOHNS          
2        | 63622.199218750000000  | HAYDEN ISLAND      
3        | 121064.000000000000000 | LINNTON            
4        | 132532.000000000000000 | FOREST PARK/LINNTON
5        | 44036.101562500000000  | KENTON             
6        | 188868.000000000000000 | FOREST PARK        
7        | 16725.900390620001417  | BRIDGETON          
8        | 35067.199218750000000  | EAST COLUMBIA      
9        | 2857.860107420000077   | MC UNCLAIMED #2    
10       | 26257.099609370001417  | CATHEDRAL PARK   

As you can see it works much like a sql command line client; there is even tab completion for the keywords and table numbers (I'm working on a mapping between saner names and the numbers). In addition, you can using the FusionSQL class to connection and query FusionTables via the query function.

w00t!

7.24.2010

PyMET: a simple api client for Trimet data

I've been hacking on and off this Trimet data wrapper for about a year now and I recently decided to give it a big push. My biggest point of contention with the project has been dealing with irregular XML api data and modeling that in python objects. It's an annoying and frustrating task in any circumstance. You must deal with parsing the xml, and then figure out how you'd like to shove it into your object properties for every single property. This means you spend more time figuring out how to map the data than actually using it.

I am really tired of this situation, it flat out sucks. That's why I've gotten lazy and tried to make it better. My new Trimet classes are based on a lazy xml api class which handles this mapping for you based on introspection of the returning data. Thanks to the BeautifulSoup project, discovering attributes, text values and children of XML objects is a very simple task. My only task was to provide the mapping. For this, I recursively loop over the elements and make a few important choices:
  1. choose to either add attributes and descend if any element has children
  2. put all elements which have identical tagname siblings into a list
With these goals for the algorithm, implementation was not too difficult. The main things to keep track of is when to append data vs. when to just add a new dict object (I am converting the entire xml tree into a dict/list object). I'm also doing so fun stuff with __getattr__ so you can acces properties of the object returned based on your schema and the data itself.

Now that I have the obvious bugs worked out, adding new XML apis based on this should be trivial. I think civic apps has some which I might try out.

* Warning, this code is still pretty rough and needs some cleanup, but if you know what you're doing it should be easy to follow the examples in arrivals.py

    7.15.2010

    Recent Presentations

    I've decided to start publishing all my presentations and their source to github. Most recently, I spoke at the pdxpython user group about my uses of PLY in Cockerel. The talk went well and I think everyone enjoyed it. I'll also be talkng on Sunday at PGDay about the PORTAL project. Should be a good time.

    7.11.2010

    2 Packages for Flask

    I've released 2 extensions for Flask today; Flask-Markdown and Flask-Jinja2Extender. Both are very simple, but handy when you're doing a lot of templating and working with Markdown. Also, it's import to note that both will be going under heavy development and should be considered unstable, blah, blah, blah. Otherwise enjoy!

    Edit:
    One more thing, you can also grab cockerel from pypi.

    7.06.2010

    Developing a DSL for logical reasoning

    DSLs are a great way to extend a program's accessibility to non-programmer domain experts. Generally, the approach is to reconstruct the operations needed for a given task, but using language and semantics that the end users are familiar with, not the programmers. It is the programmers job to design and implement the language. Here in the Cockerel project, I have been undertaking a similar exercise. The goal is to provide a language which can be used by novice logicians and students to form proofs. So far, I've developed a basic syntax:

    Lemma ex_6_19 P : (~~P) <-> P. 
    Universal_Intros.
    Split_Eq.
      P_with_CP.
      For P Use IP.
      For False Use (Contr H1 H2).
      P_with_CP.
      P_with_CP.
      For False Use (Contr H1 H2).
    Qed.
    

    Hopefully this proof is easily read, but I'll walk through it anyway. We begin by introducing our universally quantified variable [P]. (Its good to note that even though I am doing Propositional logic, there is an implicit [forall] in the goal state.) After the introduction, we divide the bi-conditional, [<->], into two implications and begin the main body of the proof. For [~~P -> P], we'll need to use a conditional proof technique. The matching tactical is [P_with_CP]. From there the [For Use] syntax allows use to write a new hypotheses [~P] for [P] using the Indirect Proof [IP] tactical. Finally we'll show [False] using a contradiction between our two hypotheses, [~~P, ~P]. The other direction is immediate from the implication.

    Ideally, a new student will be able to understand this language since it is very close to the language in their books. One area I am not a fan of is the lack of explication for some of the tacticals, [Univeral_Intros, Split_Eq, P_with_CP]. It might be a worthwhile addition to have the student also write out what those tactics are producing; this would be a similar behavior to the other tactics.

    Well that's it for now, happy proving!

    Building out a webpp using Flask

    Flask is a new micro-framework by the folks at Pocoo. I have to say I'm super impressed. The build-out is really flexible which I find as refreshing compared to Django. You can organize the application in any fashion you like; I've chosen to do a standard layout for a wsgi application. I've got a module for models, and a webapp which has submodules for my views. A nice feature of Flask is being able to register these modules on boot and instantiate all the routing; I am also initializing and managing the db connection when starting the app using the Flask-SQLAlchemy extension. Finally, I've got Flatland taking care of my forms. This overall architecture seems to his the sweet spot for my application needs. I'm knocking out tasks and taking names! Naturally all my code is on github by now...

    6.30.2010

    Moar Parsing

    I burned the midnight oil a bit last night working on my output serializer/parser and got it working. The grammar is nearly complete, with only 1 shift/reduce conflict. The parser will take a coqtop response, such as
    1 subgoal
      A : Prop
      B : Prop
      C : Prop    
      ============================
       A -> ~ ~ A \/ B > C
    """
    

    and turn that into a mostly correct python object, which for now is just a dict:
    {   'goal': {   'expr': (   {   'expr': ('A', '~ ~ A'), 'operator': '->'},
                                {   'expr': ('B', 'C'), 'operator': '>'}),
                    'operator': '\\/'},
        'hyp': [   {   'name': 'A', 'type': 'Prop'},
                   [{   'name': 'B', 'type': 'Prop'}, {   'name': 'C', 'type': 'Prop'}]],
        'subgoal': '1 subgoal'}
    

    It's pretty cool to see it work correctly on my sample strings, but the next step will be to bring it into the request/response lifecycle and see how well these objects will serialize. Still for now, progress has been made!

    6.20.2010

    Cockerel and Exposing processes to webpages

    A large part of this project is allowing a user to connect to a webpage containing a form and interact with a foreign process. By design, I am encouraging remote execution of code on my server. Now, this is a decently solved problem with sites like Code Pad and projects like Geordi, but that doesn't make the particulars of implementation any easier! I've found that a large number of choices need to be made when implementing a system in this way. They range from user access and sandbox environments to tracking state in a web session.

    Over the last two weeks, I've completed the basic development for Cockerel to offer a web form which you can prove theorems in. In other words, the site's up! (but only figuratively). There is a server process for multiplexing coqtop sessions called coqd. This is an extension of the Twisted Protocol which spawns a new pexpect coqtop process. Just using this alone, a user can send JSON of this schema
    POST
    { "userid": "id",
      "command": "cmd_str"
    }
    
    Response
    { "userid": "id",
      "response": "resp"
    }
    

    For now, the website just passess the response through as plain text, but I am working on an output parser using PLY for better styling. There is no requirement to run my webserver to use coqd.

    On the front end, I'm using Flask to build out my website. At the moment there's only one page. It's a simple form to send in proofs. You can only step through them, and its not going to be too nice to bad proofs. The proofscript line counter I'm using might be a little buggy at the moment, but it works for most cases. Sooner or later I'll need to add Undo.

    Well that's all I've got for now, much more to come! Since there is a page that can be used, now's a good time to pull the code and give it a try. I'll update the README for which dependencies are required for install, but a short list is
    flask
    twisted
    pexpect
    

    Enjoy!

    6.08.2010

    Why Twisted Changed My Life

    So at Open Source Bridge I ran into the folks at OSU's Open Source Lab. They were a huge help for flushing out the high level architecture of my connection server for Cockerel. Their project, pydra, was on my list of libraries for implementing a non-blocking i/o server. Had I done it that way, I would have been in for some pain. That is definitely not
    what pydra was designed for; its great for large processing jobs over networks. Instead, they suggested using Twisted.

    For those who don't know, Twisted is a Python library designed for building async-network servers. Internal classes include, protocols for connections, bare-bones ssh clients, irc clients/servers, and a variety of other utilities for building your async-server/client
    application. The maturity of the libraries allowed me to build out a simple connection protocol in a little over 30 minutes.

    Implimenting a server protocol in Twisted is very simple. You can begin with the base protocol.Protocol, for example, and then define a dataReceived() method that processes the data. That's pretty much it! Now, dealing with exceptions and other particulars is clearly going to take most of the dev time, but the basics of your server can be completed just like that. I highly recommend trying it out.

    Thanks again to OSL! Y'ALL ROCK!

    6.04.2010

    Cockerel: A simpler Proof Checker

    I'll be working on the Cockerel web-based proof checker for Google Summer of Code. It is specifically designed to make proof writing and verification a productive process for introductory students. Portland State University is providing mentoring and an initial deployment for this system. The project gets it name from the backend logic engine, Coq. You can track my progress from the project git repo.

    git://github.com/dcolish/Cockerel.git

    init

    Well I think at least one post is required for my blog to actually be a blog. Here it is.