10.20.2013

Thoughts on Configuration

Recently, I was reviewing the ecosystem of configuration management tools, since
I've worked heavily in salt and chef. I am very impressed by the efforts of
everyone to improve how we handle configuration. It is getting a lot of
mind-share and seems to be one of the difficult problems to solve
today. However, I wonder if fine grained configuration is the right way to build
and deploy large applications. This motivated me to share some ideas I've had on
how we approach large projects and why we need configuration.

The use of a large projects is common because the reuse of code for complex
domains is vital to the progress of software engineering. I believe any
particular project grows to be large is due to two concepts; 1. simple projects
become complex when the solution space of a tool expands beyond its original
domain, 2. tools require configuration when they need to manage conflicting
domains. Projects that solve problems they weren't originally designed will
conflict with the design of the application. While that might be useful to the
current user base it creates flaws in the project's architecture.

Imagine a house which has been architected for 1 story and then adding another
story to it. This creates some structural issues for the house; the foundation
and exterior walls were never designed for that much weight. Over time that
house might have issues. A safe thing to do is to sure up the foundation and
strengthen the walls. What if you never consulted an architect and added another
story anyway; this is the state of many software projects.

Lets rewind a bit and talk about blueprints, which many software projects do not
have. In the face of changing requirements, a good architect would hopefully go
back and consult blueprints. A good architect is bound by the cost and time of
construction. [For some reason, we often don't consider the realities of
software costs, but that is for another post.] The plans will be adjusted
significantly before construction starts, but once it is underway very little
adjustment can happen. Certainly, any features added will be added at great
cost. The architect usually does not have a house which can be reconfigured to
fit the whims of a vast array once the house is constructed because a house is a
small project.

Software does not share the same costs associated with physical
construction. Most developers, myself absolutely included, will write software
before the problem is well defined. We will write software just to define a
problem. If not for the incredibly low cost of experimentation, this would be
similar to building a house to see if you thought it would be livable. We have
no other way to experiment with the objects that we create so we write code and
use it. Not using the code you write means you are not writing the right
code. One industry term for this is a 'spike' and it is similar to a first draft
of some house blueprints.

Yet there's a hidden danger in these small projects; spikes are often not thrown
away. Instead, you decide to keep it and cultivate it. Now you're starting to
add on that extra story or two or three. I like to imagine this is like an
architect taking a bungalow and making it a five bedroom mansion. There's little
thought given to projects the size of a house, but we will use them anyway
because they seem useful. Even less obvious is how much complexity gets added to
these small projects. Rather than only adding on a few bedrooms or an extra
floor, we are able to grow these projects into skyscrapers.

Suppose that you are designing a sports stadium. You know the stadium will be
utilized for many kinds of events only some of which you can plan for now. Given
the size of the project, a requirement for this structure is allowing the space
to be configured for many purposes. If you've never seen a floor change at an
arena like Madison Square Garden, they can take the main area of the arena from
hockey rink to basketball court to a concert hall in a matter of hours. All of
this is possible because the actual structure of the arena is simple and the
important features are configurable.

This stadium is carefully designed to make what must be changed flexible, while
keeping the remainder of the stadium static. Our architect has planned a sports
stadium to host a variety of events by using a loosely specified floor as the
foundation of the space. The administrators are free to fill that blank space
with a components such as basketball courts or concert stages and seating. The
outer shell of the stadium is much more rigid and can be that way because it has
the very clear task of seating a majority of the spectators.

Enter the configuration. By giving ever user a number of switches they can flip
on and off, the sky-scraping sports stadium can be molded to fit a variety of
uses. How do you even decide what should be configurable and what should be
static? Well, I've seen a variety of approaches, but usually whatever becomes
configurable is based on who wants a feature regardless of its benefit to the
overall project. If you're a developer working on this project, you will
probably "nerd out" and hack configuration into the code. It'll be challenging
and you love challenges; that's why you write software. Alas, I pity the poor
user's who end up on the receiving end of projects like these. At this point
they could be using the code because they wanted a house, a skyscraper, a
basketball court, or a concert hall. They will have to accept some complexity
and a lot of compromise with the configuration.

The worst part of large configurable software is that it tries to be everything
to everyone and ends up exposing a poor control mechanism for the solutions it
offers. Configuration is not a good API. It it completely tied to implementation
details of the configured application. It can only be tested through careful and
fragile integration tests. It often suffers from minimal documentation and
documentation drift. A typical user is just not prepared for that.

Configurations are not a rich data source. They are not a formal language which
we can reason about. Treating configuration as code is "bad". For anything
complex you have to write custom code around the existing project which was
never designed to be a library. The problem of configuration is so bad now that
we have entire projects dedicated to managing the configurations of other
projects. There are clearly some things that you would want to manage like user
accounts and network configuration, but there's also a large subset of projects
where your only way to deploy them is to fiddle with their configurations.

When you are designing and publishing libraries that do explicit tasks, you are
giving users all the power. They are free to select and compose your library's
features to get the functionality they require. If they do want to allow
configuration of these features in their system, they are absolutely free to do
that. They are the domain experts for their deployment. Pushing the
configuration upstream to where it is actually used puts the choice of how and
what can be configured in the experts hands. You no longer need to make any
assumptions about which knobs people will need.

Everything can't be configurable, it's just not sustainable. We need more
libraries. We don't need a command line interface with lots of options for every
tool out there. We don't need more glue layers for the sake of
configuration. Just write a decent library that does a thing or two really well
and test the hell out of it. When you consider the amount of fiddling people do
with configurations, this is a significant simplification for the implementation
of their system. You are putting the control back into the users hands, although
they might not always see it that way. Remember, when you let someone configure
your code, they're trusting you. Don't make promises you can't keep.

4.27.2012

I mostly write throw away code

I'm pretty sure Github (any other social code sharing site is also guilty) has changed my coding habits forever. What was once a mostly private activity followed by some sharing is now extremely public. Why do I choose to push code there which I've spent few hours working on. Well, it's hard to resist the energy. You follow all your friends and see all the projects they've been working on and who's committing to what. Regardless of how good anything that is being committed might be, you want to share too. Not committing for even a few days feels like you're falling behind in a race. I don't even know when this race finishes; maybe it's just an endless sprint to be something. Surely my willingness to produce and share the amount of code that I feel is necessary to keep up is taking a tole on my quality of work; how could it not?

Most of my code lacks documentation. I'm not even talking about formal docs; it's lacking those too. No, it usually lacks even the slightest organized overview of what it does and why anyone would really want to use it. A few months later, I might not even remember why I wanted to use it and chances are I don't anymore. This code should never have been released. On the insanely small chance someone else stumbles onto it, they have a near zero clue of what it's for.

I usually wing the API design. Heck, flying by the seat of your pants never hurt anyone right? This berserk idea that I can get an API design right just by thinking it up on the spot is so wrong. If anyone were to tell me they just thought up a new API and coded it up immediately, I'd know they got it wrong. Why don't I bother setting the same standards for the code I share? I'm stuck maintaining bad API's if anyone else uses it and I'm stuck using the bad API if I use it. Its sadistic and dumb. I have absolutely not taken the time to design a modular, transparent, discoverable library. This is not something that can be done overnight, but I sure act like it can.

Tests. Pfft.

So, why do I share throw away code? Well, sometimes an idea or two in it is worth sharing. Most of the code is short enough that it can be read in a few minutes. I guess I am really sharing so others can read it, not use it. At the end of the day, I love the ideas that I produce more than the code that represents them. Ideas are representable so many ways, but code speaks the clearest to me; other programmers might agree. At the end of the day, most of what I'm sharing is bad code representing interesting ideas and if you take an honest look of what's hosted on Github, you would see everyone else is doing that too. There is only so much time for doing things the right way. There is a lot of time for sketching ideas and experimentation. Being good is knowing the difference and when it matters.

1.07.2011

Beginnings of a python analyzer

Analyzing the source code your running is a massive task. An enormous amount of effort goes into search for and fixing software faults. One thing I've always wondered is how would an analyzer (static or runtime) look if it were examining python. Being a dynamically typed language, python can be extremely tricky to analyze from just its syntax. However, thanks to Armin Ronacher it does have one excellent feature which could allow deeper analysis; direct access to a parsed ast. With this module you can examine, manipulate or mangle the language any way you see fit.
I've decided to attack the number one bug creator in my opinion, passing None to a function. If this were C, that would be like passing NULL. I can't really see why there would be a good place to do that and in my experiences its usually causing a bug. Beyond the basic reasons, its just semantically wrong to be passing around nothing. It sort of defeats the point of providing arguments at all. In python, we've got really nice default argument syntax so if you must have None for a argument, make that the default and don't pass anything. If you can't avoid this due to some API then I'm sorry.
Anyway, here's how I went after it; first I created a class which extends NodeTransformer and added this method:

def visit_Call(self, node):        
 self.generic_visit(node)        
 for x in chain(node.args, node.keywords):            
  x = x.value if hasattr(x, 'value') else x            
  if isinstance(x, Name) and x.id == 'None':            
   warn("Passing None Argument to function", RuntimeWarning)
 return node

Then I just use that method in a simple function to create the ast and walk it:

with open(sys.argv[1]) as f:    
 source = f.read()    
 ast = parse(source, sys.argv[1])    
 node = StaticTransformer().visit(ast)

Now when I call this script and pass in any compilable Python code, if there are any warnings I'll know about them and be able to fix them before the code gets used elsewhere. It is easy to add more methods to check ast nodes, you just follow the visit_<Node> pattern for naming the functions in your NodeTransformer class. I've created a gist with the code that I've written so far, feel free to fork and contribute!

12.31.2010

Happy New Year

To all my readers, if there are any, have a Happy New Year and a great 2011!

The code is the documentation

Yeah that's right, it is. If you're a working programming, you spend a good amount of your time writing in a language few people understand and then a bunch of time writing in english to help people understand what the hell your run-on overly complicated login function does. What you probably did is believe that you'd document it later and then you tried to be clever. That was a mistake; you're not Shakespeare or Chaucer. Your cut and paste, cargo-culted crap reads like a 4th grade essay on the death penalty. There needs to be a serious change on how we look at the work products produced from programming.

When you buy a cabinet, a chair, or anything from a master craftsman, do they design it with some clever way to sit in it and then write a manual in terrible english prose to explain how you should be sitting in it? NO!... Well at least not most of them, maybe some douche-bag avant-garde retro-bauhaus designer would, but who cares. Let's face it, in life most of what you do will not be sophisticated or clever or even fun. It will be work. The difference is not between how you deal with the opportunity for creativity but the opportunity for craftsmanship.

There are a few things that I've noticed in my short term in industry

  • You will do more craft than art in life (even if you're an artist)
  • If you do something at the limits of your comprehension, you will not understand it later
  • Cleverness leads to sadness

After writing those down a few days ago, I've had some time to reflect on what that all means. It also gave me time to decide that the first and last concepts, while important, are just not as useful as the second; I'm throwing them away for now. Really the most important thing you need to remember is:

If you do something at the limits of your comprehension, you will not understand it later

On second thought, maybe this post was the limits of what I can understand; I should stop trying to say something profound...

10.26.2010

Being lazy in Python

One of my favorite features of Haskell is its lazy evaluation of list comprehensions. This makes it possible to take the nth position item of a list without first having to calculate the entire list. Pretty sweet if you want to reason of infinite data sets such as the Fibonacci numbers or primes. In a perfect world, I'd be writing Haskell all the time. However, that's just not possible. Python is far more popular, is installed by default on many OSes and can make some tasks very simple. So really, I want lazy list evaluations in Python. The only way to implement this without hacking up the actual interpreters object space is to mess with iterators. So I've implemented the Fibonacci sequence as an iterator. Essentially, I use the next parameter to move over the list of Fibonacci numbers and calculate the next one. I then store any numbers which I have already calculated to prevent extra work. I then use that pre-calculated list along with further calculation to implement list subscripting and contains. One thing to note is that itertools.takewhile will continue iteration for one through the failing condition, not up to. Hence the existence of a rollback function. Overall it was a fun exercise on using iterators and who knows, maybe it's actually useful.

from itertools import takewhile


class Fib(object):
    i = 0
    items = list()

    def __init__(self):
        self.n = 0
        self.n_1 = 1

    def __contains__(self, item):
        if item in self.items:
            return True
        else:
            self.items += list(takewhile(lambda x: x <= item, self))
            self._rollback()
            return self.items[-1]

    def __getitem__(self, key):
        if key <= len(self.items):
            return self.items[key - 1]
        else:
            self.items += list(takewhile(lambda x: self.i < key - 1, self))
            self._rollback()
            return self.items[-1]

    def __iter__(self):
        return self

    def next(self):
        next = self.n + self.n_1
        self.n = self.n_1
        self.n_1 = next
        self.i += 1
        return next

    def _rollback(self):
        """
        When iteration gets stopped by takewhile we've gone one too far
        so it needs to be backed up one
        """
        self.n = self.items[-2]
        self.n_1 = self.items[-1]
        self.i -= 1
Update: I've got a gist going of this which includes some bug fixes for slicing.

8.16.2010

GSoC's Over

Well, it was fun, but as of today it's pencils down. I've tagged my repo with a snapshot of my summer's work. I'll be continuing dev of this project so don't despair! In addition, I will probably be speaking at pdxfunc in October about the project. Hope to see you there!