10.20.2013

Thoughts on Configuration

Recently, I was reviewing the ecosystem of configuration management tools, since
I've worked heavily in salt and chef. I am very impressed by the efforts of
everyone to improve how we handle configuration. It is getting a lot of
mind-share and seems to be one of the difficult problems to solve
today. However, I wonder if fine grained configuration is the right way to build
and deploy large applications. This motivated me to share some ideas I've had on
how we approach large projects and why we need configuration.

The use of a large projects is common because the reuse of code for complex
domains is vital to the progress of software engineering. I believe any
particular project grows to be large is due to two concepts; 1. simple projects
become complex when the solution space of a tool expands beyond its original
domain, 2. tools require configuration when they need to manage conflicting
domains. Projects that solve problems they weren't originally designed will
conflict with the design of the application. While that might be useful to the
current user base it creates flaws in the project's architecture.

Imagine a house which has been architected for 1 story and then adding another
story to it. This creates some structural issues for the house; the foundation
and exterior walls were never designed for that much weight. Over time that
house might have issues. A safe thing to do is to sure up the foundation and
strengthen the walls. What if you never consulted an architect and added another
story anyway; this is the state of many software projects.

Lets rewind a bit and talk about blueprints, which many software projects do not
have. In the face of changing requirements, a good architect would hopefully go
back and consult blueprints. A good architect is bound by the cost and time of
construction. [For some reason, we often don't consider the realities of
software costs, but that is for another post.] The plans will be adjusted
significantly before construction starts, but once it is underway very little
adjustment can happen. Certainly, any features added will be added at great
cost. The architect usually does not have a house which can be reconfigured to
fit the whims of a vast array once the house is constructed because a house is a
small project.

Software does not share the same costs associated with physical
construction. Most developers, myself absolutely included, will write software
before the problem is well defined. We will write software just to define a
problem. If not for the incredibly low cost of experimentation, this would be
similar to building a house to see if you thought it would be livable. We have
no other way to experiment with the objects that we create so we write code and
use it. Not using the code you write means you are not writing the right
code. One industry term for this is a 'spike' and it is similar to a first draft
of some house blueprints.

Yet there's a hidden danger in these small projects; spikes are often not thrown
away. Instead, you decide to keep it and cultivate it. Now you're starting to
add on that extra story or two or three. I like to imagine this is like an
architect taking a bungalow and making it a five bedroom mansion. There's little
thought given to projects the size of a house, but we will use them anyway
because they seem useful. Even less obvious is how much complexity gets added to
these small projects. Rather than only adding on a few bedrooms or an extra
floor, we are able to grow these projects into skyscrapers.

Suppose that you are designing a sports stadium. You know the stadium will be
utilized for many kinds of events only some of which you can plan for now. Given
the size of the project, a requirement for this structure is allowing the space
to be configured for many purposes. If you've never seen a floor change at an
arena like Madison Square Garden, they can take the main area of the arena from
hockey rink to basketball court to a concert hall in a matter of hours. All of
this is possible because the actual structure of the arena is simple and the
important features are configurable.

This stadium is carefully designed to make what must be changed flexible, while
keeping the remainder of the stadium static. Our architect has planned a sports
stadium to host a variety of events by using a loosely specified floor as the
foundation of the space. The administrators are free to fill that blank space
with a components such as basketball courts or concert stages and seating. The
outer shell of the stadium is much more rigid and can be that way because it has
the very clear task of seating a majority of the spectators.

Enter the configuration. By giving ever user a number of switches they can flip
on and off, the sky-scraping sports stadium can be molded to fit a variety of
uses. How do you even decide what should be configurable and what should be
static? Well, I've seen a variety of approaches, but usually whatever becomes
configurable is based on who wants a feature regardless of its benefit to the
overall project. If you're a developer working on this project, you will
probably "nerd out" and hack configuration into the code. It'll be challenging
and you love challenges; that's why you write software. Alas, I pity the poor
user's who end up on the receiving end of projects like these. At this point
they could be using the code because they wanted a house, a skyscraper, a
basketball court, or a concert hall. They will have to accept some complexity
and a lot of compromise with the configuration.

The worst part of large configurable software is that it tries to be everything
to everyone and ends up exposing a poor control mechanism for the solutions it
offers. Configuration is not a good API. It it completely tied to implementation
details of the configured application. It can only be tested through careful and
fragile integration tests. It often suffers from minimal documentation and
documentation drift. A typical user is just not prepared for that.

Configurations are not a rich data source. They are not a formal language which
we can reason about. Treating configuration as code is "bad". For anything
complex you have to write custom code around the existing project which was
never designed to be a library. The problem of configuration is so bad now that
we have entire projects dedicated to managing the configurations of other
projects. There are clearly some things that you would want to manage like user
accounts and network configuration, but there's also a large subset of projects
where your only way to deploy them is to fiddle with their configurations.

When you are designing and publishing libraries that do explicit tasks, you are
giving users all the power. They are free to select and compose your library's
features to get the functionality they require. If they do want to allow
configuration of these features in their system, they are absolutely free to do
that. They are the domain experts for their deployment. Pushing the
configuration upstream to where it is actually used puts the choice of how and
what can be configured in the experts hands. You no longer need to make any
assumptions about which knobs people will need.

Everything can't be configurable, it's just not sustainable. We need more
libraries. We don't need a command line interface with lots of options for every
tool out there. We don't need more glue layers for the sake of
configuration. Just write a decent library that does a thing or two really well
and test the hell out of it. When you consider the amount of fiddling people do
with configurations, this is a significant simplification for the implementation
of their system. You are putting the control back into the users hands, although
they might not always see it that way. Remember, when you let someone configure
your code, they're trusting you. Don't make promises you can't keep.