Have you ever worked on a system that was just impossible to maintain?
You spend hours trawling through the code until you finally think you understand what’s going on but when you make your change things fall apart. You introduce ten new bugs in places you thought had nothing to do with the code you changed.
You wade through line after line of code only to discover the method you are trying to understand isn’t being called anymore. It’s dead weight dragging the codebase down.
It feels like you’re seeing double, there are multiple pieces of code that seem to do almost the same thing but not quite. They are 90% the same with a few minor differences.
You are not alone.
This scenario is more the norm than an exception. Luckily there is an explanation and a way to avoid having your code end up in the same situation:
You need to tame complexity.
Complexity is the root cause of the majority of software problems today. Problems like unreliability, late delivery, lack of security and poor performance. Taming complexity is the most important task for any good programmer.
In the words of Edsger W. Dijkstra:
… we have to keep it crisp, disentangled and simple if we refuse to be crushed by the complexities of our own making.
So why is complexity so dangerous?
To avoid a problem you must first understand the problem.
The more complex a system is the harder it is to understand. The harder a system is to understand the more likely you are to introduce more unnecessary complexity.
This is the reason complexity is so dangerous. Every other problem in software development is either a side effect of complexity or only a problem because of complexity.
Complexity has the same impact on your codebase as compound interest has on your credit card balance.
… it is important to emphasise the value of simplicity and elegance, for complexity has a way of compounding difficulties.
-Fernando J. Corbató
How complexity impacts understanding
In a previous post I took an in depth look at how programmers understand code. For the purposes of this discussion we can simplify this to two broad approaches. Testing and informal reasoning. Both are useful but both have limits.
Complexity makes testing less effective
With testing you try to understand the code from the outside. You observe how the system behaves under certain conditions and make assumptions based on that.
The problem with testing is that all it tells you is how a system acts under the conditions you tested. It doesn’t say anything about how it would act under different conditions.
The more complex a system the more potential states it might be in. The more potential states a system has the more tests you need. Unfortunately you can never have enough tests.
…testing is hopelessly inadequate… (it) can be used very effectively to show the presence of bugs but never to show their absence.
-Edsger W. Dijkstra
Complexity makes informal reasoning more difficult
When using reasoning you try to understand the system from the inside. By using the extra information available you are able to form a more accurate understanding of the program.
If you have a more accurate understanding of the program you are better able to foresee and avoid potential problems.
The more complex a system the more difficult it becomes to hold all that complexity in your mind and make well informed decisions.
While improvements in testing will lead to more errors being detected improvements in reasoning will lead to less errors being created.
The three main causes of complexity
Before we can avoid complexity we need to understand what creates it.
Some complexity is just inherent in the problem you are trying to solve. Complex business rules for example. Other complexity is accidental and not inherent in the problem.
Your aim as a programmer is to keep accidental complexity to an absolute minimum. A program should be as simple as possible given the requirements.
The three main causes of accidental complexity are: state, control flow and code volume.
One of the first things my computer science teacher taught us in high school was to avoid global variables like the plague. They would cause endless bugs. As Forrest Gump would say:
A global variable is like a box of chocolates, you never know what you’re gonna get.
As you reduce the scope of a variable you reduce the damage it can do but you never really make the problem go away. This is why pure functional languages like Haskell don’t allow any state.
State makes programs hard to test.
Tests tell you about the behaviour of a system in one state and nothing at all about it’s behaviour in a another state.
The more states you have the more tests you need.
Even if you cover all the possible states (which is virtually impossible for any reasonably sized project) you are relying on the fact that the system will always act the same way given a set of inputs regardless of the hidden internal state of that system. If you have ever tried testing a system with even a tiny bit of concurrency you know how dangerous this assumption can be.
State makes programs hard to understand.
Thinking about a program involves a case by case mental simulation of the behaviour of the system.
Since the total possible state grows exponentially (the total number of inputs raised to the power of their possible individual states) this mental process buckles very quickly.
The other problem is that if a procedure that is stateless uses any other procedure that is stateful it also becomes stateful. In this way state contaminates the rest of your program once it creeps in. There is an old proverb that says “If you let the camel’s nose into your tent, the rest of him is sure to follow”.
Beware of the camel’s nose.
Control flow is any code that determines the order in which things happen.
In any system things happen and they must happen is a specific order so at some point this must be relevant to somebody.
The problem with control flow is that it forces you to care not just what a system does but also how it does it. In most languages this is a fairly tricky problem to solve as order is implicit.
Functional languages are slightly better at hiding exactly what is being done than pure imperative languages.
Compare a map function in a functional language to an explicit foreach loop. With the former you just need to know what “map” does, with the latter you need to inspect the loop, and figure out that it is creating a new set of values from an old set.
Since we mostly work in languages where control flow is implicit we need to learn a few tricks to limit its impact on our code. Unfortunately these tricks are outside the scope of this article. I’ll spend some time covering this topic at length in a future article.
This is the easiest cause of complexity to measure.
Code volume is often just a side effect of the previous two problems. It’s worth discussing on it’s own because it has such a compounding effect. The complexity of a system grows exponentially with the size of the code base and the bigger the code base the more complex the system.
This interaction quickly spirals out of control so it’s vital to keep a tight grip on code volume.
Secondary causes of complexity
Besides the three main causes discussed there are a variety of secondary causes of complexity that include:
- Duplicated code
- Dead code (unused code)
- Missing abstractions
- Unnecessary abstraction
- Poor modularity
- Missing documentation
This list can go on and on but it can be summarised with these three principles:
Complexity breeds complexity
Complexity is so insidious because it multiplies. There are a whole host of secondary causes of complexity that are introduced simply because the system is already complex.
Code duplication is a prime example. The more code you have the harder it is to know every piece of functionality in the system. Often duplication is introduced simply because you forget or never knew a piece of code already exists that does what you need.
Even if you know there is a piece of code that does something similar to what you want to do you are often not sure if it does exactly what you want. When there is time pressure and the code is complex enough that it would take significant effort to understand there is a huge incentive to duplicate.
Simplicity is hard
It often takes significant effort to achieve simplicity.
The first solution is hardly ever the simplest. It takes effort to understand the problem deeply and try a number of approaches until you find the simplest possible way to solve it.
This is hard to do, especially if there is existing complexity or there is time pressure. Luckily we never have to work with legacy code under unreasonable time pressures (note the sarcasm).
Simplicity can only be achieved if it is recognised, sought and prised.
Getting non technical stakeholders to understand this is difficult; especially since the cost of neglecting simplicity is only paid further down the line.
If the language you use gives you the option to do something that introduces complexity at some point you will be tempted to do it.
In the absence of language enforced constraints like Haskell’s immutability mistakes and abuses can and will happen.
Garbage collection is a good example where power is traded for simplicity. In garbage collected languages you loose the power of manual memory management. You give up explicit control of how and when memory is allocated. In return memory leaks become a less common issue.
In the end the more powerful a language is the harder it is to understand systems constructed in it.
Simplicity is not optional
Unnecessary complexity is a dangerous thing.
Once it creeps into your solution it grows like a cancer. Over time it strangles your ability to understand, change and maintain your code.
Every decision you make should prioritise simplicity. Learn to recognise complexity when you see it. Strive to find a simpler solution. Value clear code over complex solutions.
Your mission is to pursue simplicity at all costs.