The Economics of Perfect Software - data, code and conversation

Ask 100 CEOs of software companies if they want to ship software with bugs. What will they say? 50 won’t answer at all, saying something about how bugs are a huge problem in the industry that needs to be addressed; 40 will say “Of course not!” and promptly call their shark tank in preparation for a lawsuit; 9 will hang their heads and say “we can’t help it”; and that last 1 will look you straight in the eye and say “Absolutely.”

I have no idea what that last guy’s doing heading up a software company, because he studied economics.

Software can’t be written bug-free, so if you want to ship perfect software you have to fix the bugs that burrow their way into your code. (And just to head this one off at the pass: No, unit testing, agile processes, scrum, and whatever methodology du jour you may be thinking of won’t prevent all bugs from entering your code base. If I’m wrong, I’m sure you’ll tell me in the comments.)

As you’d expect, the more time and money you throw at fixing bugs, the more bugs you’ll fix. But, unfortunately, our old nemesis from economics, the Law of Diminishing Returns, applies to this process. Formally, the Law states that “the marginal production of a factor of production starts to progressively decrease as the factor is increased, in contrast to the increase that would otherwise be normally expected.” In regular-people English, that just means that how much you get out of a process isn’t the same as what you put in across the board. Instead, you end up with a quick ramp on output at the low end of input, and a long tail on output at the high end of input.

For example, imagine a program has 100 bugs, and we know it will take 100 units of effort to find and fix all 100 of those bugs. The Law of Diminishing Returns tells us that the first 40 units of effort would find the first 70 bugs, the next 30 units of effort would find the next 20 bugs, and the next 30 units of effort would find the last 10 bugs. This means that the first 70 bugs (the shallow bugs) are cheap to find and squash at only 40 / 70 = 0.571 units of work per per bug (on average). The next 20 bugs (the deep bugs) are significantly more expensive at 30 / 20 = 1.5 units of effort per bug, and the final 10 bugs (the really deep bugs) are astronomically expensive at 30 / 10 = 3 units of effort per bug. The last 10 bugs are more than 5 timesmore time- and capital-intensive to eliminate per bug than the first 70 bugs. In terms of effort, the difference between eliminating most bugs (say 70%-90%) and all bugs is huge, to the tune of a 2x difference in effort and cost.

And in real life it’s actually worse than that. Because you don’t know when you’ve killed the last bug — there’s no countdown sign, like we had in our example — you have to keep looking for more bugs even when they’re all dead just to make sure they’re all dead. If you really want to kill all the bugs, you have to plan for that cost too.

So killing all the bugs in a program is expensive. But let’s imagine for a minute that a software company decides to do it anyway. Software companies don’t set goals like “ship with no bugs” — they set goals like “ship on November 19th” instead — so this new goal would require changes to the company’s testing team and/or development schedule (either planned or unplanned), which in turn would imply an increase in their budget. Now, who do you imagine will pay the difference on their budget? The Company? (Heh.) If you haven’t worked in software, let me give you a hint: uh uh. The company will pass the cost on to the customer. So if you like software you can afford, I have news: you like buggy software. (And Open Source software is the same, by the way, except that instead of having to pay more and wait longer, you’d just have to wait longer. And possibly put up with more-ornery-than-normal developers.)

Now, to be clear, I’m not saying that companies should ship software with lots of big bugs. I’m saying they should ship it with a few little ones.

How do you know whether a bug is big or little? Think about who’s going to hit it, and how mad they’ll be when they do. If a user who goes through three levels of menus, opens an advanced configuration window, checks three checkboxes, and hits the ‘A’ key gets a weird error message for his trouble, that’s a little bug. It’s buried deep, and when the user hits it, he says “huh,” clicks a button, and then goes on his merry way. If your program crashes on launch for a common setup, though, that’s a big bug. Lots of people will hit it, and they will all be pissed.

Ergo, I propose the Golden Rules for Deciding When Your Software Is Ready for Prime Time. The Golden Rules state that you should keep testing your software and fixing bugs until the new bugs you find:

Aren’t embarrassing to your company.
Won’t tick off your customers.

The cost of fixing all the bugs in your program and then being sure you fixed them all is way too high compared to the cost of having a few users hit some bugs they won’t care about. The mindset here is not to use your customers as your testers — you’re bound to violate the golden rules if you do that — but rather to recognize that not all bugs are created equal, and some bugs justify not shipping a product while others don’t. Don’t be afraid to ship software with bugs. If you’ve got a good product that people want, a couple bugs won’t bother them at all, especially if updates to your product are easy to deploy, as they are with SaaS or a web application.

If your testing passes the Golden Rules, then your customers want your software more than they want you to fix the few little bugs that are left. So release already!

Oh, and don’t forget to ask that last CEO for stock tips. Economists always have the best portfolios.