The search for the perfect software might be never ending, but it's not pointless
I just reread the classic, Zen and the Art of Motorcycle Maintenance (see
reference). One section of the author's journey involved his attempt to define quality.
Significantly paraphrasing his excellent text-and with apologies to his logical
arguments-he says that you can't define quality, but you know it when you see it.
His arguments led me to think about a problem I was having at my company. One of our
services is developing custom software for both embedded systems and PCs. The problem is
how to decide when software is "finished" and acceptable for shipping. This
simplistic approach says to wait until all the bugs are out. However, pristine bug-free
code doesn't exist. All software has bugs waiting for the right combination of conditions
to rear its ugly head. Hence, my dilemma is really figuring out how to meet an acceptable
quality level while balancing out the cost of finding the next bug. To begin this
discussion, I'll consider the morphology of computer bugs and some of the tools available
to help you squash them.
My two decades in this business have taught me that every program has an undiscovered
bug-although I like the word defect better. I guess these years have taught me to accept
these defects but to do everything possible to minimize them. The most fundamental tool in
this minimization process is to reduce the potential for defects through good design
methodologies. Likewise, good design can help minimize the effect of defects by isolating
them from the rest of the software.
However, all software defects aren't equal. Rather, they come in many different
flavors, and their seriousness depends on factors such as the system they're found on, the
application and the intended user. For example, we've all experienced having a PC lock up.
While extremely annoying, such occurrences are rarely a reason for returning a PC to its
manufacturer, and large commercial software vendors typically won't do anything about such
problems, either. Their products work most of the time, and that's all their fine-print
license agreements state. The bottom line is that as long as a desktop computer performs
well enough to justify its existence, you keep using it, defects and all.
In the embedded world, though, software defects can take on a more ominous role that
you can't take as lightly. Next time you reach into a microprocessor-controlled microwave
oven consider the possibility of a line-voltage hiccup causing the processor to go
spastic, and ZAP, your hand's a bit warmer. Hence, when classifying software defects I
draw three basic distinctions. First, a severe defect is one that makes the product
unusable and/or dangerous. An example of this type of problem is the infamous Therac-25
and its propensity to burn holes in people (not just their tumors) due to a bug in the
software controlling the radiation exposure. Another less-serious example, but still
potentially severe, might be a DVM that reports the wrong reading at certain voltage
Second, an annoying defect might be a car computer that causes the engine to misfire
once every hour. It's not nice, but the system is still usable. Third, a cosmetic problem
might consist of a misspelling in screen text or improper debouncing of a keypad. It won't
affect sales, but it's an embarrassment.
Using this breakdown to characterize problems allows you to concentrate limited
development dollars on problems that truly affect the use or sales of a product (and
therefore your livelihood) and to fix cosmetic problems as time allows.
One problem with detecting software defects is that they can arise from many sources
including coding errors, incorrect specifications, electronic glitches and timing race
conditions. It's one thing to statically check each possible pathway through a program,
but throw in interrupts, DMA, power failures and who knows what else, and a total program
checkout becomes physically (and fiscally) impossible. The resulting situation in some
ways is similar to the conditions I encountered as a youth working in a stockroom at
inventory time. I always got the job of counting the screws and washers. One technique
would be to spend the rest of my life counting. However, I instead counted out 100 of each
item and weighed the subset and used that value to approximate the number of items in a
larger group that I first weighed. The resulting count wasn't 100% accurate, but it gave
an answer close enough to keep the accountants happy.
With software the same principle applies. Given the fact that programs are routinely
100,000 lines or more long, program checkout must approach the task from a statistical
viewpoint. Hence we can't achieve zero defects, but hopefully we can state what level of
defects is acceptable in a given product. For example, Motorola has embarked on a course
of action that will improve the defect levels in its electronics to what the firm calls
Six Sigma-or about 3.4 defects/million opportunities, or better than 99.9% error free. If
this mark seems extreme, consider for a moment what even 99% accuracy means. If the power
company were 99% successful at delivering power to you, you'd be without electricity about
15 min each day. Likewise, if medical professionals were 99% successful at handling
newborn babies, they'd be dropping about 30,000 little tikes every year. So Motorola has
established Six Sigma as its definition of perfection.
Tools of the trade
You can also view software testing with some statistical models. One technique I tried
a couple of years ago was to look at some of my software using a technique called
cyclomatic complexity. It calculates a score for code based on the number of paths that a
function can take. For example, if some code is ugly and has hundreds of nested IF
statements, it produces a very high score. Conversely, a function without a single
conditional statement garners a very low mark. Obviously, the latter example is seldom
practical, but the former is avoidable. To calculate the score for an entire app, the
evaluation program steps through the code and calculates a score for each function
separately. By graphing these values, you get a feeling for a program's complexity and the
potential for defects.
As an (admittedly unscientific) example, I put a 65,000-line program of mine through
this process. The scores were very acceptable except in a few cases. As it turns out, the
cases with high scores were areas of the program that I had patched and so had disparate
operations occurring within a single function. I had always felt uneasy about these
sections, and now I had a tool that showed the world my anxiety. Again, this tool won't
find all defects, but it helps to point a statistical finger at potential problem areas.
A decidedly more common tool is lint. This utility goes through source code looking for
a variety of mechanical coding errors such as uninitialized or unused variables, incorrect
subroutine calls and other coding violations-in short, items that are potential problem
indicators but that syntax checkers miss. Although I've used lint for a number of years, I
still question its usefulness compared to strict typing standards and a good
Finally, a metric I sometimes find useful is the visceral method. If your stomach
churns every time you power on a system, it's probably not yet time to ship the product.
At my company, when a programmer says that a software package is finished, I then turn it
over to someone else to beat on. This second-tier tester might be an engineer, a secretary
or even my four year old. The software must survive at the hands of the testers.
However, I don't have the resources for exhaustive testing (no one does), so we use one
final technique to zero in on potential problem areas. I ask the primary software
developer to imagine that the software has a defect and, based on that assumption, to say
where that defect might be. All software developers have areas in their software they're
less than delighted with-be it in an ISR, a critical timing loop, a keypad handler or even
in the horrendous calculations that some embedded designs perform. Based on this
information, the testers beat on the system some more.
During all this testing, I also keep track of the number of problems the testers find.
As I see the frequency of these hits decrease, I know that we're approaching the point
where trying to find more defects no longer justifies the cost, and it's time to ship the
software to our customers. This technique isn't perfect because our customers do still
find defects. Hopefully, though, this technique finds the vast majority of serious
problems before they reach the field. PE&IN
Pirsig, RM, Zen and the Art of Motorcycle Maintenance, Bantam (New York NY)
1975, ISBN 05-5310310-5.