SLTF Consulting
Technology with Business Sense



The search for the perfect software might be never ending, but it's not pointless

Scott Rosenthal
March, 1995

I just reread the classic, Zen and the Art of Motorcycle Maintenance (see reference). One section of the author's journey involved his attempt to define quality. Significantly paraphrasing his excellent text-and with apologies to his logical arguments-he says that you can't define quality, but you know it when you see it.

His arguments led me to think about a problem I was having at my company. One of our services is developing custom software for both embedded systems and PCs. The problem is how to decide when software is "finished" and acceptable for shipping. This simplistic approach says to wait until all the bugs are out. However, pristine bug-free code doesn't exist. All software has bugs waiting for the right combination of conditions to rear its ugly head. Hence, my dilemma is really figuring out how to meet an acceptable quality level while balancing out the cost of finding the next bug. To begin this discussion, I'll consider the morphology of computer bugs and some of the tools available to help you squash them.

Bug morphology

My two decades in this business have taught me that every program has an undiscovered bug-although I like the word defect better. I guess these years have taught me to accept these defects but to do everything possible to minimize them. The most fundamental tool in this minimization process is to reduce the potential for defects through good design methodologies. Likewise, good design can help minimize the effect of defects by isolating them from the rest of the software.

However, all software defects aren't equal. Rather, they come in many different flavors, and their seriousness depends on factors such as the system they're found on, the application and the intended user. For example, we've all experienced having a PC lock up. While extremely annoying, such occurrences are rarely a reason for returning a PC to its manufacturer, and large commercial software vendors typically won't do anything about such problems, either. Their products work most of the time, and that's all their fine-print license agreements state. The bottom line is that as long as a desktop computer performs well enough to justify its existence, you keep using it, defects and all.

In the embedded world, though, software defects can take on a more ominous role that you can't take as lightly. Next time you reach into a microprocessor-controlled microwave oven consider the possibility of a line-voltage hiccup causing the processor to go spastic, and ZAP, your hand's a bit warmer. Hence, when classifying software defects I draw three basic distinctions. First, a severe defect is one that makes the product unusable and/or dangerous. An example of this type of problem is the infamous Therac-25 and its propensity to burn holes in people (not just their tumors) due to a bug in the software controlling the radiation exposure. Another less-serious example, but still potentially severe, might be a DVM that reports the wrong reading at certain voltage levels.

Second, an annoying defect might be a car computer that causes the engine to misfire once every hour. It's not nice, but the system is still usable. Third, a cosmetic problem might consist of a misspelling in screen text or improper debouncing of a keypad. It won't affect sales, but it's an embarrassment.

Using this breakdown to characterize problems allows you to concentrate limited development dollars on problems that truly affect the use or sales of a product (and therefore your livelihood) and to fix cosmetic problems as time allows.

Perfect enough

One problem with detecting software defects is that they can arise from many sources including coding errors, incorrect specifications, electronic glitches and timing race conditions. It's one thing to statically check each possible pathway through a program, but throw in interrupts, DMA, power failures and who knows what else, and a total program checkout becomes physically (and fiscally) impossible. The resulting situation in some ways is similar to the conditions I encountered as a youth working in a stockroom at inventory time. I always got the job of counting the screws and washers. One technique would be to spend the rest of my life counting. However, I instead counted out 100 of each item and weighed the subset and used that value to approximate the number of items in a larger group that I first weighed. The resulting count wasn't 100% accurate, but it gave an answer close enough to keep the accountants happy.

With software the same principle applies. Given the fact that programs are routinely 100,000 lines or more long, program checkout must approach the task from a statistical viewpoint. Hence we can't achieve zero defects, but hopefully we can state what level of defects is acceptable in a given product. For example, Motorola has embarked on a course of action that will improve the defect levels in its electronics to what the firm calls Six Sigma-or about 3.4 defects/million opportunities, or better than 99.9% error free. If this mark seems extreme, consider for a moment what even 99% accuracy means. If the power company were 99% successful at delivering power to you, you'd be without electricity about 15 min each day. Likewise, if medical professionals were 99% successful at handling newborn babies, they'd be dropping about 30,000 little tikes every year. So Motorola has established Six Sigma as its definition of perfection.

Tools of the trade

You can also view software testing with some statistical models. One technique I tried a couple of years ago was to look at some of my software using a technique called cyclomatic complexity. It calculates a score for code based on the number of paths that a function can take. For example, if some code is ugly and has hundreds of nested IF statements, it produces a very high score. Conversely, a function without a single conditional statement garners a very low mark. Obviously, the latter example is seldom practical, but the former is avoidable. To calculate the score for an entire app, the evaluation program steps through the code and calculates a score for each function separately. By graphing these values, you get a feeling for a program's complexity and the potential for defects.

As an (admittedly unscientific) example, I put a 65,000-line program of mine through this process. The scores were very acceptable except in a few cases. As it turns out, the cases with high scores were areas of the program that I had patched and so had disparate operations occurring within a single function. I had always felt uneasy about these sections, and now I had a tool that showed the world my anxiety. Again, this tool won't find all defects, but it helps to point a statistical finger at potential problem areas.

A decidedly more common tool is lint. This utility goes through source code looking for a variety of mechanical coding errors such as uninitialized or unused variables, incorrect subroutine calls and other coding violations-in short, items that are potential problem indicators but that syntax checkers miss. Although I've used lint for a number of years, I still question its usefulness compared to strict typing standards and a good software-standards manual.

Finally, a metric I sometimes find useful is the visceral method. If your stomach churns every time you power on a system, it's probably not yet time to ship the product. At my company, when a programmer says that a software package is finished, I then turn it over to someone else to beat on. This second-tier tester might be an engineer, a secretary or even my four year old. The software must survive at the hands of the testers.

However, I don't have the resources for exhaustive testing (no one does), so we use one final technique to zero in on potential problem areas. I ask the primary software developer to imagine that the software has a defect and, based on that assumption, to say where that defect might be. All software developers have areas in their software they're less than delighted with-be it in an ISR, a critical timing loop, a keypad handler or even in the horrendous calculations that some embedded designs perform. Based on this information, the testers beat on the system some more.

During all this testing, I also keep track of the number of problems the testers find. As I see the frequency of these hits decrease, I know that we're approaching the point where trying to find more defects no longer justifies the cost, and it's time to ship the software to our customers. This technique isn't perfect because our customers do still find defects. Hopefully, though, this technique finds the vast majority of serious problems before they reach the field. PE&IN


Pirsig, RM, Zen and the Art of Motorcycle Maintenance, Bantam (New York NY) 1975, ISBN 05-5310310-5.

Copyright © 1998-2012 SLTF Consulting, a division of SLTF Marine LLC. All rights reserved.