SLTF Consulting
Technology with Business Sense



A project in the field must adapt to harsh reality

Scott Rosenthal
February, 1997

We've all experienced failures such as cars reporting errors that don't exist, VCRs that flash the time as 12:00 and computerized toys that don't function. The reason is simple: compared to the benign environment of a development lab, the real world is a harsh place for electronic systems. Hence engineers must design, develop and test products with real-world events in mind. Ignoring "cockpit" error, I've found that the three main causes for failures in the field are inadequate signal debouncing, electrostatic discharge (ESD) and power. Therefore, this month I'll begin addressing how to make embedded systems more reliable or robust by looking at these three issues, starting with signal debouncing.

Rebounding logic levels

While many real-world conditions can obviously affect electronic systems adversely, one of the most common problems is transient conditions on input signals. I’m not primarily concerned with debouncing operator controls and switches—those are problems you find in the lab. Instead, I’m talking about a lack of robustness in sensing faulty conditions.

Many systems incorporate status sensors to indicate if a faulty condition exists. Examples include lamp burnout, door open, high-voltage failure and air-pressure faults. Often a system uses a signal from one of these status sensors to abort its operation. This function normally works great in the lab where a designer can emulate the signal input with a clean logic level. In the field, though, problems can arise with this monitoring when contact bounce or other transient conditions fool the monitor software into declaring a fault.

If a sensor signal can stop system operation, the system must then verify that the problem signal really exists. One technique is to debounce all input status signals. The assumption that makes this technique work is that most legitimate faults aren't transient events. Consider a system in which a motor drives a fan; the system then monitors the fan's operation with an air-pressure switch. When the motor starts, a transient event lasting a few milliseconds can occur on the air-pressure signal. This event might result from crosstalk between signals, a ground loop, a mechanical coupling between the fan and sensor or the sensor's construction. The important point is that in most situations, a glitch of a few milliseconds on an air stream is probably meaningless; a true air-pressure fault lasts for a second or more. Using this information, you can design software to ignore fast transients and declare faults only on stable long-term signals.

One way of debouncing a status signal in software is to set up a structure such as the one in Listing 1a. The two general-purpose debouncing functions in Listing 1b, in turn, can employ this structure to determine the validity of any status signal. Listing 2 shows this code in action debouncing an air-pressure switch that the system checks every 100 msec. This code assumes that to declare a failure, the switch must return an error state for 1 sec (ten monitor cycles).

The advantage of using the structure and the two support functions is that the software module becomes a generic debouncer for status signals. Just make sure that the hysteresis value multiplied by the repetition rate of the code is long enough to truly debounce the signal. In addition, this technique can also debounce measurement problems. For example, consider a system monitoring an analog signal. Any analog measurement demonstrates some degree of variability, but as a level approaches an error threshold this variability appears as contact bounce. The code in Listing 1 debounces this input, as well, and the only difference is that the code measures an analog and not a binary signal.

Baby lightning

After debouncing, the area causing me the most problems over the past year has been ESD. In many companies, ESD testing and protection is an engineering discipline in itself. Obviously, I won't be able to do justice to the entire topic in just part of a column, but I will present two examples—one common and the other fairly esoteric.

Since the mid 70s, one of the most common chips in embedded designs has been the 8255 (both the NMOS and now the CMOS 82C55). This device goes by many names, including PIO and PIC. For those readers not familiar with it, the chip provides three 8-bit I/O ports that software can configure independently as input, output or as a special function.

I've worked with this device many times and have learned that its Reset line is extremely sensitive to any disturbances. The reset causes all I/O pins to return to their input state and generally kills a system's operation. For years my solution to this problem was to ground the Reset line and reset the chip with software—an approach that worked well until the age of ESD testing.

The first thing I discovered is that the sensitivity of the Reset signal to a disturbance—whether power-supply noise, ESD or whatever—varies tremendously from vendor to vendor. For example, Intel's parts seem extremely sensitive to any type of glitch. When I informed Intel of this problem, that supplier's reaction was to tell us to use someone else's part! Apparently a large number of manufacturers are following this advice because I did a quick check of ESD-certified boards and found they all use the 82C55 from NEC—which is essentially immune to the interference.

The esoteric experience occurred to another engineer at my company. During one test sequence, he had to move a module from one piece of equipment to another across the room. When inserting the module into the second system he created an error in the device. We traced the problem back to his wheelchair. It turns out that the hubs on his rear wheels create as much as 50 kV of charge. This phenomenon also raised the potential of the module on his lap. When he plugged it into the assembly, the module discharged and thereby caused the failure. This problem pathway helped us uncover and solve an intermittent field problem that occurred during transport of this module. (Incidentally, only one of his two wheelchairs exhibited this problem.)

Power problems

Finally, it's amazing how often line power can cause problems. I remember being in Venezuela with an instrument that required a good ground. The only one available was a water pipe on the outside ledge on the tenth floor. I crawled out onto the ledge, sanded the pipe and with a hose clamp attached a wire to the instrument.

In addition, power coming out of the wall might not be what you think it is. Line voltages vary between different countries, so just because a power supply says 115/230 doesn't mean it necessarily works in Japan at 100V or in the UK at 240V. Further, voltage and the frequency specs always include a tolerance, which the power company uses to define "good" power. Remember that just because the power is OK at the line coming into the building, there's no guarantee that it's still good at an outlet.

Beyond voltage/frequency considerations, noise on line power can also be a killer. A company I previously worked for sold measurement instruments to grain co-ops in the Midwest. Most of these units needed conditioning boxes to suppress power problems caused by numerous summer thunderstorms.

In a similar vein, I recently came across a test specification for electronic systems used in a fast-food restaurant that simulate the power problems such a unit might experience in deployment. For example, as the heater for the French fryer switches On and Off, the spec wants to ensure that the milkshake machine doesn't start spitting out fluid onto the counter. All electronic devices purchased for this particular fast-food joint must pass a test consisting of various line-cycle disturbances including missing cycles, voltage spikes, sags and interference. If you really want to harden a device against power-line problems, perform these kinds of tests, as well. PE&IN

Listing 1a: Debounce data structure

typedef struct {
	int Hysteresis; 	/* number of bad in a row needed */
	int LocalCount; 	/* keeps track of hysteresis count */
	int Error;		/* if not 0, then we have an error */
} TDebounced;                        


Listing 1b: Debounce code

void ErrorClear(TDebounced *ptr) {
	ptr->Error = 0;				/* no error */
	ptr->LocalCount = ptr->Hysteresis;	/* reinitialize counter */


int ErrorSet(TDebounced *ptr) {
	if (ptr->Error == NO) {			/* if no error yet */
		if (ptr->LocalCount != 0) {
			ptr->LocalCount—; 	/* one less time period */
		if (ptr->LocalCount == 0) {
			ptr->Error = 1;	 	/* we have an error */
	return ptr->Error; 			/* return the current error code */


Listing 2: Debounce in action

static TDebounced AirError = {10};	/* 10 times in a row is one second */

int AirPressureTask(int flag) {
	if (flag == 1) { 		/* if 1, then initialize */
		ErrorClear(&AirError);	/* this will initialize the debounce info */
	if (PressureSw() == 0) {	/* 0 means a failure */
		ErrorSet(&AirError);	/* an error — see if fully debounced */
	} else {
		ErrorClear(&AirError);	/* no error — reset debounce info */
	return AirError.Error;		/* return the current error status */

Copyright © 1998-2012 SLTF Consulting, a division of SLTF Marine LLC. All rights reserved.