SLTF Consulting
Technology with Business Sense

 Home | Bio | Contact Us | Site Map


Off-the-shelf hardware and multimedia bring voice output within easy reach

Scott Rosenthal
December, 1996

As discussed in my last column (reference), deciding to put a voice into a product involves many decisions, with almost none of them involving the technology. After working your way through these implementation issues, it's time to tackle the actual hardware and software. This column reviews how to implement voice output in an embedded project.

Troubled specialization

Every embedded project always places constraints on what you can design into the device. For example, everyone ideally wants to avoid sole-source parts. Likewise, if at all possible, it's smart to avoid spending money on specialized development tools. Unfortunately, neither goal is possible if a designer decides to use commercial speech-processor chips.

A number of companies make chips for voice applications. Some chips speak phonemes, others record voices and store them into a memory device, while others speak from encoded files in memory. Some chips sound very mechanical (phoneme speech) and some sound quite good (encoded files in memory). However, as far as I can determine, none have second sources. Also, because these parts are specialty items, there's no guarantee that they'll be available even a year from now.

The other problem with these devices concerns their proprietary development systems. For example, one manufacturer whose device at first seemed reasonable to use requires a PC running the Japanese version of DOS to run its software. Other chip manufacturers require a designer buy (no renting allowed) their development systems for $10k or more. To me, this amount seems like an awful lot of money to support one small facet of a product. Still other companies don't offer development systems—you send them the audio information, and they program it into their chips. The catch is outrageous engineering costs and the massive chip volumes they require.

After exploring these scenarios, I decided to implement a solution with truly off-the-shelf parts that allowed for easy and inexpensive development and made for low-cost second-source production.

Voice on a budget

The solution was to implement the voice system using a standard D/A, an audio amp and EPROM. To generate speech, the system fetches a byte of data from memory and writes it to the D/A. The audio amp pumps up the converter's output to the point where it can drive a speaker. This approach allows for easy second-sourcing of the components while keeping down both parts and development-tool costs. For the remainder of this column I'll describe how to implement this homebrew design, including some of the tradeoffs made to keep production happy.

The first step was to generate data for the system to play. I decided to store data using the .wav format for two reasons. First, it's easy to record audio using this format with a multimedia PC. In fact, the standard recording application that comes with Windows works just fine for simple voice applications. Second, the format uses PCM encoding, which requires no data decompression. I recorded the 8-bit data file at 11.025 KHz.

This choice, though, also caused the first problem I had to overcome. Have you ever tried to generate an 11.025-KHz clock? This frequency isn't an even factor of any common microprocessor clock frequencies. For example, assuming a standard 12-MHz clock, you’d have to divide the system clock by 1088.4353 to get the right value. What's more, if using a custom clock frequency such as 11.995 MHz to derive the proper audio clock, you'll need to add an oscillator to derive the standard serial-communication data rates. In my opinion, the easiest solution is to simply forget the 0.4353 and divide the standard clock by 1088. The result is an audio clock of 11.029 KHz with a frequency error of 0.04%—an amount far too tiny for the human ear to detect.

The next item the speech circuitry needs is D/A conversion. This stage actually consists of two parts. The first turns the digital audio into an analog waveform, whereas the second controls the volume. When reconstructing the audio, an important point to remember is that PCM encoding centers the recording around the D/A's halfway point—in this case 80H. Hence, the audio signal effectively contains a DC bias equal to half the converter's dynamic range. You must remove this bias before sending the analog signal to a power amplifier; failure to do so results in a system that generates little speech but lots of smoke! In my finished system, the audio-output circuitry consists of a simple 2-pole filter to remove sampling artifacts and a series capacitor to AC-couple the signal into the power amplifier.

The second part of this conversion is volume control. A simple way of accomplishing this task is to add a pot at the power amp's input. Adjusting the pot changes the drive level into the amplifier, thereby changing volume. However, if a computer is supposed to control the output volume this simple approach won't work. In such cases I build the speech system using a dual D/A. The first stage works as just described except the reference voltage comes from the second stage. As program code changes the reference, so does the output volume.

Cranking it out

With all the support stuff in place, you need a way to send data from memory to the D/A. One of my favorite techniques is DMA because the DMA controller continuously moves the voice data to the converter at the system clock frequency you’re using without computer intervention. For example, assume that the microprocessor integrates a DMA controller as does the 80C188EC. This chip's controller can transfer as many as 65,536 bytes without host intervention. To play a message, my program sets up a starting address in memory as the source, an I/O port as the destination and the message length as the transfer size. The program then starts the DMA controller and resumes its normal business. At the data transfer's conclusion, the DMA controller generates an interrupt. The associated ISR either terminates the controller or sets it up to play the next part of the message. In this way, with minimal overhead, the microprocessor can easily play audio messages you record using standard off-the-shelf hardware.

A DMA controller isn't mandatory, though. It's still possible to crank out audio data with processors such as the 8051, for which DMA just doesn't exist. Although it's trickier, I've found a technique that works beautifully. I set up a timer to interrupt the processor every 90 µsec (the period of the 11.025-KHz sample clock). In the ISR, which is coded in assembly for speed, the processor dumps the next audio byte to the I/O port, updates its pointer to the next data and exits. This technique ensures that data going out the I/O port is spaced in time by 90 µsec regardless of the processing time within the ISR. Using this method on a 12-MHz 8051, I've gotten ISR execution time down to less than 45 µsec, which then gives me approximately 50% of the processor's bandwidth for the rest of that application. If that's not enough time, a few other options include increasing clock frequency, changing to a processor that uses fewer states/instruction or adding a glue logic to decrease the ISR's responsibilities even further.PE&IN


Rosenthal, S, "The road to loquacious instrumentation is rough but passable," PE&IN, Oct 1996, pgs 72-74.

About Us | What We Do | SSM | MiCOS | Search | Designs | Articles

Copyright © 1998-2014 SLTF Consulting, a division of SLTF Marine LLC. All rights reserved.