SLTF Consulting
Technology with Business Sense

 Home | Bio | Contact Us | Site Map


 

It sometimes takes a microcontroller to boot up an FPGA

Scott Rosenthal
November, 1994

Modern ICs are little engineering marvels. Think of all the logic working within those tiny boxes! Can anyone ever fully document how to use these modern miracles? Is it possible to anticipate all operating conditions? In earlier days, references such as the Texas Instruments TTL Databook went into exhaustive detail about the innards of each chip, right down to the types of transistors and resistor values they incorporated. These simplistic devices, at least from our present perspective, seemed thoroughly documented and characterized. Today with the proliferation of VLSI, quick turnaround times and short product lifetimes, this attention to detail is sadly missing. In addition, I personally question the ancillary knowledge some of these chip designers have in addition to their general logic equations.

The basis for these cynical thoughts lies in some problems I've recently encountered with a Xilinx FPGA. This class of device can supposedly replace oodles of logic. However, as you'll see, this assertion can quickly fall apart when these chips encounter the real, far-from-ideal, world. We take so much for granted that it comes as a surprise when these little chips don't function as we expect. And the solution we had to implement makes me question the whole direction of embedded applications with VLSI parts.

Inadvertent program loss

One of my embedded-instrument designs uses a Xilinx FPGA to replace considerable chip-selection logic. One device it helped control was the flash memory that served as main program storage. For three years this hardware worked terrifically until one day someone forgot to write down some instructions, and the Xilinx house of cards came crashing down around my metaphorical ears.

No problem ever results from a single event, and this one was no exception. My customer's salesperson went to put the instrument into a demonstration mode. However, because no one had written down the procedure, he did it incorrectly. When the demo mode didn't work he got mad at the instrument, quickly toggled the power off and then on-an action that corrupted the flash memory. Of course, the description I got was simply that the instrument lost its program.

With all such strange occurrences, I always try to go back to the source. Hence I talked to the salesperson, sympathized with his plight and helped him remember his normal operation. As it turns out, he said that everyone had problems getting into demo mode and the accepted procedure was to switch the instrument off then on as quickly as possible to hide the "problem" from the customer.

I examined the problematic instrument and, sure enough, quickly toggling power would trash the flash memory's contents. Of course, the first solution that comes to everyone is simple-don't toggle power in that fashion. And the only reason it had happened in this case was because of the missing piece of paper. So maybe another solution is to provide better training. The problem, as we all know, is that the real world is a tough place for an embedded design, and if a salesperson could break the instrument, you can bet that a customer could do it just as easily.

With an oscilloscope I discovered that the flash-memory programming voltage (Vpp) that's supposed to be off at power up would occasionally be on while the user was quickly cycling power. The control signal for this voltage came from a Xilinx 2018 chip. This symptom didn't make any sense because on power up, all the 2018's I/O pins are supposed to be in a high-impedance state with a weak pull-up to Vcc. The Vpp control requires a Low signal, so it couldn't possibly be on.

Besides, a flash memory requires much more than just a programming voltage to write new data into it. It also requires a chip select signal, a write signal and the proper programming code to kick off its internal state machine. I thought that our program was absolutely secure inside this device. All those signals coming from the Xilinx chip, all High at reset time, couldn't possibly conspire to destroy my data. Wrong!

As it turns out, the Xilinx chip has a problem at reset time. When you cycle power quickly, the chip sometimes comes up with its outputs on at logic Zero levels. In addition, this anomaly only occurs on certain pins that just happened to lead to crucial parts of my circuits. What happened was that the Xilinx chip drove the flash memory's CS/ and WR/ (chip select and write enable) pins Low, the end of the Xilinx configuration process (even though it never actually configured itself) booted the instrument's main CPU which, running amuck, changed the data lines such that the flash memory saw its programming code.

Of course, as we all do in these situations, Xilinx denied that its part played any role in this domino effect: "Can't happen…, we would've heard about it by now…"-you know the scenario. So I took out my trusty oscilloscope and logic analyzer and set out to fix the problem and prove Xilinx wrong.

I found that the instrument was trashing random 1- byte memory locations in the first flash memory chip. With further investigation, I could see on the debugging tools that the Xilinx chip sometimes failed to boot properly, yet it signaled the rest of the system that it was properly configured. Monitoring Vcc while switching power, I noticed that while power was going down on the computer board, if I switched power back on when Vcc was between 0.5V and 1.0V, I could consistently trash the flash memories. For some reason, the chip wouldn't reset properly.

I then compared my logic-analyzer traces of the power-up process to the chip's timing diagrams, and again everything looked right. However, a noted on several timing diagrams states that if Vcc takes longer than 25 msec to go from 2V to Vcc min you must install a special Xilinx reset circuit. Our Vcc rise time was well within this specification, so I basically ignored this advice.

Telephoning Xilinx again, I proceeded to tell my story to five different people, who all said that there was no way that the chip's outputs could come on before configuration. Finally, I happened upon someone with a good corporate memory. Apparently Xilinx had seen a problem similar to mine many years ago, and each progressive generation of devices has gotten better at solving it. The engineer said that we needed to implement a ridiculously complicated boot-up procedure so that the chip would always reset properly. He also said that the 3000 and 4000 Series parts didn't need this technique, only the 2000 Series-but as I later discovered, the 3000 Series also exhibits this problem.

Russian nesting dolls

The timing diagram for the fix looked simple enough, but from a hardware standpoint it was far from trivial. First of all, I had to retrofit the circuit into an existing instrument. Secondly, I couldn't count on the validity of most signals because they passed through the Xilinx chip. I needed to find a solution that would work every time, fit into the existing space and didn't create its own problems.

The first thing I considered was a PAL, but to implement the logic required more than a garden-variety PAL chip. In addition, these devices normally need an oscillator or some other type of clock circuit to operate, and the last thing this EMI-noisy instrument needed was another high frequency clock leaking all over the place. I even considered using the venerable 555 timer as a sequential 1-shot device, but this method also created its own problems. The solution I finally settled on was to use a simple microcontroller to start the Xilinx chip so it, in turn, could start the main CPU.

What a ridiculous idea-use a microcontroller to start the hardware to start the computer. On further reflection, it does make a lot of sense. With the microcontroller, I had infinite latitude to adjust for the crazy Xilinx chip reset timing. What's more, small low-volume microcontrollers like the 87C752 cost almost the same as a PAL. Finally, I had no real speed requirement; a few more microseconds wouldn't hurt anybody.

The only real concern was how to guarantee that the microcontroller would start each time. In the old days, I simply drove a microprocessor's reset input with a resistor, capacitor and maybe a diode. Then I learned the hard way that this type of reset circuit isn't effective in brown-out conditions, so I started using power-supervisory chips to control the micro's reset signal. It gave me a nice known signal appropriately applied at the right times. Now look where I've graduated to. I'm using a power-supervisory chip to start a microcontroller to start a Xilinx chip to start a microcomputer. What a ludicrous scenario. I wonder how many transistors I'm using to just kick off the instrument's operation?

Epilog

With these modifications in place everything now works as it should. My little program in the microcontroller implements the Xilinx reset procedure (modified for another Xilinx problem I found), which in turn now properly resets the Xilinx chip, which now properly starts the microcomputer. All the original flash-memory problems are long gone, and the salespeople have their sheet of paper with the directions.

Just when I finally thought that the power-up problems with the VLSI chips were out of the way, I got a call about a power-on problem (again with the flipping power switches) relating to a Cirrus GD6215 video controller chip. All our signals are fine, but the device enters a test mode.

So what's the real solution? Must we educate instrument users to not switch power off and on too fast? Must we add a circuit to every system that guarantees to drop Vcc to 0.1V every time the power switch goes off? Must we build smart power switches that don't allow a user to quickly flip the power switch (like the switch in the original IBM PC)?

The VLSI chips used in modern PCs are marvels of engineering, but embedded designers need more than functionality, they also need robustness. Because of a missing description on a piece of paper, I ended up adding a microcontroller to a system. What's wrong with this picture? PE&IN



About Us | What We Do | SSM | MiCOS | Search | Designs | Articles

Copyright © 1998-2014 SLTF Consulting, a division of SLTF Marine LLC. All rights reserved.