SLTF Consulting
Technology with Business Sense



Tips from the trenches: tweak your way to improved C compiler output

Scott Rosenthal
October, 1998

I'm never satisfied with the status quo. Solutions primarily offer compromises based on available facts, costs, skills or time. With this mindset, why would I think that a C compiler output gives the best results for my application? Loaded down by years of squeezing bits into unimaginably small memory spaces and not wholly trusting the infallibility of technology, I've come to appreciate C compilers, but to get great performance from them takes some knowledge on our end, too. Today's compilers are marvelous and even come with optimizers to tweak functions within the confines set by compiler designers. You can manually select or trade off execution speed vs memory size, how much optimization to apply and even how to most efficiently use memory and register resources.

My problem isn't necessarily with compilers but with our expectations of them. We expect a compiler to do a fantastic job without it having an understanding of our intentions, application or logic design. Is this problem really any different than using a spell checker? A properly spelled word used in the wrong context passes the spell checker but creates an "error" with the person reading the text.

A compiler can't possibly know the nuances of my application. Thus I can either accept its limitations and hope the processor has enough memory and speed for my code both today and in the future, or I can help the compiler by adopting coding styles that move towards the desired performance. Now, many readers are already rightly asking, "Why devote that time to tuning C code because time is money?" However, they're not considering costs associated with exceeding memory space and poor speed performance. Thus, first check out the following suggestions before bombarding me with critical e-mail. I've accumulated these techniques, which are independent of processor, compiler or application, from my years writing C code for embedded processors.

  • Look at the assembly language — The compiler's assembly listing is your only way to peek under the hood to see how well it translated your intentions. Most software people, though, ignore this selectable output due to deadline pressures, inexperience with assembly language, or they simply never even thought of doing so. An assembly listing serves as a valuable tool when optimizing performance.

Mind you, I'm not advocating that designers tweak assembly code. Instead, use it to check how the compiler translated your intentions into C code. A simple review, which doesn't take long, might offer insight into how wasteful certain coding constructs can be.

Note that I don't immediately check assembly output every time I compile a module. Doing so would waste time and money. Instead, first get a module or function working and then check the assembly output. This offers a way to take advantage of a high-level language's speed with a simple QA check.

What are some of the things to look for? In general, search for assembly constructs that defy the simplicity of C code. For example, repeatedly assigning the same floating-point constant to different variables is less memory efficient than using a global constant without much of a speed hit.

Another example is the Switch/Case statement. With some compilers, that statement's overhead is horrendous. See if it makes sense to recode a small Switch/Case into a conditional statement. On a related note, some compilers allow you to force the Switch argument into a more "natural" representation for the processor, thereby saving code. The bottom line is that without a look at the assembly-language output you can't catch these things.

  • Bit fields — Coming from an assembly-language background, I find it second nature to manipulate bits. So when I found bit-field operations in the C language I instantly started using them. With some processors — those with bit-manipulation opcodes — the generated code is concise and straightforward. In contrast, consider a processor that doesn't implement bit-manipulation instructions such as the 80C188. Every time you want to set, clear or check a bit requires the compiler to generate shifts, ands and maybe ors. A simple logic statement to set a bit becomes multiple instructions (the number dependent on bit position) that chew up memory and waste processor cycles. Multiply this loss by the number of times you use a particular bit, and memory starts evaporating.

Today, I avoid using C-language bit fields because of the memory and speed penalties. Instead, I map bits into a word and do comparisons on the entire word, which might be a char, int or long, depending on overall system needs. This approach might not be as intuitive as treating a bit field as a variable, but good comments and variable names minimize this downside.

  • Char vs Int — I once thought that to conserve memory resources all I had to do was pick the smallest memory type specifier. Well, it's really never that simple. Again, using the 16-bit 80C188 as an example, specifying a char requires the compiler to load the char into the LSB of the 16-bit register and a 0 (or sign-extend the LSB) into the MSB. This sequence occurs every time you use the variable, which invariably leads to more ROM space and slower execution. In this case, an int is the preferred natural word size. However, because the 8051 is an 8-bit CPU, its preferred word size is 8 bits. Using a 16-bit value, where an 8-bit would suffice, is again wasteful.
  • Signed vs unsigned variables — I couldn't believe that signed vs unsigned would really change the compiler output, but I believed it after seeing the assembly listing. This example occurred with a PIC processor. In a For loop, using an unsigned loop variable took more opcodes than switching the loop variable to a signed character. The lesson here is to never take anything for granted.
  • Compound conditional statements — For many reasons, programmers should limit the nesting of compound conditional statements. Nesting should never go more than three deep due to the sheer complexity and difficulty in testing all program pathways. Further, compound conditional statements use lots of memory and create unpredictable timings that might affect performance.

Try to limit these compound statements. However, sometimes they might seem mandatory. In those situations you can probably find an alternative. Sometimes I turn a hash into a table. Specifically, I evaluate each conditional clause and for a True condition set a unique bit in a variable. Then I use the variable as an index into a table, which might contain pointers to different functions, values or outcomes for the program to use. This technique saves memory, gives a more predictable path through the program, allows easy testing of all pathways and is extremely simple to explain during a code review.

  • Dereferencing pointers — I'm a big proponent of pointers in C. Some people shy away from them, and others blast them for giving the programmer too much flexibility and leeway. Yet they provide a solution to many programming deadends. However, one problem is having pointers to pointers all over. It's so easy to end up with a statement such as ptr1->ptr2->value. It works perfectly fine, but every time you use it the compiler dereferences ptr1 and ptr2, and each dereferencing can take a few instructions. Multiply that amount by the number of times this construct might appear, and you end up using lots of memory, thus slowing execution speed.

Instead use a temporary variable that removes the indirections. For example, assign ptr3 = ptr1->ptr2. This method allows you to use ptr3->value, thereby saving one level of dereferencing.

The savings add up

These suggestions might not seem like they add up to much, but in the long run they make a big difference. For example, on a recent 68HC11 project, I found that the C library functions ate up roughly 14k bytes of ROM, leaving me roughly 18k bytes for the application software. Of course, I filled it up and then went through the assembly listing to look for savings. By implementing some of the techniques discussed above, I uncovered an additional 5k bytes of ROM space. This amount might not seem like much, but it made the difference between not completing the project and finishing it with all required features. PE&IN

Copyright © 1998-2012 SLTF Consulting, a division of SLTF Marine LLC. All rights reserved.