SLTF Consulting
Technology with Business Sense



Language-localization tips aid overseas sales

Scott Rosenthal
October, 1997

You've just finished a project, and the boss comes by and says, "Great, now how easy is it to add additional languages? We’d like to sell it abroad." The knee-jerk response is that it's not terribly difficult—simply translate a few text strings. Once you start, though, it no longer seems so easy. But with a little forethought and some understanding of localization issues, an embedded design can easily work with other languages.

Localization is the act of changing a design so it works properly in the end user's locale. For embedded software, it involves not just languages but other regional differences such as time and date formats, currency symbols as well as decimal points and commas. In the old days, you could insist that a customer could have your product in any language, as long as it was English. This attitude won't work today. The international market is large and tough, and adapting a product to each country's needs might be the only way to make it marketable.

Hardware issues

The embedded world puts special constraints on localizing a design compared with the PC world. There, and especially with a graphical interface, issues such as character fonts and the placement of objects left to right, right to left, or top to bottom fade away. In the embedded world, with normally limited display and entry options, system hardware can dictate the extent and implementation of localization efforts. For example, if a product comes with 7-segment displays, it must use a decimal point in place of a comma, which is the practice in German. Also, just because you can spell an English word with letters that fit in seven segments (for instance, CAL) doesn't mean you can spell out the same word in French.

Even with a text display using, for example, a 5x7 character cell, another consideration is the availability of correct fonts. The Romance languages require a font with support for all appropriate characters with accent marks, umlauts or other diacritics. But if you’re trying to convert a system for use in Asia, the use of a simple display might be impossible. In this case, a GUI on a modern OS may be the only choice.

Even if support for a Western language is all a system needs, additional problems exist. One such hardware issue revolves around the amount of memory a design can devote to localizations. For example, a PC provides a practically infinite sink for storing text messages. An embedded system, in contrast, normally places tight limits on available memory. Nonvolatile memory must store these messages, in essence splitting memory between text and program code. You'll find it absolutely amazing how fast text can chew up memory. With an average word containing five characters (bytes) plus a sixth byte for a space, every 1k byte of ROM holds roughly 170 words. So, for example, this column would require approximately 8k bytes of ROM for storage. That amount might not seem so bad, but multiply it by perhaps five languages, and all of a sudden text storage climbs to more than 40k bytes.

So, before you can even begin to handle language translations, you must resolve the following questions when designing an embedded system for localization:

  1. Will the system need localization?
  2. Will the system handle more than one localization without changing the program?
  3. If more than one, how many localizations must a system handle without a program change?
  4. Is there room in ROM for both the program and all the messages for different localizations?
  5. How does the user select the correct language?
  6. Do constraints exist on the user display or printer that might cause trouble? If so, what can you do?
  7. Is there room on the display or printer for message expansion with other languages?

With a little help...

After deciding to localize an embedded system, you face two main issues: incorporating localization within source code, and getting proper language translations for text messages.

Programmers can choose from several ways to incorporate localization within source code. The primary things you must change are date and time formats, decimal points and commas, currency and text messages.

I’m a firm believer in isolating user-interface routines from the rest of the program code. As an example, a system might need to display frequencies in Hertz, kiloHertz or megaHertz. Within a program I keep all frequencies as, for example, Hertz. Any time the program must display a frequency, it calls a function that returns a string with the value formatted for the correct representation, such as megaHertz. This isolation also helps development when it comes to localization. By making software always call a function for formatting output data, you can use one function to handle a particular localization issue, such as the date format or substituting commas for decimal points.

Another way to handle localization issues is to embed special codes in strings. These codes could stand for the currency symbol, the decimal-point symbol or the date symbol. Before displaying such data to the user, the software passes the string through a localize function that performs substitutions using information for the correct locale.

For localizing the date format, another technique I've used with an alphanumeric display is to show it as dd-mmm-yyyy where the month appears as a 3-letter abbreviation specific to each localization. This technique then treats the date format as a text-string substitution and the format works in most regions of the world.

The text of it all

To most people, localization means only translating strings to another language. Not quite—you must also address numerous related issues including message expansion, correct text selection and translation accuracy.

For instance, moving from English to another language generally increases the number of characters in a message. Short ones seem to increase the most, whereas longer messages don't enlarge as much. I generally try to leave 50% additional space on a display device for string expansion. Also don't forget to write code that works with variable-length messages instead of hard-coding the length. For example, if a program includes a column of numbers with text labels preceding them, right-justify the labels so all the numbers line up correctly on their decimal points or commas.

One of the more important issues with localization is how to modify text in a program. Anyone who hard-codes the text message into the program's body pays dearly when localizing the design. One technique is to create a text module for each language (such as english.c or french.c). It holds a list of all the text strings in an array, and each language variant of the module holds the same string information at the same array location. The software, when it needs a text string, uses the index.

Similarly, for a system with multiple localizations, one technique is for software to call a function that returns a pointer to the correct text string. The following snippet shows an example of this function:

int language;/* global language selection */	
char *GetTextPtr(int n)
  switch (language) {
    default: /* default is English */
    case 0:
      return &english[n];
    case 1:
      return &swedish[n];
    case 2:
      return &spanish[n];
    case 3:
      return &german[n];
    case 4:
      return &french[n];

Any function that needs a text string calls GetTextPtr(). Its input argument is an index number into the string array (such as 4 for "Display diagnostics."). The function then branches to the proper language pointer and returns a pointer to the text string.

Translation woes

One of the biggest problems I've had with localization is finding translation staff who can properly put text strings into other languages. Remember, these people generally aren't computer whizzes, so technical terms and markings can throw them. For instance, what do you expect a nonprogrammer to do with %5.2f or %%? Likewise, line continuations and spaces for formatting can also throw off their work. The following programming guidelines should help translators do their work:

  1. Use macros for computer terms in the middle of text strings. Use uppercase characters that definitely don't look like text.
  2. Keep line lengths short. Remember about string-length expansion with other languages.
  3. When programming in C, don't continue lines with a "\"; instead, use the other form for strings: "text 1" "text 2" (no comma between the strings)
  4. Create and give the translators a style guide for the system. Do you want acronyms for various functions or should translators spell them out? Can or does the acronym change with different languages? How much space can a translator use for technical and product-specific terms? Are there any display-size restrictions that might limit how long a single word can be (a non-trivial issue for German and Italian)?


Copyright © 1998-2012 SLTF Consulting, a division of SLTF Marine LLC. All rights reserved.