Before I worked at Yahoo!, I had never localized (modifying an application to fit a region’s language and cultural needs) anything. Now that I’ve done it for over 2 years and 30 different localizations, there are a few things I’ve learned that are quite important.
- Allow for text expansion and shrinkage
This is rule #1 when localizing. Some languages like German or Russian can expand to over 100% more than the English equivalent. And others like Chinese or Japanese can shrink by more than 50%, which can equally wreck a non-flexible layout. Your HTML and CSS needs to be extremely flexible when dealing with localized content. No more fixed height or width buttons, tabs or any other content areas.
- Keep text out of images
Replacing text in HTML is easy and can be done automatically with many localization tools. Replacing text in graphics is not so easy. Do whatever you can to get text out of graphics and into your HTML. This may annoy designers, but will save everyone time.
- Store configuration data separately from translations
Because your web app will need to behave differently for each locale, you will need to put configuration information somewhere. You may be tempted to put it into your translations which would allow translators/localizers the ability to change configuration on their own, but this is a bad idea. Why?
One day a localizer will put a really bad value into your configuration data and cause your app to crash. And it will be your fault because why should a localizer know how your app works and what a correct configuration value should be?
Another benefit of storing configuration data separately is you can easily write scripts to parse your configuration or change it. Much harder to do that when it’s buried in localizations.
- Just use UTF-8
I’m not an encoding expert, but when every locale is using 1 encoding and that encoding can handle almost all languages, it will make your life much easier. Tell everyone to use UTF-8 and you won’t be banging your head against the wall every time you see little boxes with question makrs in them.
- Watch out for variable substitution
When dealing with sentences like “You have X new messages”, many languages have multiple different translations depending on the number of messages. 0, 1, 2-3, 4-7, 8-N may all have different translations. Try handling that in an if-else block times 10 languages. It doesn’t scale.
You can skip dealing with this type of challenge sometimes by setting things up like this: “Number of new messages: X”. Sometimes this is OK, sometimes it’s not. There’s a lot of different ways to handle it, which goes beyond the scope of this post.