Software internationalization pitfalls
OR
A note from translator to developer

software_internationalization

Dear developer,

It seems like we are a lot alike:

  • You speak one language, but write in another; so do I.
  • Your work pivots around communication; so does mine.
  • You are addressing the world, “Hello World” 😊; so do I.
  • Your everyday is full of codes and tools and platforms and systems; so is mine.

And it also seems that even more often you and I engage in the same projects, with me coming in after you. When you finish the software internationalization, I come in for the localization. This is why I decided to write this letter; to tell you that I strive for your work to be world-accessible, world-understood and ultimately world-used, but I need your help. I need you to make your work world-ready!

While localizing your software, I have really delved into the coding or scripting conventions of your people, as well as your particular style and habits, and I would expect that in some way or another, my questions on what “%s” in about 127 strings of code stands for have reached you.

So, here are some ideas on how we can work together on creating and releasing amazing software for the world in each and every person’s native language.

The issues with string concatenation

If you concatenate strings, I am left in the dark. Using your coding tricks to fill in this sentence

Are you sure you want to delete this            ?”

with whatever value you need each time, be it ‘file’, ‘folder’, ‘image’ etc., may save time during scripting, but it won’t while localizing. English is only mildly a synthetic language (on the inflective range of the spectrum) which makes this approach acceptable. This though is not the case with highly inflective and agglutinative languages, like Greek and Turkish, where adjectives and articles, and in the case of Turkish verbs too, are bound by gender and grammatical case depending on sentence position.

So, I’d rather you keep sentences in a single string! But if you must use variables, interpolate.

Guessing game with variables

When you use the same variable to call upon different values, I feel like looking for a needle in the haystack while trying to guess what this might be every time. When exporting the localizable content in a resource, content is coming from all places and is not always in order, thus making context in the resource unreliable for such clarifications. When you resort to using variables, please name them uniquely and share with me the respective directory. Comments will come in handy too as we work on a use-what-you-have and nothing-goes-to-waste mode most of the time. And never, I mean never, place a button text in a variable string; unless it shows on the crystal ball, I cannot know what will end up there at run time.

Out of this world communication

When you compound variables in strings, I am secretly laughing while trying to craft messages to aliens with your idea of a sentence; here’s a peak:

If you connect %s with %s at %d:%d on %s, %s %d, %d, %s will crash.” can stand for

If you connect 912 Maritima with Earth at 23:48 on Tuesday, December 24, 2089, Earth will crash.”, and I add “Are you sure you want to continue?” just for the fun of it. Thus, I plead with you to phrase more in the language of humans so that meaning is still possible in and for our world.

Size matters

Size matters (no pun intended) and the maximum allowed, the better. Don’t be thrifty about your string buffer size. When there is no physical restriction, buffer overruns are notorious for bugs even in non-localized software, though when in localization mode, more strings tend to reveal buffer overrun problems. So, here are some suggestions to be proactive at a design level by implementing software internationalization best practices: a. consider using dynamic and responsive design, b. allow as much space as possible for text expansion, c. if there have to be character limitations, tell me what the maximum allowed number of characters is and consult with me for the best approach.

Not everyone gets your humor

When you are ‘in’, I am ‘out’. I get it that you want to make error messages fun or that you feel like using colloquial language to boost your users’ engagement. But humor is culture-specific, definitely not international at all, so be prepared for your inspiration to be neutralized – literally and figuratively.

Gaming is one thing, and content in that case is better handled through transcreation or copywriting approaches. But while “you are nailing it” when you are “downloading the upload loader” before you “bake your config in the oven” and “clean the naughty bits” to add that “wow factor”, I really feel like ‘burning it’ to fire up inspiration.

Besides all joke though, try to avoid elements, be it in design or phrasing, that are bound by cultural perspective, i.e. jargon or slang, religious, gender, ethnic and political references, as well as culture-specific imagery and colors, as these are not always transferable in other cultures and locales, thus conflicting with concept of software internationalization.

Funny fact shared by Microsoft: In earlier UIs a U.S. rural mailbox was used to indicate mail. This was acceptable for use in the United States because many people had grown up using these types of mailboxes. However, when the image was introduced in Europe, most people wanted to know what a breadbox sitting on a pole had to do with mail.

And if you’ve made it that far and I have your attention, I’ll run you through some more aspects in software that are language-bound; I’ll try to be brief but won’t cut corners.

Here are some further considerations when writing world-ready software having the software internationalization best practices in mind:

Plurals across languages

Plurals are not handled in the same way across languages. If you have read my referenced content about the typology of languages above, you’ve probably guessed already that using a placeholder at the end of the word won’t work.

Sorting rules rule!

Not everything is as straightforward as alphabetical order. There are languages that sort accented vowels first or last, languages rendered by non-Latin scripts that base their sorting on the stroke count or their phonetic representation, and there is also the Turkish-I problem. The thing is that sorting is not something users notice, unless it’s not there or it’s not correct and they cannot find what they are looking for. So, this would be a proactive step for boosting the customer experience.

Date/Number internationalization

Calendar, date/time, number and address formats depend on the locale, the religion and world economics.

    • Besides the Gregorian calendar used in most countries of the West, there are also the Japanese, the Hijri, the Taiwan, the Hebrew lunar and the Japanese calendars; you’ve probably heard about the Chinese New Year, which is nowhere close to January 1st. However, even in the same calendar, there are countries where the first day of the week differs, such as in the Arab-speaking world where Friday and Saturday are non-working days, but Sunday is the first day of the working week.
    • When it comes to numbers, the thousand and decimal separator is the most common issue, but then you should also keep in mind the negatives, the shape and the grouping of numbers.
    • Talking about address fields, don’t make all fields obligatory and be prepared to handle with flexibility additional data that you might not usually expect to see in an address form; for example, not all countries have states, and there are cases where the address is accompanied by a map.

Text rendering

Text rendering, that is the process of converting a string to readable text for the user, may be straightforward for simple scripts, but can significantly impact the application design and functionality when not handled correctly for more complex cases.

      • Wrong capitalization can change the meaning of words, eg. if you capitalize the word for “Wednesday” in Russian, it changes the meaning to “environment”. Greek does not support the small caps function because all capital writing in Greek is not accented. There are languages where there is no one-to-one matching between uppercase and lowercase. There are also languages that do not make the distinction at all, i.e. non-Latin scripts.
      • When in page alignment, do mind and prepare for languages that are right-to-left or top-to-bottom. However, do also exempt content that if mirrored would create issues, i.e. file names and paths, URLs, URIs, server and domain names etc.
      • Not all languages and writing scripts break at a space, tab or hyphen.

My last two cents

And here we are, with you having taken a taste of the world languages and their idiosyncrasies. I have faith in you that you will do your best to write world-ready codebase. So, when you have completed the software internationalization phase, you only have to extract the localizable elements to create resource bundles for me to go in and work my magic. Two tips here that will save you and me both time and money:

  1. Group localizable strings separate from non-localizable ones.
  2. Clean up your resources to only include strings that are used.

With you on the software internationalization front and me on the localization front, we will conquer the world with software that will be easier to translate in many languages, address different locales, be error-free, user-friendly and user-relevant.

What do you think? Are you with me?

The translator