Unicode—It’s What Makes The World Go Round

Anglo-Centric Offshore Developers—An Oxymoron?

The Problem

Many companies needing software solutions turn to organizations in Asian countries since American software developers are usually paid much higher.  Unfortunately, I have found that in many cases, you get what you pay for.  For example, I have been very surprised how Anglo-centric some offshore and H1B developers are, especially when English is not their first language, with my having to explain in detail how and why Unicode should be used—Unicode, a ubiquitous method for storing text in various languages and scripts of the world that’s been around for decades.

UTF-16 == UTF-8

One developer working on the Windows version of the Mac OS X product I was working on was very adamant that it was okay to set an XML document’s character set tag as UTF-16 while saving it as UTF-8.  The reason it was “okay?”  His library on the Windows platform was able to figure out his mistake and work around it, so he said Apple’s framework that I was relying on to open his data was buggy for not accepting his error.  Standards are called standards for a reason; if they weren’t, they’d be called suggestions.  Thankfully, he finally figured out how to use Microsoft’s .NET libraries to convert his text to match the tag he said his text was in.

ISO-639-2 Uniquely Identifies Localizations

One developer working on the backend REST services I relied upon for my iOS app insisted she was only going to accept, store, and transmit localization information using the ISO-639-2 format, and even then, she insisted on the bibliographic form rather than the terminology form—we were developing software to be used by developers, not producing a book for librarians.

I demonstrated two well-known examples of why this was short-sighted:  first, Chinese comes in two different script systems, Simplified and Traditional, and second, Serbian is regularly written in both Cyrillic and Latin scripts.  Even with such evidence, she insisted there was no ambiguity and there was no need to instead use RFC 5646, a standard that iOS, macOS, many web browsers, and many modern operating systems use.  While it’s true the applications we worked on that relied upon our REST services was limited to a few localizations, I’ve seen plenty of times where similar short-sighted decisions eventually needed fixing.

ASCII-Randomed UUIDs

One developer working on the Windows version of the macOS product I was working on was tasked with generating a 16-byte random number, so, like me and based on our cross-platform design documents, he generated a UUID (aka GUID), but rather than immediately taking the 16-byte value of the UUID like I did, he instead used the ASCII representation of hexadecimal numbers without hyphens, which was 32 bytes long, and then truncated the last half of it and used the first half as his random number.  He justified his choice by saying he couldn’t figure out how to use his .NET libraries to get the binary form and that the number was still sufficiently random.

All English Words, Regardless of Context, Translate The Same

American-born software developers who have graduated from an accredited institution of higher learning within the past decade should also have some familiarity with software globalization.  However, I encountered such a person who had been working at Apple for a handful of years that insisted it was the right and necessary thing to hardcode all UI-facing strings in a fairly new application, and then use a separate command-line tool, genstrings, to pull those out to generate a text file that would then be used for translation.

Anyone with any linguistic skill (passing English is a basic requirement for a college education in the U.S.) knows there are some words in practically every language that mean more than one thing, and when translating each usage into another language, there is no guarantee in getting the same word.  Even when dealing with phrases that use numbers to count things, the order and types of words used may vary significantly.  For example, in English, we say “zero disks were formatted,” “one disk was formatted,” and “two or more disks were formatted.”  However, in some languages, these differences are even more significant, some having a “small-many” and “large-many” form.  While I cannot claim full responsibility, Apple’s string dictionary feature was added in an earlier version of the macOS operating system after I submitted a bug report requesting such a feature, after finding it difficult to mange numeric-valued strings without changing source code.  Application software should not have to know the intricacies of the languages being shown—that’s why frameworks and resource files are used.

I found it surprising that within this corner of Apple, there wasn’t the same level of understanding that everyone outside Apple had, that Apple’s tool genstrings was meant to help a developer start localizing an app that had not started out being localized, not a tool to be used all the time.  While many UI-facing strings that are the same may refer to the same thing, there is no guarantee, and as an application grows, more non-equivalent strings may be introduced.  Besides, it’s just not smart to hardcode strings, even in one’s native language, in an application—this requires recompilation and may introduce misspellings.  I personally prefer using a non-UI-usable key for localizable strings in my source code for the very purpose of ensuring a string gets localized.

One case of non-equivalent words I encountered a few years ago in another application was the word “State.”  On one screen, it was used for the condition of a button, having values of on or off, and in another place, it meant a governmental region like California.  While many applications may not have such collisions, it’s best to do things correctly from the start and expect that problem rather than try to fix things when—never if—they happen.

Dates, Currency, Time Zones, etc.

I could go on and on about how I’ve had to fix issues with formatting and conversion of dates, currency, time zones (standard, daylight saving time, summer time, and half-hour differences), etc., but needless to say, too often the people I’ve worked with only know as much as one would get if they took a two-week boot camp in coding.

The Solution

Software globalization is here to stay, and the tools to make it easier continually evolve.  Nevertheless, anyone developing an app, regardless of whether it may be used by people in other cultures than the one he or she was born in, needs to be aware of at least the most basic pitfalls so many others have worked through and take the time to learn their libraries (or make their own) and do things the right way, not the expedient way.

Quite often, failures in software are depicted with the metaphor of a fire needing to be put out, and American statesman Benjamin Franklin once said about the need for fire insurance, “An ounce of prevention is worth a pound of cure.”  If companies aren’t willing to put the time and money into preventing software “fires,” then there will always be a need for software “firefighters.”


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s