Integrating AngularJS and i18n-js, Switching Locales at Runtime

AngularJS has really nice built-in support for number, currency and date formatting. Unfortunately it lacks two significant features:

  • Performing string lookup / interpolation via externalized message bundles, and
  • Allowing the application to change the current locale at runtime.

A colleague of mine recently recommended the i18n-js library for internationalization. i18n-js is a lightweight library (~700 lines) which implements string lookup and formatting and is a suitable replacement for Angular's localization libraries.

I've written a proof-of-concept showing how the two can be integrated and made the source available at Github. Below is an explanation of some of the key files. Some of the gists have been trimmed slightly to ease formatting on the page, so refer to Github for the full code.

The above adapter wraps the core i18n-js APIs and makes it easy to load message bundles for a given locale. I wanted this functionality to stand on its own, so note that there is no dependency on AngularJS itself in this file.

The code above goes in the root HTML page and is used to load the message bundles for the default locale and then bootstrap the Angular application. This ensures all bundles are finished loading before the Angular application inits. This can be important if some application modules require the translations or formatters to be ready during initialization.

The above snippet exposes the i18n-js functionality via a few straightforward AngularJS filters. If other application components require access ot the I18nAdapter itself, it would be easy to add a service or factory here to make the adapter injectable.

If you clone the Github repo and run it under a web server, you can see a very simple AngularJS app making use of this code. Enjoy!

Localization Tips, Part 4: Localization Resources on the Web

This is the fourth and final entry in a multi-part series about localization.

Below is a collection of links to various localization resources out on the web.

Getting Started

Download Unicode Data

Unicode Common Locale Data Repository

  1. Download "core.zip" from the "Data" column
  2. Uncompress and open the "common/main" folder
  3. Open the language or locale file of your choice. There is a huge amount of data including:
    • Names of world languages, scripts and countries
    • Exemplar characters
    • Calendar system, date/time formats, time zone names
    • Number systems and names
    • Currencies

Explore Unicode Data

Locale Explorer - an online interface for viewing much of the data above.

Interactive Collation Tool

Interactive Demo (Danish) - interactive tool that allows arbitrary strings of text to be entered and sorted in various ways.

jQuery Globalize and AngularJS Integration in Under 100 Lines

The other day I threw together a quick proof-of-concept for integrating AngularJS with the jQuery Globalize Plugin. The first part is a simple utility Angular "service" which loads the appropriate JSON message bundle:

It is bootstrapped from run() in app.js:

Source code is available on Github. Just fire up launch.html to view the sample app initializing with different locales.

I'm still working on a more thorough evaluation of jQuery Globalize itself. Assuming it meets the criteria, a few more improvements come to mind:

  1. Currency filter
  2. Date/time filter
  3. Switching locales at runtime

Hope you found this one interesting!

Localization Tips, Part 3: Sorting and Collating

This is the third entry in a multi-part series about localization.

Sorting and collating data is something everyone does on a frequent basis and typically won't spend a lot of time thinking about. If you've ever seen a physical rolodex before, this is the essence of sorting and collating: putting a list of contacts in a particular order, collated (or grouped) in a manner which everyone agrees upon. For example:

A
Aaron Burr
Alexander Hamilton
B
Benjamin Franklin
C
Carter Braxton
Charles Pickney
D
Daniel Carroll
S
Samuel Adams

Unless you happen to be Danish:

A
Alexander Hamilton
B
Benjamin Franklin
C
Charles Pickney
Carter Braxton
D
Daniel Carroll
S
Samuel Adams
Å
Aaron Burr

Or Czech:

A
Aaron Burr
Alexander Hamilton
B
Benjamin Franklin
C
Carter Braxton
D
Daniel Carroll
CH
Charles Pickney
S
Samuel Adams

So what's going on here? There are a few different considerations on how these characters get sorted.

Exemplar Characters

Each language has a set of "exemplar characters" which contain the commonly used letters fo a given modern form of a language. More simply, you can think of them as the "alphabet" or the "native characters" for a given language. Here are a few examples of these characters:

English: a b c d e f g h i j k l m n o p q r s t u v w x y z
Danish: a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å
Czech: a á b c č d ď e é ě f g h ch i í j k l m n ň o ó p q r ř s š t ť u ú ů v w x y ý z ž

Sometimes a sequence of Latin characters will map to a particular native character. In Danish, "AA" is treated as "Å" which sorts after "Z." Sometimes an exemplar character is actually expressed by multiple characters. In Czech, for example, strings starting with "CH" sort after "H" but strings starting with just "C" sort after "B."

Unicode Sorting

The Unicode collation algorithm defines the default sorting applied to the characters supported by Unicode. The order defined here is important because it serves as the basis for how other languages and locales customize collation. However, collating data based on this standard isn't particularly useful to end users who expect data sorted in a manner that is specific to their own conventions.

Non-Native Characters

Different languages will define different rules for how non-native characters are sorted and collated. Frequently they will appear after the native characters in order defined by the Unicode collation algorithm. However, Japanese's default collation will show latin characters first, then Japanese characters, then non-native.

Tips

Here is a brief list of things to keep in mind when dealing with collation in your application:

  1. If your application's user interface displays data in a collated or "grouped" view, ensure it does so in a locale-aware manner. For example, if your application displays a list of contacts with a header row for each letter, the letters should be appropriate for that locale. Define that set of exemplar characters for each language in the appropriate localization files. There typically should also be an "other" section after the last exemplar character for text starting with "non-native" characters.
  2. Ensure your application sorts data according to the user's selected locale. Frequently this will involve passing the proper locale to the API's "sort" function. Using the Unicode collation might seem like a fair compromise but will lead to a result that is displeasing to most non-English speakers.
  3. If your application is multi-tier, ensure that all tiers of the application are sorting in the same manner. For instance, an n-tier application might use JavaScript, Java and MySQL. If MySQL queries are sorted in a locale-aware manner but then JavaScript or Java execute a subsequence locale-agnostic sort, data will be inconsistent.
  4. If one of the tiers doesn't provide support for locale-aware sorting, remove sorting code from that tier and rely on another tier to perform all sorting.
  5. If the application relies on caching of data, ensure that a locale-aware sort is performed after the data is retrieved from cache. This issue can manifest in many different places, including caching of JSON or XML responses over HTTP.

Localization Tips, Part 2: The Evils of Date Formatting

This is the second entry in a multi-part series about localization.

Every modern platform provides a mechanism for formatting dates. We've all dealt with people on the business side who prefer dates formatted as YYYY-mm-dd or dd MMM YY or countless other permutations. The slippering slope of date formatting is that once a date is formatted, it should be formatted correctly for each and every locale that the application supports.

Here is a sample of correct date formatting for 30 common locales:

Even after so many managed to agree on the Gregorian calendar system it's amazing to see the wide variety of date formats. Even if an application only needed to support a few locales, the developer would need to define a few date formats in each translation file and tweak as appropriate for that locale's conventions. I didn't know Romanians formatted their dates as 17.01.2013, did you? :)

While nearly every platform's date formatting APIs provide a way to explicitly set a date format, some also support the notion of "format styles." These styles define a short, medium and long style for formatting date and time for a given locale. Check out this example from the Flash Globalization APIs.

Specifying a date style of SHORT, MEDIUM or LONG is a very convenient way to ensure that dates are always formatted correctly for the current locale. Here's an example snippet of Flash code which prints out the MEDIUM version of the current date for the supported locales:

The output of this script is the same as the snippet of date formats a few paragraphs up. Pretty convenient stuff here. For added flexibility, a developer could also write code to support explicit date formats for certain locales and then fall back to a date style for other locales.

So before you format another date, dig a little deeper into their localization APIs to see what's available. You might be surprised what you find.